This post proves that the combination of softmax and cross-entropy loss ensures significant gradients by making the gradient the difference between predicted probabilities and actual labels, which aids effective learning.
-
Softmax + Cross-Entropy Loss = Significant Gradient
-
An Minimalistic Introduction to einsum
This post introduces essence of the Einstein summation convention (Einsum).
-
Query2Box
This article discusses the method of the paper "Query2Box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings" by Ren et al [1] and explains the intution behind the defined box operations and distance metrics.
-
Graph Neural Networks and Electronic Health Records
This article summarizes major researches that applied graph neural networks (GNN) to analyzing electronic health records (EHR).
-
The Attention Interpretability Debate
This post summarizes three papers at the center of the debate on whether attention is interpretable.
- Attention is not explanation [1]
- Is attention interpretable? [4]
- Attention is not not explanation [5]
-
Integrated Gradients
The paper (Sundararajan, Taly et al. 2017) identifies two axioms – sensitivity and implementation invariance that attribution methods ought to satisfy. Based on the axioms, the authors designed a new attribution method called integrated gradients. The method requires no modification to the original model and needs only a few calls to the standard gradient operator.