This post proves that the combination of softmax and cross-entropy loss ensures significant gradients by making the gradient the difference between predicted probabilities and actual labels, which aids effective learning.
- 
                
                Softmax + Cross-Entropy Loss = Significant Gradient
- 
                
                An Minimalistic Introduction to einsumThis post introduces essence of the Einstein summation convention (Einsum). 
- 
                
                Query2BoxThis article discusses the method of the paper "Query2Box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings" by Ren et al [1] and explains the intution behind the defined box operations and distance metrics. 
- 
                
                Graph Neural Networks and Electronic Health RecordsThis article summarizes major researches that applied graph neural networks (GNN) to analyzing electronic health records (EHR). 
- 
                
                The Attention Interpretability DebateThis post summarizes three papers at the center of the debate on whether attention is interpretable. - Attention is not explanation [1]
- Is attention interpretable? [4]
- Attention is not not explanation [5]
 
- 
                
                Integrated GradientsThe paper (Sundararajan, Taly et al. 2017) identifies two axioms – sensitivity and implementation invariance that attribution methods ought to satisfy. Based on the axioms, the authors designed a new attribution method called integrated gradients. The method requires no modification to the original model and needs only a few calls to the standard gradient operator.