Questions tagged [softmax]
66 questions
73
votes
6 answers
Cross-entropy loss explanation
Suppose I build a neural network for classification. The last layer is a dense layer with Softmax activation. I have five different classes to classify. Suppose for a single training example, the true label is [1 0 0 0 0] while the predictions be…
enterML
- 3,011
- 9
- 26
- 38
31
votes
4 answers
Gumbel-Softmax trick vs Softmax with temperature
From what I understand, the Gumbel-Softmax trick is a technique that enables us to sample discrete random variables, in a way that is differentiable (and therefore suited for end-to-end deep learning).
Many papers and articles describe it as a way…
4-bit
- 411
- 1
- 4
- 3
5
votes
1 answer
Can I turn any binary classification algorithms into multiclass algorithms using softmax and cross-entropy loss?
Softmax + cross-entropy loss for multiclass classification is used in ML algorithms such as softmax regression and (last layer of) neural networks. I wonder if this method could turn any binary classification algorithms into a multiclass one? For…
5
votes
5 answers
Do I need to standardize my one hot encoded labels?
I'm trying to do a simple softmax regression where I have features (2 columns) and a one hot encoded vector of labels (two categories: left = 1 and Right = 0). Do I need to standardize just the vector of features or also the vector of labels? when…
José Lucas Araújo dos Santos
- 51
- 1
- 2
5
votes
1 answer
What is the advantage of using Euler's number (e^x) instead of another base in the softmax equation?
I understand the softmax equation is
$\boldsymbol{P}(y=j \mid x)=\frac{e^{x_{j}}}{\sum_{k=1}^{K} e^{x_{k}}}$
My question is: why use $e^x$ instead of say, $3^x$. I understand $e^x$ is it's own derivative, but how is that advantageous in this…
Codedorf
- 53
- 4
3
votes
1 answer
Difference in performance Sigmoid vs. Softmax
For the same Binary Image Classification task, if in the final layer I use 1 node with Sigmoid activation function and binary_crossentropy loss function, then the training process goes through pretty smoothly (92% accuracy after 3 epochs on…
Eric Cartman
- 51
- 4
3
votes
3 answers
Dot product for similarity in word to vector computation in NLP
In NLP while computing word to vector we try to maximize log(P(o|c)). Where P(o|c) is probability that o is outside word, given that c is center word.
Uo is word vector for outside word
Vc is word vector for center word
T is number of words in…
Vivek Dani
- 130
- 1
- 5
3
votes
1 answer
Softmax activation predictions not summing to 1
I am a beginner with rnns, consider this sample code
from tensorflow import keras
import numpy as np
if __name__ == '__main__':
model = keras.Sequential((
keras.layers.SimpleRNN(5, activation="softmax", input_shape=(1, 3)),
))
…
user329387
- 31
- 1
3
votes
1 answer
Pytorch doing a cross entropy loss when the predictions already have probabilities
So, normally categorical cross-entropy could be applied using a cross-entropy loss function in PyTorch or by combing a logsoftmax with the negative log likelyhood function such as follows:
m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
pred =…
user3023715
- 203
- 2
- 5
3
votes
1 answer
Multiclass Classification with Decision Trees: Why do we calculate a score and apply softmax?
I'm trying to figure out why when using decision trees for multi class classification it is common to calculate a score and apply softmax, instead of just taking the averages of the terminal nodes probabilities?
Let's say our model is two trees. A…
Caleb
- 141
- 1
- 3
2
votes
0 answers
Precision-Recall Curve Intuition for Multi-Class Classification Utilizing SoftMax Activation
I am running a CNN image multi-class classification model with Keras/Tensorflow and have established about a 90% overall accuracy with my best model trial. I have 10 unique classes I am trying to classify. However I want to present a PRC for the…
Coldchain9
- 159
- 5
2
votes
1 answer
Problem with chain rule in softmax layer when differentiated separately
I have some problems with backpropagation in softmax output layer. I know how it should work but if I try to apply the chain rule in the classical way, I get different results compared to when Softmax is derivated with Cross-Entropy error. Here's an…
Display name
- 153
- 1
- 4
2
votes
1 answer
Why use different variations of Softmax in training and validation for neural networks with Pytorch?
Specifically, I'm working on a modeling project, and I see someone else's code that looks like
def forward(self, x):
x = self.fc1(x)
x = self.activation1(x)
x = self.fc2(x)
x = self.activation2(x)
x = self.fc3(x)
x =…
Anon
- 123
- 4
2
votes
2 answers
Why do we use a softmax activation function in Convolutional Autoencoders?
I have been working on an image segmentation project where I have created a convolutional autoencoder. I saw this image and implemented it using Keras.
At the output layer, the author has used the softmax activation function. Shouldn't it be ReLU?…
Shubham Panchal
- 2,140
- 8
- 21
2
votes
1 answer
What is the benefit of the exponential function inside softmax?
I know that softmax is:
$$ softmax(x) = \frac{e^{x_i}}{\sum_j^n e^{x_j}}$$
This is an $\mathbb{R}^n \implies \mathbb{R}^n$ function, and the elements of the output add up to 1. I understand that the purpose of normalizing is to have elements of $x$…
Victor2748
- 123
- 4