Highest Voted 'softmax' Questions - Data Science Stack Exchange

73

votes

6 answers

Cross-entropy loss explanation

Suppose I build a neural network for classification. The last layer is a dense layer with Softmax activation. I have five different classes to classify. Suppose for a single training example, the true label is [1 0 0 0 0] while the predictions be…

asked Jul 10 '17 at 10:26

enterML

3,011
9
26
38

31

votes

4 answers

Gumbel-Softmax trick vs Softmax with temperature

From what I understand, the Gumbel-Softmax trick is a technique that enables us to sample discrete random variables, in a way that is differentiable (and therefore suited for end-to-end deep learning). Many papers and articles describe it as a way…

neural-network deep-learning attention-mechanism softmax

asked Aug 29 '19 at 10:30

4-bit

411
1
4
3

5

votes

1 answer

Can I turn any binary classification algorithms into multiclass algorithms using softmax and cross-entropy loss?

Softmax + cross-entropy loss for multiclass classification is used in ML algorithms such as softmax regression and (last layer of) neural networks. I wonder if this method could turn any binary classification algorithms into a multiclass one? For…

machine-learning multiclass-classification softmax

asked Jul 30 '19 at 06:22

CouldntLoginToMyPreviousAcc

151
3

5

votes

5 answers

Do I need to standardize my one hot encoded labels?

I'm trying to do a simple softmax regression where I have features (2 columns) and a one hot encoded vector of labels (two categories: left = 1 and Right = 0). Do I need to standardize just the vector of features or also the vector of labels? when…

machine-learning softmax

asked Jul 26 '19 at 13:39

José Lucas Araújo dos Santos

51
1
2

5

votes

1 answer

What is the advantage of using Euler's number (e^x) instead of another base in the softmax equation?

I understand the softmax equation is $\boldsymbol{P}(y=j \mid x)=\frac{e^{x_{j}}}{\sum_{k=1}^{K} e^{x_{k}}}$ My question is: why use $e^x$ instead of say, $3^x$. I understand $e^x$ is it's own derivative, but how is that advantageous in this…

machine-learning statistics activation-function softmax

asked Aug 26 '22 at 15:06

Codedorf

53
4

3

votes

1 answer

Difference in performance Sigmoid vs. Softmax

For the same Binary Image Classification task, if in the final layer I use 1 node with Sigmoid activation function and binary_crossentropy loss function, then the training process goes through pretty smoothly (92% accuracy after 3 epochs on…

image-classification training loss-function softmax sigmoid

asked Jun 28 '21 at 19:20

Eric Cartman

51
4

3

votes

3 answers

Dot product for similarity in word to vector computation in NLP

In NLP while computing word to vector we try to maximize log(P(o|c)). Where P(o|c) is probability that o is outside word, given that c is center word. Uo is word vector for outside word Vc is word vector for center word T is number of words in…

nlp word-embeddings word2vec similarity softmax

asked Jan 14 '21 at 15:55

Vivek Dani

130
1
5

3

votes

1 answer

Softmax activation predictions not summing to 1

I am a beginner with rnns, consider this sample code from tensorflow import keras import numpy as np if __name__ == '__main__': model = keras.Sequential(( keras.layers.SimpleRNN(5, activation="softmax", input_shape=(1, 3)), )) …

keras tensorflow rnn softmax

asked Aug 24 '19 at 07:40

user329387

31
1

3

votes

1 answer

Pytorch doing a cross entropy loss when the predictions already have probabilities

So, normally categorical cross-entropy could be applied using a cross-entropy loss function in PyTorch or by combing a logsoftmax with the negative log likelyhood function such as follows: m = nn.LogSoftmax(dim=1) loss = nn.NLLLoss() pred =…

neural-network loss-function probability pytorch softmax

asked Jul 18 '19 at 21:56

user3023715

203
2
5

3

votes

1 answer

Multiclass Classification with Decision Trees: Why do we calculate a score and apply softmax?

I'm trying to figure out why when using decision trees for multi class classification it is common to calculate a score and apply softmax, instead of just taking the averages of the terminal nodes probabilities? Let's say our model is two trees. A…

decision-trees xgboost softmax

asked Sep 27 '17 at 01:05

Caleb

141
1
3

2

votes

0 answers

Precision-Recall Curve Intuition for Multi-Class Classification Utilizing SoftMax Activation

I am running a CNN image multi-class classification model with Keras/Tensorflow and have established about a 90% overall accuracy with my best model trial. I have 10 unique classes I am trying to classify. However I want to present a PRC for the…

python classification multiclass-classification softmax

asked Oct 16 '20 at 18:37

Coldchain9

159
5

2

votes

1 answer

Problem with chain rule in softmax layer when differentiated separately

I have some problems with backpropagation in softmax output layer. I know how it should work but if I try to apply the chain rule in the classical way, I get different results compared to when Softmax is derivated with Cross-Entropy error. Here's an…

neural-network backpropagation softmax

asked Dec 13 '19 at 07:39

Display name

153
1
4

2

votes

1 answer

Why use different variations of Softmax in training and validation for neural networks with Pytorch?

Specifically, I'm working on a modeling project, and I see someone else's code that looks like def forward(self, x): x = self.fc1(x) x = self.activation1(x) x = self.fc2(x) x = self.activation2(x) x = self.fc3(x) x =…

neural-network deep-learning pytorch softmax

asked Jul 10 '19 at 21:11

Anon

123
4

2

votes

2 answers

Why do we use a softmax activation function in Convolutional Autoencoders?

I have been working on an image segmentation project where I have created a convolutional autoencoder. I saw this image and implemented it using Keras. At the output layer, the author has used the softmax activation function. Shouldn't it be ReLU?…

convolution image-recognition autoencoder activation-function softmax

asked Jun 15 '19 at 05:37

Shubham Panchal

2,140
8
21

2

votes

1 answer

What is the benefit of the exponential function inside softmax?

I know that softmax is: $$ softmax(x) = \frac{e^{x_i}}{\sum_j^n e^{x_j}}$$ This is an $\mathbb{R}^n \implies \mathbb{R}^n$ function, and the elements of the output add up to 1. I understand that the purpose of normalizing is to have elements of $x$…

machine-learning neural-network classification loss-function softmax

asked Aug 22 '23 at 16:21

Victor2748

123
4

Questions tagged [softmax]