How to understand a large result of torch.nn.NLLLoss() with correct predicts?

Question

I'm learning the usage of torch.nn.NLLLoss() and torch.nn.LogSoftmax(), and I'm confused about the results of them.

For example:

lsm = torch.nn.LogSoftmax(dim=-1)
nll = nn.NLLLoss()

grnd_truth = torch.tensor([1])
# let's say it predicted correctly!!!
raw_logits = torch.tensor([[0.3665, 0.5542, -1.0306]])

logsoftmax = lsm(raw_logits)
final_loss = nll(logsoftmax, grnd_truth)

logsoftmax:
>>> tensor([[-0.8976, -0.7099, -2.2947]])
final_loss:
>>> tensor(0.7099)    # Is it normal getting so large loss when predicted correctly?

# let's try a incorrect prediction
grnd_truth = torch.tensor([0])
nll(lsm(raw_logits), grnd_truth)
>>> tensor(0.8976)    # Is this not enough large?

Obviously, we can see that the mis-classified loss is not large enough, comparing to the loss of a correctly classifing that is near to former.

I read the docs of NLLLoss, and I know its formula. I guessed that the designers must have their reasons to use the formula, but I didn't have gotten it.

In the fact, the reason of I posting this question, is that I encountered a difficulty to train a triple-classification module with NLLLoss() andLogSoftmax().

Is the goal of a lossing func that make the loss as small as possible when the prediction is right, and vice versa?

score 1 · Answer 1 · answered Oct 24 '22 at 08:29

The equivalent formula in TensorFlow is the Categorical Cross Entropy or Sparse Categorical Cross Entropy.

https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy

https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy

Maybe it could describe better the aim of the formula.

As far as I know, it is mainly used to apply multi-class classification on large datasets automatically with labels or integer targets.

https://github.com/christianversloot/machine-learning-articles/blob/main/how-to-use-sparse-categorical-crossentropy-in-keras.md

Note: Pytorch just uses Log probabilities, whereas TensorFlow doesn't.

How to understand a large result of torch.nn.NLLLoss() with correct predicts?

1 Answers1