Questions tagged [loss-function]

A function used to quantify the difference between observed data and predicted values according to a model. Minimization of loss functions is a way to estimate the parameters of the model.

526 questions
114
votes
5 answers

Why do cost functions use the square error?

I'm just getting started with some machine learning, and until now I have been dealing with linear regression over one variable. I have learnt that there is a hypothesis, which is: $h_\theta(x)=\theta_0+\theta_1x$ To find out good values for the…
Golo Roden
  • 1,313
  • 2
  • 9
  • 6
66
votes
2 answers

Sparse_categorical_crossentropy vs categorical_crossentropy (keras, accuracy)

Which is better for accuracy or are they the same? Of course, if you use categorical_crossentropy you use one hot encoding, and if you use sparse_categorical_crossentropy you encode as normal integers. Additionally, when is one better than the…
Master M
  • 763
  • 1
  • 6
  • 5
44
votes
2 answers

What does from_logits=True do in SparseCategoricalcrossEntropy loss function?

In the documentation it has been mentioned that y_pred needs to be in the range of [-inf to inf] when from_logits=True. I truly didn't understand what this means, since the probabilities need to be in the range of 0 to 1! Can someone please explain…
44
votes
5 answers

Intuitive explanation of Noise Contrastive Estimation (NCE) loss?

I read about NCE (a form of candidate sampling) from these two sources: Tensorflow writeup Original Paper Can someone help me with the following: A simple explanation of how NCE works (I found the above difficult to parse and get an understanding…
tejaskhot
  • 3,935
  • 7
  • 20
  • 18
29
votes
6 answers

L2 loss vs. mean squared loss

I see some literature consider L2 loss (least squared error) and mean squared error loss are two different kinds of loss functions. However, it seems to me these two loss functions essentially compute the same thing (with a 1/n factor…
Edamame
  • 2,705
  • 5
  • 23
  • 32
27
votes
2 answers

What is the advantage of using log softmax instead of softmax?

Are there any advantages to using log softmax over softmax? What are the reasons to choose one over the other?
rawwar
  • 831
  • 2
  • 12
  • 23
19
votes
2 answers

Parameterization regression of rotation angle

Let's say I have a top-down picture of an arrow, and I want to predict the angle this arrow makes. This would be between $0$ and $360$ degrees, or between $0$ and $2\pi$. The problem is that this target is circular, $0$ and $360$ degrees are exactly…
17
votes
2 answers

Custom loss function with additional parameter in Keras

I'm looking for a way to create a loss function that looks like this: The function should then maximize for the reward. Is this possible to achieve in Keras? Any suggestions how this can be achieved are highly appreciated. def…
Nickpick
  • 651
  • 2
  • 7
  • 18
14
votes
3 answers

Keras Sequential model returns loss 'nan'

I'm implementing a neural network with Keras, but the Sequential model returns nan as loss value. I have sigmoid activation function in the output layer to squeeze output between 0 and 1, but maybe doesn't work properly. This is the code: def…
pairon
  • 395
  • 1
  • 3
  • 15
14
votes
3 answers

Why is there a $2$ at the denominator of the mean squared error function?

In the famous Deep Learning Book, in chapter 1, equation 6, the Quadratic Cost (or Mean Squared Error) in a neural network is defined as $ C(w, b) = \frac{1}{2n}\sum_{x}||y(x)-a||^2 $ where $w$ is the set of all weights and $b$ the set of all…
Silas Berger
  • 161
  • 1
  • 5
13
votes
2 answers

Interpreting the Root Mean Squared Error (RMSE)!

I read all about pros and cons of RMSE vs. other absolute errors namely mean absolute error (MAE). See the the following references: MAE and RMSE — Which Metric is Better? What's the bottom line? How to compare models Or this nice blogpost, or this…
TwinPenguins
  • 4,157
  • 3
  • 17
  • 53
13
votes
3 answers

Tensorflow Adjusting Cost Function for Imbalanced Data

I have a classification problem with highly imbalanced data. I have read that over and undersampling as well as changing the cost for underrepresented categorical outputs will lead to better fitting. Before this was done tensorflow would categorize…
Cole
  • 181
  • 1
  • 1
  • 7
10
votes
1 answer

What is the difference between SGD classifier and the Logisitc regression?

To my understanding, the SGD classifier, and Logistic regression seems similar. An SGD classifier with loss = 'log' implements Logistic regression and loss = 'hinge' implements Linear SVM. I also understand that logistic regression uses gradient…
10
votes
2 answers

Validation showing huge fluctuations. What could be the cause?

I'm training a CNN for a 3-class image classification problem. My training loss decreased smoothly, which is the expected behaviour. However, my validation loss shows a lot of fluctuation. Is this something that I should be worried about, or should…
Josh
  • 487
  • 4
  • 8
9
votes
1 answer

XGBoost custom objective for regression in R

I implemented a custom objective and metric for a xgboost regression. In order to see if I'm doing this correctly, I started with a quadratic loss. The implementation seems to work well, but I cannot reproduce the results from a standard…
Peter
  • 7,277
  • 5
  • 18
  • 47
1
2 3
35 36