Questions tagged [cost-function]

74 questions
25
votes
2 answers

Why do we have to divide by 2 in the ML squared error cost function?

I'm not sure why you need to multiply by $\frac1{2m}$ in the beginning. I understand that you would have to divide the whole sum by $\frac1{m}$, but why do we have to multiply $m$ by two? Is it because we have two $\theta$ here in the example?
Marton Langa
  • 353
  • 1
  • 3
  • 4
24
votes
3 answers

Python implementation of cost function in logistic regression: why dot multiplication in one expression but element-wise multiplication in another

I have a very basic question which relates to Python, numpy and multiplication of matrices in the setting of logistic regression. First, let me apologise for not using math notation. I am confused about the use of matrix dot multiplication versus…
GhostRider
  • 353
  • 1
  • 2
  • 8
23
votes
1 answer

How does Gradient Descent and Backpropagation work together?

Please forgive me as I am new to this. I have attached a diagram trying to model my understanding of neural network and Back-propagation? From videos on Coursera and resources online I formed the following understanding of how neural network…
21
votes
1 answer

What do "compile", "fit", and "predict" do in Keras sequential models?

I am a little confused between these two parts of Keras sequential models functions. May someone explains what is exactly the job of each one? I mean compile doing forward pass and calculating cost function then pass it through fit to do backward…
user3486308
  • 1,260
  • 5
  • 16
  • 27
10
votes
2 answers

What is the Time Complexity of Linear Regression?

I am working with linear regression and I would like to know the Time complexity in big-O notation. The cost function of linear regression without an optimisation algorithm (such as Gradient descent) needs to be computed over iterations of the…
10
votes
1 answer

Cost function for Ordinal Regression using neural networks

What is the best cost function to train a neural network to perform ordinal regression, i.e. to predict a result whose value exists on an arbitrary scale where only the relative ordering between different values is significant (e.g: to predict…
xboard
  • 348
  • 3
  • 14
7
votes
2 answers

What are the cases where it is fine to initialize all weights to zero

I've taken a few online courses in machine learning, and in general, the advice has been to choose random weights for a neural network to ensure that your neurons don't all learn the same thing, breaking symmetry. However, there were other cases…
6
votes
2 answers

Does MLP always find local minimum

In linear regression we use the following cost function which is a convex function: We Use the following cost function in logistic regression because the preceding cost function is not convex whenever the hypothesis (h) is logistic function. We…
Green Falcon
  • 13,868
  • 9
  • 55
  • 98
5
votes
2 answers

What is an intuitive explanation for the log loss cost function?

I would really appreciate if someone could explain the log loss cost function And the use of it in measuring a classification model performance. I have read a few articles but most of them concentrate on mathematics and not on intuitive explanation…
Sai Kumar
  • 601
  • 1
  • 8
  • 14
4
votes
2 answers

Cost sensitive learning and class balancing

I am facing a classification problem with classes that are really imbalanced (more or less 1% of positive cases). In addition, the "cost" of a False Negative (FN) is much higher than the cost of False Positive (FP). Considering so, I decided to…
4
votes
2 answers

Regularization for intercept parameter

Why is the regularization parameter not applied to the intercept parameter? From what I have read about the cost functions for Linear and Logistic regression, the regularization parameter (λ) is applied to all terms except the intercept. For…
4
votes
2 answers

Question of using gradient descent instead of calculus. I checked previous questions there are still points to clarify

First of all I checked http://stats.stackexchange.com/questions/23128/solving-for-regression-parameters-in-closed-form-vs-gradient-descent, http://stackoverflow.com/questions/26804656/why-do-we-use-gradient-descent-in-linear-regression,…
J.Smith
  • 458
  • 3
  • 15
4
votes
2 answers

XGBoost change loss function

I'm using XGBoost (through the sklearn API) and I'm trying to do a binary classification. False Positives are much worse for me than False Negatives, how can I take this into account? The API confuses me a bit and I found two arguments that might be…
cosec
  • 51
  • 1
  • 3
3
votes
1 answer

Why does the MAE still remain, at all?

This may seem to be a silly question. But I just wonder why the MAE doesn't reduce to values close to 0. It's the result of an MLP with 2 hidden layers and 6 neurons per hidden layer, trying to estimate one outputvalue depending on three input…
Turnvater
  • 48
  • 6
3
votes
1 answer

Comparison between cost functions to determine the "best" model?

I'm building an LSTM neural net for time series prediction (regression) and I am incorporating custom loss functions into training. I'm trying to determine which cost function (of 3 cost functions) gives the "best" model, in other words, trying to…
PyRsquared
  • 1,584
  • 1
  • 10
  • 17
1
2 3 4 5