Questions tagged [cost-function]
74 questions
25
votes
2 answers
Why do we have to divide by 2 in the ML squared error cost function?
I'm not sure why you need to multiply by $\frac1{2m}$ in the beginning. I understand that you would have to divide the whole sum by $\frac1{m}$, but why do we have to multiply $m$ by two?
Is it because we have two $\theta$ here in the example?
Marton Langa
- 353
- 1
- 3
- 4
24
votes
3 answers
Python implementation of cost function in logistic regression: why dot multiplication in one expression but element-wise multiplication in another
I have a very basic question which relates to Python, numpy and multiplication of matrices in the setting of logistic regression.
First, let me apologise for not using math notation.
I am confused about the use of matrix dot multiplication versus…
GhostRider
- 353
- 1
- 2
- 8
23
votes
1 answer
How does Gradient Descent and Backpropagation work together?
Please forgive me as I am new to this. I have attached a diagram trying to model my understanding of neural network and Back-propagation? From videos on Coursera and resources online I formed the following understanding of how neural network…
Mohamed Mahyoub
- 345
- 1
- 2
- 5
21
votes
1 answer
What do "compile", "fit", and "predict" do in Keras sequential models?
I am a little confused between these two parts of Keras sequential models functions. May someone explains what is exactly the job of each one? I mean compile doing forward pass and calculating cost function then pass it through fit to do backward…
user3486308
- 1,260
- 5
- 16
- 27
10
votes
2 answers
What is the Time Complexity of Linear Regression?
I am working with linear regression and I would like to know the Time complexity in big-O notation. The cost function of linear regression without an optimisation algorithm (such as Gradient descent) needs to be computed over iterations of the…
user134132523
- 149
- 1
- 3
- 13
10
votes
1 answer
Cost function for Ordinal Regression using neural networks
What is the best cost function to train a neural network to perform ordinal regression, i.e. to predict a result whose value exists on an arbitrary scale where only the relative ordering between different values is significant (e.g: to predict…
xboard
- 348
- 3
- 14
7
votes
2 answers
What are the cases where it is fine to initialize all weights to zero
I've taken a few online courses in machine learning, and in general, the advice has been to choose random weights for a neural network to ensure that your neurons don't all learn the same thing, breaking symmetry.
However, there were other cases…
Stephen
- 272
- 2
- 9
6
votes
2 answers
Does MLP always find local minimum
In linear regression we use the following cost function which is a convex function:
We Use the following cost function
in logistic regression because the preceding cost function is not convex whenever the hypothesis (h) is logistic function. We…
Green Falcon
- 13,868
- 9
- 55
- 98
5
votes
2 answers
What is an intuitive explanation for the log loss cost function?
I would really appreciate if someone could explain the log loss cost function And the use of it in measuring a classification model performance.
I have read a few articles but most of them concentrate on mathematics and not on intuitive explanation…
Sai Kumar
- 601
- 1
- 8
- 14
4
votes
2 answers
Cost sensitive learning and class balancing
I am facing a classification problem with classes that are really imbalanced (more or less 1% of positive cases). In addition, the "cost" of a False Negative (FN) is much higher than the cost of False Positive (FP).
Considering so, I decided to…
A1010
- 193
- 9
4
votes
2 answers
Regularization for intercept parameter
Why is the regularization parameter not applied to the intercept parameter?
From what I have read about the cost functions for Linear and Logistic regression, the regularization parameter (λ) is applied to all terms except the intercept. For…
N.M
- 181
- 1
- 5
4
votes
2 answers
Question of using gradient descent instead of calculus. I checked previous questions there are still points to clarify
First of all I checked http://stats.stackexchange.com/questions/23128/solving-for-regression-parameters-in-closed-form-vs-gradient-descent, http://stackoverflow.com/questions/26804656/why-do-we-use-gradient-descent-in-linear-regression,…
J.Smith
- 458
- 3
- 15
4
votes
2 answers
XGBoost change loss function
I'm using XGBoost (through the sklearn API) and I'm trying to do a binary classification.
False Positives are much worse for me than False Negatives, how can I take this into account?
The API confuses me a bit and I found two arguments that might be…
cosec
- 51
- 1
- 3
3
votes
1 answer
Why does the MAE still remain, at all?
This may seem to be a silly question. But I just wonder why the MAE doesn't reduce to values close to 0.
It's the result of an MLP with 2 hidden layers and 6 neurons per hidden layer, trying to estimate one outputvalue depending on three input…
Turnvater
- 48
- 6
3
votes
1 answer
Comparison between cost functions to determine the "best" model?
I'm building an LSTM neural net for time series prediction (regression) and I am incorporating custom loss functions into training. I'm trying to determine which cost function (of 3 cost functions) gives the "best" model, in other words, trying to…
PyRsquared
- 1,584
- 1
- 10
- 17