Questions tagged [gradient-descent]

Gradient Descent is an algorithm for finding the minimum of a function. It iteratively calculates partial derivatives (gradients) of the function and descends in steps proportional to those partial derivatives. One major application of Gradient Descent is fitting a parameterized model to a set of data: the function to be minimized is an error function for the model.

Gradient descent is a first-order iterative optimization algorithm. It is an optimization algorithm used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function (cost).

To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

Gradient descent is also known as steepest descent, or the method of steepest descent.

https://en.wikipedia.org/wiki/Gradient_descent

454 questions

votes

6 answers

What is the difference between Gradient Descent and Stochastic Gradient Descent?

What is the difference between Gradient Descent and Stochastic Gradient Descent? I am not very familiar with these, can you describe the difference with a short example?

asked Aug 04 '18 at 06:36

Developer

1,069
2
9
11

votes

5 answers

Does gradient descent always converge to an optimum?

I am wondering whether there is any scenario in which gradient descent does not converge to a minimum. I am aware that gradient descent is not always guaranteed to converge to a global optimum. I am also aware that it might diverge from an optimum…

machine-learning neural-network deep-learning optimization gradient-descent

asked Nov 09 '17 at 16:41

wit221

votes

4 answers

Scikit-learn: Getting SGDClassifier to predict as well as a Logistic Regression

A way to train a Logistic Regression is by using stochastic gradient descent, which scikit-learn offers an interface to. What I would like to do is take a scikit-learn's SGDClassifier and have it score the same as a Logistic Regression here.…

python logistic-regression scikit-learn gradient-descent

asked Aug 04 '15 at 08:11

hlin117

votes

1 answer

How does Gradient Descent and Backpropagation work together?

Please forgive me as I am new to this. I have attached a diagram trying to model my understanding of neural network and Back-propagation? From videos on Coursera and resources online I formed the following understanding of how neural network…

machine-learning neural-network gradient-descent backpropagation cost-function

asked Jan 28 '19 at 13:34

Mohamed Mahyoub

votes

2 answers

Why ReLU is better than the other activation functions

Here the answer refers to vanishing and exploding gradients that has been in sigmoid-like activation functions but, I guess, Relu has a disadvantage and it is its expected value. there is no limitation for the output of the Relu and so its expected…

machine-learning neural-network deep-learning gradient-descent activation-function

asked Oct 03 '17 at 14:17

Green Falcon

13,868
9
55
98

votes

4 answers

Is Gradient Descent central to every optimizer?

I want to know whether Gradient descent is the main algorithm used in optimizers like Adam, Adagrad, RMSProp and several other optimizers.

machine-learning neural-network deep-learning optimization gradient-descent

asked Mar 12 '19 at 10:04

rawwar

votes

2 answers

Why averaging the gradient works in Gradient Descent?

In Full-batch Gradient descent or Minibatch-GD we are getting gradient from several training examples. We then average them out to get a "high-quality" gradient, from several estimations and finally use it to correct the network, at once. But why…

gradient-descent mini-batch-gradient-descent

asked Jun 22 '18 at 05:49

Kari

2,686
1
17
47

votes

1 answer

What feature engineering is necessary with tree based algorithms?

I understand data hygiene, which is probably the most basic feature engineering. That is making sure all your data is properly loaded, making sure N/As are treated as a special value rather than a number between -1 and 1, and tagging your…

feature-selection decision-trees xgboost gradient-descent feature-engineering

asked Aug 08 '17 at 15:00

William Entriken

votes

2 answers

Why is learning rate causing my neural network's weights to skyrocket?

I am using tensorflow to write simple neural networks for a bit of research and I have had many problems with 'nan' weights while training. I tried many different solutions like changing the optimizer, changing the loss, the data size, etc. but with…

machine-learning python tensorflow optimization gradient-descent

asked Dec 27 '16 at 22:50

abeoliver

votes

1 answer

How flexible is the link between objective function and output layer activation function?

It seems standard in many neural network packages to pair up the objective function to be minimised with the activation function in the output layer. For instance, for a linear output layer used for regression it is standard (and often only choice)…

neural-network gradient-descent

asked Jul 08 '15 at 20:04

Neil Slater

28,338
4
77
100

votes

1 answer

What is the difference between SGD classifier and the Logisitc regression?

To my understanding, the SGD classifier, and Logistic regression seems similar. An SGD classifier with loss = 'log' implements Logistic regression and loss = 'hinge' implements Linear SVM. I also understand that logistic regression uses gradient…

machine-learning logistic-regression gradient-descent loss-function

asked Sep 07 '18 at 18:15

Akash Dubey

votes

4 answers

Why does it speed up gradient descent if the function is smooth?

I now read a book titled "Hands-on Machine Learning with Scikit-Learn and TensorFlow" and on the chapter 11, it has the following description on the explanation of ELU (Exponential ReLU). Third, the function is smooth everywhere, including around z…

deep-learning gradient-descent

asked Aug 07 '17 at 14:58

Blaszard

votes

2 answers

Stochastic gradient descent based on vector operations?

let's assume that I want to train a stochastic gradient descent regression algorithm using a dataset that has N samples. Since the size of the dataset is fixed, I will reuse the data T times. At each iteration or "epoch", I use each training sample…

python gradient-descent regression

asked Oct 10 '14 at 13:34

Pablo Suau

1,767
1
13
20

votes

3 answers

What is momentum in neural network?

While using "Two class neural network" in Azure ML, I encountered "Momentum" property. As per documentation, which is not clear, it says For The momentum, type a value to apply during learning as a weight on nodes from previous…

machine-learning deep-learning neural-network gradient-descent momentum

asked Oct 18 '20 at 09:25

Sandeep Bhutani

votes

1 answer

Understanding dropout and gradient descent

I am looking at how to implement dropout on deep neural networks and found something counter intuitive. In the forward phase dropout mask activations with a random tensor of 1s and 0s to force net to learn the average of the weights. This help the…

neural-network deep-learning gradient-descent

asked Aug 27 '15 at 19:36

emanuele

2 3

…

30 31 Next