Highest Voted 'activation-function' Questions

45

votes

4 answers

Why is ReLU used as an activation function?

Activation functions are used to introduce non-linearities in the linear output of the type w * x + b in a neural network. Which I am able to understand intuitively for the activation functions like sigmoid. I understand the advantages of ReLU,…

asked Jan 10 '18 at 13:07

Bunny Rabbit

573
1
4
6

42

votes

2 answers

What is GELU activation?

I was going through BERT paper which uses GELU (Gaussian Error Linear Unit) which states equation as $$ GELU(x) = xP(X ≤ x) = xΦ(x).$$ which in turn is approximated to $$0.5x(1 + tanh[\sqrt{ 2/π}(x + 0.044715x^3)])$$ Could you simplify the equation…

activation-function bert mathematics

asked Apr 18 '19 at 08:06

thanatoz

2,365
4
15
39

33

votes

4 answers

How to use LeakyRelu as activation function in sequence DNN in keras?When it perfoms better than Relu?

How do you use LeakyRelu as an activation function in sequence DNN in keras? If I want to write something similar to: model = Sequential() model.add(Dense(90, activation='LeakyRelu')) What is the solution? Put LeakyRelu similar to Relu? Second…

machine-learning python deep-learning keras activation-function

asked Oct 02 '18 at 04:06

user10296606

1,784
5
17
31

23

votes

2 answers

Why ReLU is better than the other activation functions

Here the answer refers to vanishing and exploding gradients that has been in sigmoid-like activation functions but, I guess, Relu has a disadvantage and it is its expected value. there is no limitation for the output of the Relu and so its expected…

machine-learning neural-network deep-learning gradient-descent activation-function

asked Oct 03 '17 at 14:17

Green Falcon

13,868
9
55
98

22

votes

1 answer

Difference of Activation Functions in Neural Networks in general

I have studied the activation function types for neural networks. The functions themselves are quite straightforward, but the application difference is not entirely clear. It's reasonable that one differentiates between logical and linear type…

neural-network activation-function

asked Oct 04 '16 at 11:05

Hendrik

8,377
17
40
55

18

votes

3 answers

How to create custom Activation functions in Keras / TensorFlow?

I'm using keras and I wanted to add my own activation function myf to tensorflow backend. how to define the new function and make it operational. so instead of the line of code: model.add(layers.Conv2D(64, (3, 3), activation='relu')) I'll write…

keras tensorflow activation-function

asked Sep 09 '19 at 07:34

Basta

181
1
1
4

14

votes

1 answer

Input normalization for ReLu?

Let's assume a vanilla MLP for classification with a given activation function for hidden layers. I know it is a known best practice to normalize the input of the network between 0 and 1 if sigmoid is the activation function and -0.5 and 0.5 if tanh…

machine-learning neural-network deep-learning activation-function

asked Dec 20 '17 at 03:39

Taiko

243
1
2
6

12

votes

5 answers

How does Sigmoid activation work in multi-class classification problems

I know that for a problem with multiple classes we usually use softmax, but can we also use sigmoid? I have tried to implement digit classification with sigmoid at the output layer, it works. What I don't understand is how does it work?

machine-learning neural-network deep-learning multiclass-classification activation-function

asked Oct 06 '18 at 08:41

bharath chandra

121
1
1
4

12

votes

2 answers

Why deep learning models still use RELU instead of SELU, as their activation function?

I am a trying to understand the SELU activation function and I was wondering why deep learning practitioners keep using RELU, with all its issues, instead of SELU, which enables a neural network to converge faster and internally normalizes each…

machine-learning deep-learning neural-network activation-function

asked Oct 02 '21 at 19:17

Konstantinos Skoularikis

323
2
10

10

votes

1 answer

Backpropagation: In second-order methods, would ReLU derivative be 0? and what its effect on training?

ReLU is an activation function defined as $h = \max(0, a)$ where $a = Wx + b$. Normally, we train neural networks with first-order methods such as SGD, Adam, RMSprop, Adadelta, or Adagrad. Backpropagation in first-order methods requires first-order…

neural-network optimization backpropagation activation-function

asked Jul 12 '16 at 17:16

Rizky Luthfianto

2,176
2
19
22

9

votes

4 answers

Activation function vs Squashing function

This may seem like a very simple and obvious question, but I haven't actually been able to find a direct answer. Today, in a video explaining deep neural networks, I came across the term Squashing function. This is a term that I have never heard or…

machine-learning neural-network activation-function

asked Aug 06 '18 at 12:48

Mate de Vita

193
1
1
6

8

votes

2 answers

Why leaky relu is not so common in real practice?

As leaky relu does not lead any value to 0, so training always continues. And I can't think of any disadvantages it have. Yet Leaky relu is less popular than Relu in real practice. Can someone tell why?

machine-learning neural-network deep-learning activation-function

asked May 14 '20 at 02:30

Prashant Gupta

181
1
3

7

votes

2 answers

How does one derive the modified tanh activation proposed by LeCun?

In "Efficient Backprop" (http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf), LeCun and others propose a modified tanh activation function of the form: $$ f(x) = 1.7159 * tanh(\frac{2}{3}*x) $$ They argue that : It is easier to approximate with…

neural-network activation-function mathematics

asked Jan 25 '20 at 14:17

Lucas Morin

2,513
5
19
39

7

votes

4 answers

Can the vanishing gradient problem be solved by multiplying the input of tanh with a coefficient?

To my understanding, the vanishing gradient problem occurs when training neural networks when the gradient of each activation function is less than 1 such that when corrections are back-propagated through many layers, the product of these gradients…

machine-learning neural-network deep-learning activation-function

asked May 07 '19 at 13:07

zephyr

121
1
9

7

votes

2 answers

Gradient Descent in ReLU Neural Network

I’m new to machine learning and recently facing a problem on back propagation of training a neural network using ReLU activation function shown in the figure. My problem is to update the weights matrices in the hidden and output layers. The cost…

neural-network gradient-descent activation-function

asked Apr 21 '19 at 06:31

kelvincheng

71
1
2

Questions tagged [activation-function]