Activation function is a non-linear transformation, usually applied in neural networks to the output of the linear or convolutional layer. Common activation functions: sigmoid, tanh, ReLU, etc.
Questions tagged [activation-function]
166 questions
45
votes
4 answers
Why is ReLU used as an activation function?
Activation functions are used to introduce non-linearities in the linear output of the type w * x + b in a neural network.
Which I am able to understand intuitively for the activation functions like sigmoid.
I understand the advantages of ReLU,…
Bunny Rabbit
- 573
- 1
- 4
- 6
42
votes
2 answers
What is GELU activation?
I was going through BERT paper which uses GELU (Gaussian Error Linear Unit) which states equation as
$$ GELU(x) = xP(X ≤ x) = xΦ(x).$$ which in turn is approximated to $$0.5x(1 + tanh[\sqrt{
2/π}(x + 0.044715x^3)])$$
Could you simplify the equation…
thanatoz
- 2,365
- 4
- 15
- 39
33
votes
4 answers
How to use LeakyRelu as activation function in sequence DNN in keras?When it perfoms better than Relu?
How do you use LeakyRelu as an activation function in sequence DNN in keras?
If I want to write something similar to:
model = Sequential()
model.add(Dense(90, activation='LeakyRelu'))
What is the solution? Put LeakyRelu similar to Relu?
Second…
user10296606
- 1,784
- 5
- 17
- 31
23
votes
2 answers
Why ReLU is better than the other activation functions
Here the answer refers to vanishing and exploding gradients that has been in sigmoid-like activation functions but, I guess, Relu has a disadvantage and it is its expected value. there is no limitation for the output of the Relu and so its expected…
Green Falcon
- 13,868
- 9
- 55
- 98
22
votes
1 answer
Difference of Activation Functions in Neural Networks in general
I have studied the activation function types for neural networks. The functions themselves are quite straightforward, but the application difference is not entirely clear.
It's reasonable that one differentiates between logical and linear type…
Hendrik
- 8,377
- 17
- 40
- 55
18
votes
3 answers
How to create custom Activation functions in Keras / TensorFlow?
I'm using keras and I wanted to add my own activation function myf to tensorflow backend. how to define the new function and make it operational. so instead of the line of code:
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
I'll write…
Basta
- 181
- 1
- 1
- 4
14
votes
1 answer
Input normalization for ReLu?
Let's assume a vanilla MLP for classification with a given activation function for hidden layers.
I know it is a known best practice to normalize the input of the network between 0 and 1 if sigmoid is the activation function and -0.5 and 0.5 if tanh…
Taiko
- 243
- 1
- 2
- 6
12
votes
5 answers
How does Sigmoid activation work in multi-class classification problems
I know that for a problem with multiple classes we usually use softmax, but can we also use sigmoid? I have tried to implement digit classification with sigmoid at the output layer, it works. What I don't understand is how does it work?
bharath chandra
- 121
- 1
- 1
- 4
12
votes
2 answers
Why deep learning models still use RELU instead of SELU, as their activation function?
I am a trying to understand the SELU activation function and I was wondering why deep learning practitioners keep using RELU, with all its issues, instead of SELU, which enables a neural network to converge faster and internally normalizes each…
Konstantinos Skoularikis
- 323
- 2
- 10
10
votes
1 answer
Backpropagation: In second-order methods, would ReLU derivative be 0? and what its effect on training?
ReLU is an activation function defined as $h = \max(0, a)$ where $a = Wx + b$.
Normally, we train neural networks with first-order methods such as SGD, Adam, RMSprop, Adadelta, or Adagrad. Backpropagation in first-order methods requires first-order…
Rizky Luthfianto
- 2,176
- 2
- 19
- 22
9
votes
4 answers
Activation function vs Squashing function
This may seem like a very simple and obvious question, but I haven't actually been able to find a direct answer.
Today, in a video explaining deep neural networks, I came across the term Squashing function. This is a term that I have never heard or…
Mate de Vita
- 193
- 1
- 1
- 6
8
votes
2 answers
Why leaky relu is not so common in real practice?
As leaky relu does not lead any value to 0, so training always continues. And I can't think of any disadvantages it have.
Yet Leaky relu is less popular than Relu in real practice. Can someone tell why?
Prashant Gupta
- 181
- 1
- 3
7
votes
2 answers
How does one derive the modified tanh activation proposed by LeCun?
In "Efficient Backprop" (http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf), LeCun and others propose a modified tanh activation function of the form:
$$ f(x) = 1.7159 * tanh(\frac{2}{3}*x) $$
They argue that :
It is easier to approximate with…
Lucas Morin
- 2,513
- 5
- 19
- 39
7
votes
4 answers
Can the vanishing gradient problem be solved by multiplying the input of tanh with a coefficient?
To my understanding, the vanishing gradient problem occurs when training neural networks when the gradient of each activation function is less than 1 such that when corrections are back-propagated through many layers, the product of these gradients…
zephyr
- 121
- 1
- 9
7
votes
2 answers
Gradient Descent in ReLU Neural Network
I’m new to machine learning and recently facing a problem on back propagation of training a neural network using ReLU activation function shown in the figure. My problem is to update the weights matrices in the hidden and output layers.
The cost…
kelvincheng
- 71
- 1
- 2