Questions tagged [derivation]
20 questions
3
votes
1 answer
Is it valid to use numpy.gradient to find slope of line as well as slope of curve at any point?
what is the difference between slope of the line and slope of the curve? Is it valid to use numpy.gradient to find the slope of the line and slope of the curve at any point?
#slope of line at any point
tanθ= y2-y1/x2-x1
#slope of…
star
- 1,411
- 7
- 18
- 29
3
votes
1 answer
Maximum Entropy Policy Gradient Derivation
I am reading through the paper on Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review by Sergey Levine. I am having a difficulty in understanding this part of the derivation on Maximum Entropy Policy Gradients (Section…
Ricky Sanjaya
- 39
- 3
2
votes
0 answers
Deriving vectorized form of linear regression
We first have the weights of a D dimensional vector $w$ and a D dimensional predictor vector $x$, which are all indexed by $j$. There are $N$ observations, all D dimensional. $t$ is our targets, i.e, ground truth values. We then derive the cost…
user2793618
- 143
- 4
2
votes
1 answer
1st order Taylor Series derivative calculation for autoregressive model
I wrote a blog post where I calculated the Taylor Series of an autoregressive function. It is not strictly the Taylor Series, but some variant (I guess). I'm mostly concerned about whether the derivatives look okay. I noticed I made a mistake and…
targetXING
- 121
- 7
2
votes
1 answer
Doubt in Derivation of Backpropagation
I was going through the derivation of backpropagation algorithm provided in this document (adding just for reference). I have doubt at one specific point in this derivation. The derivation goes as follows:
Notation:
The subscript $k$ denotes the…
ATK
- 175
- 6
1
vote
1 answer
SVM - Making sense of distance derivation
I am studying the math behind SVM.
The following question is about a small but important detail during the SVM derivation.
The question
Why the distance between the hyperplane $w*x+b=0$ and data point (in vector form) $p$, $d = \frac{w * p +…
Alan Yue
- 21
- 2
1
vote
1 answer
How is this score function estimator derived?
In this paper they have this equation, where they use the score function estimator, to estimate the gradient of an expectation. How did they derive this?
adam
- 13
- 2
1
vote
1 answer
Derivative of Loss wrt bias term
I read this and have an ambiguity.
I try to understand well how to calculate the derivative of Loss w.r.t to bias.
In this question, we have this definition:
np.sum(dz2,axis=0,keepdims=True)
Then in Casper's comment, he said that the The derivative…
Gonzalo Sanchez cano
- 111
- 1
1
vote
0 answers
back propagation through time derivation issue
I read several posts about BPTT for RNN, but I am actually a bit confused about one step in the derivation. Given
$$h_t=f(b+Wh_{t-1}+Ux_t)$$
when we compute $\frac{\partial h_t}{\partial W}$, does anyone know why is it simply
$$\frac{\partial…
username123
- 151
- 4
1
vote
1 answer
A Derivation in Combinatory Categorial Grammer
I am reading about CCG on page 23 of Speech and Language processing. There is a derivation as follows:
(VP/PP)/NP , VP\((VP/PP)/NP) => VP?
Can anyone example this please? This make sense if
VP\((VP/PP)/NP) is equivalent to (VP\(VP/PP))/NP
and…
chikitin
- 153
- 6
1
vote
0 answers
How to compute backpropagation gradient according chain rule for using vector/matrix differential?
I have some problems for computing derivative for sum of squares error in backprop neural network.
For example, we have a neural network as in picture. For drawing simplicity, i've dropped the sample indexes.
Сonventions:
x - data_set input.
W - is…
Grigogiy Reznichenko
- 11
- 2
1
vote
0 answers
Adding a group specific penalty to binary cross-entropy
I want to implement a custom Keras loss function that consists of plain binary cross-entropy plus a penalty that increases the loss for false negatives from one class (each observation can belong to one of two classes, privileged and unprivileged)…
Tim
- 11
- 1
0
votes
1 answer
Loss function for points inside polygon
I am trying to optimize some parameters that used to transform 2d points from a place to another (you may think of that as rotation & translation parameter for simplicity)
The parameters are considered optimal if the transformed points lay inside a…
Humam Helfawi
- 101
- 2
0
votes
1 answer
Problem for a math formula in Weight Uncertainty in Neural Network
I am studying the paper https://arxiv.org/pdf/1505.05424.pdf and there is a formula I don't get page 4:
I don't understand how they obtain this formula. Moreover, with chain rule, I get $\frac{\partial f(\mathrm w, \theta)}{\partial\mathrm w} =…
Jack21
- 1
- 1
0
votes
0 answers
How to find the derivative of the hidden state of recurrent neural networks?
Recently I am reading the following paper (link)
Liu, Sifei, Jinshan Pan, and Ming-Hsuan Yang. “Learning Recursive Filters for Low-Level Vision via a Hybrid Neural Network.” In Computer Vision – ECCV 2016, edited by Bastian Leibe, Jiri Matas, Nicu…
user153245
- 101
- 3