5

Why would the dimension of $w^{[2]}$ be $(n^{[2]}, n^{[1]})$ ?

This is a simple linear equation, $z^{[n]}= W^{[n]}a^{[n-1]} + b^{[n]}$

There seems to be an error in the screenshot. the weight, $W$ should be transposed, please correct me if I am wrong.

$W^{[2]}$ are the weights assigned to the neurons in the layer 2

$n^{[1]}$ is the number of neurons in layer 1

Screenshot from Andrew Ng deeplearning coursera course video:

backpropagation algorithm

Stephen Rauch
  • 1,783
  • 11
  • 21
  • 34
kevin
  • 283
  • 1
  • 3
  • 9

1 Answers1

6

There seems to be an error in the screenshot. The weight, $W$ should be transposed, please correct me if I am wrong.

You are wrong.

Matrix multiplication works so that if you multiply two matrices together, $C = AB$, where $A$ is an $i \times j$ matrix and $B$ is a $j \times k$ matrix, then C will be a $i \times k$ matrix. Note that $A$'s column count must equal $B$'s row count ($j$).

In the neural network, $a^{[1]}$ is a $n^{[1]} \times 1$ matrix (column vector), and $z^{[2]}$ needs to be a $n^{[2]} \times 1$ matrix, to match number of neurons.

Therefore $W^{[2]}$ has to have dimensions $n^{[2]} \times n^{[1]}$ in order to generate an $n^{[2]} \times 1$ matrix from $W^{[2]}a^{[1]}$

Neil Slater
  • 28,338
  • 4
  • 77
  • 100
  • Neil, would you mind taking another look at https://datascience.stackexchange.com/questions/23486/proper-derivation-of-dz1-expression-for-backpropagation-algorithm ? – kevin Oct 04 '17 at 13:52
  • It's helpful to think of the weight matrix, W, as an adjacency matrix for a directed graph between layers. Therefore as @Neil Slater says, its a n[next layer] X n[current layer] matrix. – steviesh May 04 '18 at 21:29