Should we add bias to each entry of the convolution then sum, or add bias once at end of calculating the convolution in CNNs?
-
what do you mean by "sum"? Can you express your question mathematically, its a bit vague from words. – Louis T Nov 08 '17 at 21:57
2 Answers
Short answer: the bias is added once after the convolution has been calculated.
Long answer: discrete convolution that you see in CNNs is a linear function applied to pixel values in a small region of an image. The output of this linear function is then jammed through some nonlinearity (like ReLU). For a region $\mathbf{x}$ of size $i \times j$ of an image and a convolutional filter $\mathbf{k}$, and no bias term, this linear function $f$ would be defined as:
$$ f(\mathbf{x}, \mathbf{k}) = \mathbf{x}*\mathbf{k} = \sum_{i,j} k_{i,j} x_{i,j} $$
Without a bias term, this linear function $f$ must go through the origin. In other words, if $\mathbf{x}$ or $\mathbf{k}$ is all zeroes, the output of $f$ will be zero as well. This may not be desirable, so we add a bias term $b$. This gives the model more flexibility by providing a value that is always added to the output of the convolution, regardless of the values of $\mathbf{x}$ and $\mathbf{k}$ -- in other words, it's the intercept value.
$$ f(\mathbf{x}, \mathbf{k}, b) = b + (\mathbf{x}*\mathbf{k}) = b + \sum_{i,j} k_{i,j} x_{i,j} $$
If this value was added to each entry of the convolution, it would not achieve its purpose as $f$ would still necessarily go through the origin.
- 3,900
- 20
- 35
Based on the answer here and the blog post here there are two variants for using bias in convolutional layers. Tied biases if you use one bias per convolutional filter/kernel and untied biases if you use one bias per kernel and output location.
- 13,868
- 9
- 55
- 98