4

Should we add bias to each entry of the convolution then sum, or add bias once at end of calculating the convolution in CNNs?

Green Falcon
  • 13,868
  • 9
  • 55
  • 98
  • what do you mean by "sum"? Can you express your question mathematically, its a bit vague from words. – Louis T Nov 08 '17 at 21:57

2 Answers2

7

Short answer: the bias is added once after the convolution has been calculated.

Long answer: discrete convolution that you see in CNNs is a linear function applied to pixel values in a small region of an image. The output of this linear function is then jammed through some nonlinearity (like ReLU). For a region $\mathbf{x}$ of size $i \times j$ of an image and a convolutional filter $\mathbf{k}$, and no bias term, this linear function $f$ would be defined as:

$$ f(\mathbf{x}, \mathbf{k}) = \mathbf{x}*\mathbf{k} = \sum_{i,j} k_{i,j} x_{i,j} $$

Without a bias term, this linear function $f$ must go through the origin. In other words, if $\mathbf{x}$ or $\mathbf{k}$ is all zeroes, the output of $f$ will be zero as well. This may not be desirable, so we add a bias term $b$. This gives the model more flexibility by providing a value that is always added to the output of the convolution, regardless of the values of $\mathbf{x}$ and $\mathbf{k}$ -- in other words, it's the intercept value.

$$ f(\mathbf{x}, \mathbf{k}, b) = b + (\mathbf{x}*\mathbf{k}) = b + \sum_{i,j} k_{i,j} x_{i,j} $$

If this value was added to each entry of the convolution, it would not achieve its purpose as $f$ would still necessarily go through the origin.

timleathart
  • 3,900
  • 20
  • 35
1

Based on the answer here and the blog post here there are two variants for using bias in convolutional layers. Tied biases if you use one bias per convolutional filter/kernel and untied biases if you use one bias per kernel and output location.

Green Falcon
  • 13,868
  • 9
  • 55
  • 98