13

What is the difference between Fully Connected layers and Bilinear layers in deep learning?

Green Falcon
  • 13,868
  • 9
  • 55
  • 98
N.IT
  • 1,975
  • 4
  • 17
  • 35

1 Answers1

8

I quote the answers from What is a bilinear tensor layer (in contrast to a standard linear neural network layer) or how can I imagine it?.

A bilinear function is a function of two inputs $x$ and $y$ that is linear in each input separately. Simple bilinear functions on vectors are the dot product or the element-wise product.

Let $M$ be a matrix. The function $f(x,y)=x^TMy=\sum_iM_{ij}x_iy_j$ is bilinear in $x$ and $y$. In fact, any scalar bilinear function on two vectors takes this form. Note that a bilinear function is a linear combination of $x_iy_j$ whereas a linear function such as $g(x,y)=Ax+By$ can only have $x_i$ or $y_i$. For neural nets, that means a bilinear function allows for richer interactions between inputs.

Now what if you want a bilinear function that outputs a vector? Well, you simply define a matrix $M_k$ for each coordinate of the output and you end up with a stack of matrices. That stack of matrices is called a tensor (3-mode tensor to be exact). You can imagine the bilinear tensor product with two vectors as $x^⊤M_ky$ computed on each “slice” of the tensor.

Bilinear Models consists of two feature extractors whose outputs are multiplied using an outer product at each location of the image and pooled to obtain an image descriptor. 1

enter image description here

Its advantage is that it can model pairwise feature interactions in a translationally invariant manner, which is particularly useful for fine-grained categorization. It also allows end-to-end training using image labels only, and achieves state-of-the-art performance on fine-grained classification.

Yashas
  • 103
  • 3
Green Falcon
  • 13,868
  • 9
  • 55
  • 98
  • 1
    Great summary. Just one point -- for the Bilinear CNN model we directly took an outer product of x and y, which would mean an identity matrix for M. Since x and y are the result of learned projections anyway, one can assume anything that a matrix M would encode can be learned end-to-end in the features. – AruniRC Jan 31 '19 at 18:33
  • I am stuck in the formula (,)==∑. Can someone explain it? How do we multiply for instance $x^{T}A$. The dimensions don't seems to match. I am not sure how to do this matching actually since x is 2d and M is 3d. – Eduardo Reis Feb 10 '21 at 06:50
  • For each image, they are done separately. You can search for vectorization. – Green Falcon Feb 10 '21 at 07:01
  • @EduardoReis it is not simple matrix multiplication. It is done by einstein summation aka einsum. – Ritwik Jan 06 '23 at 10:58
  • @GreenFalcon does bilinear functions works better than simple concatenation (of $x$ and $y$) and a linear layer? If yes, then why? – Ritwik Jan 06 '23 at 11:00
  • @Ritwik Sorry for my late response. To answer your question, I can tell you that it highly depends on your task. I cannot tell you it is always like that. There is no proof to show that. – Green Falcon Feb 27 '23 at 13:31