10

I am working with linear regression and I would like to know the Time complexity in big-O notation. The cost function of linear regression without an optimisation algorithm (such as Gradient descent) needs to be computed over iterations of the weight combinations (as a brute force approach). This makes computation time dependent on the number of weights and obviously on the number of training data.

If $n$ is the number of training data, $W$ is the number of weights and each resolution of the weight space is set to $m$ meaning that each weight will iterate through $m$ number of possible values. Then the time complexity of this linear regression is

$O(m^Wn)$

Is this correct?

user134132523
  • 149
  • 1
  • 3
  • 13

2 Answers2

5

In linear regression you have to solve $$ (X'X)^{-1}X'Y, $$ where $X$ is a $n\times p$ matrix. Now, in general the complexity of the matrix product $AB$ is O(abc) whenever $A$ is $a\times b$ and $B$ is $b\times c$. Therefore we can evaluate the following complexities:

a) the matrix product $X'X$ with complexity $O(p^2n)$.

b) the matrix-vector product $X'Y$ with complexity $O(pn)$.

c) the inverse $(X'X)^{-1}$ with complecity $O(p^3)$,

Therefore the complexity is $O(np^2 + p^3)$.

Carlos Llosa
  • 151
  • 1
  • 2
2

It highly depends on the "solver" you are using.

Calling $n$ the number of observations and $p$ the number of weights, the overall complexity should be $n^2p+p^3$.

Indeed, when performing a linear regression you are doing matrices multiplication whose complexity is $n^2p$ (when evaluating $X'X$) and inverting the resulting matrix. It is now a square matrix with $p$ rows, the complexity for matrix inversion usually is $p^3$ (though it can be lowered).

Hence a theoretical complexity : $n^2p+p^3$.

Side notes

However, numerical simulations (using python's scikit library) seem to have a time complexity close to $n^{0.72} p^{1.3}$

This may be due to the fact that no implementation actually perform the full inversion (instead, the system can be solved using gradient descents) or due to the fact that there are other ways to calibrate the weights of a linear regression.

Source

An article from my blog : computational complexity of machine learning algorithms

RUser4512
  • 157
  • 4
  • why is the exponent of n, squared. if I have 3 weights then the error function computed over n training sets have to be recomputed for all the possible combinations of those 3 weights right. same goes for higher number of weights. that is why my reasoning was that the number of weights should be an exponent. and also in your calculation where is the limit on the number of possible iterations per weight? I am talking about a purely brute force approach even though its not the optimal way to go about it. – user134132523 Sep 03 '18 at 18:01
  • "why is the exponent of n, squared. if I have 3 weights then the error function computed over n training sets have to be recomputed for all the possible combinations of those 3 weights right." No. This would be true if you were doing a naive approach, but based on the closed formula for $\hat{\beta}$, you do not have to try every combination of weights – RUser4512 Sep 04 '18 at 08:09
  • 6
    from : https://math.stackexchange.com/questions/84495/computational-complexity-of-least-square-regression-operation, you may have inverted the exponent on n and p. The complexity should be $np^2$, this is also in line with practical time complexity. – Lucas Morin Apr 15 '20 at 15:17
  • 2
    Indeed n and p are inverted, it confused me too. It would be nice if the author would fix the answer. – offlinehacker Jul 13 '21 at 08:19