18 Dec 2021 - tsp
Last update 12 Mar 2022
7 mins
Again as for my other mini blog articles for this series this is just a short summary on how to perform least squares using QR decomposition in a practical way and a short summary on how I interpret and see how this algorithm works. It’s again not a mathematical rigid and found definition (again: refer to your favorite mathematics textbook for a fully rigid definition). The idea of this blog article again is just to summarize the idea.
The linear least squares problem is a typical regression or fitting problem. It usually arises when one’s trying to find approximate solutions for an over-determined system - such as fitting a function into real measurements or looking for approximations for solutions to overdetermined systems of linear equations. It’s also part of general residual minimization methods such as GMRES that are used to solve large scale linear problems - such as problems that arise during FEM or FDM / FDTD calculations. As one can guess from it’s name it’s applied to linear functions though the idea can of course be extended to non linear functions - thought then not as easy using just linear algebra obviously.
The typical set of linear equations can be written as
[ A \vec{x} = \vec{b} ]In this case the matrix $A \in \mathfrak{R}^{m \times n}$ contains the coefficient to the $n$ variables contained in $\vec{x} \in \mathfrak{R}^n$. Thus each line describes a single of the $m$ linear equations for the unknowns, the resulting constants are summarized in the elements of $\vec{b} \in \mathfrak{R}^m$. For a determined system with $A \in \mathfrak{R}^{m \times m}$ that has a solution there exists a theoretical perfect solution that could be determined by matrix inversion:
[ A^T A \vec{x} = A^T \vec{b} \\ \vec{x} = A^T \vec{b} ]This can indeed be done either by performing Gaussian elimination or some similar method and is pretty easy to implement. To make this somewhat robust one usually would have to implement a pivoting scheme that is usually performed doing row and column swapping so one is always reducing to the largest element in the diagonal and then performing a diagonalization of the matrix $A$. Depending on the matrix one might also perform different methods to efficiently invert the matrix.
For an overdetermined system - as usually done for fitting problems - or for systems that are not solved exactly because of some kind of measurement noise one usually defined the error deviation - also called residual vector $\vec{r} \in \mathfrak{R}^{m}$:
[ A \vec{x} = \vec{b} \\ \to A \vec{x} - \vec{b} = \vec{0} \\ \vec{r} = A \vec{x} - \vec{b} ]As one can see for an optimal solution the residual would be $\vec{r} = \vec{0}$. The main idea behind most fitting and regression models now is the minimizing of the residual. For least squares one tries to set $\mid \vec{r} \mid_2 ^2 \to min $. One can imagine that this reduces the distance between the found solution to the real solution as far as possible and thus finds an optimal approximation. This is also what one tries to do with linear least squares.
First let’s recall what QR decomposition is - just the key points, I’ve previously written a blog entry on how to perform QR decomposition using Givens rotations numerically. QR decomposition allows one to separate a matrix $A$ into an orthogonal part $Q$ (thus $Q^T Q = \mathfrak{1}$) and an upper triangular part $R$.
Now one can apply the decomposition to the coefficient matrix $A$:
[ \mid A \vec{x} - \vec{b} \mid_2^2 \to min \\ \mid QR \vec{x} - \vec{b} \mid_2^2 \to min ]The key to apply QR decomposition to this problem is now that $Q$ is an orthogonal matrix. Thus it rotates a vector in $m$ dimensional space but it does not change it’s length - that’s also the reason why $Q^{-1} = Q^T$ by the way. Thus in case one applies $Q^T$ to the vector that one’s calculating the norm (length) of the norm does not change:
[ \mid QR \vec{x} - \vec{b} \mid_2^2 \to min \\ \mid Q^{T} ( QR \vec{x} - \vec{b} ) \mid_2^2 \to min \\ \mid Q^{T} Q R \vec{x} - Q^T \vec{b} \mid_2^2 \to min \\ \mid R \vec{x} - Q^T \vec{b} \mid_2^2 \to min ]As one can see the term $Q^T \vec{b}$ is simple to calculate since one knows all factors. For an optimal solvable system one could even express this as:
[ R \vec{x} = Q^T \vec{b} ]In this case the residuum would be $\vec{r} = \vec{0}$. The interesting thing is that the matrix $R$ for an overdetermined system contains a bunch of zero rows extending down from the square form - one can see immediately that it’s impossible to solve the equation above for these lines:
[ \begin{pmatrix} r_{1,1} & \ldots & \ldots & r_{1,n} \\ 0 & r_{2,2} & \ldots & r_{2,n} \\ 0 & 0 & \ldots & \ldots \\ 0 & \ldots & 0 & r_{n,n} \\ 0 & \ldots & \ldots & 0 \\ \ldots & \ldots & \ldots & \ldots \\ 0 & 0 & \ldots & 0 \end{pmatrix} * \begin{pmatrix} x_1 \\ x_2 \\ \ldots \\ x_n \end{pmatrix} = \begin{pmatrix} q_{1,1} & \ldots & \ldots & q_{1,m} \\ \ldots & \ldots & \ldots & \ldots \\ q_{m,1} & \ldots & \ldots & q_{m,m} \end{pmatrix} * \begin{pmatrix} b_1 \\ b_2 \\ \ldots \\ b_m \end{pmatrix} ]The interesting part is that one can - for an optimal approximate solution (the proof can be done by inserting explicitly and seeing that the lower rows do not depend on the solution $\vec{x}$ - obviously since they’re multiplied by 0 - and thus do not contribute to the derivatives during extrema search) one can simply discard the lower zero rows of the $R$ matrix and only solve for the remaining lines using backsubstitution (setting $\vec{c} = (Q^T * \vec{b})_n$ for simplicity):
[ \begin{pmatrix} r_{1,1} & \ldots & \ldots & r_{1,n} \\ 0 & r_{2,2} & \ldots & r_{2,n} \\ 0 & 0 & \ldots & \ldots \\ 0 & \ldots & 0 & r_{n,n} \end{pmatrix} * \begin{pmatrix} x_1 \\ x_2 \\ \ldots \\ x_n \end{pmatrix} = \begin{pmatrix} q_{1,1} & \ldots & \ldots & q_{1,m} \\ \ldots & \ldots & \ldots & \ldots \\ q_{n,1} & \ldots & \ldots & q_{n,m} \end{pmatrix} * \begin{pmatrix} b_1 \\ b_2 \\ \ldots \\ b_m \end{pmatrix} \\ \begin{pmatrix} r_{1,1} & \ldots & \ldots & r_{1,n} \\ 0 & r_{2,2} & \ldots & r_{2,n} \\ 0 & 0 & \ldots & \ldots \\ 0 & \ldots & 0 & r_{n,n} \end{pmatrix} * \begin{pmatrix} x_1 \\ x_2 \\ \ldots \\ x_n \end{pmatrix} = \begin{pmatrix} c_1 \\ c_2 \\ \ldots \\ c_n \end{pmatrix} ]The backsubstitution is then done line by line:
[ r_{n,n} * x_n = c_n \to x_n = \frac{c_n}{r_{n,n}} \\ r_{n-1,n-1} * x_{n-1} + r_{n-1,n} * x_{n} = c_{n-1} \to x_{n-1} = \frac{c_{n-1} - r_{n-1,n} * x_{n}}{r_{n-1,n-1}} \\ r_{n-2, n-2} * x_{n-2} + r_{n-2, n-1} * x_{n-1} + r_{n-2, n} * x_{n} = c_{n-2} \to x_{n-2} = \frac{c_{n-1} - r_{n-2, n-1} * x_{n-1} - r_{n-2, n} * x_{n}}{r_{n-2, n-2}} \\ \ldots \\ \to x_{\alpha} = \frac{c_{\alpha} - \sum_{i = \alpha+1}^{n} x_{i} * r_{\alpha, i}}{r_{\alpha, \alpha}} ]Taking a short look on complexity of solving the linear least squares problem using QR decomposition by Givens rotations:
The following GitHub GIST contains a simple and naive implementation of the described algorithm using the QR decomposition using Givens rotations. The notebook is also available as a PDF
This article is tagged: Math, Programming, Physics, How stuff works, Machine learning, Computational linear algebra
Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/