In some practical cases of science, two values may correlate, as one being the function of another one. Sometimes, we need to find this function. But finding an exact function based on analysis is not always easy. A solution to this is regression. What this does is it approximates a function based on a cloud of values, which behaves similarly as the previous values. This helps calculating images of values we can’t measure or estimating the behaviour on larger data.
This method is known as the least squares method. This is a type of linear regression, which means the function to find is linear. In order to calculate the function , least square method uses the following formulas:
The main idea is the following: to make a linear equation such as the variance from each point to the point in the graph is smaller than the variance of the mean. In formula:
And in graph:
This image was taken from the book Numerical Methods for Engineers (6th ed.), from S. Chapra.
In order to measure how effective the regression was, we calculate three values:
- Standard error: relative difference of average errors (between using the median as constant function and using the regression).
- Correlation coefficient: how much each value evaluated in the function is related to the original value.
- Determination coefficient: how much the new function is better than the median as a constant function.
The following data compares cricket chirps against temperature. This is the data:
|Cricket chirps per second (chirp/s)||Temperature (°F)|
The resulting function is:
The plot from both the values cloud and the function.
Quantificating the error we get the following values:
- Standard error: 3.80599
- Correlation coefficient: 0.846927
- Determination coefficient: 0.717285
Linear regression is a great tool for finding an approximate function. In order to work properly and to obtain a feasible approximation, the data must have a linear behaving. If we apply this regression to nonlinear data, it will most certainly fail.