⚠️ Warning: This is a draft ⚠️

This means it might contain formatting issues, incorrect code, conceptual problems, or other severe issues.

If you want to help to improve and eventually enable this page, please fork RosettaGit's repository and open a merge request on GitHub.

== Clarification needed == This task needs more clarification, like a link to a suitable wikipedia page. —[[User:Dkf|Donal Fellows]] 17:04, 29 June 2009 (UTC)

This task requires merging with [[Polynomial Fitting]] already representing least squares approximation example in the basis {1, ''x'', ''x''2}. Linear regression is just same in the basis {1, ''x''}. --[[User:Dmitry-kazakov|Dmitry-kazakov]] 18:31, 29 June 2009 (UTC)

OK, there's now implementations in two languages. It's not clear to me how this is different from the polynomial fitting task either, but I'm a completionist (for [[Tcl]]) and not a statistician... —[[User:Dkf|Donal Fellows]] 10:34, 9 July 2009 (UTC)

An explanation from the Lapack documentation may be helpful. [http://www.netlib.org/lapack/lug/node27.html] The idea is that you want to model a set of empirical data

$\left\{\left(x_1,F\left(x_1\right)\right)\dots \left(x_m,F\left(x_m\right)\right)\right\}$

by fitting it to a function of the form

$\hat\left\{F\right\}\left(x\right) = \sum_\left\{i=1\right\}^n\beta_i f_i\left(x\right)$

where you've already chosen the $f_i$ functions and only the $\beta$'s need to be determined. The number of data points $m$ generally will exceed the number of functions in the model, $n$, and the functions $f_i$ can be anything, not just $x^i$ as in the case of polynomial curve fitting.

I don't believe the Ruby and Tcl solutions on the page solve the general case of this problem because they assume a polynomial model. I propose that the task be clarified to stipulate that the inputs are two tables or matrices of numbers, one containing all values of $f_j\left(x_i\right)$ and the other containing all values of $F\left(x_i\right)$ with $i$ ranging from 1 to $m$ and $j$ ranging from 1 to $n$, and $m>n$. --[[User:Sluggo|Sluggo]] 12:52, 9 August 2009 (UTC)

:The thing that worried me was that there wasn't any code in there to determine whether the degree of the polynomial selected was justified. I had to hard-code the depth of polynomial to try in the example code. I ''think'' the fitting engine itself doesn't care; you can present any sampled function you want for fitting. (I suppose many of the other more-advanced statistics tasks have this same problem; they require an initial “and magic happens here” to have happened before you can make use of them.) —[[User:Dkf|Donal Fellows]] 13:47, 9 August 2009 (UTC)

:You can always fit any empirical data to a polynomial. This task is about fitting it to functions of a more general form (i.e., a different basis). For example, by selecting the functions $f_i$ as sinusoids with appropriate frequencies, it should be possible to obtain Fourier coefficients (albeit less efficiently than with a dedicated FFT solver). I don't see how the code in the given solutions could be used to do that. --[[User:Sluggo|Sluggo]] 16:57, 9 August 2009 (UTC) :: Well, the only code in the Tcl example that assumes a polynomial is in the example part and not the core solution part; that's using polynomials of up to degree 2 because that's what the page that provided the data used for the example suggested, not out of some kind of endorsement of polynomials. —[[User:Dkf|Donal Fellows]] 20:51, 9 August 2009 (UTC) ::Fair enough. Thank you for the explanation. --[[User:Sluggo|Sluggo]] 22:44, 9 August 2009 (UTC)

== Misleading Note in Intro ==

The note in the introduction is incorrect. So I deleted it. This task is a multiple ''linear'' regression problem; the use of OLS indicates that we are dealing with a ''linear model''. This is very different from a polynomial fitting problem which, by definition, is generally non-linear. At best, the multiple regression task is ''multi-linear'' and it is most certainly a subset of polynomial fitting problems. The note would be correct if we were talking about a multi-variate polynomial fitting task (which would actually make an excellent task). --[[User:Treefall|Treefall]] 22:36, 20 August 2010 (UTC)

== Python example is not correct??? ==

The Python example solves a different problem, namely fitting a quadratic polynomial in one variable to a set of points in the plane. What is asked for is a way of fitting a linear poynomial in several variables to a set of points in some dimension. I think np.linalg.lstsq() , a function from numpy , is what is needed. : The method with the matrix operations was basically correct, but it was hard to see with it using random data. I substituted the Wikipedia example data so it was more clear that the method works. I also added a np.linalg.lstsq() version, which I understand is preferred. —[[User:Sonia|Sonia]] ([[User talk:Sonia|talk]]) 21:37, 3 April 2015 (UTC)

==Task description has too many equations and not enough guidance== Could someone at least try and mitigate the need to at least think you know what multiple regression is, before you can make sense of the page?