Saturday, 10 January 2009

Linear regression example

Problem setup
-------------

y = mx + c

If m = 4, and c = 34, the values of y for x = [1,2,3,4,5] are:

In [68@19:52:03]:x = array([1, 2, 3, 4])

In [71@19:52:03]:y = 4 * x + 34

In [72@19:52:03]:y
Out[72]: array([38, 42, 46, 50])



Solve for m, c
--------------

In [73@19:52:03]:a = array( [[1, x[0]], [1, x[1]], [1, x[2]], [1, x[3]]])

In [80@19:52:03]:numpy.linalg.lstsq(a, y)
Out[80]: (array([ 34., 4.]), array([ 0.]), 2, array([ 5.77937881, 0.77380911]))

There in the first item returned, are c and m for a least squares solution. The second item is the 'Sums of residues'. Third is the rank. Fourth are the singular values of 'a'.

If the relation wasn't y = mx + c, but, say mx + nx^{2} + c, then a third column in a could be used, to hold the calculated square terms of x. n would then be returned also, in the first item.


Improvement
-----------

If x and y are are lists of coordinates:

a = numpy.ones((len(x), 2))
a[:, 1] = x
y = numpy.array(y)
out = numpy.linalg.lstsq(a, y)

c, m = out[0]
sum_residues = out[1]
out_rank = out[2]
singulars = out[3]

No comments:

Post a Comment