Fitting Polynomial Models to Data


It is common engineering practice to "fit a line" to a set of data in order to determine some useful parameter in a mathematical model or perhaps to generate a calibration curve. A straight line is a simple polynomial and goal of the fit is to determine the coefficients (the slope and intercept) of the polynomial that lead to the "best fit" of a line to the data.

The fitting process can be generalized to determine the coefficients of the Nth-order polynomial that best fits N+1 (or more, usually) data points. The determination of the coefficients is usually termed "polynomial regression" and, in MATLAB, is accomplished by the function polyfit.

Topics

There is more to life than fitting just polynomials to data. Check here for other options in Fitting Land.


[Back to main page] [Back to "How do I ..."]

Computing the coefficients

Assuming you have two N-by-1 vectors of data values, x and y, the best coefficients for a straight-line fit (in the least-squares sense) are found through the command

   coeff = polyfit(x,y,1) 
while those for the least-squares polynomial fit of degree n (< N-1) are found by
   coeff = polyfit(x,y,n) 

Make sure that x and y are the same shape (i.e., both art row vectors or both are column vectors) or you'll get an error.

Note that here is a very definite order of the coefficients in the vector coeff: the coefficient of the highest order monomial comes first and the rest follow in descending order (down to the constant term). Thus, for a straight-line fit,

   slope = coeff(1)          
   intercept = coeff(2)

Also, note that the coefficients from polyfit are returned as a row vector.

[TOPICS]

Evaluating the fit

polyfit makes the computation of the coefficients easy. The tough part of polynomial regression is knowing that the "fit" is a good one. Determining the quality of the fit requires experience, a sense of balance and some statistical summaries.

1. Does the fit look good?

A picture is worth a thousand words and so a plot of the curve that represents the fitted polynomial overlaid on the data is a powerful way of assessing the quality of the fit. Using the coefficients determined by polyfit, you can create such a plot via the commands

   coeff = polyfit(x,y,n) 
   xfit = linspace(min(x),max(x),npts)
   yfit = polyval(coeff,xfit)
   plot(x,y,'o',xfit,yfit)

where n is the order of the polynomial you are fitting and npts is the number of points used to plot the polynomial you are fitting. npts = 2 is fine for a line but more points would be needed to make something like a cubic equation (with all its curves) look nice. Try something like 200 for npts to start things off.

It is important that the coefficients returned by polyfit are used without truncation (i.e., they should be used to full machine precision). This is accomplished by using the output of polyfit (the variable coeff) directly as the input to polyval.

2. Are the residuals small and unpatterned?

The residuals are defined as the mismatch of the predicted values of ypred (computed using the x-values in the data you are trying to fit) and the actual (data) values of y.

If the coefficients of the fit are given in the vector coeff, the residuals are calculated via the commands

   ypred = polyval(coeff,x);      % predicted values
   resid = y - ypred;             % mismatch

Two different plots of the residuals can sometimes be quite helpful:

A. A plot of the residuals versus the predicted values

   plot(ypred,resid,'o')

should not show any patterns or trends. The plot should simply show random noise, according to the theory that underlies all the computations in polyfit. The degree of pattern and trend is a very good measure of the quality of the fit (no trend = good fit). If there is a trend, your model is missing something!

B. A normal-probability plot of the residuals should be a straight line if the premises of linear regression are correct and if the fit is good. If the Statistics toolbox is avalilable, the function normplot will do this for you. An alternative is to use nsplot from BISKIT (available at Bucknell).

Numerical methods for assessing the fit are given below.

[TOPICS]

Computing statistical summaries of the fit

There are any number of statistical measures of the "quality" and "appropriateness" of a model fit to a set of data. This section shows how to compute some of the more common measures (coefficient of determination, observed f-value for the fit and the observed t-values/confidence intervals for the coefficients). Note that all of these measures are less informative than the by-eye views discussed above.

For what follows, it is assumed that you have a set of x-y data pairs and that you have used polyfit to compute the coefficients for a polynomial of given order (and have stored in a vector called coeff).


[TOPICS]

The bottom line: polyfit makes polynomial fitting easy. It is vital, then, that you assess the quality of the results generated by polyfit. Hence,

Always think about and assess your results!

because ...

Blind use of the fitted coefficients is poor engineering practice!



[TOPICS]



[back to main page] [back to "How do I ..."]
Comments? Contact Jim Maneval at maneval@bucknell.edu