Simultaneous solution of multiple equations for multiple unknowns is a dauting task, in general. Many of the solution strategies from the solution of single equations in single unknowns (Newton's method, for example) can be applied but the possibilty for no solution or multiple solutions is very much greater.
The subject breaks naturally into two divisions:
No attempt is made here to provide a tutorial on solution of equations. Rather, some typical examples are presented and some conclusions are indicated. If you find yourself much in need of a tool in this area, consider the routines in the Optimization Toolbox (provided by the MathWorks and available at Bucknell on the Sun system).
Finally, before going too far with the material in this section you should
If the equations you wish to solve are linear, then the methods of linear algebra is the way to go. There are two cases that are usually of interest:
Example: Interpolating polynomial of degree N-1 passing through N distinct points.
Given: N distinct x-y pairs (xj,yj), (j = 1:N)
Find: Pn(x) such that Pn(xj) = yj and n = N - 1.
For each x-y pair (j = 1:N), the equation
cN xjN-1 + cN-1 xjN-2 + ... + c2 x + c1 = yj
holds since the interpolating polynomial must pass exactly through all the points in the given set. The goal is to use this set of N equations to determine the N unknown coefficients, ck (k = 1:N) of the polynomial.
If we create a set of coefficients ajk = xjN-k (i.e., the j-th x-value raised to the N-k power) then the set of equations may be written as
Ac = y
or
f(c) = Ac - y = 0
where A is the matrix of coefficients just defined, c an N-by-1 vector of the unknown coefficients and y is a vector that contains the y-values from the given data.
For example, find the coeffients of the parabola (N = 2) that passes through the points (-1,3), (1,2) and (2,5).
x = [-1 1 2]' % x-values y = [3 2 5]' % y-values A = [x.^2 x ones(size(x))] % coefficient matrix c = A\y % the coefficients check = polyfit(x,y,2) % apply polyfit c - check' % are they the same?
Except for the shape of the coefficient vector, the solution produced
by polyfit is the same (within roundoff) as that using
matrix inversion (matrix division).
Example: Fitting functions other than polynomials (polyfit generalization)
When you use polyfit to find the unknown coefficients in the fit of a polynomial of specified order to a set of data, you are carrying out a particular case of linear regression. More generally, given the functions f1(x), f2(x), ... , fN(x), the linear model
c1 f1(x) + c2 f2(x) + ... + cN fN(x) = y
can be fit to a set of x-y data ((xj, yj), j = 1:M) by determining the unknown coefficients ck (k = 1:N). The number of data pairs (M) is generally much greater than the number of fitting functions (N).
Inserting each of the M x-y pairs into the expression for the linear model given above produces the following set of M equations
c1 f1(xj) + c2 f2(xj) + ... + cN fN(xj) = yj
(j = 1:M) which constitutes a set of equations that are linear in the unknown coefficients, ck. Note that there is no restriction on the linearity on the functions chosen for the model. What is important here is that the functions are combined linearly.
Defining the elements of an M-by-N matrix A as ajk = fk(xj), the equations above may be written in compact matrix form as
Ac = y
which is the same f(c) that was found for the case of polynomial interpolation in the M = N case. The only difference here is that M > N and so there will not be a unique solution.
To generate a unique solution, the hope that f(x) be zero exactly is given up and some additional criterion for assesing the quality of the solution is imposed. A common criterion is that the coefficients ck are chosen such that the residual f(c) = Ac - y is minimized in some sense. If the square of the residual (fTf) is used to guide the minimization then it may be shown that the coefficients satisfy the N-by-N set of equations
Bc = d
where B = ATA and d = ATy. These are the so-called "normal" equations for the given system. The solution in this case (using MATLAB) is simply
c = A\y
Note that the matrix B and the vector d need not (and should not) be formed.
As a specific example, consider fitting the function
y = c1 + c2 x + c3 sin(x)
to a set of synthetic data (a model with error added):
x = linspace(0,5,50)'; % x-values for the data set y = 1 + 3*x - 2*sin(x); % the answer noise = 0.5*randn(size(x)); % some noise to throw in y = y + noise; % the disguised answer A = [ones(size(x)) x sin(x)]; % coefficient matrix c = A\y % the solution: is it [1 3 -2]'?
(you should be able to paste the above lines right into the MATLAB command window.) A graphical look at the quality of the solution can also be generated:
xi = linspace(0,5,200); % plotting coordinates yi = c(1) + c(2)*xi + c(3)*sin(xi); % evaluate the function plot(x,y,'o',xi,yi) % display a comparison
Strictly speaking, this particular type of problem is not root-finding since the function f(x) is not 0 at the solution. Rather, a feature of the solution (the minimum of its residual) is found. In a sense, however, a root-find did occur. At the minimum of a function, the derivative is zero. Hence, for this problem a root of the derivative of the residual was found.
Solution of sets of non-linear equations is a difficult task and MATLAB does not have a built-in solver for more than one variable (though the Optimization Toolbox does have one). Presently, you can retrieve fsolve35.m from the Mathworks FTP site (ftp://ftp.mathworks.com/contrib) that will give you some functionality in this area. The focus here will again be on examples that illustrate the way to set up and deal with these sorts of problems.
As with the linear case (above) two major cases can be distinguished:
Example: Newton's method for N equations in N unknowns
One way of deriving the up-dating procedure for the 1-dimensional version of Newton's method is to write the Taylor-series representation of the function and then truncate the expression. For the N-dimensional case, the same approach can be used but keeping track of the differentiation is most effectively done with matrix notation.
At the point xo, the function f(x) can be expanded in a Taylor series as
f(x) = f(xo) + J (x - xo) + higher-order terms
where J is a matrix of first derivatives of the function, called the Jacobian matrix. The elements of this matrix are
Jij = partial derivative of fi with respect to xj
Truncating this expansion at the first derivative and setting this expression to zero (which is what we would like f(x) to be when the root is found) gives
0 = f(xo) + J (x - xo)
which, on re-arrangement, leads to the following estimate of where the root can be found.
x = xo - J-1 f(xo)
If the function at this new estimate of the root is not close enough to zero, the estimate can replace xo and thus generate a new estimate.
A simple implementation of this version of Newton's method follows:
A typical definition line might look like
function [f,J] = myfun(x) %MYFUN Returns the function (f) and Jacobian (J) given x
tolerance = 1e-6; xo = a column vector of values [f,J] = myfun(xo); x = xo - J\f; while norm(x-xo) > tolerance xo = x; [f,J] = myfun(xo); x = xo - J\f; end
While this is not a "production" version of a robust root-finder, it is simple enough to incorporate in other functions or improve upon to create a robust root-finder. Aside from its simplistic nature, the method suffers from the need for you to provide a matrix of first-derivatives, a time-consuming if not impossible task. For "small" problems, however, the approach may be feasible.
Example: Fitting a multi-exponential decay
A fit of the expression
y = a1 exp(b1 x) + a2 exp(b2 x)
to a set of data points ((xj,yj), j = 1:M) occurs in many guises in engineering and physics. If the number of points involved is greater than 4 (and it generally should be) an exact determination of the parameters of the model (ak, bk, k = 1:2) is not possible. In such an instance, determination of the parameters is usually found by minimizing the difference between the prediction (generated by evaluating the function using an assumed set of parameters) and the actual y-values from the data.
The degree of mismatch between the model and the data is often characterized by the sum of the squares of the individual difference for each data point. Specifically, the "objective function" to be minimized is
f = d12 + d22 + ... + dM2
where dj2 = {yj - (a1 exp(b1 xj) + a2 exp(b2 xj) }2. While not strictly a root-finding problem, this data-fitting/optimization problem can be approached as a root-finding problem from the point of view of its first derivative (which is zero at the minimum of the function defined above.
The built-in function fmins (renamed fminsearch in version 6) can be used to minimize a function of 2 or more variables and the above problem provides a good example of how to use this function.
function sse = expmodel(p,x,y) %EXPMODEL Fit of a 2-term exponential model to a set of data % The parameters are in p (4-by-1 vector) and assumed to % be in the following order: a1 = p(1); b1 = p(2); a2 = p(3); b2 = p(4); ypred = a1*exp(b1*x) - a2*exp(b2*x); % predictions err = y - ypred; % mismatches sse = sum(err.^2); % residual measure
x = a N-by-1 vector of x-values
y = a N-by-1 vector of y-values
po = a 4-by-1 vector of inital estimates of the parameters
opt = foptions;
p = fmins('expmodel',po,opt,[],x,y);
The variable opt is used by fmins to customize performance and accuracy of the computation. See the on-line help information on foptions for your options here.
You should be able to see the obvious generalization of this approach
to fitting functions other than the one focussed on here. When
constructing Mfiles for use with fmins, you can either
include the data set as a parameter (as was done above) or you can
eliminate the two extra input arguments and put the data in the Mfile
directly. See the discussion of the syntax
options for fzero for additional information.