## Roots of the equation f(x) = 0 (f and x are vectors)

Simultaneous solution of multiple equations for multiple unknowns is a dauting task, in general. Many of the solution strategies from the solution of single equations in single unknowns (Newton's method, for example) can be applied but the possibilty for no solution or multiple solutions is very much greater.

[Back to main page] [Back to numerics page]

### Topics

The subject breaks naturally into two divisions:

1. f(x) is a linear vector function (f = Ax - b)
2. f(x) is a nonlinear vector function

No attempt is made here to provide a tutorial on solution of equations. Rather, some typical examples are presented and some conclusions are indicated. If you find yourself much in need of a tool in this area, consider the routines in the Optimization Toolbox (provided by the MathWorks).

Finally, before going too far with the material in this section you should

### Linear sets of equations

If the equations you wish to solve are linear, then the methods of linear algebra is the way to go. There are two cases that are usually of interest:

1. N equations in N unknowns

Example: Interpolating polynomial of degree N-1 passing through N distinct points.

Given: N distinct x-y pairs (xj,yj), (j = 1:N)

Find: Pn(x) such that Pn(xj) = yj and n = N - 1.

For each x-y pair (j = 1:N), the equation

cN xjN-1 + cN-1 xjN-2 + ... + c2 x + c1 = yj

holds since the interpolating polynomial must pass exactly through all the points in the given set. The goal is to use this set of N equations to determine the N unknown coefficients, ck (k = 1:N) of the polynomial.

If we create a set of coefficients ajk = xjN-k (i.e., the j-th x-value raised to the N-k power) then the set of equations may be written as

Ac = y

or

f(c) = Ac - y = 0

where A is the matrix of coefficients just defined, c an N-by-1 vector of the unknown coefficients and y is a vector that contains the y-values from the given data.

For example, find the coeffients of the parabola (N = 2) that passes through the points (-1,3), (1,2) and (2,5).

```   x = [-1 1 2]'                % x-values
y = [3 2 5]'                 % y-values
A = [x.^2 x ones(size(x))]   % coefficient matrix
c = A\y                      % the coefficients
check = polyfit(x,y,2)       % apply polyfit
c - check'                   % are they the same?
```

Except for the shape of the coefficient vector, the solution produced by polyfit is the same (within roundoff) as that using matrix inversion (matrix division).

2. M equations in N unknowns (with M > N)

Example: Fitting functions other than polynomials (polyfit generalization)

When you use polyfit to find the unknown coefficients in the fit of a polynomial of specified order to a set of data, you are carrying out a particular case of linear regression. More generally, given the functions f1(x), f2(x), ... , fN(x), the linear model

c1 f1(x) + c2 f2(x) + ... + cN fN(x) = y

can be fit to a set of x-y data ((xj, yj), j = 1:M) by determining the unknown coefficients ck (k = 1:N). The number of data pairs (M) is generally much greater than the number of fitting functions (N).

Inserting each of the M x-y pairs into the expression for the linear model given above produces the following set of M equations

c1 f1(xj) + c2 f2(xj) + ... + cN fN(xj) = yj

(j = 1:M) which constitutes a set of equations that are linear in the unknown coefficients, ck. Note that there is no restriction on the linearity on the functions chosen for the model. What is important here is that the functions are combined linearly.

Defining the elements of an M-by-N matrix A as ajk = fk(xj), the equations above may be written in compact matrix form as

Ac = y

which is the same f(c) that was found for the case of polynomial interpolation in the M = N case. The only difference here is that M > N and so there will not be a unique solution.

To generate a unique solution, the hope that f(x) be zero exactly is given up and some additional criterion for assesing the quality of the solution is imposed. A common criterion is that the coefficients ck are chosen such that the residual f(c) = Ac - y is minimized in some sense. If the square of the residual (fTf) is used to guide the minimization then it may be shown that the coefficients satisfy the N-by-N set of equations

Bc = d

where B = ATA and d = ATy. These are the so-called "normal" equations for the given system. The solution in this case (using MATLAB) is simply

```   c = A\y
```

Note that the matrix B and the vector d need not (and should not) be formed.

As a specific example, consider fitting the function

y = c1 + c2 x + c3 sin(x)

to a set of synthetic data (a model with error added):

```   x = linspace(0,5,50)';        % x-values for the data set
y = 1 + 3*x - 2*sin(x);       % the answer
noise = 0.5*randn(size(x));   % some noise to throw in
y = y + noise;                % the disguised answer
A = [ones(size(x)) x sin(x)]; % coefficient matrix
c = A\y                       % the solution: is it [1 3 -2]'?
```

(you should be able to paste the above lines right into the MATLAB command window.) A graphical look at the quality of the solution can also be generated:

```
xi = linspace(0,5,200);              % plotting coordinates
yi = c(1) + c(2)*xi + c(3)*sin(xi);  % evaluate the function
plot(x,y,'o',xi,yi)                  % display a comparison
```

Strictly speaking, this particular type of problem is not root-finding since the function f(x) is not 0 at the solution. Rather, a feature of the solution (the minimum of its residual) is found. In a sense, however, a root-find did occur. At the minimum of a function, the derivative is zero. Hence, for this problem a root of the derivative of the residual was found.

### Non-linear sets of equations

Solution of sets of non-linear equations is a difficult task and MATLAB does not have a built-in solver for more than one variable (though the Optimization Toolbox does have one). Presently, you can retrieve fsolve35.m from the Mathworks FTP site (ftp://ftp.mathworks.com/contrib) that will give you some functionality in this area. The focus here will again be on examples that illustrate the way to set up and deal with these sorts of problems.

As with the linear case (above) two major cases can be distinguished:

### When the number of equations is the same as the number of unknowns

Example: Newton's method for N equations in N unknowns

One way of deriving the up-dating procedure for the 1-dimensional version of Newton's method is to write the Taylor-series representation of the function and then truncate the expression. For the N-dimensional case, the same approach can be used but keeping track of the differentiation is most effectively done with matrix notation.

At the point xo, the function f(x) can be expanded in a Taylor series as

f(x) = f(xo) + J (x - xo) + higher-order terms

where J is a matrix of first derivatives of the function, called the Jacobian matrix. The elements of this matrix are

Jij = partial derivative of fi with respect to xj

Truncating this expansion at the first derivative and setting this expression to zero (which is what we would like f(x) to be when the root is found) gives

0 = f(xo) + J (x - xo)

which, on re-arrangement, leads to the following estimate of where the root can be found.

x = xo - J-1 f(xo)

If the function at this new estimate of the root is not close enough to zero, the estimate can replace xo and thus generate a new estimate.

A simple implementation of this version of Newton's method follows:

1. Write a function which, when provided with a vector of x-values, computes the value of the N elements of the function as well as the N-by-N Jacobian matrix.

A typical definition line might look like

```function [f,J] = myfun(x)
%MYFUN  Returns the function (f) and Jacobian (J) given x
```
2. Use that function in a version of the following script
```   tolerance = 1e-6;
xo = a column vector of values
[f,J] = myfun(xo);
x = xo - J\f;

while norm(x-xo) > tolerance
xo = x;
[f,J] = myfun(xo);
x = xo - J\f;
end
```

While this is not a "production" version of a robust root-finder, it is simple enough to incorporate in other functions or improve upon to create a robust root-finder. Aside from its simplistic nature, the method suffers from the need for you to provide a matrix of first-derivatives, a time-consuming if not impossible task. For "small" problems, however, the approach may be feasible.

### When the number of equations is larger than the number of unknowns

Example: Fitting a multi-exponential decay

A fit of the expression

y = a1 exp(b1 x) + a2 exp(b2 x)

to a set of data points ((xj,yj), j = 1:M) occurs in many guises in engineering and physics. If the number of points involved is greater than 4 (and it generally should be) an exact determination of the parameters of the model (ak, bk, k = 1:2) is not possible. In such an instance, determination of the parameters is usually found by minimizing the difference between the prediction (generated by evaluating the function using an assumed set of parameters) and the actual y-values from the data.

The degree of mismatch between the model and the data is often characterized by the sum of the squares of the individual difference for each data point. Specifically, the "objective function" to be minimized is

f = d12 + d22 + ... + dM2

where dj2 = {yj - (a1 exp(b1 xj) + a2 exp(b2 xj) }2. While not strictly a root-finding problem, this data-fitting/optimization problem can be approached as a root-finding problem from the point of view of its first derivative (which is zero at the minimum of the function defined above.

The built-in function fmins (renamed fminsearch in version 6) can be used to minimize a function of 2 or more variables and the above problem provides a good example of how to use this function.

1. Write a function Mfile that evaluates the objective function defined above:
```function sse = expmodel(p,x,y)
%EXPMODEL  Fit of a 2-term exponential model to a set of data

% The parameters are in p (4-by-1 vector) and assumed to
% be in the following order:
a1 = p(1);
b1 = p(2);
a2 = p(3);
b2 = p(4);

ypred = a1*exp(b1*x) - a2*exp(b2*x);  % predictions
err = y - ypred;                      % mismatches

sse = sum(err.^2);                    % residual measure
```

2. Load in a set of x-y data and use fmins to find the values of the parameters that minimize the residual measure given above. The following script gives an idea of the direction to head in:
```x = a N-by-1 vector of x-values
y = a N-by-1 vector of y-values
po = a 4-by-1 vector of inital estimates of the parameters

opt = foptions;
p = fmins('expmodel',po,opt,[],x,y);
```

The variable opt is used by fmins to customize performance and accuracy of the computation. See the on-line help information on foptions for your options here.

You should be able to see the obvious generalization of this approach to fitting functions other than the one focussed on here. When constructing Mfiles for use with fmins, you can either include the data set as a parameter (as was done above) or you can eliminate the two extra input arguments and put the data in the Mfile directly. See the discussion of the syntax options for fzero for additional information.

[Back to main page] [Back to numerics page]
Comments? Contact Jim Maneval at maneval@bucknell.edu