\chapter{Spendley's et al. method}
\index{Spendley, W.}
\index{Hext, G. R.}
\index{Himsworth, F. R.}
In this chapter, we present Spendley, Hext and Himsworth algorithm \cite{Spendley1962} for
unconstrained optimization.
We begin by presenting a global overview of the algorithm.
Then we present various geometric situations which might occur
during the algorithm. In the second section, we present several
numerical experiments which allow to get some insight in the behavior
of the algorithm on some simple situations. The two first cases
are involving only 2 variables and are based on a quadratic function.
The last numerical experiment explores the behavior of the algorithm
when the number of variables increases.
\section{Introduction}
In this section, we present Spendley's et al algorithm for unconstrained optimization.
This algorithm is based on the iterative update of a simplex.
At each iteration, either a reflection of a shrink step is performed, so that
the shape of the simplex does not change during the iterations.
Then we present various geometric situations which might occur
during the algorithm. This allows to understand when exactly a reflection
or a shrink is performed in practice.
\subsection{Overview}
The goal of Spendley's et al. algorithm is to solve the
following unconstrained optimization problem
\begin{eqnarray}
\min f(\bx)
\end{eqnarray}
where $\bx\in \RR^n$, $n$ is the number of optimization parameters and $f$ is the objective
function $f:\RR^n\rightarrow \RR$.
This algorithms is based on the iterative update of
a \emph{simplex} made of $n+1$ points $S=\{\bv_i\}_{i=1,n+1}$. Each point
in the simplex is called a \emph{vertex} and is associated with
a function value $f_i=f(\bv_i)$ for $i=1,n+1$.
The vertices are sorted by increasing function values so that the
\emph{best} vertex has index 1 and the \emph{worst} vertex
has index $n+1$
\begin{eqnarray}
\label{sp-sorted-vertices-fv}
f_1 \leq f_2 \leq \ldots \leq f_n \leq f_{n+1}.
\end{eqnarray}
The $\bv_1$ vertex (resp. the $\bv_{n+1}$ vertex) is called the \emph{best}
vertex (resp. \emph{worst}), because it is associated with the lowest (resp. highest)
function value. As we are going to see, the \emph{next-to-worst} vertex $\bv_n$ has a
special role in this algorithm.
The centroid of the simplex $\overline{\bx} (j)$ is the center of the vertices
where the vertex $\bv_j$ has been
excluded. This centroid is
\begin{eqnarray}
\label{sp-centroid-generalized}
\overline{\bx} (j) = \frac{1}{n} \sum_{i=1,n+1, i\neq j} \bv_i.
\end{eqnarray}
The algorithm makes use
of one coefficient $\rho>0$, called the reflection factor. The standard
value of this coefficient is $\rho=1$.
The algorithm attempts to replace some vertex
$\bv_j$ by a new vertex $\bx(\rho,j)$ on the line from the vertex $\bv_j$
to the centroid $\overline{\bx}(j)$. The new vertex $\bx(\rho,j)$ is defined by
\begin{eqnarray}
\label{sp-interpolate-generalized}
\bx(\rho,j) = (1+\rho)\overline{\bx}(j) - \rho \bv_j.
\end{eqnarray}
\subsection{Algorithm}
In this section, we analyze Spendley's et al algorithm, which
is presented in figure \ref{algo-spendley}.
\begin{figure}[htbp]
\begin{algorithmic}
\STATE Compute an initial simplex $S_0$
\STATE Sorts the vertices $S_0$ with increasing function values
\STATE $S\gets S_0$
\WHILE{$\sigma(S)>tol$}
\STATE $\overline{x}\gets \overline{\bx}(n+1)$ \COMMENT{Compute the centroid}
\STATE $\bx_r \gets \bx(\rho,n+1)$ \COMMENT{Reflect with respect to worst}
\STATE $f_r \gets f(\bx_r)$
\IF {$f_r0$ is a chosen scaling parameter.
The more $a$ is large, the more difficult the problem is
to solve with the simplex algorithm.
Indeed, let us compute the Hessian matrix associated with the
cost function. We have
\begin{eqnarray}
\bH = \left(
\begin{array}{cc}
2 a & 0 \\
0 & 2
\end{array}
\right).
\end{eqnarray}
Therefore, the eigenvalues of the Hessian matrix
are $2a$ and $2$, which are stricly
positive if $a>0$. Hence, the cost function is stricly convex and
there is only one global solution, that is $\bx^\star = (0,0,\ldots,0)^T$.
The ratio between these two eigenvalues is $a$. This leads
to an elongated valley, which is extremely narrow when $a$ is large.
The stopping criteria is based on the relative size of the simplex
with respect to the size of the initial simplex
\begin{eqnarray}
\sigma_+(S) < tol \times \sigma_+(S_0).
\end{eqnarray}
In this experiment, we use $tol=10^{-8}$ as the relative tolerance
on simplex size.
We set the maximum number of function evaluations to 400.
The initial simplex is a regular simplex with length unity.
The following Scilab script allows to perform the optimization.
\lstset{language=scilabscript}
\begin{lstlisting}
a = 100;
function y = quadratic (x)
y = a * x(1)^2 + x(2)^2;
endfunction
nm = nmplot_new ();
nm = nmplot_configure(nm,"-numberofvariables",2);
nm = nmplot_configure(nm,"-function",quadratic);
nm = nmplot_configure(nm,"-x0",[10.0 10.0]');
nm = nmplot_configure(nm,"-maxiter",400);
nm = nmplot_configure(nm,"-maxfunevals",400);
nm = nmplot_configure(nm,"-tolxmethod","disabled");
nm = nmplot_configure(nm,"-tolsimplexizerelative",1.e-8);
nm = nmplot_configure(nm,"-simplex0method","spendley");
nm = nmplot_configure(nm,"-method","fixed");
nm = nmplot_configure(nm,"-verbose",1);
nm = nmplot_configure(nm,"-verbosetermination",0);
nm = nmplot_configure(nm,"-simplexfn","rosenbrock.fixed.history.simplex.txt");
nm = nmplot_configure(nm,"-fbarfn","rosenbrock.fixed.history.fbar.txt");
nm = nmplot_configure(nm,"-foptfn","rosenbrock.fixed.history.fopt.txt");
nm = nmplot_configure(nm,"-sigmafn","rosenbrock.fixed.history.sigma.txt");
nm = nmplot_search(nm);
nmplot_display(nm);
nm = nmplot_destroy(nm);
\end{lstlisting}
The numerical results are presented in table \ref{fig-spendley-numexp1-table},
where the experiment is presented for $a=100$. We can check that the
number of function evaluations is equal to its maximum limit, even if the value of the
function at optimum is very inaccurate ($f(\bx^\star) \approx 0.08$).
\begin{figure}[h]
\begin{center}
%\begin{tiny}
\begin{tabular}{|l|l|}
\hline
Iterations & 340 \\
Function Evaluations & 400 \\
$a$ & $100.0$ \\
$\bx_0$ & $(10.0,10.0)$ \\
Relative tolerance on simplex size & $10^{-8}$ \\
Exact $\bx^\star$ & $(0.,0.)$\\
Computed $\bx^\star$ & $(0.001,0.2)$\\
Computed $f(\bx^\star)$ & $0.08$\\
\hline
\end{tabular}
%\end{tiny}
\end{center}
\caption{Numerical experiment with Spendley's et al. method on a badly scaled quadratic function}
\label{fig-spendley-numexp2-table}
\end{figure}
The various simplices generated during the iterations are
presented in figure \ref{fig-spendley-numexp2-historysimplex}.
The method use reflections in the early iterations. Then there
is no possible improvement using reflections, so that shrinking is necessary.
But the repeated shrink steps makes the simplex very small, leading to a large number of
iterations. This is a limitation of the method, which is based on a simplex
which can vary its size, but not its shape.
\begin{figure}
\begin{center}
\includegraphics[width=15cm]{spendleymethod/quad2-spendley-simplexcontours.png}
\end{center}
\caption{Spendley et al. numerical experiment with $f(x_1,x_2) = a x_1^2 + x_2^2$ and $a=100$ -- History of simplex}
\label{fig-spendley-numexp2-historysimplex}
\end{figure}
In figure \ref{fig-spendley-numexp2-scaling}, we analyze the
behavior of the method with respect to scaling.
We check that the method behave poorly when the scaling is
bad. The convergence speed is slower and slower and impractical
when $a>10$
\begin{figure}[htbp]
\begin{center}
%\begin{tiny}
\begin{tabular}{|l|l|l|}
\hline
$a$ & Function evaluations & Computed $f(\bx^\star)$ \\
$1.0$ & 160 & $2.35e-18$ \\
$10.0$ & 222 & $1.2e-17$ \\
$100.0$ & 400 & $0.083$ \\
$1000.0$ & 400 & $30.3$ \\
$10000.0$ & 400 & $56.08$ \\
\hline
\end{tabular}
%\end{tiny}
\end{center}
\caption{Numerical experiment with Spendley's et al. method on a badly scaled quadratic function}
\label{fig-spendley-numexp2-scaling}
\end{figure}
\subsection{Sensitivity to dimension}
In this section, we try to study the convergence of the
Spendley et al. algorithm with respect to the number of variables,
as presented by Han \& Neumann in \cite{HanNeumann2006}.
We emphasize, though, that Han \& Neumann present their numerical
experiment with the Nelder-Mead algorithm, while we present
in this section the Spendley et al. algorithm.
The function we try to minimize is the following quadratic function
in $n$-dimensions
\begin{eqnarray}
\label{quadratic-sp-function3}
f(\bx) = \sum_{i=1,n} x_i^2.
\end{eqnarray}
The initial guess is the origin $\bx_0 = (0,0,\ldots,0)^T$,
which is also the global solution of the problem.
We have $f(\bx_0)=0$ so that this vertex is never updated
during the iterations.
The initial simplex is computed with a random number generator.
The first vertex of the initial simplex is the origin.
The other vertices are uniform in the $[-1,1]$ interval.
An absolute termination criteria on the size of the simplex is used,
that is, the algorithm is stopped when the inequality
$\sigma_+(S_k) \leq 10^{-8}$ is satisfied.
For this test, we compute the rate of convergence as presented
in Han \& Neuman \cite{HanNeumann2006}. This rate is defined as
\begin{eqnarray}
\label{rho-sp-rate-convergence}
\rho(S_0,n) = \textrm{lim sup}_{k\rightarrow \infty}
\left(\prod_{i=0,k-1} \frac{\sigma(S_{i+1})}{\sigma(S_i)}\right)^{1/k},
\end{eqnarray}
where $k$ is the number of iterations.
That definition can be viewed as the geometric mean of the ratio of the
oriented lengths between successive simplices.
This definition implies
\begin{eqnarray}
\label{rho-sp-rate-convergence2}
\rho(S_0,n) = \textrm{lim sup}_{k\rightarrow \infty}
\left(\frac{\sigma(S_k)}{\sigma(S_0)}\right)^{1/k},
\end{eqnarray}
If $k$ is the number of iterations required to obtain convergence, as
indicated by the termination criteria, the rate of convergence is practically computed as
\begin{eqnarray}
\rho(S_0,n,k) = \left( \frac{\sigma(S_{k})}{\sigma(S_0)}\right)^{1/k}.
\end{eqnarray}
The following Scilab script allows to perform the optimization.
\lstset{language=scilabscript}
\begin{lstlisting}
function y = quadratic (x)
y = x(:).' * x(:);
endfunction
//
// myoutputcmd --
// This command is called back by the Nelder-Mead
// algorithm.
// Arguments
// state : the current state of the algorithm
// "init", "iter", "done"
// data : the data at the current state
// This is a tlist with the following entries:
// * x : the optimal vector of parameters
// * fval : the minimum function value
// * simplex : the simplex, as a simplex object
// * iteration : the number of iterations performed
// * funccount : the number of function evaluations
// * step : the type of step in the previous iteration
//
function myoutputcmd ( state , data , step )
global STEP_COUNTER
STEP_COUNTER(step) = STEP_COUNTER(step) + 1
endfunction
// OptimizeHanNeumann --
// Perform the optimization and returns the object
// Arguments
// N : the dimension
function nm = OptimizeHanNeumann ( N )
global STEP_COUNTER
STEP_COUNTER("init") = 0;
STEP_COUNTER("done") = 0;
STEP_COUNTER("reflection") = 0;
STEP_COUNTER("expansion") = 0;
STEP_COUNTER("insidecontraction") = 0;
STEP_COUNTER("outsidecontraction") = 0;
STEP_COUNTER("expansion") = 0;
STEP_COUNTER("shrink") = 0;
STEP_COUNTER("reflectionnext") = 0;
x0 = zeros(N,1);
nm = neldermead_new ();
nm = neldermead_configure(nm,"-numberofvariables",N);
nm = neldermead_configure(nm,"-function",quadratic);
nm = neldermead_configure(nm,"-x0",x0);
nm = neldermead_configure(nm,"-maxiter",10000);
nm = neldermead_configure(nm,"-maxfunevals",10000);
nm = neldermead_configure(nm,"-tolxmethod","disabled");
nm = neldermead_configure(nm,"-tolsimplexizeabsolute",1.e-8);
nm = neldermead_configure(nm,"-tolsimplexizerelative",0);
nm = neldermead_configure(nm,"-simplex0method","given");
coords0(1,1:N) = zeros(1,N);
coords0(2:N+1,1:N) = 2 * rand(N,N) - 1;
nm = neldermead_configure(nm,"-coords0",coords0);
nm = neldermead_configure(nm,"-method","fixed");
nm = neldermead_configure(nm,"-verbose",0);
nm = neldermead_configure(nm,"-verbosetermination",0);
nm = neldermead_configure(nm,"-outputcommand",myoutputcmd);
//
// Perform optimization
//
nm = neldermead_search(nm);
endfunction
for N = 1:10
nm = OptimizeHanNeumann ( N );
niter = neldermead_get ( nm , "-iterations" );
funevals = neldermead_get ( nm , "-funevals" );
simplex0 = neldermead_get ( nm , "-simplex0" );
sigma0 = optimsimplex_size ( simplex0 , "sigmaplus" );
simplexopt = neldermead_get ( nm , "-simplexopt" );
sigmaopt = optimsimplex_size ( simplexopt , "sigmaplus" );
rho = ( sigmaopt / sigma0 ) ^ ( 1 / niter );
//mprintf ( "%d %d %d %e\n" , N , funevals , niter , rho );
mprintf("%d %s\n",N, strcat(string(STEP_COUNTER)," "))
nm = neldermead_destroy(nm);
end
\end{lstlisting}
The figure \ref{fig-sp-numexp3-nbsteps} presents the type of
steps which are performed for each number of variables.
We see that the algorithm mostly performs shrink steps.
\begin{figure}[htbp]
\begin{center}
%\begin{tiny}
\begin{tabular}{|l|l|l|l|l|}
\hline
$n$ & \#Iterations & \# Reflections & \# Reflection & \#Shrink\\
& & / High & / Next to High & \\
\hline
1 & 27 & 0 & 0 & 26\\
2 & 28 & 0 & 0 & 27\\
3 & 30 & 2 & 0 & 27\\
4 & 31 & 1 & 1 & 28\\
5 & 29 & 0 & 0 & 28\\
6 & 31 & 2 & 0 & 28\\
7 & 29 & 0 & 0 & 28\\
8 & 29 & 0 & 0 & 28\\
9 & 29 & 0 & 0 & 28\\
10 & 29 & 0 & 0 & 28\\
11 & 29 & 0 & 0 & 28\\
12 & 29 & 0 & 0 & 28\\
13 & 31 & 0 & 2 & 28\\
14 & 29 & 0 & 0 & 28\\
15 & 29 & 0 & 0 & 28\\
16 & 31 & 0 & 1 & 29\\
17 & 30 & 0 & 0 & 29\\
18 & 30 & 0 & 0 & 29\\
19 & 31 & 0 & 1 & 29\\
20 & 32 & 2 & 0 & 29\\
\hline
\end{tabular}
%\end{tiny}
\end{center}
\caption{Numerical experiment with Spendley et al method on a
generalized quadratic function -- Number of iterations
and types of steps performed}
\label{fig-sp-numexp3-nbsteps}
\end{figure}
The figure \ref{fig-sp-numexp3-dimension} presents the number of function
evaluations depending on the number of variables.
We can see that the number of function evaluations
increases approximately linearly with the dimension of the problem in
figure \ref{fig-sp-numexp3-fvn}. A rough rule of thumb is that, for $n=1,20$,
the number of function evaluations is equal to $30n$:
most iterations are shrink steps and approximately 30 iterations are required,
almost independently of $n$.
\begin{figure}[htbp]
\begin{center}
%\begin{tiny}
\begin{tabular}{|l|l|l|l|}
\hline
$n$ & Function & Iterations & $\rho(S_0,n)$\\
& Evaluations & & \\
\hline
1 & 81 & 27 & 0.513002 \\
2 & 112 & 28 & 0.512532 \\
3 & 142 & 29 & 0.524482 \\
4 & 168 & 28 & 0.512532 \\
5 & 206 & 31 & 0.534545 \\
6 & 232 & 29 & 0.512095 \\
7 & 262 & 30 & 0.523127 \\
8 & 292 & 30 & 0.523647 \\
9 & 321 & 30 & 0.523647 \\
10 & 348 & 29 & 0.512095 \\
11 & 377 & 29 & 0.512095 \\
12 & 406 & 29 & 0.512095 \\
13 & 435 & 29 & 0.512095 \\
14 & 464 & 29 & 0.512095 \\
15 & 493 & 29 & 0.512095 \\
16 & 540 & 30 & 0.511687 \\
17 & 570 & 30 & 0.511687 \\
18 & 600 & 30 & 0.511687 \\
19 & 630 & 30 & 0.511687 \\
20 & 660 & 30 & 0.511687 \\
\hline
\end{tabular}
%\end{tiny}
\end{center}
\caption{Numerical experiment with Spendley et al. method on a generalized quadratic function}
\label{fig-sp-numexp3-dimension}
\end{figure}
\begin{figure}
\begin{center}
\includegraphics[width=10cm]{spendleymethod/spendley-dimension-nfevals.png}
\end{center}
\caption{Spendley et al. numerical experiment -- Number of function evaluations
depending on the number of variables}
\label{fig-sp-numexp3-fvn}
\end{figure}
The table \ref{fig-sp-numexp3-dimension} also shows the interesting
fact that the convergence rate is almost constant and
very close to $1/2$. This is a consequence of the shrink steps,
which are dividing the size of the simplex at each iteration by 2.
\section{Conclusion}
We saw in the first numerical experiment that the method
behave reasonably when the function is correctly scaled.
When the function is badly scaled, as in the second numerical
experiment, the Spendley et al. algorithm produces a large
number of function evaluations and converges very slowly.
This limitation occurs with even moderate badly scaled
functions and generates a very slow method in these
cases.
In the last experiment, we have explored what happens when the number of
iterations is increasing. In this experiment, the rate of convergence is close to $1/2$
and the number of function evaluations is a linear function of the number
of variables (approximately $30n$).