\chapter{Spendley's et al. method} \index{Spendley, W.} \index{Hext, G. R.} \index{Himsworth, F. R.} In this chapter, we present Spendley, Hext and Himsworth algorithm \cite{Spendley1962} for unconstrained optimization. We begin by presenting a global overview of the algorithm. Then we present various geometric situations which might occur during the algorithm. In the second section, we present several numerical experiments which allow to get some insight in the behavior of the algorithm on some simple situations. The two first cases are involving only 2 variables and are based on a quadratic function. The last numerical experiment explores the behavior of the algorithm when the number of variables increases. \section{Introduction} In this section, we present Spendley's et al algorithm for unconstrained optimization. This algorithm is based on the iterative update of a simplex. At each iteration, either a reflection of a shrink step is performed, so that the shape of the simplex does not change during the iterations. Then we present various geometric situations which might occur during the algorithm. This allows to understand when exactly a reflection or a shrink is performed in practice. \subsection{Overview} The goal of Spendley's et al. algorithm is to solve the following unconstrained optimization problem \begin{eqnarray} \min f(\bx) \end{eqnarray} where $\bx\in \RR^n$, $n$ is the number of optimization parameters and $f$ is the objective function $f:\RR^n\rightarrow \RR$. This algorithms is based on the iterative update of a \emph{simplex} made of $n+1$ points $S=\{\bv_i\}_{i=1,n+1}$. Each point in the simplex is called a \emph{vertex} and is associated with a function value $f_i=f(\bv_i)$ for $i=1,n+1$. The vertices are sorted by increasing function values so that the \emph{best} vertex has index 1 and the \emph{worst} vertex has index $n+1$ \begin{eqnarray} \label{sp-sorted-vertices-fv} f_1 \leq f_2 \leq \ldots \leq f_n \leq f_{n+1}. \end{eqnarray} The $\bv_1$ vertex (resp. the $\bv_{n+1}$ vertex) is called the \emph{best} vertex (resp. \emph{worst}), because it is associated with the lowest (resp. highest) function value. As we are going to see, the \emph{next-to-worst} vertex $\bv_n$ has a special role in this algorithm. The centroid of the simplex $\overline{\bx} (j)$ is the center of the vertices where the vertex $\bv_j$ has been excluded. This centroid is \begin{eqnarray} \label{sp-centroid-generalized} \overline{\bx} (j) = \frac{1}{n} \sum_{i=1,n+1, i\neq j} \bv_i. \end{eqnarray} The algorithm makes use of one coefficient $\rho>0$, called the reflection factor. The standard value of this coefficient is $\rho=1$. The algorithm attempts to replace some vertex $\bv_j$ by a new vertex $\bx(\rho,j)$ on the line from the vertex $\bv_j$ to the centroid $\overline{\bx}(j)$. The new vertex $\bx(\rho,j)$ is defined by \begin{eqnarray} \label{sp-interpolate-generalized} \bx(\rho,j) = (1+\rho)\overline{\bx}(j) - \rho \bv_j. \end{eqnarray} \subsection{Algorithm} In this section, we analyze Spendley's et al algorithm, which is presented in figure \ref{algo-spendley}. \begin{figure}[htbp] \begin{algorithmic} \STATE Compute an initial simplex $S_0$ \STATE Sorts the vertices $S_0$ with increasing function values \STATE $S\gets S_0$ \WHILE{$\sigma(S)>tol$} \STATE $\overline{x}\gets \overline{\bx}(n+1)$ \COMMENT{Compute the centroid} \STATE $\bx_r \gets \bx(\rho,n+1)$ \COMMENT{Reflect with respect to worst} \STATE $f_r \gets f(\bx_r)$ \IF {$f_r0$ is a chosen scaling parameter. The more $a$ is large, the more difficult the problem is to solve with the simplex algorithm. Indeed, let us compute the Hessian matrix associated with the cost function. We have \begin{eqnarray} \bH = \left( \begin{array}{cc} 2 a & 0 \\ 0 & 2 \end{array} \right). \end{eqnarray} Therefore, the eigenvalues of the Hessian matrix are $2a$ and $2$, which are stricly positive if $a>0$. Hence, the cost function is stricly convex and there is only one global solution, that is $\bx^\star = (0,0,\ldots,0)^T$. The ratio between these two eigenvalues is $a$. This leads to an elongated valley, which is extremely narrow when $a$ is large. The stopping criteria is based on the relative size of the simplex with respect to the size of the initial simplex \begin{eqnarray} \sigma_+(S) < tol \times \sigma_+(S_0). \end{eqnarray} In this experiment, we use $tol=10^{-8}$ as the relative tolerance on simplex size. We set the maximum number of function evaluations to 400. The initial simplex is a regular simplex with length unity. The following Scilab script allows to perform the optimization. \lstset{language=scilabscript} \begin{lstlisting} a = 100; function y = quadratic (x) y = a * x(1)^2 + x(2)^2; endfunction nm = nmplot_new (); nm = nmplot_configure(nm,"-numberofvariables",2); nm = nmplot_configure(nm,"-function",quadratic); nm = nmplot_configure(nm,"-x0",[10.0 10.0]'); nm = nmplot_configure(nm,"-maxiter",400); nm = nmplot_configure(nm,"-maxfunevals",400); nm = nmplot_configure(nm,"-tolxmethod","disabled"); nm = nmplot_configure(nm,"-tolsimplexizerelative",1.e-8); nm = nmplot_configure(nm,"-simplex0method","spendley"); nm = nmplot_configure(nm,"-method","fixed"); nm = nmplot_configure(nm,"-verbose",1); nm = nmplot_configure(nm,"-verbosetermination",0); nm = nmplot_configure(nm,"-simplexfn","rosenbrock.fixed.history.simplex.txt"); nm = nmplot_configure(nm,"-fbarfn","rosenbrock.fixed.history.fbar.txt"); nm = nmplot_configure(nm,"-foptfn","rosenbrock.fixed.history.fopt.txt"); nm = nmplot_configure(nm,"-sigmafn","rosenbrock.fixed.history.sigma.txt"); nm = nmplot_search(nm); nmplot_display(nm); nm = nmplot_destroy(nm); \end{lstlisting} The numerical results are presented in table \ref{fig-spendley-numexp1-table}, where the experiment is presented for $a=100$. We can check that the number of function evaluations is equal to its maximum limit, even if the value of the function at optimum is very inaccurate ($f(\bx^\star) \approx 0.08$). \begin{figure}[h] \begin{center} %\begin{tiny} \begin{tabular}{|l|l|} \hline Iterations & 340 \\ Function Evaluations & 400 \\ $a$ & $100.0$ \\ $\bx_0$ & $(10.0,10.0)$ \\ Relative tolerance on simplex size & $10^{-8}$ \\ Exact $\bx^\star$ & $(0.,0.)$\\ Computed $\bx^\star$ & $(0.001,0.2)$\\ Computed $f(\bx^\star)$ & $0.08$\\ \hline \end{tabular} %\end{tiny} \end{center} \caption{Numerical experiment with Spendley's et al. method on a badly scaled quadratic function} \label{fig-spendley-numexp2-table} \end{figure} The various simplices generated during the iterations are presented in figure \ref{fig-spendley-numexp2-historysimplex}. The method use reflections in the early iterations. Then there is no possible improvement using reflections, so that shrinking is necessary. But the repeated shrink steps makes the simplex very small, leading to a large number of iterations. This is a limitation of the method, which is based on a simplex which can vary its size, but not its shape. \begin{figure} \begin{center} \includegraphics[width=15cm]{spendleymethod/quad2-spendley-simplexcontours.png} \end{center} \caption{Spendley et al. numerical experiment with $f(x_1,x_2) = a x_1^2 + x_2^2$ and $a=100$ -- History of simplex} \label{fig-spendley-numexp2-historysimplex} \end{figure} In figure \ref{fig-spendley-numexp2-scaling}, we analyze the behavior of the method with respect to scaling. We check that the method behave poorly when the scaling is bad. The convergence speed is slower and slower and impractical when $a>10$ \begin{figure}[htbp] \begin{center} %\begin{tiny} \begin{tabular}{|l|l|l|} \hline $a$ & Function evaluations & Computed $f(\bx^\star)$ \\ $1.0$ & 160 & $2.35e-18$ \\ $10.0$ & 222 & $1.2e-17$ \\ $100.0$ & 400 & $0.083$ \\ $1000.0$ & 400 & $30.3$ \\ $10000.0$ & 400 & $56.08$ \\ \hline \end{tabular} %\end{tiny} \end{center} \caption{Numerical experiment with Spendley's et al. method on a badly scaled quadratic function} \label{fig-spendley-numexp2-scaling} \end{figure} \subsection{Sensitivity to dimension} In this section, we try to study the convergence of the Spendley et al. algorithm with respect to the number of variables, as presented by Han \& Neumann in \cite{HanNeumann2006}. We emphasize, though, that Han \& Neumann present their numerical experiment with the Nelder-Mead algorithm, while we present in this section the Spendley et al. algorithm. The function we try to minimize is the following quadratic function in $n$-dimensions \begin{eqnarray} \label{quadratic-sp-function3} f(\bx) = \sum_{i=1,n} x_i^2. \end{eqnarray} The initial guess is the origin $\bx_0 = (0,0,\ldots,0)^T$, which is also the global solution of the problem. We have $f(\bx_0)=0$ so that this vertex is never updated during the iterations. The initial simplex is computed with a random number generator. The first vertex of the initial simplex is the origin. The other vertices are uniform in the $[-1,1]$ interval. An absolute termination criteria on the size of the simplex is used, that is, the algorithm is stopped when the inequality $\sigma_+(S_k) \leq 10^{-8}$ is satisfied. For this test, we compute the rate of convergence as presented in Han \& Neuman \cite{HanNeumann2006}. This rate is defined as \begin{eqnarray} \label{rho-sp-rate-convergence} \rho(S_0,n) = \textrm{lim sup}_{k\rightarrow \infty} \left(\prod_{i=0,k-1} \frac{\sigma(S_{i+1})}{\sigma(S_i)}\right)^{1/k}, \end{eqnarray} where $k$ is the number of iterations. That definition can be viewed as the geometric mean of the ratio of the oriented lengths between successive simplices. This definition implies \begin{eqnarray} \label{rho-sp-rate-convergence2} \rho(S_0,n) = \textrm{lim sup}_{k\rightarrow \infty} \left(\frac{\sigma(S_k)}{\sigma(S_0)}\right)^{1/k}, \end{eqnarray} If $k$ is the number of iterations required to obtain convergence, as indicated by the termination criteria, the rate of convergence is practically computed as \begin{eqnarray} \rho(S_0,n,k) = \left( \frac{\sigma(S_{k})}{\sigma(S_0)}\right)^{1/k}. \end{eqnarray} The following Scilab script allows to perform the optimization. \lstset{language=scilabscript} \begin{lstlisting} function y = quadratic (x) y = x(:).' * x(:); endfunction // // myoutputcmd -- // This command is called back by the Nelder-Mead // algorithm. // Arguments // state : the current state of the algorithm // "init", "iter", "done" // data : the data at the current state // This is a tlist with the following entries: // * x : the optimal vector of parameters // * fval : the minimum function value // * simplex : the simplex, as a simplex object // * iteration : the number of iterations performed // * funccount : the number of function evaluations // * step : the type of step in the previous iteration // function myoutputcmd ( state , data , step ) global STEP_COUNTER STEP_COUNTER(step) = STEP_COUNTER(step) + 1 endfunction // OptimizeHanNeumann -- // Perform the optimization and returns the object // Arguments // N : the dimension function nm = OptimizeHanNeumann ( N ) global STEP_COUNTER STEP_COUNTER("init") = 0; STEP_COUNTER("done") = 0; STEP_COUNTER("reflection") = 0; STEP_COUNTER("expansion") = 0; STEP_COUNTER("insidecontraction") = 0; STEP_COUNTER("outsidecontraction") = 0; STEP_COUNTER("expansion") = 0; STEP_COUNTER("shrink") = 0; STEP_COUNTER("reflectionnext") = 0; x0 = zeros(N,1); nm = neldermead_new (); nm = neldermead_configure(nm,"-numberofvariables",N); nm = neldermead_configure(nm,"-function",quadratic); nm = neldermead_configure(nm,"-x0",x0); nm = neldermead_configure(nm,"-maxiter",10000); nm = neldermead_configure(nm,"-maxfunevals",10000); nm = neldermead_configure(nm,"-tolxmethod","disabled"); nm = neldermead_configure(nm,"-tolsimplexizeabsolute",1.e-8); nm = neldermead_configure(nm,"-tolsimplexizerelative",0); nm = neldermead_configure(nm,"-simplex0method","given"); coords0(1,1:N) = zeros(1,N); coords0(2:N+1,1:N) = 2 * rand(N,N) - 1; nm = neldermead_configure(nm,"-coords0",coords0); nm = neldermead_configure(nm,"-method","fixed"); nm = neldermead_configure(nm,"-verbose",0); nm = neldermead_configure(nm,"-verbosetermination",0); nm = neldermead_configure(nm,"-outputcommand",myoutputcmd); // // Perform optimization // nm = neldermead_search(nm); endfunction for N = 1:10 nm = OptimizeHanNeumann ( N ); niter = neldermead_get ( nm , "-iterations" ); funevals = neldermead_get ( nm , "-funevals" ); simplex0 = neldermead_get ( nm , "-simplex0" ); sigma0 = optimsimplex_size ( simplex0 , "sigmaplus" ); simplexopt = neldermead_get ( nm , "-simplexopt" ); sigmaopt = optimsimplex_size ( simplexopt , "sigmaplus" ); rho = ( sigmaopt / sigma0 ) ^ ( 1 / niter ); //mprintf ( "%d %d %d %e\n" , N , funevals , niter , rho ); mprintf("%d %s\n",N, strcat(string(STEP_COUNTER)," ")) nm = neldermead_destroy(nm); end \end{lstlisting} The figure \ref{fig-sp-numexp3-nbsteps} presents the type of steps which are performed for each number of variables. We see that the algorithm mostly performs shrink steps. \begin{figure}[htbp] \begin{center} %\begin{tiny} \begin{tabular}{|l|l|l|l|l|} \hline $n$ & \#Iterations & \# Reflections & \# Reflection & \#Shrink\\ & & / High & / Next to High & \\ \hline 1 & 27 & 0 & 0 & 26\\ 2 & 28 & 0 & 0 & 27\\ 3 & 30 & 2 & 0 & 27\\ 4 & 31 & 1 & 1 & 28\\ 5 & 29 & 0 & 0 & 28\\ 6 & 31 & 2 & 0 & 28\\ 7 & 29 & 0 & 0 & 28\\ 8 & 29 & 0 & 0 & 28\\ 9 & 29 & 0 & 0 & 28\\ 10 & 29 & 0 & 0 & 28\\ 11 & 29 & 0 & 0 & 28\\ 12 & 29 & 0 & 0 & 28\\ 13 & 31 & 0 & 2 & 28\\ 14 & 29 & 0 & 0 & 28\\ 15 & 29 & 0 & 0 & 28\\ 16 & 31 & 0 & 1 & 29\\ 17 & 30 & 0 & 0 & 29\\ 18 & 30 & 0 & 0 & 29\\ 19 & 31 & 0 & 1 & 29\\ 20 & 32 & 2 & 0 & 29\\ \hline \end{tabular} %\end{tiny} \end{center} \caption{Numerical experiment with Spendley et al method on a generalized quadratic function -- Number of iterations and types of steps performed} \label{fig-sp-numexp3-nbsteps} \end{figure} The figure \ref{fig-sp-numexp3-dimension} presents the number of function evaluations depending on the number of variables. We can see that the number of function evaluations increases approximately linearly with the dimension of the problem in figure \ref{fig-sp-numexp3-fvn}. A rough rule of thumb is that, for $n=1,20$, the number of function evaluations is equal to $30n$: most iterations are shrink steps and approximately 30 iterations are required, almost independently of $n$. \begin{figure}[htbp] \begin{center} %\begin{tiny} \begin{tabular}{|l|l|l|l|} \hline $n$ & Function & Iterations & $\rho(S_0,n)$\\ & Evaluations & & \\ \hline 1 & 81 & 27 & 0.513002 \\ 2 & 112 & 28 & 0.512532 \\ 3 & 142 & 29 & 0.524482 \\ 4 & 168 & 28 & 0.512532 \\ 5 & 206 & 31 & 0.534545 \\ 6 & 232 & 29 & 0.512095 \\ 7 & 262 & 30 & 0.523127 \\ 8 & 292 & 30 & 0.523647 \\ 9 & 321 & 30 & 0.523647 \\ 10 & 348 & 29 & 0.512095 \\ 11 & 377 & 29 & 0.512095 \\ 12 & 406 & 29 & 0.512095 \\ 13 & 435 & 29 & 0.512095 \\ 14 & 464 & 29 & 0.512095 \\ 15 & 493 & 29 & 0.512095 \\ 16 & 540 & 30 & 0.511687 \\ 17 & 570 & 30 & 0.511687 \\ 18 & 600 & 30 & 0.511687 \\ 19 & 630 & 30 & 0.511687 \\ 20 & 660 & 30 & 0.511687 \\ \hline \end{tabular} %\end{tiny} \end{center} \caption{Numerical experiment with Spendley et al. method on a generalized quadratic function} \label{fig-sp-numexp3-dimension} \end{figure} \begin{figure} \begin{center} \includegraphics[width=10cm]{spendleymethod/spendley-dimension-nfevals.png} \end{center} \caption{Spendley et al. numerical experiment -- Number of function evaluations depending on the number of variables} \label{fig-sp-numexp3-fvn} \end{figure} The table \ref{fig-sp-numexp3-dimension} also shows the interesting fact that the convergence rate is almost constant and very close to $1/2$. This is a consequence of the shrink steps, which are dividing the size of the simplex at each iteration by 2. \section{Conclusion} We saw in the first numerical experiment that the method behave reasonably when the function is correctly scaled. When the function is badly scaled, as in the second numerical experiment, the Spendley et al. algorithm produces a large number of function evaluations and converges very slowly. This limitation occurs with even moderate badly scaled functions and generates a very slow method in these cases. In the last experiment, we have explored what happens when the number of iterations is increasing. In this experiment, the rate of convergence is close to $1/2$ and the number of function evaluations is a linear function of the number of variables (approximately $30n$).