summaryrefslogtreecommitdiffstats
path: root/scilab_doc/neldermead/nmbibliography.tex
blob: 8aee50cd9fdbc4a7102752d31ab7f9eb8af04666 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
\chapter{Nelder-Mead bibliography}

In this section, we present a brief overview of selected 
papers, sorted in chronological order, which deal with 
the Nelder-Mead algorithm

\section{Spendley, Hext, Himsworth, 1962}

"Sequential Application of Simplex Designs in Optimisation and Evolutionary Operation", 
Spendley W., Hext G. R. and Himsworth F. R., 
American Statistical Association and American Society for Quality, 1962

This article \cite{Spendley1962} presents an algorithm for unconstrained optimization in 
which a simplex is used. The simplex has a fixed, regular (i.e. all 
lengths are equal), shape and is made of n+1 vertices (where n is the 
number of parameters to optimize). The algorithm is based on the 
reflection of the simplex with respect to the centroid of better 
vertices. One can add a shrink step so that the simplex size can converge 
to zero. Because the simplex shape cannot change, the convergence rate 
may be very slow if the eigenvalues of the hessian matrix have very 
different magnitude.

\section{Nelder, Mead, 1965}

"A Simplex Method for Function Minimization", 
Nelder J. A.  and Mead R., 
The Computer Journal, 1965

This article \cite{citeulike:3009487} presents the Nelder-Mead unconstrained optimization 
algorithm. It is based on a simplex made of n+1 vertices and is a 
modification of the Spendley's et al algorithm. It includes features 
which enables the simplex to adapt to the local landscape of the cost 
function. The additional steps are expansion, inside contraction and 
outside contraction. The stopping criterion is based on the standard 
deviation of the function value on the simplex.

The convergence of the algorithm is better than Spendley's et al. The 
method is compared against Powell's free-derivative method (1964) with 
comparable behavior. The algorithm is "greedy" in the sense that the 
expansion point is kept if it improves the best function value in the 
current simplex. Most Nelder-Mead variants which have been analyzed 
after are keeping the expansion point only if it improves over the 
reflection point.

\section{Box, 1965}

"A New Method of Constrained Optimization and a Comparison With Other Methods",
M. J. Box,
The Computer Journal 1965 8(1):42-52,
1965, British Computer Society

In this paper \cite{Box1965}, Box presents a modification of the NM algorithm which 
takes into accounts for bound constraints and non-linear constraints.
This variant is called the Complex method. 
The method expects that the initial guess satisfies the nonlinear constraints.
The nonlinear constraints are supposed to define a convex set.
The algorithm ensures that the simplex evolves in the feasible space.

The method to take into account for the bound constraints is based on 
projection of the parameters inside the bounded domain. If some 
nonlinear constraint is not satisfied, the trial point is moved halfway
toward the centroid of the remaining points (which are all satisfying 
the nonlinear constraints).

The simplex may collapse into a subspace if a projection occurs. To 
circumvent this problem, k>=n+1 vertices are used instead of the original 
n+1 vertices. A typical value of k is k=2n. The initial simplex is computed with a random number 
generator, which takes into account for the bounds on the parameters. To 
take into account for the nonlinear constraints, each vertex of the 
initial simplex is moved halfway toward the centroid of the points 
satisfying the constraints (in which the initial guess already is).

\section{Guin, 1968}

"Discussion and correspondence: modification of the complex method of constrained optimization",
J. A. Guin,
The Computer Journal,
1968

In this article \cite{Guin:1968:DCM}, Guin suggest 3 rules to improve the practical convergence 
properties of Box's complex method. These suggestions include the use of the 
next-to-worst point when the worst point does not produce an improvement 
of the function value. The second suggestion is to project the points 
strictly into the bounds, instead of projecting inside the bounds.
The third suggestion is related to the failure of the method
when the centroid is no feasible. In that case, Guin suggest to 
restrict the optimization in the subspace defined by the best 
vertex and the centroid.

\section{O'Neill, 1971}

"Algorithm AS47 - Function minimization using a simplex procedure",
R. O'Neill, 1971, Applied Statistics

In this paper \cite{O'Neill1971AAF}, R. O'Neill presents a fortran 77 implementation of the 
Nelder-Mead algorithm. 
The initial simplex is computed axis-by-axis, given the initial guess
and a vector of step lengths.
A factorial test is used to check if the computed optimum point is a local minimum.

\section{Parkinson and Hutchinson, 1972}

In \cite{parkinson1972}, "An investigation into the efficiency of variants on the simplex method", 
Parkinson and Hutchinson explored 
several ways of improvement. First, they investigate the sensitivity
of the algorithm to the initial simplex. Two parameters were investigated,
i.e. the initial length and the orientation of the simplex. 
An automatic setting for the orientation, though very desirable, is 
not easy to design. Parkinson and Hutchinson tried to automatically 
compute the scale of the initial simplex by two methods, based on 
a "line search" and on a local "steepest descent".
Their second investigation adds a new step to the algorithm, the unlimited
expansion. After a sucessful expansion, the algorithm tries to produce 
an expansion point by taking the largest possible number of 
expansion steps. After an unlimited expansion steps is performed, the 
simplex is translated, so that excessive modification of the scale and shape 
is avoided. Combined and tested against low dimension problems, the 
modified algorithm, named PHS, provides typical gains of 20% gains in 
function evaluations.

\section{Richardson and Kuester, 1973}

"Algorithm 454: the complex method for constrained optimization",
Richardson Joel A. and Kuester J. L.,
Commun. ACM,
1973

In this paper \cite{362324}, Richardson and Kuester shows a fortran 77 implementation
of Box's complex optimization method. The paper clarifies several 
specific points from Box's original paper while remaining very close to it.
Three test problems are presented with the specific algoritmic settings (such as 
the number of vertices for example) and number of iterations.

\section{Shere, 1973}

"Remark on algorithm 454 : The complex method for constrained optimization",
Shere Kenneth D.,
Commun. ACM,
1974

In this article \cite{372783}, Shere presents two counterexamples where the algorithm 454,
implemented by Richardson and Kuester produces an infinite loop.
"This happens whenever the corrected point, the centroid of the remaining 
complex points, and every point on the line segment joining these two points
all have functional values lower than the functional values at each of the
remaining complex points.

\section{Routh, Swartz, Denton, 1977}

"Performance of the Super-Modified Simplex",
M.W. Routh, P.A. Swartz, M.B. Denton,
Analytical Chemistry,
1977

In this article \cite{Routh1977}, 
Routh, Swartz and Denton present a variant
of the Nelder-Mead algorithm, which is called
the Modified Simplex Method (SMS) in their
paper. The algorithm is modified in the following
way. After determination of the worst response (W), 
the responses at the centroid (C) and reflected (R)
vertices are measured and a second-order polynomial
curve is fitted to the responses at W, C and R.
Furthermore, the curve is extrapolated beyond W 
and R by a percentage of the W-R vector resulting
in two types of curve shapes. In the concave down
case, a maximum occurs within the interval.
Assuming a maximization process, evaluation of the
derivative of the curve reveals the location of the
predicted optimum whose response is subsequently
evaluated, the new vertex is located at that position,
and the optimization process is continued.
In the concave up case, a response maximum
does not occur within the interval so the extended
interval boundary producing the highest predicted
response is chosen as the new vertex location,
its response is determined, and the optimization
is continued. If the response at the predicted
extended interval boundary location does not
prove to be greater than the response at R, the 
vertex R may instead be retained as the new 
vertex and the process continued.
The slope at the extended interval boundary
may additionally be evaluated dictating the 
magnitude of the expansion coefficient, i.e.
the greater the slope (indicating rapid approach
to the optimum location), the smaller the required
expansion coefficient and, conversely, the smaller
the slope (indicating remoteness from the 
optimum location), the larger the required expansion
coefficient.

Some additional safeguard procedure must be used
in order to prevent the collapse of the simplex.

\section{Van Der Wiel, 1980}

"Improvement of the Super-Modified Simplex
Optimization Procedure",
P.F.A., Van Der Wiel
Analytica Chimica Acta,
1980

In this article \cite{VanDerWiel1980}, Van Der Wiel
tries to improve the SMS method by Routh et al..
His modifications are based on a Gaussian fit,
weighted reflection point and estimation of response
at the reflection point. Van Der Wiel presents 
a simplified pseudo-code for one algorithm
The method is tested in 5 cases, where the cost
function is depending on the exponential function.

\section{Walters, Parker, Morgan and Deming, 1991}

"Sequential Simplex Optimization for Quality and Productivity in Research, Development, and Manufacturing", 
F. S. Walters, L. R. Parker, Jr., S. L. Morgan, and S. N. Deming, 1991

In this book \cite{WaltersParkerMorganDeming1991}, Walters, Parker, Morgan and Deming
give a broad view on the simplex methods in chemistry. 
The Spendley et al. and Nelder-Mead algorithms are particularily deeply analyzed, 
with many experiments analyzed in great detail.
Template tables are given, so that an engineer can manually perform the optimization
and make the necessary calculations.
Practical advices are given, which allow to make a better use of the algorithms.

In chapter 5, "Comments on Fixed-size and Variable-size Simplexes",
comparing the path of the two algorithms allows to check that a real optimum
has been found.
When the authors analyze the graph produced by the response depending on the number of 
iteration, the general behavior of the fixed-size algorithm is made of four steps.
Gains in response are initially rapid, but the rate of return decreases as the 
simplex probes to find the ridge and then moves along the shallower 
ridge to find the optimum.
The behavior from different starting locations is also analyzed.
Varying the size of the initial simplex is also analyzed for the fixed-size 
simplex algorithm. The many iterations which are produced when a tiny 
initial simplex is used with the fixed-size simplex is emphasized.

The chapter 6, "General Considerations", warns that the user may 
setup an degenerate initial simplex, leading to a false convergence 
of the algorithm. Various other initial simplices are analyzed.
Modifications in the algorithm to take into account for bounds 
contraints are presented.
The behavior of the fixed-size and variable-size simplex algorithms 
is analyzed when the simplex converges.
The "k+1" rule, introduced by Spendley et al. to take into account 
for noise in the cost function is presented.

The chapter 7, "Additional Concerns and Topics" deals with advanced questions 
regarding these algorithms. The variable size simplex algorithm is analyzed
in the situation of a ridge. Partially oscillatory collapse of the 
Nelder-Mead algorithm is presented. The same behavior is 
presented in the case of a saddle point. This clearly shows that 
practionners were aware of the convergence problem of this 
algorithm well before Mc Kinnon presented a simple counter example (in 1998).
The "Massive Contraction" step of Nelder and Mead is presented as a 
solution for this oscillatory behavior. The authors present 
a method, due to Ernst, which allows to keep the volume of the 
simplex, instead of shrinking it. This method is based on a 
translation of the simplex. This modification requires 
$n+1$ function evaluations. A more efficient method, due to King,
is based on reflection with respect to the next-to-worst 
vertex. This modification was first suggested by Spendley et al.
in their fixed-size simplex algorithm.

In the same chapter, the authors present the behavior of the algorithms 
in the case of multiple optima. They also present briefly other 
types of simplex algorithms.

A complete bibliography (from 1962 to 1990) on simplex-based optimization is given in 
the end of the book.

\section{Subrahmanyam, 1989}

"An extension of the simplex method to constrained nonlinear optimization",
M. B. Subrahmanyam, Journal of Optimization Theory and Applications, 1989

In this article \cite{69970}, the simplex algorithm of Nelder and Mead is extended to handle nonlinear 
optimization problems with constraints. To prevent the simplex from 
collapsing into a subspace near the constraints, a delayed reflection is 
introduced for those points moving into the infeasible region. Numerical 
experience indicates that the proposed algorithm yields good results in 
the presence of both inequality and equality constraints, even when the 
constraint region is narrow. 

If a vertex becomes infeasible, we do not increase the value at this 
vertex until the next iteration is completed. Thus, the next iteration
is accomplished using the actual value of the function at the infeasible 
point. At the end of the iteration, in case the previous vertex is not the worst
vertex, it is assigned a high value, so that it then becomes a candidate
for reflection during the next iteration.

The paper presents numerical experiments which are associated with 
thousands of calls to the cost function. This may be related with 
the chosen reflection factor equal to 0.95, which probably cause 
a large number of reflections until the simplex can finally 
satisfy the constraints.

\section{Numerical Recipes in C, 1992}

"Numerical Recipes in C, Second Edition", 
W. H. Press, Saul A. Teukolsky, William T. Vetterling and Brian P. Flannery, 
1992

In this book \cite{NumericalRecipes}, an ANSI C implementation of the Nelder-Mead algorithm is given.
The initial simplex is based on the axis. 
The termination criterion is based on the relative difference of the 
function value of the best and worst vertices in the simplex.

\section{Lagarias, Reeds, Wright, Wright, 1998}

"Convergence Properties of the Nelder--Mead Simplex Method in Low Dimensions", 
Jeffrey C. Lagarias, James A. Reeds, Margaret H. Wright and Paul E. Wright, 
SIAM Journal on Optimization, 1998

This paper \cite{lagarias:112} presents convergence properties of the Nelder-Mead algorithm applied
to stricly convex functions in dimensions 1 and 2. Proofs are given to a minimizer
in dimension 1, and various limited convergence results for dimension 2. 

\section{Mc Kinnon, 1998}

"Convergence of the Nelder--Mead Simplex Method to a Nonstationary Point", 
SIAM J. on Optimization, K. I. M. McKinnon, 1998

In this article \cite{589109}, Mc Kinnon analyzes the behavior of the 
Nelder-Mead simplex method for a family of examples which cause the 
method to converge to a nonstationnary point. All the examples use 
continuous functions of two variables. The family of functions contains
strictly convex functions with up to three continuous derivatives.
In all the examples, the method repeatedly applies the inside 
contraction step with the best vertex remaining fixed. The simplices
tend to a straight line which is orthogonal to the steepest descent 
direction. It is shown that this behavior cannot occur for functions
with more than three continuous derivatives.

\section{Kelley, 1999}

"Detection and Remediation of Stagnation in the Nelder--Mead Algorithm Using a Sufficient Decrease Condition", 
SIAM J. on Optimization, Kelley, C. T., 1999

In this article \cite{589283}, Kelley presents a test for sufficient decrease which, 
if passed for the entire iteration, will guarantee convergence of the 
Nelder-Mead iteration to a stationary point if the objective function is 
smooth. Failure of this condition is an indicator of potential 
stagnation. As a remedy, Kelley propose to restart the algorithm with an 
oriented simplex, smaller than the previously optimum simplex, but with 
a better shape and which approximates the steepest descent step from the 
current best point. The method is experimented against Mc Kinnon test 
function and allow to converge to the optimum, where the original Nelder
-Mead algorithm was converging to a non-stationary point. Although the 
oriented simplex works well in practice, other strategies may be chosen 
with similar results, such as a simplex based on axis, a regular simplex 
(like Spendley's) or a simplex based on the variable magnitude (like 
Pfeffer's suggestion in Matlab's fminsearch). The paper also shows one 
convergence theorem which prove that if the sufficient decrease 
condition is satisfied and if the product of the condition of the simplex 
by the simplex size converge to zero, therefore, with additional 
assumptions on the cost function and the sequence of simplices, any 
accumulation point of the simplices is a critical point of f.

The same ideas are presented in the book \cite{Kelley1999}.

\section{Han, 2000}

In his Phd thesis \cite{Han2000}, Lixing Han analyzes the 
properties of the Nelder-Mead algorithm.
Han present two examples in which the Nelder-Mead 
simplex method does not converge to a single point. 
The first example is a nonconvex function with bounded 
level sets and it exhibits similar nonconvergence properties 
with the Mc Kinnon counterexample $f(\xi_1,\xi_2) = \xi_1^2 - \xi_2(\xi_2-2)$. 
The second example is a convex function with bounded level sets, for 
which the Nelder-Mead simplices converge to a degenerate simplex, but 
not to a single point. These nonconvergent examples support the 
observations by some practitionners that in the Nelder-Mead 
simplices may collapse into a degenerate simplex and therefore support 
the use of a restart strategy. Han also investigates 
the effect of the dimensionality of the Nelder-Mead method.
It is shown that the Nelder-Mead simplex method becomes less
efficient as the dimension increases. Specifically, Han consider the quadratic
function $\xi_1^2 + \ldots + \xi_1^n$ and shows that the Nelder-Mead 
method becomes less efficient as the dimension increases.
The considered example offers insight into understanding the effect
of dimensionnality on the Nelder-Mead method.
Given all the known failures and inefficiencies of the Nelder-Mead 
method, a very interesting question is why it is so popular in practice.
Han present numerical results of the Nelder-Mead method on the standard
collection of Mor\'e-Garbow-Hillstrom with dimensions $n\leq 6$.
Han compare the Nelder-Mead method with a finite difference BFGS 
method and a finite difference steepest descent method. The numerical
results show that the Nelder-Mead method is much more efficient 
than the finite difference steepest descent method for the problems he
tested with dimensions $n\leq 6$. It is also often comparable with the 
finite difference BFGS method, which is believed to be the 
best derivative-free method.
Some of these results are reproduced in \cite{HanNeumann2006} by Han and Neumann, 
"Effect of dimensionality on the Nelder-€“Mead simplex method"
and in \cite{hanNeumann2003}, "On the roots of certain polynomials arising 
from the analysis of the Nelder-Mead simplex method".

\section{Nazareth, Tseng, 2001}

"Gilding the Lily: A Variant of the Nelder-Mead Algorithm Based on Golden-Section Search"
Computational Optimization and Applications, 2001, 
Larry Nazareth and Paul Tseng

The article \cite{584536} propose a variant of the Nelder-Mead algorithm derived from 
a reinterpretation of univariate golden-section direct search. In the 
univariate case, convergence of the variant can be analyzed analogously 
to golden-section search. 

The idea is based on a particular choice of the reflection, expansion, 
inside and outside contraction parameters, based on the golden ratio. 
This variant of the Nelder-Mead algorithm is called Nelder-Mead-Golden-
Ratio, or NM-GS. In one dimension, the authors exploit the connection 
with golden-search method and allows to prove a convergence theorem on 
unimodal univariate functions. This is marked contrast to the approach 
taken by Lagarias et al. where considerable effort is expended to show 
convergence of the original NM algorithm on strictly convex univariate 
functions. With the NM-GS variant, one obtain convergence in the 
univariate case (using a relatively simple proof) on the broader class 
of unimodal functions. 

In the multivariate case, the authors modify the variant by replacing 
strict descent with fortified descent and maintaining the interior 
angles of the simplex bounded away from zero. Convergence of the 
modified v ariant can be analyzed by applying results for a fortified-
descent simplicial search method. Some numerical experience with the 
variant is reported.

\section{Perry, Perry, 2001}

"A New Method For Numerical Constrained Optimization"
by Ronald N. Perry, Ronald N. Perry,
March 2001

In this report \cite{Perry01anew}, we propose a new method for constraint 
handling that can be applied to established optimization algorithms and 
which significantly improves their ability to traverse through 
constrained space. To make the presentation concrete, we apply the new 
constraint method to the Nelder and Mead polytope algorithm. The 
resulting technique, called SPIDER, has shown great initial promise for 
solving difficult (e.g., nonlinear, nondifferentiable, noisy) 
constrained problems.

In the new method, constraints are partitioned into multiple
levels. A constrained performance, independent of the objective
function, is defined for each level. A set of rules, based on 
these partitioned performances, specify the ordering and 
movement of vertices as they straddle constraint boundaries;
these rules [...] have been shown to significantly aid motion along
constraints toward an optimum. Note that the new approach uses not 
penalty function and thus does not warp the performance surface,
thereby avoiding the possible ill-conditioning of the objective
function typical in penalty methods.

No numerical experiment is presented.

\section{Andersson, 2001}

"Multiobjective Optimization in Engineering Design - Application to fluid Power Systems"
Johan Andersson, 2001

This PhD thesis \cite{Andersson01multiobjectiveoptimization} gives a brief overview of the Complex method by Box in 
section 5.1.

\section{Peters, Bolte, Marschner, N\"{u}ssen and Laur, 2002}

In \cite{590075}, "Enhanced Optimization Algorithms for the Development of Microsystems", 
the authors combine radial basis function interpolation methods 
with the complex algorithm by Box.
Interpolation with radial basis functions is a linear approach
in which the model function $f$ is generated
via the weighted sum of the basis functions $\Phi_i(r)$.
The parameter $r$ describes the distance of the current
point from the center $x_i$ of the ith basis function. It is
calculated via the euclidean norm.
It is named ComplInt strategy. The name stands for
Complex in combination with interpolation. The Complex
strategy due to Box is very well suited for the
combination with radial basis function interpolation
for it belongs to the polyhedron strategies.
The authors presents a test performed on a pratical application, 
which leaded them to the following comment :
"The best result achieved with the ComplInt strategy
is not only around 10\% better than the best result of
the Complex strategy due to Box, the ComplInt also
converges much faster than the Complex does: while
the Complex strategy needs an average of 7506, the
ComplInt only calls for an average of 2728 quality
function evaluations."


\section{Han, Neumann, 2006}

"Effect of dimensionality on the Nelder-Mead simplex method", 
L. Han and M. Neumann (2006), 


In this article \cite{HanNeumann2006}, the effect of dimensionality on the Nelder-Mead algorithm is 
investigated. It is shown that by using the quadratic function $f(x) = x^T*x$, 
the Nelder-Mead simplex method deteriorates as the dimension 
increases. More precisely, in dimension 1, with the quadratic function
$f(x) = x^2$ and a particular choice of the initial simplex, applies 
inside contraction step repeatedly and the convergence rate (as the 
ratio between the length of the simplex at two consecutive steps) is 1/2. 
In dimension 2, with a particular initial simplex, the NM algorithm 
applies outside contraction step repeatedly and the convergence rate is 
$\sqrt(2)/2$.

For n>=3, a numerical experiment is performed on the quadratic function 
with the fminsearch algorithm from Matlab. It is shown that the original 
NM algorithm has a convergence rate which is converging towards 1 when n 
increases. For n=32, the rate of convergence is 0.9912.

\section{Singer, Nelder, 2008}

\url{http://www.scholarpedia.org/article/Nelder-Mead_algorithm}
Singer and Nelder

This article is a complete review of the Nelder-Mead algorithm.
Restarting the algorithm is adviced when a premature termination occurs.