summaryrefslogtreecommitdiffstats
path: root/scilab_doc
diff options
context:
space:
mode:
authorMichaël Baudin <michael.baudin@scilab.org>2009-02-13 11:37:33 +0100
committerMichaël Baudin <michael.baudin@scilab.org>2009-02-13 11:37:33 +0100
commit9a172b4b8d17d215ae2af1ab223355105371625d (patch)
tree3fdf994a1f5f73c08def135f7da756cbd3759f95 /scilab_doc
parentebafb86506eacc708a8ba7da615d0a88fbf6ab90 (diff)
downloadscilab-9a172b4b8d17d215ae2af1ab223355105371625d.zip
scilab-9a172b4b8d17d215ae2af1ab223355105371625d.tar.gz
Draft for Scilab is not naive
Diffstat (limited to 'scilab_doc')
-rw-r--r--scilab_doc/scilabisnotnaive/Makefile34
-rw-r--r--scilab_doc/scilabisnotnaive/complexdivision.tex591
-rw-r--r--scilab_doc/scilabisnotnaive/conclusion.tex21
-rw-r--r--scilab_doc/scilabisnotnaive/introduction.tex56
-rw-r--r--scilab_doc/scilabisnotnaive/numericalderivative.tex412
-rw-r--r--scilab_doc/scilabisnotnaive/quadratic.tex504
-rw-r--r--scilab_doc/scilabisnotnaive/scilabisnotnaive.bib193
-rw-r--r--scilab_doc/scilabisnotnaive/scilabisnotnaive.pdfbin0 -> 264460 bytes
-rw-r--r--scilab_doc/scilabisnotnaive/scilabisnotnaive.tex138
9 files changed, 1949 insertions, 0 deletions
diff --git a/scilab_doc/scilabisnotnaive/Makefile b/scilab_doc/scilabisnotnaive/Makefile
new file mode 100644
index 0000000..26ec0e1
--- /dev/null
+++ b/scilab_doc/scilabisnotnaive/Makefile
@@ -0,0 +1,34 @@
1#!/bin/sh
2
3RAPPORT = scilabisnotnaive
4
5LT = pdflatex
6
7pdf:
8 $(LT) $(RAPPORT)
9 bibtex $(RAPPORT)
10 $(LT) $(RAPPORT)
11 $(LT) $(RAPPORT)
12
13
14dvi:
15 $(LT) ${RAPPORT}
16 bibtex ${RAPPORT}
17 $(LT) ${RAPPORT}
18
19clean:
20 rm -f *.aux
21 rm -f *.bbl
22 rm -f *.blg
23 rm -f *.log
24 rm -f *.out
25 rm -f *.toc
26
27spell:
28 ispell -t ${RAPPORT}.tex
29
30distclean:
31 make clean
32 rm -f ${RAPPORT}.pdf
33 rm -f ${RAPPORT}.dvi
34
diff --git a/scilab_doc/scilabisnotnaive/complexdivision.tex b/scilab_doc/scilabisnotnaive/complexdivision.tex
new file mode 100644
index 0000000..0d726d1
--- /dev/null
+++ b/scilab_doc/scilabisnotnaive/complexdivision.tex
@@ -0,0 +1,591 @@
1\section{Complex division}
2
3Dans cette partie, nous analysons le problème de la division complexe
4dans Scilab. Nous mettons en lumière pourquoi le traitement numérique interne
5de Scilab est parfois inutilisé et pourquoi il est parfois inutile, voire
6nuisible (rappelons que le but d'une introduction est d'attirer le lecteur
7vers la suite du texte, raison pour laquelle nous avons rédigé ces mots
8les plus provocants possible !).
9
10Nous détaillons en particulier la différence entre
11définition mathématique et implémentation en nombres flottants.
12Nous montrons comment la division de nombres complexes est
13effectué dans Scilab lorsque l'opérateur "/" est utilisé.
14Nous montrons également que l'implémentation n'est pas
15utilisé de manière consistante dans Scilab.
16Nous montrerons enfin que les librairies utilisées dans
17les compilateurs Intel et gfortran traitent le problème.
18
19\subsection{Theory}
20
21\subsubsection{Algebraic computations}
22
23Il est de notoriété publique que la formule mathématique
24qui permet de calculer la division entre deux nombres complexes
25\begin{eqnarray}
26\frac{a + ib}{c + id} = \frac{ac + bd}{c^2 + d^2} + i \frac{bc - ad}{c^2 + d^2}
27\end{eqnarray}
28
29\subsubsection{Naive algorithm}
30
31The naive algorithm for the computation of the complex division
32is presented in figure \ref{naive-complexdivision}.
33
34\begin{figure}[htbp]
35\begin{algorithmic}
36\STATE $den \gets c^2 + d^2$
37\STATE $e \gets (ac + bd)/ den$
38\STATE $f \gets (bc - ad)/ den$
39\end{algorithmic}
40\caption{Naive algorithm to compute the complex division}
41\label{naive-complexdivision}
42\end{figure}
43
44
45\subsection{Experiments}
46n'est pas robuste quand on utilise des nombres flottants \cite{368661}.
47En résumé, le problème arrive lorsque les nombres a, b, c ou d s'approchent
48du domaine de définition des nombres flottants, ce qui peut provoquer
49un overflow ou un underflow.
50
51Supposons que l'on traite la division suivante
52
53\begin{eqnarray}
54\frac{1 + I * 1}{1 + I * 1e308} = 1e308 - I * 1e-308
55\end{eqnarray}
56
57Utilisons la formule mathématique, naïve, pour vérifier cette division.
58
59\begin{eqnarray}
60den &=& c² + d² = 1² + (1e308)² = 1 + 1e616 ~1e616 \\
61e &=& (ac + bd)/ den = (1*1 + 1*1e308)/1e616 ~ 1e308/1e616 ~ 1e-308\\
62f &=& (bc - ad)/ den = (1*1 - 1*1e308)/1e616 ~ -1e308/1e616 ~ -1e-308
63\end{eqnarray}
64
65Numériquement, les choses ne se passent pas ainsi, car vmax = 1e308 est la
66valeur maximale avant overflow. Plus précisément, vmax est tel que
67tout nombre v > vmax satisfait
68\begin{eqnarray}
69(v * 2) / 2 != v
70\end{eqnarray}
71Autrement dit, la multiplication et la division ne sont plus précises pour
72des nombres v > vmax.
73
74Si on utilise la formule naïve avec des nombres double precision, alors
75\begin{eqnarray}
76den = c² + d² = 1² + (1e308)² = Inf
77\end{eqnarray}
78c'est à dire un overflow. Les termes e et f sont alors calculés par
79\begin{eqnarray}
80e = (ac + bd)/ den = (1*1 + 1*1e308)/Inf = 1e308/Inf = 0\\
81f = (bc - ad)/ den = (1*1 - 1*1e308)/Inf = -1e308/Inf = 0
82\end{eqnarray}
83Le résultat est alors faux sur le plan mathématique, mais surtout, il
84est imprécis sur le plan numérique, dans la mesure où aucun des
85nombres initiaux n'était supérieur à vmax.
86
87On peut également montrer que la formule naïve peut générer
88des underflow.
89
90Supposons que l'on veuille calculer la division suivante :
91\begin{eqnarray}
92\frac{1 + I * 1}{1e-308 + I * 1e-308}= 1e308
93\end{eqnarray}
94
95Utilisons la formule mathématique, naïve, pour vérifier cette division.
96
97\begin{eqnarray}
98den &=& c² + d² = (1e-308)² + (1e-308)² = 1e-616 + 1e-616 = 2e-616 \\
99e &=& (ac + bd)/ den = (1*1e-308 + 1*1e-308)/2e-616 ~ 2e-308/2e-616 ~ 1e308 \\
100f &=& (bc - ad)/ den = (1*1e-308 - 1*1e-308)/2e-616 ~ 0/1e-616 ~ 0
101\end{eqnarray}
102
103Avec des nombres double precision, le calcul ne se passe pas ainsi.
104Le dénominateur va provoquer un underflow et va être numériquement
105mis à zéro, de telle sorte que
106
107\begin{eqnarray}
108den &=& c² + d² = (1e-308)² + (1e-308)² = 1e-616 + 1e-616 = 0 \\
109e &=& (ac + bd)/ den = (1*1e-308 + 1*1e-308)/0 ~ 2e-308/0 ~ Inf \\
110f &=& (bc - ad)/ den = (1*1e-308 - 1*1e-308)/0 ~ 0/0 ~ NaN \\
111\end{eqnarray}
112
113\subsubsection{Scilab implementation}
114
115On peut expérimenter facilement dans la console Scilab v5.0.2, prenant soin
116de choisir un cas test qui pose problème.
117
118\begin{verbatim}
119a=1+%i*1
120b=1+%i*1e308
121a/b
122\end{verbatim}
123
124Le test montre que l'implémentation dans Scilab correspond à celle
125de Smith, puisque, dans ce cas, le résultat est correct :
126
127\begin{verbatim}
128-->a=1+%i*1
129 a =
130
131 1. + i
132
133-->b=1+%i*1e308
134 b =
135
136 1. + 1.000+308i
137
138-->a/b
139 ans =
140
141 1.000-308 - 1.000-308i
142\end{verbatim}
143
144Si on effectue le calcul pas à pas, on trouve que le
145code utilisé correspond au code source de "wwdiv" du module
146elementary\_functions :
147
148\begin{verbatim}
149subroutine wwdiv(ar, ai, br, bi, cr, ci, ierr)
150\end{verbatim}
151
152qui implémente la formule de Smith.
153Or, comme on l'a vu, l'algorithme de Smith possède une robustesse limitée.
154
155On peut également tester les limites de la méthode de Smith, en
156expérimentant de la manière suivante :
157
158\begin{verbatim}
159-->a = 1e307 + %i * 1e-307
160 a =
161 1.000+307 + 1.000-307i
162-->b = 1e205 + %i * 1e-205
163 b =
164 1.000+205 + 1.000-205i
165-->a/b
166 ans =
167 1.000+102
168\end{verbatim}
169
170Cette réponse est fausse, mais correspond bien au résultat
171de la méthode de Smith.
172
173Par ailleurs, il est intéressant de constater que la procédure wwdiv n'est
174pas utilisée systématiquement dans Scilab.
175En effet, on peut trouver des sections de code utilisant l'implémentation
176de la division complexe fournie par le compilateur.
177
178Par exemple, la fonction lambda = spec(A,B) calcule les valeurs propres
179généralisées des matrices complexes A et B. L'interface vers la fonction
180zggev de Lapack est implémentée dans intzggev, dans laquelle on peut trouver
181les lignes suivantes
182
183\begin{verbatim}
184 do 15 i = 1, N
185 zstk(lALPHA-1+i)=zstk(lALPHA-1+i)/zstk(lBETA-1+i)
186 15 continue
187\end{verbatim}
188
189Ce morceau de code permet de stocker dans le tableau complex
190associé à la variable alpha le résultat de la division de alpha/beta.
191Comme les deux opérandes sont de type complexe, c'est le compilateur
192fortran qui implémente la division (et, bien sûr, sans utiliser wwdiv).
193
194
195\subsubsection{Fortran code}
196
197Le code fortran suivant permet d'illustrer l'ensemble des points
198présentés précédement. On le test avec le compilateur Intel Fortran 10.1.
199
200\begin{verbatim}
201
202 program rndof
203 complex*16 a
204 complex*16 b
205 complex*16 c
206 double precision ar
207 double precision ai
208 double precision br
209 double precision bi
210 double precision cr
211 double precision ci
212c Check that naive implementation does not have a bug
213 ar = 1.d0
214 ai = 2.d0
215 br = 3.d0
216 bi = 4.d0
217 call compare(ar, ai, br, bi)
218c Check that naive implementation is not robust with respect to overflow
219 ar = 1.d0
220 ai = 1.d0
221 br = 1.d0
222 bi = 1.d308
223 call compare(ar, ai, br, bi)
224c Check that naive implementation is not robust with respect to underflow
225 ar = 1.d0
226 ai = 1.d0
227 br = 1.d-308
228 bi = 1.d-308
229 call compare(ar, ai, br, bi)
230c Check that Smith implementation is not robust with respect to complicated underflow
231 ar = 1.d307
232 ai = 1.d-307
233 br = 1.d205
234 bi = 1.d-205
235 call compare(ar, ai, br, bi)
236 end
237
238 subroutine naive(ar, ai, br, bi, cr, ci)
239 double precision, intent(in) :: ar
240 double precision, intent(in) :: ai
241 double precision, intent(in) :: br
242 double precision, intent(in) :: bi
243 double precision, intent(out) :: cr
244 double precision, intent(out) :: ci
245 double precision den
246 den = br * br + bi * bi
247 cr = (ar * br + ai * bi) / den
248 ci = (ai * br - ar * bi) / den
249 end
250
251 subroutine smith(ar, ai, br, bi, cr, ci)
252 double precision, intent(in) :: ar
253 double precision, intent(in) :: ai
254 double precision, intent(in) :: br
255 double precision, intent(in) :: bi
256 double precision, intent(out) :: cr
257 double precision, intent(out) :: ci
258 double precision den
259 if (abs(br) .ge. abs(bi)) then
260 r = bi / br
261 den = br + r*bi
262 cr = (ar + ai*r) / den
263 ci = (ai - ar*r) / den
264 else
265 r = br / bi
266 den = bi + r*br
267 cr = (ar*r + ai) / den
268 ci = (ai*r - ar) / den
269 endif
270 end
271
272 subroutine compare(ar, ai, br, bi)
273 double precision, intent(in) :: ar
274 double precision, intent(in) :: ai
275 double precision, intent(in) :: br
276 double precision, intent(in) :: bi
277 complex*16 a
278 complex*16 b
279 complex*16 c
280 double precision cr
281 double precision ci
282 print *, "****************"
283 call naive(ar, ai, br, bi, cr, ci)
284 print *, "Naive :", cr, ci
285 call smith(ar, ai, br, bi, cr, ci)
286 print *, "Smith :", cr, ci
287 a= dcmplx(ar, ai)
288 b = dcmplx(br, bi)
289 c = a/b
290 print * , "Fortran:", c
291 end
292
293\end{verbatim}
294
295Si on compile ce code avec Intel Fortran 10.1, on obtient
296l'affichage suivant dans la console.
297
298\begin{verbatim}
299 ****************
300 c naive: 0.440000000000000 8.000000000000000E-002
301 c Smith: 0.440000000000000 8.000000000000000E-002
302 c Fortran: (0.440000000000000,8.000000000000000E-002)
303 ****************
304 c naive: 0.000000000000000E+000 0.000000000000000E+000
305 c Smith: 9.999999999999999E-309 -9.999999999999999E-309
306 c Fortran: (9.999999999999999E-309,-9.999999999999999E-309)
307 ****************
308 c naive: Infinity NaN
309 c Smith: 1.000000000000000E+308 0.000000000000000E+000
310 c Fortran: (1.000000000000000E+308,0.000000000000000E+000)
311 ****************
312 c naive: 0.000000000000000E+000 0.000000000000000E+000
313 c Smith: 1.000000000000000E+102 0.000000000000000E+000
314 c Fortran: (9.999999999999999E+101,-9.999999999999999E-309)
315\end{verbatim}
316
317Le quatrième test montre que l'implémentation fournie par
318le compilateur Intel donne un résultat correct, bien que la
319méthode de Smith donne de mauvais résultats.
320
321\subsubsection{C Code}
322
323Le code C++ suivant illustre le traitement du problème.
324Il est fondé sur le type "double complex" et fonctionne avec le compilateur
325Intel C 11.0 avec la configuration du standard C99 dans l'environnement Visual Studio.
326
327\begin{verbatim}
328#include <stdio.h>
329#include <math.h>
330#include <complex.h>
331
332//
333// naive --
334// Compute the complex division with a naive method.
335//
336void naive (double ar, double ai, double br, double bi, double * cr, double * ci)
337{
338 double den;
339 den = br * br + bi * bi;
340 *cr = (ar * br + ai * bi) / den;
341 *ci = (ai * br - ar * bi) / den;
342}
343
344//
345// smith --
346// Compute the complex division with Smith's method.
347//
348void smith (double ar, double ai, double br, double bi, double * cr, double * ci)
349{
350 double den;
351 double r;
352 double abr;
353 double abi;
354 abr = fabs(br);
355 abi = fabs(bi);
356 if ( abr >= abi)
357 {
358 r = bi / br;
359 den = br + r*bi;
360 *cr = (ar + ai*r) / den;
361 *ci = (ai - ar*r) / den;
362 }
363 else
364 {
365 r = br / bi;
366 den = bi + r*br;
367 *cr = (ar*r + ai) / den;
368 *ci = (ai*r - ar) / den;
369 }
370}
371
372//
373// compare --
374// Compare 3 methods for complex division:
375// * naive method
376// * Smith method
377// * C99 method
378//
379void compare (double ar, double ai, double br, double bi)
380{
381 double complex a;
382 double complex b;
383 double complex c;
384
385 double cr;
386 double ci;
387
388 printf("****************\n");
389
390 naive(ar, ai, br, bi, &cr, &ci);
391 printf("Naif --> c = %e + %e * I\n" , cr , ci );
392
393 smith(ar, ai, br, bi, &cr, &ci);
394 printf("Smith --> c = %e + %e * I\n" , cr , ci );
395
396 a = ar + ai*I;
397 b = br + bi*I;
398 c = a / b;
399 printf("C --> c = %e + %e * I\n" , creal(c) , cimag(c) );
400}
401
402
403int main(void)
404{
405 double ar;
406 double ai;
407 double br;
408 double bi;
409
410
411 // Check that naive implementation does not have a bug
412 ar = 1;
413 ai = 2;
414 br = 3;
415 bi = 4;
416 compare (ar, ai, br, bi);
417
418 // Check that naive implementation is not robust with respect to overflow
419 ar = 1;
420 ai = 1;
421 br = 1;
422 bi = 1e307;
423 compare (ar, ai, br, bi);
424
425 // Check that naive implementation is not robust with respect to underflow
426 ar = 1;
427 ai = 1;
428 br = 1e-308;
429 bi = 1e-308;
430 compare (ar, ai, br, bi);
431
432 // Check that Smith method is not robust in complicated cases
433 ar = 1e307;
434 ai = 1e-307;
435 br = 1e205;
436 bi = 1e-205;
437 compare (ar, ai, br, bi);
438
439 return 0;
440}
441\end{verbatim}
442
443Voici le résultat qui apparaît dans la console :
444
445\begin{verbatim}
446****************
447Naif --> c = 4.400000e-001 + 8.000000e-002 * I
448Smith --> c = 4.400000e-001 + 8.000000e-002 * I
449C --> c = 4.400000e-001 + 8.000000e-002 * I
450****************
451Naif --> c = 0.000000e+000 + -0.000000e+000 * I
452Smith --> c = 1.000000e-307 + -1.000000e-307 * I
453C --> c = 1.000000e-307 + -1.000000e-307 * I
454****************
455Naif --> c = 1.#INF00e+000 + -1.#IND00e+000 * I
456Smith --> c = 1.000000e+308 + 0.000000e+000 * I
457C --> c = 1.000000e+308 + 0.000000e+000 * I
458****************
459Naif --> c = -1.#IND00e+000 + -0.000000e+000 * I
460Smith --> c = 1.000000e+102 + 0.000000e+000 * I
461C --> c = 1.000000e+102 + -1.000000e-308 * I
462\end{verbatim}
463
464Cela montre que l'implémentation des nombres complexes dans la librairie
465fournie par Intel traite le problème de manière adéquate.
466
467\subsection{Explanations}
468\subsubsection{The Smith's method}
469
470C'est pourquoi les auteurs de Scilab, qui ont lu \cite{WhatEveryComputerScientist},
471ont implementé la formule de Smith \cite{368661} (mais ils citent
472Goldberg en référence, ce qui est une erreur) :
473
474\begin{algorithmic}
475\IF {$( |d| <= |c| )$}
476 \STATE $r \gets d / c$
477 \STATE $den \gets c + r * d$
478 \STATE $e \gets (a + b * r)/ den $
479 \STATE $f \gets (b - a * r)/ den $
480\ELSE
481 \STATE $r \gets c / d$
482 \STATE $den \gets d + r * c$
483 \STATE $e \gets (a * r + b)/den $
484 \STATE $f \gets (b * r - a)/den$
485\ENDIF
486\end{algorithmic}
487
488Dans le cas $(1+i)/(1+1e308 i)$, la méthode de Smith donne
489
490\begin{verbatim}
491si ( |1e308| <= |1| ) > test faux
492sinon
493 r = 1 / 1e308 = 0
494 den = 1e308 + 0 * 1 = 1e308
495 e = (1 * 0 + 1) / 1e308 = 1e-308
496 f = (1 * 0 - 1) / 1e308 = -1e-308
497\end{verbatim}
498
499ce qui est le résultat correct.
500
501Dans le cas $(1+i)/(1e-308+1e-308 i)$, la méthode de Smith donne
502
503\begin{verbatim}
504si ( |1e-308| <= |1e-308| ) > test vrai
505 r = 1e-308 / 1e308 = 1
506 den = 1e-308 + 1 * 1e-308 = 2e308
507 e = (1 + 1 * 1) / 2e308 = 1e308
508 f = (1 - 1 * 1) / 2e308 = 0
509\end{verbatim}
510
511ce qui est encore une fois le résultat correct.
512
513Il s'avère que le calcul de Smith, écrit en 1962, fonctionne bien dans un
514certain nombre de situations. L'article [4] cite une analyse de
515Hough qui donne une borne sur l'erreur réalisée par
516le calcul.
517\begin{verbatim}
518|zcomp - zref| <= eps |zref|
519\end{verbatim}
520
521\subsubsection{The limits of the Smith method}
522
523L'article \cite{214414} (1985) toutefois, fait la distinction
524entre la norme |zcomp - zref| et la valeur des
525parties imaginaires et réelles. Il montre en particulier un
526exemple dans lequel la partie imaginaire est erronée.
527
528Supposons que m et n sont des entiers possédant les propriétés suivantes
529\begin{verbatim}
530m >> 0
531n >> 0
532n >> m
533\end{verbatim}
534
535On peut alors facilement démontrer que la division complexe
536suivante peut être approchée :
537
538\begin{verbatim}
53910^n + i 10^-n
540-------------- = 10^(n-m) - i 10^(n-3m)
54110^m + i 10^-m
542\end{verbatim}
543
544On décide alors de choisir les nombres n et m inférieurs à 308 mais de telle sorte
545que
546\begin{verbatim}
547n - 3 m = -308
548\end{verbatim}
549
550Par exemple le couple m=205, n=307 satisfait les égalités précédentes
551de telle sorte que
552
553\begin{verbatim}
55410^307 + i 10^-307
555------------------ = 10^102 - i 10^-308
55610^205 + i 10^-205
557\end{verbatim}
558
559Il est facile de voir que ce dernier cas met en défaut la
560formulation naïve.
561Il est plus surprenant de constater que ce cas met également
562en défaut la formule de Smith. En effet, les opérations suivantes
563sont réalisées par la méthode de Smith
564
565\begin{verbatim}
566si ( |1e-205| <= |1e205| ) > test vrai
567 r = 1e-205 / 1e205 = 0
568 den = 1e205 + 0 * 1e-205 = 1e205
569 e = (10^307 + 10^-307 * 0) / 1e205 = 1e102
570 f = (10^-307 - 10^307 * 0) / 1e205 = 0
571\end{verbatim}
572
573On constate que la partie réelle est exacte tandis que la
574partie imaginaire est fausse. On peut également vérifier
575que le module du résultat est dominé par la partie réelle de
576telle sorte que l'inégalité |zcomp - zref| <= eps |zref| reste
577vérifiée.
578
579Les limites de la méthode de Smith ont étées levées dans \cite{214414}.
580L'algorithme proposé est fondé sur une proposition qui démontre
581que si n nombres x1...xn sont représentables alors min(xi) * max(xi)
582est également représentable. L'implémentation de
583la division complexe tire parti de cette proposition pour réaliser un
584calcul correct.
585
586Il s'avère que l'algorithme de Stewart est dépassé par l'algorithme
587de Li et Al \cite{567808}, mais également par celui de Kahan \cite{KAHAN1987},
588qui, d'après \cite{1039814}, est
589similaire à celui implémenté dans le standard C99.
590
591
diff --git a/scilab_doc/scilabisnotnaive/conclusion.tex b/scilab_doc/scilabisnotnaive/conclusion.tex
new file mode 100644
index 0000000..04d191a
--- /dev/null
+++ b/scilab_doc/scilabisnotnaive/conclusion.tex
@@ -0,0 +1,21 @@
1\section{Conclusion}
2
3We have presented several cases where the mathematically perfect
4algorithm (i.e. without obvious bugs) do not produce accurate results
5with the computer in particular situations.
6In this paper, we have shown that specific methods can be used to
7cure some of the problems. We have also shown that these methods
8do not cure all the problems.
9
10All Scilab algorithms take floating point values as inputs,
11and returns floating point values as output. Problems arrive when the
12intermediate calculations involve terms which are
13not representable as floating point values.
14
15That article should not discourage us
16from implementing our own algorithms. Rather, it should warn
17us and that some specific work is to do when we translate the
18mathematical material into a algorithm.
19That article shows us that accurate can be obtained with
20floating point numbers, provided that we are less \emph{naïve}.
21
diff --git a/scilab_doc/scilabisnotnaive/introduction.tex b/scilab_doc/scilabisnotnaive/introduction.tex
new file mode 100644
index 0000000..bda5c1b
--- /dev/null
+++ b/scilab_doc/scilabisnotnaive/introduction.tex
@@ -0,0 +1,56 @@
1\section{Introduction}
2
3There is a huge space between the numerical formula, as
4presented in basic mathematical books, and the implementation
5of the numerical algorithm, and especially in Scilab.
6While the mathematic theory deals with the correctness
7of the formulas, \emph{Scilab take cares with your numbers}.
8
9This difficulty is generated by the fact that, while
10the mathematics treat with \emph{real} numbers, the
11computer deals with their \emph{floating point representations}.
12
13A central reference on this subject is the article
14by Goldberg, "What Every Computer Scientist Should Know About Floating-Point Arithmetic",
15\cite{WhatEveryComputerScientist}.
16If one focuses on numerical algorithms, the "Numerical Recipes" \cite{NumericalRecipes},
17is another good sources of solutions for that problem.
18The work of Kahan is also central in this domain, for example \cite{Kahan2004}.
19
20In this article, we will show examples of these problems by
21using the following theoric and experimental approach.
22\begin{enumerate}
23\item First, we will derive the basic theory at the core of a numerical
24formula.
25\item Then we will implement it in Scilab and compare with the
26result given by the primitive provided by Scilab.
27As we will see, some particular cases do not work well
28with our formula, while the Scilab primitive computes a correct
29result.
30\item Then we will analyse the \emph{reasons} of the differences.
31\end{enumerate}
32
33When we compute errors, we use the relative error formula
34\begin{eqnarray}
35e_r=\frac{|x_c-x_e|}{|x_e|}, \qquad x_e\neq 0
36\end{eqnarray}
37where $x_c\in\RR$ is the computed value, and $x_e\in\RR$ is the
38expected value, i.e. the mathematically exact result.
39The relative error is linked with the number of significant
40digits in the computed value $x_c$. For example, if the relative
41error $e_r=10^{-6}$, then the number of significant digits is 6.
42
43When the expected value is zero, the relative error cannot
44be computed, and we then use the absolute error
45\begin{eqnarray}
46e_a=|x_c-x_e|.
47\end{eqnarray}
48
49Before getting into the details, it is important to
50know that real variables in the Scilab language are stored in
51\emph{double precision} variables. Since Scilab is
52following the IEEE 754 standard, that means that real
53variables are stored with 64 bits precision.
54As we shall see later, this has a strong influence on the
55results.
56
diff --git a/scilab_doc/scilabisnotnaive/numericalderivative.tex b/scilab_doc/scilabisnotnaive/numericalderivative.tex
new file mode 100644
index 0000000..54f4e52
--- /dev/null
+++ b/scilab_doc/scilabisnotnaive/numericalderivative.tex
@@ -0,0 +1,412 @@
1\section{Numerical derivatives}
2
3In this section, we detail the computation of the numerical derivative of
4a given function.
5
6In the first part, we briefly report the first order forward formula, which
7is based on the Taylor theorem.
8We then present the naïve algorithm based on these mathematical formulas.
9In the second part, we make some experiments in Scilab and compare our
10naïve algorithm with the \emph{derivative} Scilab primitive.
11In the third part, we analyse
12why and how floating point numbers must be taken into account when the
13numerical derivatives are to compute.
14
15A reference for numerical derivates
16is \cite{AbramowitzStegun1972}, chapter 25. "Numerical Interpolation,
17Differentiation and Integration" (p. 875).
18The webpage \cite{schimdtnd} and the book \cite{NumericalRecipes} give
19results about the rounding errors.
20
21\subsection{Theory}
22
23The basic result is the Taylor formula with one variable \cite{dixmier}
24
25\begin{eqnarray}
26f(x+h) &=& f(x)
27+ h f^\prime(x)
28+\frac{h^2}{2} f^{\prime \prime}(x)
29+\frac{h^3}{6} f^{\prime \prime \prime}(x)
30+\frac{h^4}{24} f^{\prime \prime \prime \prime}(x) + \mathcal{O}(h^5)
31\end{eqnarray}
32
33If we write the Taylor formulae of a one variable function $f(x)$
34\begin{eqnarray}
35f(x+h) &\approx& f(x) + h \frac{\partial f}{\partial x}+ \frac{h^2}{2} f^{\prime \prime}(x)
36\end{eqnarray}
37we get the forward difference which approximates the first derivate at order 1
38\begin{eqnarray}
39f^\prime(x) &\approx& \frac{f(x+h) - f(x)}{h} + \frac{h}{2} f^{\prime \prime}(x)
40\end{eqnarray}
41
42The naive algorithm to compute the numerical derivate of
43a function of one variable is presented in figure \ref{naive-numericalderivative}.
44
45\begin{figure}[htbp]
46\begin{algorithmic}
47\STATE $f'(x) \gets (f(x+h)-f(x))/h$
48\end{algorithmic}
49\caption{Naive algorithm to compute the numerical derivative of a function of one variable}
50\label{naive-numericalderivative}
51\end{figure}
52
53\subsection{Experiments}
54
55The following Scilab function is a straitforward implementation
56of the previous algorithm.
57
58\lstset{language=Scilab}
59\lstset{numbers=left}
60\lstset{basicstyle=\footnotesize}
61\lstset{keywordstyle=\bfseries}
62\begin{lstlisting}
63function fp = myfprime(f,x,h)
64 fp = (f(x+h) - f(x))/h;
65endfunction
66\end{lstlisting}
67
68In our experiments, we will compute the derivatives of the
69square function $f(x)=x^2$, which is $f'(x)=2x$.
70The following Scilab script implements the square function.
71
72\lstset{language=Scilab}
73\lstset{numbers=left}
74\lstset{basicstyle=\footnotesize}
75\lstset{keywordstyle=\bfseries}
76\begin{lstlisting}
77function y = myfunction (x)
78 y = x*x;
79endfunction
80\end{lstlisting}
81
82The most naïve idea is that the computed relative error
83is small when the step $h$ is small. Because \emph{small}
84is not a priori clear, we take $\epsilon\approx 10^{-16}$
85in double precision as a good candidate for \emph{small}.
86In the following script, we compare the computed
87relative error produced by our naïve method with step
88$h=\epsilon$ and the \emph{derivative} primitive with
89default step.
90
91\lstset{language=Scilab}
92\lstset{numbers=left}
93\lstset{basicstyle=\footnotesize}
94\lstset{keywordstyle=\bfseries}
95\begin{lstlisting}
96x = 1.0;
97fpref = derivative(myfunction,x,order=1);
98e = abs(fpref-2.0)/2.0;
99mprintf("Scilab f''=%e, error=%e\n", fpref,e);
100h = 1.e-16;
101fp = myfprime(myfunction,x,h);
102e = abs(fp-2.0)/2.0;
103mprintf("Naive f''=%e, h=%e, error=%e\n", fp,h,e);
104\end{lstlisting}
105
106When executed, the previous script prints out :
107
108\begin{verbatim}
109Scilab f'=2.000000e+000, error=7.450581e-009
110Naive f'=0.000000e+000, h=1.000000e-016, error=1.000000e+000
111\end{verbatim}
112
113Our naïve method seems to be quite inaccurate and has not
114even 1 significant digit !
115The Scilab primitive, instead, has 9 significant digits.
116
117Since our faith is based on the truth of the mathematical
118theory, some deeper experiments must be performed.
119We then make the following experiment, by taking an
120initial step $h=1.0$ and then dividing $h$ by 10 at each
121step of a loop with 20 iterations.
122
123\lstset{language=Scilab}
124\lstset{numbers=left}
125\lstset{basicstyle=\footnotesize}
126\lstset{keywordstyle=\bfseries}
127\begin{lstlisting}
128x = 1.0;
129fpref = derivative(myfunction,x,order=1);
130e = abs(fpref-2.0)/2.0;
131mprintf("Scilab f''=%e, error=%e\n", fpref,e);
132h = 1.0;
133for i=1:20
134 h=h/10.0;
135 fp = myfprime(myfunction,x,h);
136 e = abs(fp-2.0)/2.0;
137 mprintf("Naive f''=%e, h=%e, error=%e\n", fp,h,e);
138end
139\end{lstlisting}
140
141Scilab then produces the following output.
142
143\begin{verbatim}
144Scilab f'=2.000000e+000, error=7.450581e-009
145Naive f'=2.100000e+000, h=1.000000e-001, error=5.000000e-002
146Naive f'=2.010000e+000, h=1.000000e-002, error=5.000000e-003
147Naive f'=2.001000e+000, h=1.000000e-003, error=5.000000e-004
148Naive f'=2.000100e+000, h=1.000000e-004, error=5.000000e-005
149Naive f'=2.000010e+000, h=1.000000e-005, error=5.000007e-006
150Naive f'=2.000001e+000, h=1.000000e-006, error=4.999622e-007
151Naive f'=2.000000e+000, h=1.000000e-007, error=5.054390e-008
152Naive f'=2.000000e+000, h=1.000000e-008, error=6.077471e-009
153Naive f'=2.000000e+000, h=1.000000e-009, error=8.274037e-008
154Naive f'=2.000000e+000, h=1.000000e-010, error=8.274037e-008
155Naive f'=2.000000e+000, h=1.000000e-011, error=8.274037e-008
156Naive f'=2.000178e+000, h=1.000000e-012, error=8.890058e-005
157Naive f'=1.998401e+000, h=1.000000e-013, error=7.992778e-004
158Naive f'=1.998401e+000, h=1.000000e-014, error=7.992778e-004
159Naive f'=2.220446e+000, h=1.000000e-015, error=1.102230e-001
160Naive f'=0.000000e+000, h=1.000000e-016, error=1.000000e+000
161Naive f'=0.000000e+000, h=1.000000e-017, error=1.000000e+000
162Naive f'=0.000000e+000, h=1.000000e-018, error=1.000000e+000
163Naive f'=0.000000e+000, h=1.000000e-019, error=1.000000e+000
164Naive f'=0.000000e+000, h=1.000000e-020, error=1.000000e+000
165\end{verbatim}
166
167We see that the relative error begins by decreasing, and then
168is increasing.
169Obviously, the optimum step is approximately $h=10^{-8}$, where the
170relative error is approximately $e_r=6.10^{-9}$.
171We should not be surprised to see that Scilab has computed
172a derivative which is near the optimum.
173
174\subsection{Explanations}
175
176\subsubsection{Floating point implementation}
177
178With a floating point computer, the total
179error that we get from the forward difference approximation
180is (skipping the multiplication constants) the sum of the
181linearization error $E_l = h$ (i.e. the $\mathcal{O}(h)$ term)
182and the rounding error $rf(x)$ on the difference $f(x+h) - f(x)$
183\begin{eqnarray}
184E = \frac{rf(x)}{h} + \frac{h}{2} f^{\prime \prime}(x)
185\end{eqnarray}
186When $h\rightarrow \infty$, the error is then the sum of a
187term which converges toward $+\infty$ and a term which converges toward 0.
188The total error is minimized when both terms are equal.
189With a single precision computation, the rounding error is $r = 10^{-7}$
190and with a double precision computation, the rounding error is $r = 10^{-16}$.
191We make here the assumption that the values $f(x)$ and
192$f^{\prime \prime}(x)$ are near 1 so that the error can be written
193\begin{eqnarray}
194E = \frac{r}{h} + h
195\end{eqnarray}
196We want to compute the step $h$ from the rounding error $r$ with a
197step satisfying
198\begin{eqnarray}
199h = r^\alpha
200\end{eqnarray}
201for some $\alpha > 0$.
202The total error is therefore
203\begin{eqnarray}
204E = r^{1-\alpha} + r^\alpha
205\end{eqnarray}
206The total error is minimized when both terms are equal, that is,
207when the exponents are equal $1-\alpha = \alpha$ which leads to
208\begin{eqnarray}
209\alpha = \frac{1}{2}
210\end{eqnarray}
211We conclude that the step which minimizes the error is
212\begin{eqnarray}
213h = r^{1/2}
214\end{eqnarray}
215and the associated error is
216\begin{eqnarray}
217E = 2 r^{1/2}
218\end{eqnarray}
219
220Typical values with single precision are $h = 10^{-4}$ and $E=2. 10^{-4}$
221and with double precision $h = 10^{-8}$ and $E=2. 10^{-8}$.
222These are the minimum error which are achievable with a forward difference
223numerical derivate.
224
225To get a significant value of the step $h$, the step is computed
226with respect to the point where the derivate is to compute
227\begin{eqnarray}
228h = r^{1/2} x
229\end{eqnarray}
230
231One can generalize the previous computation with the
232assumption that the scaling parameter from the Taylor
233expansion is $h^{\alpha_1}$ and the order of the formula
234is $\mathcal{O}(h^{\alpha_2})$. The total error is then
235\begin{eqnarray}
236E = \frac{r}{h^{\alpha_1}} + h^{\alpha_2}
237\end{eqnarray}
238The optimal step is then
239\begin{eqnarray}
240h = r^{\frac{1}{\alpha_1 + \alpha_2}}
241\end{eqnarray}
242and the associated error is
243\begin{eqnarray}
244E = 2 r^{\frac{\alpha_2}{\alpha_1 + \alpha_2}}
245\end{eqnarray}
246
247
248An additional trick \cite{NumericalRecipes} is to compute the
249step $h$ so that the rounding error for the sum $x+h$ is minimum.
250This is performed by the following algorithm, which implies a temporary
251variable $t$
252\begin{eqnarray}
253t = x + h\\
254h = t - h
255\end{eqnarray}
256
257
258\subsubsection{Results}
259
260In the following results, the variable $x$ is either a
261scalar $x^in \RR$ or a vector $x\in \RR^n$.
262When $x$ is a vector, the step $h_i$ is defined by
263\begin{eqnarray}
264h_i = (0,\ldots,0,1,0,\ldots,0)
265\end{eqnarray}
266so that the only non-zero component of $h_i$ is the $i$-th component.
267
268\begin{itemize}
269
270\item First derivate : forward 2 points
271\begin{eqnarray}
272f^\prime(x) &\approx& \frac{f(x+h) - f(x)}{h} + \mathcal{O}(h)
273\end{eqnarray}
274Optimal step : $h = r^{1/2}$ and error $E=2r^{1/2}$.\\
275Single precision : $h \approx 10^{-4}$ and $E\approx 10^{-4}$.\\
276Double precision $h \approx 10^{-8}$ and $E\approx 10^{-8}$.
277
278\item First derivate : backward 2 points
279\begin{eqnarray}
280f^\prime(x) &\approx& \frac{f(x) - f(x-h)}{h} + \mathcal{O}(h)
281\end{eqnarray}
282Optimal step : $h = r^{1/2}$ and error $E=2r^{1/2}$.\\
283Single precision : $h \approx 10^{-4}$ and $E\approx 10^{-4}$.\\
284Double precision $h \approx 10^{-8}$ and $E\approx 10^{-8}$.
285
286\item First derivate : centered 2 points
287\begin{eqnarray}
288f^\prime(x) &\approx& \frac{f(x+h) - f(x-h)}{2h} + \mathcal{O}(h^2)
289\end{eqnarray}
290Optimal step : $h = r^{1/3}$ and error $E=2r^{2/3}$.\\
291Single precision : $h \approx 10^{-3}$ and $E\approx 10^{-5}$.\\
292Double precision $h \approx 10^{-5}$ and $E\approx 10^{-10}$.
293
294\end{itemize}
295
296\subsubsection{Robust algorithm}
297
298The robust algorithm to compute the numerical derivate of
299a function of one variable is presented in figure \ref{robust-numericalderivative}.
300
301\begin{figure}[htbp]
302\begin{algorithmic}
303\STATE $h \gets \sqrt{\epsilon}$
304\STATE $f'(x) \gets (f(x+h)-f(x))/h$
305\end{algorithmic}
306\caption{A more robust algorithm to compute the numerical derivative of a function of one variable}
307\label{robust-numericalderivative}
308\end{figure}
309
310\subsection{One step further}
311
312As we can see, the \emph{derivative} command is based on
313floating point comprehension. Is it completely \emph{bulletproof} ?
314Not exactly.
315
316See for example the following Scilab session.
317
318\begin{verbatim}
319-->fp = derivative(myfunction,1.e-100,order=1)
320 fp =
321 0.0000000149011611938477
322-->fe=2.e-100
323 fe =
324 2.000000000000000040-100
325-->e = abs(fp-fe)/fe
326 e =
327 7.450580596923828243D+91
328\end{verbatim}
329
330The exact answer is $x_e=2. \times 10^{-100}$ so that the
331result does not have any significant digits.
332
333The additionnal experiment
334
335The explanation is that the step is computed with $h = \sqrt(eps)\approx 10^{-8}$.
336Then $f(x+h)=f(10^{-100} + 10^{-8}) \approx f(10^{-8}) = 10^{-16}$, because the
337term $10^{-100}$ is much smaller than $10^{-8}$. The result of the
338computation is therefore $(f(x+h) - f(x))/h = (10^{-16} + 10^{-200})/10^{-8} \approx 10^{-8}$.
339
340The additionnal experiment
341
342\begin{verbatim}
343-->sqrt(%eps)
344 ans =
345 0.0000000149011611938477
346\end{verbatim}
347
348explains the final result.
349
350To improve the accuracy of the computation, one can take control of the
351step $h$. A reasonable solution is to use $h=\sqrt{\epsilon}*x$ so that the
352step is scaled depending on $x$.
353The following script illustrates than method, which produces
354results with 8 significant digits.
355
356\begin{verbatim}
357-->fp = derivative(myfunction,1.e-100,order=1,h=sqrt(%eps)*1.e-100)
358 fp =
359 2.000000013099139394-100
360-->fe=2.e-100
361 fe =
362 2.000000000000000040-100
363-->e = abs(fp-fe)/fe
364 e =
365 0.0000000065495696770794
366\end{verbatim}
367
368But when $x$ is exactly zero, the scaling method cannot work, because
369it would produce the step $h=0$, and therefore a division by zero
370exception. In that case, the default step provides a good accuracy.
371
372The \emph{numdiff} command uses the step
373\begin{eqnarray}
374h=\sqrt{\epsilon}(1+10^{-3}|x|)
375\end{eqnarray}
376
377As we can see the following session, the behaviour is approximately
378the same when the value of $x$ is 1.
379
380\begin{verbatim}
381-->fp = numdiff(myfunction,1.0)
382 fp =
383 2.0000000189353417390237
384-->fe=2.0
385 fe =
386 2.
387-->e = abs(fp-fe)/fe
388 e =
389 9.468D-09
390\end{verbatim}
391
392The accuracy is slightly decreased with respect to the optimal
393value 7.450581e-009 which was produced by derivative. But the number
394of significant digits is approximately the same, i.e. 9 digits.
395
396The goal of this step is to produce good accuracy when the value of $x$
397is large, where the \emph{numdiff} command produces accurate
398results, while \emph{derivative} performs poorly.
399
400\begin{verbatim}
401-->numdiff(myfunction,1.e10)
402 ans =
403 2.000D+10
404-->derivative(myfunction,1.e10,order=1)
405 ans =
406 0.
407\end{verbatim}
408
409This step is a trade-off because it allows to keep a good accuracy with
410large values of $x$, but produces a slightly sub-optimal step size when
411$x$ is near 1. The behaviour near zero is the same.
412
diff --git a/scilab_doc/scilabisnotnaive/quadratic.tex b/scilab_doc/scilabisnotnaive/quadratic.tex
new file mode 100644
index 0000000..536f829
--- /dev/null
+++ b/scilab_doc/scilabisnotnaive/quadratic.tex
@@ -0,0 +1,504 @@
1\section{Quadratic equation}
2
3In this section, we detail the computation of the roots of a quadratic polynomial.
4As we shall see, there is a whole world from the mathematics formulas to the
5implementation of such computations. In the first part, we briefly report the formulas which allow to
6compute the real roots of a quadratic equation with real coefficients.
7We then present the naïve algorithm based on these mathematical formulas.
8In the second part, we make some experiments in Scilab and compare our
9naïve algorithm with the \emph{roots} Scilab primitive.
10In the third part, we analyse
11why and how floating point numbers must be taken into account when the
12implementation of such roots is required.
13
14\subsection{Theory}
15
16We consider the following quadratic equation, with real
17coefficients $a, b, c \in \RR$ \cite{wikipediaquadratic,wikipedialossofsign,mathworldquadratic} :
18
19\begin{eqnarray}
20a x^2 + b x + c = 0.
21\end{eqnarray}
22
23The real roots of the quadratic equations are
24\begin{eqnarray}
25x_- &=& \frac{-b- \sqrt{b^2-4ac}}{2a}, \label{real:x-}\\
26x_+ &=& \frac{-b+ \sqrt{b^2-4ac}}{2a}, \label{real:x+}
27\end{eqnarray}
28with the hypothesis that the discriminant $\Delta=b^2-4ac$
29is positive.
30
31The naive, simplified, algorithm to use to compute the roots of the
32quadratic is presented in figure \ref{naive-quadratic}.
33
34\begin{figure}[htbp]
35\begin{algorithmic}
36\STATE $\Delta\gets b^2-4ac$
37\STATE $s\gets \sqrt{\Delta}$
38\STATE $x_-\gets (-b-s)/(2a)$
39\STATE $x_+\gets (-b+s)/(2a)$
40\end{algorithmic}
41\caption{Naive algorithm to compute the real roots of a quadratic equation}
42\label{naive-quadratic}
43\end{figure}
44
45\subsection{Experiments}
46
47The following Scilab function is a straitforward implementation
48of the previous formulas.
49
50\lstset{language=Scilab}
51\lstset{numbers=left}
52\lstset{basicstyle=\footnotesize}
53\lstset{keywordstyle=\bfseries}
54\begin{lstlisting}
55function r=myroots(p)
56 c=coeff(p,0);
57 b=coeff(p,1);
58 a=coeff(p,2);
59 r=zeros(2,1);
60 r(1)=(-b+sqrt(b^2-4*a*c))/(2*a);
61 r(2)=(-b-sqrt(b^2-4*a*c))/(2*a);
62endfunction
63\end{lstlisting}
64
65The goal of this section is to show that some additionnal
66work is necessary to compute the roots of the quadratic equation
67with sufficient accuracy.
68We will especially pay attention to rounding errors and
69overflow problems.
70In this section, we show that the \emph{roots} command
71of the Scilab language is not \emph{naive}, in the sense that it
72takes into account for the floating point implementation details
73that we will see in the next section.
74
75
76
77\subsubsection{Rounding errors}
78
79We analyse the rounding errors which are
80appearing when the discriminant of the quadratic equation
81is such that $b^2\approx 4ac$.
82We consider the following quadratic equation
83\begin{eqnarray}
84\epsilon x^2 + (1/\epsilon)x - \epsilon = 0
85\end{eqnarray}
86with $\epsilon=0.0001=10^{-4}$.
87
88The two real solutions of the quadratic equation are
89\begin{eqnarray}
90x_- &=& \frac{-1/\epsilon- \sqrt{1/\epsilon^2+4\epsilon^2}}{2\epsilon} \approx -1/\epsilon^2, \\
91x_+ &=& \frac{-1/\epsilon+ \sqrt{1/\epsilon^2+4\epsilon^2}}{2\epsilon} \approx \epsilon^2
92\end{eqnarray}
93
94The following Scilab script shows an example of the computation
95of the roots of such a polynomial with the \emph{roots}
96primitive and with a naive implementation.
97Only the positive root $x_+ \approx \epsilon^2$ is considered in this
98test (the $x_-$ root is so that $x_- \rightarrow -\infty$ in both
99implementations).
100
101\lstset{language=Scilab}
102\lstset{numbers=left}
103\lstset{basicstyle=\footnotesize}
104\lstset{keywordstyle=\bfseries}
105\begin{lstlisting}
106p=poly([-0.0001 10000.0 0.0001],"x","coeff");
107e1 = 1e-8;
108roots1 = myroots(p);
109r1 = roots1(1);
110roots2 = roots(p);
111r2 = roots2(1);
112error1 = abs(r1-e1)/e1;
113error2 = abs(r2-e1)/e1;
114printf("Expected : %e\n", e1);
115printf("Naive method : %e (error=%e)\n", r1,error1);
116printf("Scilab method : %e (error=%e)\n", r2, error2);
117\end{lstlisting}
118
119The script then prints out :
120
121\begin{verbatim}
122Expected : 1.000000e-008
123Naive method : 9.094947e-009 (error=9.050530e-002)
124Scilab method : 1.000000e-008 (error=1.654361e-016)
125\end{verbatim}
126
127The result is astonishing, since the naive root has
128no correct digit and a relative error which is 14 orders
129of magnitude greater than the relative error of the Scilab root.
130
131The explanation for such a behaviour is that the expression of the
132positive root is the following
133
134\begin{eqnarray}
135x_+ &=& \frac{-1/\epsilon+ \sqrt{1/\epsilon^2+4\epsilon^2}}{2\epsilon}
136\end{eqnarray}
137
138and is numerically evalutated as
139
140\begin{verbatim}
141\sqrt{1/\epsilon^2+4\epsilon^2} = 10000.000000000001818989
142\end{verbatim}
143
144As we see, the first digits are correct, but the last digits
145are polluted with rounding errors. When the expression $-1/\epsilon+ \sqrt{1/\epsilon^2+4\epsilon^2}$
146is evaluated, the following computations are performed~:
147
148\begin{verbatim}
149-1/\epsilon+ \sqrt{1/\epsilon^2+4\epsilon^2}
150 = -10000.0 + 10000.000000000001818989
151 = 0.0000000000018189894035
152\end{verbatim}
153
154The user may think that the result is extreme, but it
155is not. Reducing furter the value of $\epsilon$ down to
156$\epsilon=10^{-11}$, we get the following output :
157
158\begin{verbatim}
159Expected : 1.000000e-022
160Naive method : 0.000000e+000 (error=1.000000e+000)
161Scilab method : 1.000000e-022 (error=1.175494e-016)
162\end{verbatim}
163
164The relative error is this time 16 orders of magnitude
165greater than the relative error of the Scilab root.
166In fact, the naive implementation computes a false root $x_+$ even for
167a value of epsilon equal to $\epsilon=10^-3$, where the relative
168error is 7 times greater than the relative error produced by the
169\emph{roots} primitive.
170
171\subsubsection{Overflow}
172
173In this section, we analyse the overflow exception which is
174appearing when the discriminant of the quadratic equation
175is such that $b^2>> 4ac$.
176We consider the following quadratic equation
177\begin{eqnarray}
178x^2 + (1/\epsilon)x + 1 = 0
179\end{eqnarray}
180with $\epsilon\rightarrow 0$.
181
182The roots of this equation are
183\begin{eqnarray}
184x_- &\approx& -1/\epsilon \rightarrow -\infty, \qquad \epsilon \rightarrow 0\\
185x_+ &\approx& -\epsilon \rightarrow 0^-, \qquad \epsilon \rightarrow 0
186\end{eqnarray}
187To create a difficult case, we search $\epsilon$ so that
188$1/\epsilon^2 = 10^{310}$, because we know that $10^{308}$
189is the maximum value available with double precision floating
190point numbers. The solution is $\epsilon=10^{-155}$.
191
192The following Scilab script shows an example of the computation
193of the roots of such a polynomial with the \emph{roots}
194primitive and with a naive implementation.
195
196\lstset{language=Scilab}
197\lstset{numbers=left}
198\lstset{basicstyle=\footnotesize}
199\lstset{keywordstyle=\bfseries}
200\begin{lstlisting}
201// Test #3 : overflow because of b
202e=1.e-155
203a = 1;
204b = 1/e;
205c = 1;
206p=poly([c b a],"x","coeff");
207expected = [-e;-1/e];
208roots1 = myroots(p);
209roots2 = roots(p);
210error1 = abs(roots1-expected)/norm(expected);
211error2 = abs(roots2-expected)/norm(expected);
212printf("Expected : %e %e\n", expected(1),expected(2));
213printf("Naive method : %e %e (error=%e)\n", roots1(1),roots1(2),error1);
214printf("Scilab method : %e %e (error=%e)\n", roots2(1),roots2(2), error2);
215\end{lstlisting}
216
217The script then prints out :
218
219\begin{verbatim}
220Expected : -1.000000e-155 -1.000000e+155
221Naive method : Inf Inf (error=Nan)
222Scilab method : -1.000000e-155 -1.000000e+155 (error=0.000000e+000)
223\end{verbatim}
224
225As we see, the $b^2-4ac$ term has been evaluated as $1/\epsilon^2-4$,
226which is approximately equal to $10^{310}$. This number cannot
227be represented in a floating point number. It therefore produces the
228IEEE overflow exception and set the result as \emph{Inf}.
229
230\subsection{Explanations}
231
232The technical report by G. Forsythe \cite{Forsythe1966} is
233especially interesting on that subject. The paper by Goldberg
234\cite{WhatEveryComputerScientist} is also a good reference for the
235quadratic equation. One can also consult the experiments performed by
236Nievergelt in \cite{Nievergelt2003}.
237
238The following tricks are extracted from the
239\emph{quad} routine of the \emph{RPOLY} algorithm by
240Jenkins \cite{Jenkins1975}. This algorithm is used by Scilab in the
241roots primitive, where a special case is handled when the
242degree of the equation is equal to 2, i.e. a quadratic equation.
243
244\subsubsection{Properties of the roots}
245
246One can easily show that the sum and the product of the roots
247allow to recover the coefficients of the equation which was solve.
248One can show that
249\begin{eqnarray}
250x_- + x_+ &=&\frac{-b}{a}\\
251x_- x_+ &=&\frac{c}{a}
252\end{eqnarray}
253Put in another form, one can state that the computed roots are
254solution of the normalized equation
255\begin{eqnarray}
256x^2 - \left(\frac{x_- + x_+}{a}\right) x + x_- x_+ &=&0
257\end{eqnarray}
258
259Other transformation leads to an alternative form for the roots.
260The original quadratic equation can be written as a quadratic
261equation on $1/x$
262\begin{eqnarray}
263c(1/x)^2 + b (1/x) + a &=&0
264\end{eqnarray}
265Using the previous expressions for the solution of $ax^2+bx+c=0$ leads to the
266following expression of the roots of the quadratic equation when the
267discriminant is positive
268\begin{eqnarray}
269x_- &=& \frac{2c}{-b+ \sqrt{b^2-4ac}}, \label{real:x-inverse}\\
270x_+ &=& \frac{2c}{-b- \sqrt{b^2-4ac}} \label{real:x+inverse}
271\end{eqnarray}
272These roots can also be computed from \ref{real:x-}, with the
273multiplication by $-b+ \sqrt{b^2-4ac}$.
274
275\subsubsection{Conditionning of the problem}
276
277The conditionning of the problem may be evaluated with the
278computation of the partial derivatives of the roots of the
279equations with respect to the coefficients.
280These partial derivatives measure the sensitivity of the
281roots of the equation with respect to small errors which might
282pollute the coefficients of the quadratic equations.
283
284In the following, we note $x_-=\frac{-b- \sqrt{\Delta}}{2a}$
285and $x_+=\frac{-b+ \sqrt{\Delta}}{2a}$ when $a\neq 0$.
286If the discriminant is stricly positive and $a\neq 0$, i.e. if the roots
287of the quadratic are real, the partial derivatives of the
288roots are the following :
289\begin{eqnarray}
290\frac{\partial x_-}{\partial a} &=& \frac{c}{a\sqrt{\Delta}} + \frac{b+\sqrt{\Delta}}{2a^2}, \qquad a\neq 0, \qquad \Delta\neq 0\\
291\frac{\partial x_+}{\partial a} &=& -\frac{c}{a\sqrt{\Delta}} + \frac{b-\sqrt{\Delta}}{2a^2}\\
292\frac{\partial x_-}{\partial b} &=& \frac{-1-b/\sqrt{\Delta}}{2a}\\
293\frac{\partial x_+}{\partial b} &=& \frac{-1+b/\sqrt{\Delta}}{2a}\\
294\frac{\partial x_-}{\partial c} &=& \frac{1}{\sqrt{\Delta}}\\
295\frac{\partial x_+}{\partial c} &=& -\frac{1}{\sqrt{\Delta}}
296\end{eqnarray}
297
298If the discriminant is zero, the partial derivatives of the
299double real root are the following :
300\begin{eqnarray}
301\frac{\partial x_\pm}{\partial a} &=& \frac{b}{2a^2}, \qquad a\neq 0\\
302\frac{\partial x_\pm}{\partial b} &=& \frac{-1}{2a}\\
303\frac{\partial x_\pm}{\partial c} &=& 0
304\end{eqnarray}
305
306The partial derivates indicate that if $a\approx 0$ or $\Delta\approx 0$,
307the problem is ill-conditionned.
308
309
310
311\subsubsection{Floating-Point implementation : fixing rounding error}
312
313In this section, we show how to compute the roots of a
314quadratic equation with protection against rounding
315errors, protection against overflow and a minimum
316amount of multiplications and divisions.
317
318Few but important references deals with floating point
319implementations of the roots of a quadratic polynomial.
320These references include the important paper \cite{WhatEveryComputerScientist} by Golberg,
321the Numerical Recipes \cite{NumericalRecipes}, chapter 5, section 5.6
322and \cite{FORSYTHE1991}, \cite{Nievergelt2003}, \cite{Kahan2004}.
323
324The starting point is the mathematical solution of the quadratic equation,
325depending on the sign of the discriminant $\Delta=b^2 - 4ac$ :
326\begin{itemize}
327\item If $\Delta> 0$, there are two real roots,
328\begin{eqnarray}
329x_\pm &=& \frac{-b\pm \sqrt{\Delta}}{2a}, \qquad a\neq 0
330\end{eqnarray}
331\item If $\Delta=0$, there are one double root,
332\begin{eqnarray}
333x_\pm &=& -\frac{b}{2a}, \qquad a\neq 0
334\end{eqnarray}
335\item If $\Delta< 0$,
336\begin{eqnarray}
337x_\pm &=&\frac{-b}{2a} \pm i \frac{\sqrt{-\Delta}}{2a}, \qquad a\neq 0
338\end{eqnarray}
339\end{itemize}
340
341
342In the following, we make the hypothesis that $a\neq 0$.
343
344The previous experiments suggest that the floating point implementation
345must deal with two different problems :
346\begin{itemize}
347\item rounding errors when $b^2\approx 4ac$ because of the cancelation of the
348terms which have opposite signs,
349\item overflow in the computation of the discriminant $\Delta$ when $b$ is
350large in magnitude with respect to $a$ and $c$.
351\end{itemize}
352
353When $\Delta>0$, the rounding error problem can be splitted in two cases
354\begin{itemize}
355\item if $b<0$, then $-b+\sqrt{b^2-4ac}$ may suffer of rounding errors,
356\item if $b>0$, then $-b-\sqrt{b^2-4ac}$ may suffer of rounding errors.
357\end{itemize}
358
359Obviously, the rounding problem will not appear when $\Delta<0$,
360since the complex roots do not use the sum $-b+\sqrt{b^2-4ac}$.
361When $\Delta=0$, the double root does not cause further trouble.
362The rounding error problem must be solved only when $\Delta>0$ and the
363equation has two real roots.
364
365A possible solution may found in combining the following expressions for the
366roots
367\begin{eqnarray}
368x_- &=& \frac{-b- \sqrt{b^2-4ac}}{2a}, \label{real:x-2}\\
369x_- &=& \frac{2c}{-b+ \sqrt{b^2-4ac}}, \label{real:x-inverse2}\\
370x_+ &=& \frac{-b+ \sqrt{b^2-4ac}}{2a}, \label{real:x+2}\\
371x_+ &=& \frac{2c}{-b- \sqrt{b^2-4ac}} \label{real:x+inverse2}
372\end{eqnarray}
373
374The trick is to pick the formula so that the sign of $b$ is the
375same as the sign of the square root.
376
377The following choice allow to solve the rounding error problem
378\begin{itemize}
379\item compute $x_-$ : if $b<0$, then compute $x_-$ from \ref{real:x-inverse2}, else
380(if $b>0$), compute $x_-$ from \ref{real:x-2},
381\item compute $x_+$ : if $b<0$, then compute $x_+$ from \ref{real:x+2}, else
382(if $b>0$), compute $x_+$ from \ref{real:x+inverse2}.
383\end{itemize}
384
385The solution of the rounding error problem can be adressed, by considering the
386modified Fagnano formulas
387\begin{eqnarray}
388x_1 &=& -\frac{2c}{b+sgn(b)\sqrt{b^2-4ac}}, \\
389x_2 &=& -\frac{b+sgn(b)\sqrt{b^2-4ac}}{2a},
390\end{eqnarray}
391where
392\begin{eqnarray}
393sgn(b)=\left\{\begin{array}{l}
3941, \textrm{ if } b\geq 0,\\
395-1, \textrm{ if } b< 0,
396\end{array}\right.
397\end{eqnarray}
398The roots $x_{1,2}$ correspond to $x_{+,-}$ so that if $b<0$, $x_1=x_-$ and
399if $b>0$, $x_1=x_+$. On the other hand, if $b<0$, $x_2=x_+$ and
400if $b>0$, $x_2=x_-$.
401
402An additionnal remark is that the division by two (and the multiplication
403by 2) is exact with floating point numbers so these operations
404cannot be a source of problem. But it is
405interesting to use $b/2$, which involves only one division, instead
406of the three multiplications $2*c$, $2*a$ and $4*a*c$.
407This leads to the following expressions of the real roots
408\begin{eqnarray}
409x_- &=& -\frac{c}{(b/2)+sgn(b)\sqrt{(b/2)^2-ac}}, \\
410x_+ &=& -\frac{(b/2)+sgn(b)\sqrt{(b/2)^2-ac}}{a},
411\end{eqnarray}
412which can be simplified into
413\begin{eqnarray}
414b'&=&b/2\\
415h&=& -\left(b'+sgn(b)\sqrt{b'^2-ac}\right)\\
416x_1 &=& \frac{c}{h}, \\
417x_2 &=& \frac{h}{a},
418\end{eqnarray}
419where the discriminant is positive, i.e. $b'^2-ac>0$.
420
421One can use the same value $b'=b/2$ with the complex roots in the
422case where the discriminant is negative, i.e. $b'^2-ac<0$ :
423\begin{eqnarray}
424x_1 &=& -\frac{b'}{a} - i \frac{\sqrt{ac-b'^2}}{a}, \\
425x_2 &=& -\frac{b'}{a} + i \frac{\sqrt{ac-b'^2}}{a},
426\end{eqnarray}
427
428A more robust algorithm, based on the previous analysis is presented in figure \ref{robust-quadratic}.
429By comparing \ref{naive-quadratic} and \ref{robust-quadratic}, we can see that
430the algorithms are different in many points.
431
432\begin{figure}[htbp]
433\begin{algorithmic}
434\IF {$a=0$}
435 \IF {$b=0$}
436 \STATE $x_-\gets 0$
437 \STATE $x_+\gets 0$
438 \ELSE
439 \STATE $x_-\gets -c/b$
440 \STATE $x_+\gets 0$
441 \ENDIF
442\ELSIF {$c=0$}
443 \STATE $x_-\gets -b/a$
444 \STATE $x_+\gets 0$
445\ELSE
446 \STATE $b'\gets b/2$
447 \STATE $\Delta\gets b'^2 - ac$
448 \IF {$\Delta<0$}
449 \STATE $s\gets \sqrt{-\Delta}$
450 \STATE $x_1^R\gets -b'/a$
451 \STATE $x_1^I\gets -s/a$
452 \STATE $x_2^R\gets x_-^R$
453 \STATE $x_2^I\gets -x_1^I$
454 \ELSIF {$\Delta=0$}
455 \STATE $x_1\gets -b'/a$
456 \STATE $x_2\gets x_2$
457 \ELSE
458 \STATE $s\gets \sqrt{\Delta}$
459 \IF {$b>0$}
460 \STATE $g=1$
461 \ELSE
462 \STATE $g=-1$
463 \ENDIF
464 \STATE $h=-(b'+g*s)$
465 \STATE $x_1\gets c/h$
466 \STATE $x_2\gets h/a$
467 \ENDIF
468\ENDIF
469\end{algorithmic}
470\caption{A more robust algorithm to compute the roots of a quadratic equation}
471\label{robust-quadratic}
472\end{figure}
473
474\subsubsection{Floating-Point implementation : fixing overflow problems}
475
476The remaining problem is to compute $b'^2-ac$ without creating
477unnecessary overflows.
478
479Notice that a small improvment
480has allread been done : if $|b|$ is close to the upper bound $10^{154}$,
481then $|b'|$ may be less difficult to process since $|b'|=|b|/2 < |b|$.
482One can then compute the square root by using normalization methods,
483so that the overflow problem can be drastically reduced.
484The method is based on the fact that the term $b'^2-ac$ can be
485evaluted with two equivalent formulas
486\begin{eqnarray}
487b'^2-ac &=& b'^2\left[1-(a/b')(c/b')\right] \\
488b'^2-ac &=& c\left[b'(b'/c) - a\right]
489\end{eqnarray}
490
491\begin{itemize}
492\item If $|b'|>|c|>0$, then the expression involving $\left(1-(a/b')(c/b')\right)$
493is so that no overflow is possible since $|c/b'| < 1$ and the problem occurs
494only when $b$ is large in magnitude with respect to $a$ and $c$.
495\item If $|c|>|b'|>0$, then the expression involving $\left(b'(b'/c) - a\right)$
496should limit the possible overflows since $|b'/c| < 1$.
497\end{itemize}
498These normalization tricks are similar to the one used by Smith in the
499algorithm for the division of complex numbers \cite{Smith1962}.
500
501
502
503
504
diff --git a/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib b/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib
new file mode 100644
index 0000000..cdf9518
--- /dev/null
+++ b/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib
@@ -0,0 +1,193 @@
1#
2# polynomialroots.bib --
3#
4
5
6@book{AbramowitzStegun1972,
7author = {Abramowitz, M. and Stegun, I. A.},
8title = {Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables},
9year = {1972},
10publisher= {}}
11
12@UNPUBLISHED{wikipediaquadratic,
13 title = {Quadratic Equation},
14 note = {\url{http://en.wikipedia.org/wiki/Quadratic_equation}}
15 }
16
17@UNPUBLISHED{wikipediacubic,
18 title = {Cubic Equation},
19 note = {\url{http://en.wikipedia.org/wiki/Cubic_equation}}
20 }
21
22@UNPUBLISHED{wikipedialossofsign,
23 title = {Loss of significance},
24 note = {\url{http://en.wikipedia.org/wiki/Loss_of_significance}}
25 }
26
27@UNPUBLISHED{mathworldcubic,
28 title = {Cubic Equation},
29 note = {\url{http://mathworld.wolfram.com/CubicFormula.html}}
30 }
31
32@UNPUBLISHED{mathworldquadratic,
33 title = {Quadratic Equation},
34 note = {\url{http://mathworld.wolfram.com/QuadraticEquation.html}}
35 }
36
37@book{NumericalRecipes,
38author = {W. H. Press and Saul A. Teukolsky and William T. Vetterling and Brian P. Flannery},
39title = {Numerical Recipes in C, Second Edition},
40year = {1992},
41publisher= {}}
42
43@book{WhatEveryComputerScientist,
44author = {David Goldberg},
45title = {What Every Computer Scientist Should Know About Floating-Point Arithmetic},
46month={March},
47year = {1991},
48publisher= {Association for Computing Machinery, Inc.},
49note = {\url{http://www.physics.ohio-state.edu/~dws/grouplinks/floating_point_math.pdf}},
50}
51
52@article{Nievergelt2003,
53 jstor_articletype = {primary_article},
54 title = {How (Not) to Solve Quadratic Equations},
55 author = {Nievergelt, Yves},
56 journal = {The College Mathematics Journal},
57 jstor_issuetitle = {},
58 volume = {34},
59 number = {2},
60 jstor_formatteddate = {Mar., 2003},
61 pages = {90--104},
62 url = {http://www.jstor.org/stable/3595780},
63 ISSN = {07468342},
64 abstract = {},
65 publisher = {Mathematical Association of America},
66 language = {},
67 copyright = {Copyright © 2003 Mathematical Association of America},
68 year = {2003},
69 }
70
71@book{FORSYTHE1991,
72author = {George E. Forsythe},
73title = {How Do You Solve A Quadratic Equation ?},
74month={JUNE},
75year = {1966},
76publisher= {Computer Science Department, School of Humanities and Sciences, Stanford University},
77 note = {\url{ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/66/40/CS-TR-66-40.pdf}}
78}
79
80@article{Kahan2004,
81 title = {On the Cost of Floating-Point Computation Without Extra-Precise Arithmetic},
82 author = {W. Kahan},
83 note = {\url{http://www.cs.berkeley.edu/~wkahan/Qdrtcs.pdf}},
84 year = {2004},
85 }
86
87@article{Smith1962,
88 author = {Smith,, Robert L.},
89 title = {Algorithm 116: Complex division},
90 journal = {Commun. ACM},
91 volume = {5},
92 number = {8},
93 year = {1962},
94 issn = {0001-0782},
95 pages = {435},
96 doi = {http://doi.acm.org/10.1145/368637.368661},
97 publisher = {ACM},
98 address = {New York, NY, USA},
99 }
100
101@article{Jenkins1975,
102 author = {Jenkins,, M. A.},
103 title = {Algorithm 493: Zeros of a Real Polynomial [C2]},
104 journal = {ACM Trans. Math. Softw.},
105 volume = {1},
106 number = {2},
107 year = {1975},
108 issn = {0098-3500},
109 pages = {178--189},
110 doi = {http://doi.acm.org/10.1145/355637.355643},
111 publisher = {ACM},
112 address = {New York, NY, USA},
113 }
114
115@article{368661,
116 author = {Smith,, Robert L.},
117 title = {Algorithm 116: Complex division},
118 journal = {Commun. ACM},
119 volume = {5},
120 number = {8},
121 year = {1962},
122 issn = {0001-0782},
123 pages = {435},
124 doi = {http://doi.acm.org/10.1145/368637.368661},
125 publisher = {ACM},
126 address = {New York, NY, USA},
127 }
128@article{214414,
129 author = {Stewart,, G. W.},
130 title = {A note on complex division},
131 journal = {ACM Trans. Math. Softw.},
132 volume = {11},
133 number = {3},
134 year = {1985},
135 issn = {0098-3500},
136 pages = {238--241},
137 doi = {http://doi.acm.org/10.1145/214408.214414},
138 publisher = {ACM},
139 address = {New York, NY, USA},
140 }
141@article{567808,
142 author = {Li,, Xiaoye S. and Demmel,, James W. and Bailey,, David H. and Henry,, Greg and Hida,, Yozo and Iskandar,, Jimmy and Kahan,, William and Kang,, Suh Y. and Kapur,, Anil and Martin,, Michael C. and Thompson,, Brandon J. and Tung,, Teresa and Yoo,, Daniel J.},
143 title = {Design, implementation and testing of extended and mixed precision BLAS},
144 journal = {ACM Trans. Math. Softw.},
145 volume = {28},
146 number = {2},
147 year = {2002},
148 issn = {0098-3500},
149 pages = {152--205},
150 doi = {http://doi.acm.org/10.1145/567806.567808},
151 publisher = {ACM},
152 address = {New York, NY, USA},
153 }
154@article{KAHAN1987,
155 author = {KAHAN, W.},
156 title = {Branch cuts for complex elementary functmns, or much ado about nothing's sign bit.},
157 year = {1987},
158 pages = {165--211},
159 publisher = {In The State of the Art m Numerzcal Analysts. Proceedings of the Joint IMA/SIAM Conference, A. Iserles and M J. D Powell, Eds. Clarendon Press, Oxford, England},
160 }
161@article{1039814,
162 author = {Priest,, Douglas M.},
163 title = {Efficient scaling for complex division},
164 journal = {ACM Trans. Math. Softw.},
165 volume = {30},
166 number = {4},
167 year = {2004},
168 issn = {0098-3500},
169 pages = {389--401},
170 doi = {http://doi.acm.org/10.1145/1039813.1039814},
171 publisher = {ACM},
172 address = {New York, NY, USA},
173 }
174@UNPUBLISHED{schimdtnd,
175 title = {Numerical Derivatives},
176 note = {\url{http://fermi.la.asu.edu/PHY531/intro/node1.html}},
177 author = {K.E. Schmidt}
178 }
179@book{dixmier,
180author = {J. Dixmier, P. Dugac},
181title = {Cours de Mathématiques du premier cycle, 1ère année},
182year = {1969},
183publisher= {Gauthier-Villars}}
184
185@article{Forsythe1966,
186 author = {George E. Forsythe},
187 title = {How Do You Solve A Quadratic Equation ?},
188 year = {1966},
189 publisher = {TECHNICAL REPORT NO. CS40},
190 note = {\url{ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/66/40/CS-TR-66-40.pdf}},
191 }
192
193
diff --git a/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf b/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf
new file mode 100644
index 0000000..2b5f8d0
--- /dev/null
+++ b/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf
Binary files differ
diff --git a/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex b/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex
new file mode 100644
index 0000000..0c35b37
--- /dev/null
+++ b/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex
@@ -0,0 +1,138 @@
1%
2% scilabisnotnaive.tex --
3% Some notes about floating point issues in Scilab.
4%
5% Copyright 2008 Michael Baudin
6%
7\documentclass[12pt]{article}
8
9%% Good fonts for PDF
10\usepackage[cyr]{aeguill}
11
12%% Package for page headers
13\usepackage{fancyhdr}
14
15%% Package to include graphics
16%% Comment for DVI
17\usepackage[pdftex]{graphicx}
18
19%% Figures formats: jpeg or pdf
20%% Comment for DVI
21\DeclareGraphicsExtensions{.jpg,.pdf}
22
23%% Package to create Hyperdocuments
24%% Comment for DVI
25\usepackage[pdftex,colorlinks=true,linkcolor=blue,citecolor=blue,urlcolor=blue]{hyperref}
26
27%% Package to control printed area size
28\usepackage{anysize}
29%% ...by defining margins {left}{right}{top}{bottom}
30\marginsize{22mm}{14mm}{12mm}{25mm}
31
32%% Package used to include a bibliography
33\usepackage{natbib}
34
35%% R for real numbers
36\usepackage{amssymb}
37
38%% User defined commands
39
40%% Figure reference
41\newcommand{\figref}[1]{figure~\ref{#1}}
42
43%% Equation reference
44\newcommand{\Ref}[1]{(\ref{#1})}
45
46%% Emphasize a word or a group of words
47\newcommand{\empha}[1]{\textit{\textbf{#1}}}
48
49%% Derivation operators
50\newcommand{\D}{\partial}
51\newcommand{\Dt}{\partial_t}
52\newcommand{\Dx}{\partial_x}
53\newcommand{\Dy}{\partial_y}
54
55\usepackage{url}
56
57% Scilab macros
58\newcommand{\scimacro}[1]{\textit{#1}}
59\newcommand{\scicommand}[1]{\textit{#1}}
60
61% To highlight source code
62\usepackage{listings}
63\usepackage{algorithmic}
64
65% Define theorem environments
66\newtheorem{theorem}{Theorem}[section]
67\newtheorem{lemma}[theorem]{Lemma}
68\newtheorem{proposition}[theorem]{Proposition}
69\newtheorem{corollary}[theorem]{Corollary}
70
71\newenvironment{proof}[1][Proof]{\begin{trivlist}
72\item[\hskip \labelsep {\bfseries #1}]}{\end{trivlist}}
73\newenvironment{definition}[1][Definition]{\begin{trivlist}
74\item[\hskip \labelsep {\bfseries #1}]}{\end{trivlist}}
75\newenvironment{example}[1][Example]{\begin{trivlist}
76\item[\hskip \labelsep {\bfseries #1}]}{\end{trivlist}}
77\newenvironment{remark}[1][Remark]{\begin{trivlist}
78\item[\hskip \labelsep {\bfseries #1}]}{\end{trivlist}}
79
80\newcommand{\qed}{\nobreak \ifvmode \relax \else
81 \ifdim\lastskip<1.5em \hskip-\lastskip
82 \hskip1.5em plus0em minus0.5em \fi \nobreak
83 \vrule height0.75em width0.5em depth0.25em\fi}
84
85% Maths shortcuts
86\newcommand{\RR}{\mathbb{R}}
87
88% Algorithms
89\usepackage{algorithm2e}
90%%\usepackage{algorithmic}
91
92\begin{document}
93\author{Michael Baudin}
94\date{February 2009}
95\title{Scilab is not naïve}
96\maketitle
97\begin{abstract}
98Most of the time, the mathematical formula is
99directly used in the Scilab source code.
100But, in many algorithms, some additionnal work is
101performed, which takes into account the fact that
102the computer do not process mathematical real values,
103but performs computations with their floating
104point representation.
105The goal of this article is to show that, in many
106situations, Scilab is not naïve and use algorithms
107which have been specifically tailored for floating point
108computers. We analyse in this article the
109particular case of the quadratic equation, the
110complex division and the numerical derivatives,
111and show that one these examples, the naïve algorithm
112is not sufficiently accurate.
113\end{abstract}
114
115\tableofcontents
116
117\include{introduction}
118
119\include{quadratic}
120
121\include{numericalderivative}
122
123\include{complexdivision}
124
125\include{conclusion}
126
127\clearpage
128
129%% Bibliography
130
131
132\addcontentsline{toc}{section}{Bibliography}
133\bibliographystyle{plain}
134\bibliography{scilabisnotnaive}
135
136
137\end{document}
138