summaryrefslogtreecommitdiffstats
path: root/scilab_doc
diff options
context:
space:
mode:
authorMichaŽl Baudin <michael.baudin@scilab.org>2009-03-05 11:59:48 +0100
committerMichaŽl Baudin <michael.baudin@scilab.org>2009-03-05 11:59:48 +0100
commit32542f089c085323922e89343b6bfcf74835db07 (patch)
treef7f57db3d045a024a52f608af4f411b26fb749ef /scilab_doc
parent8c70f93306f99b0751f0cf7021c869af272881c7 (diff)
downloadscilab-32542f089c085323922e89343b6bfcf74835db07.zip
scilab-32542f089c085323922e89343b6bfcf74835db07.tar.gz
Added some -not-so-funny- examples
Diffstat (limited to 'scilab_doc')
-rw-r--r--scilab_doc/scilabisnotnaive/introduction.tex41
-rw-r--r--scilab_doc/scilabisnotnaive/numericalderivative.tex17
-rw-r--r--scilab_doc/scilabisnotnaive/pythagoreansum.tex180
-rw-r--r--scilab_doc/scilabisnotnaive/quadratic.tex23
-rw-r--r--scilab_doc/scilabisnotnaive/references.tex10
-rw-r--r--scilab_doc/scilabisnotnaive/scilabisnotnaive.bib116
-rw-r--r--scilab_doc/scilabisnotnaive/scilabisnotnaive.pdfbin299187 -> 313874 bytes
-rw-r--r--scilab_doc/scilabisnotnaive/scilabisnotnaive.tex7
-rw-r--r--scilab_doc/scilabisnotnaive/simpleexperiments.tex149
9 files changed, 520 insertions, 23 deletions
diff --git a/scilab_doc/scilabisnotnaive/introduction.tex b/scilab_doc/scilabisnotnaive/introduction.tex
index b697405..db7affb 100644
--- a/scilab_doc/scilabisnotnaive/introduction.tex
+++ b/scilab_doc/scilabisnotnaive/introduction.tex
@@ -5,12 +5,42 @@ While most mathematic books deals with exact formulas,
5Scilab uses algorithms which are specifically designed for 5Scilab uses algorithms which are specifically designed for
6computers. 6computers.
7 7
8As a practical example of the problem considered in this
9document, consider the following experiments. The following is an example of
10a Scilab 5.1 session, where we compute 0.1 by two ways.
11
12\begin{verbatim}
13-->format(25)
14-->0.1
15 ans =
16 0.1000000000000000055511
17-->1.0-0.9
18 ans =
19 0.0999999999999999777955
20\end{verbatim}
21
22I guess that for a person who has never heard of these problems,
23this experiment may be a shock. To get things clearer, let's
24check that the sinus function is approximated.
25
26\begin{verbatim}
27-->format(25)
28-->sin(0.0)
29 ans =
30 0.
31-->sin(%pi)
32 ans =
33 0.0000000000000001224647
34\end{verbatim}
35
8The difficulty is generated by the fact that, while 36The difficulty is generated by the fact that, while
9the mathematics treat with \emph{real} numbers, the 37the mathematics treat with \emph{real} numbers, the
10computer deals with their \emph{floating point representations}. 38computer deals with their \emph{floating point representations}.
11This is the difference between the 39This is the difference between the
12\emph{naive}, mathematical, approach, and the \emph{numerical}, 40\emph{naive}, mathematical, approach, and the \emph{numerical},
13floating-point aware, implementation. 41floating-point aware, implementation.
42(The detailed explanations of the previous examples are presented
43in the appendix of this document.)
14 44
15In this article, we will show examples of these problems by 45In this article, we will show examples of these problems by
16using the following theoric and experimental approach. 46using the following theoric and experimental approach.
@@ -36,18 +66,11 @@ digits in the computed value $x_c$. For example, if the relative
36error $e_r=10^{-6}$, then the number of significant digits is 6. 66error $e_r=10^{-6}$, then the number of significant digits is 6.
37 67
38When the expected value is zero, the relative error cannot 68When the expected value is zero, the relative error cannot
39be computed, and we then use the absolute error 69be computed, and we then use the absolute error instead
40\begin{eqnarray} 70\begin{eqnarray}
41e_a=|x_c-x_e|. 71e_a=|x_c-x_e|.
42\end{eqnarray} 72\end{eqnarray}
43 73
44A central reference on this subject is the article
45by Goldberg, "What Every Computer Scientist Should Know About Floating-Point Arithmetic",
46\cite{WhatEveryComputerScientist}.
47If one focuses on numerical algorithms, the "Numerical Recipes" \cite{NumericalRecipes},
48is another good sources of solutions for that problem.
49The work of Kahan is also central in this domain, for example \cite{Kahan2004}.
50
51Before getting into the details, it is important to 74Before getting into the details, it is important to
52know that real variables in the Scilab language are stored in 75know that real variables in the Scilab language are stored in
53\emph{double precision} variables. Since Scilab is 76\emph{double precision} variables. Since Scilab is
@@ -56,3 +79,5 @@ variables are stored with 64 bits precision.
56As we shall see later, this has a strong influence on the 79As we shall see later, this has a strong influence on the
57results. 80results.
58 81
82
83
diff --git a/scilab_doc/scilabisnotnaive/numericalderivative.tex b/scilab_doc/scilabisnotnaive/numericalderivative.tex
index c95a3b3..7ce0768 100644
--- a/scilab_doc/scilabisnotnaive/numericalderivative.tex
+++ b/scilab_doc/scilabisnotnaive/numericalderivative.tex
@@ -12,12 +12,6 @@ In the third part, we analyse
12why and how floating point numbers must be taken into account when the 12why and how floating point numbers must be taken into account when the
13numerical derivatives are to compute. 13numerical derivatives are to compute.
14 14
15A reference for numerical derivates
16is \cite{AbramowitzStegun1972}, chapter 25. "Numerical Interpolation,
17Differentiation and Integration" (p. 875).
18The webpage \cite{schimdtnd} and the book \cite{NumericalRecipes} give
19results about the rounding errors.
20
21\subsection{Theory} 15\subsection{Theory}
22 16
23The basic result is the Taylor formula with one variable \cite{dixmier} 17The basic result is the Taylor formula with one variable \cite{dixmier}
@@ -307,7 +301,7 @@ a function of one variable is presented in figure \ref{robust-numericalderivativ
307\label{robust-numericalderivative} 301\label{robust-numericalderivative}
308\end{figure} 302\end{figure}
309 303
310\subsection{One step further} 304\subsection{One more step}
311 305
312In this section, we analyse the behaviour of \emph{derivative} 306In this section, we analyse the behaviour of \emph{derivative}
313when the point $x$ is either large $x \rightarrow \infty$, 307when the point $x$ is either large $x \rightarrow \infty$,
@@ -426,3 +420,12 @@ large values of $x$, but produces a slightly sub-optimal step size when
426$x$ is near 1. The behaviour near zero is the same, i.e. both commands 420$x$ is near 1. The behaviour near zero is the same, i.e. both commands
427produce wrong results when $x \rightarrow 0$ and $x\neq 0$. 421produce wrong results when $x \rightarrow 0$ and $x\neq 0$.
428 422
423\subsection{References}
424
425A reference for numerical derivates
426is \cite{AbramowitzStegun1972}, chapter 25. "Numerical Interpolation,
427Differentiation and Integration" (p. 875).
428The webpage \cite{schimdtnd} and the book \cite{NumericalRecipes} give
429results about the rounding errors.
430
431
diff --git a/scilab_doc/scilabisnotnaive/pythagoreansum.tex b/scilab_doc/scilabisnotnaive/pythagoreansum.tex
new file mode 100644
index 0000000..5833c69
--- /dev/null
+++ b/scilab_doc/scilabisnotnaive/pythagoreansum.tex
@@ -0,0 +1,180 @@
1\section{The Pythagorean sum}
2
3In this section, we analyse the computation of the Pythagorean sum,
4which is used in two different computations, that is the norm of a complex
5number and the 2-norm of a vector of real values.
6
7In the first part, we briefly present the mathematical formulas for these
8two computations.
9We then present the naÔve algorithm based on these mathematical formulas.
10In the second part, we make some experiments in Scilab and compare our
11naÔve algorithm with the \emph{abs} and \emph{norm} Scilab primitives.
12In the third part, we analyse
13why and how floating point numbers must be taken into account when the
14Pythagorean sum is to compute.
15
16\subsection{Theory}
17
18\subsection{Experiments}
19
20% TODO : compare both abs and norm.
21
22\lstset{language=Scilab}
23\lstset{numbers=left}
24\lstset{basicstyle=\footnotesize}
25\lstset{keywordstyle=\bfseries}
26\begin{lstlisting}
27// Straitforward implementation
28function mn2 = mynorm2(a,b)
29 mn2 = sqrt(a^2+b^2)
30endfunction
31// With scaling
32function mn2 = mypythag1(a,b)
33 if (a==0.0) then
34 mn2 = abs(b);
35 elseif (b==0.0) then
36 mn2 = abs(a);
37 else
38 if (abs(b)>abs(a)) then
39 r = a/b;
40 t = abs(b);
41 else
42 r = b/a;
43 t = abs(a);
44 end
45 mn2 = t * sqrt(1 + r^2);
46 end
47endfunction
48// With Moler & Morrison's
49// At most 7 iterations are required.
50function mn2 = mypythag2(a,b)
51 p = max(abs(a),abs(b))
52 q = min(abs(a),abs(b))
53 //index = 0
54 while (q<>0.0)
55 //index = index + 1
56 //mprintf("index = %d, p = %e, q = %e\n",index,p,q)
57 r = (q/p)^2
58 s = r/(4+r)
59 p = p + 2*s*p
60 q = s * q
61 end
62 mn2 = p
63endfunction
64function compare(x)
65 mprintf("Re(x)=%e, Im(x)=%e\n",real(x),imag(x));
66 p = abs(x);
67 mprintf("%20s : %e\n","Scilab",p);
68 p = mynorm2(real(x),imag(x));
69 mprintf("%20s : %e\n","Naive",p);
70 p = mypythag1(real(x),imag(x));
71 mprintf("%20s : %e\n","Scaling",p);
72 p = mypythag2(real(x),imag(x));
73 mprintf("%20s : %e\n","Moler & Morrison",p);
74endfunction
75// Test #1 : all is fine
76x = 1 + 1 * %i;
77compare(x);
78// Test #2 : more difficult when x is large
79x = 1.e200 + 1 * %i;
80compare(x);
81// Test #3 : more difficult when x is small
82x = 1.e-200 + 1.e-200 * %i;
83compare(x);
84\end{lstlisting}
85
86\begin{verbatim}
87***************************************
88Example #1 : simple computation with Scilab 5.1
89x(1)=1.000000e+000, x(2)=1.000000e+000
90 Scilab : 1.414214e+000
91 Naive : 1.414214e+000
92 Scaling : 1.414214e+000
93 Moler & Morrison : 1.414214e+000
94***************************************
95Example #2 : with large numbers ?
96 Scilab : Inf
97 Naive : Inf
98 Scaling : 1.000000e+200
99 Moler & Morrison : 1.000000e+200
100***************************************
101Example #3 : with small numbers ?
102x(1)=1.000000e-200, x(2)=1.000000e-200
103 Scilab : 0.000000e+000
104 Naive : 0.000000e+000
105 Scaling : 1.414214e-200
106 Moler & Morrison : 1.414214e-200
107***************************************
108> Conclusion : Scilab is naive !
109Octave 3.0.3
110***************************************
111octave-3.0.3.exe:29> compare(x);
112***************************************
113x(1)=1.000000e+000, x(2)=1.000000e+000
114 Octave : 1.414214e+000
115 Naive : 1.414214e+000
116 Scaling : 1.414214e+000
117 Moler & Morrison : 1.414214e+000
118***************************************
119x(1)=1.000000e+200, x(2)=1.000000e+000
120 Octave : 1.000000e+200
121 Naive : Inf
122 Scaling : 1.000000e+200
123 Moler & Morrison : 1.000000e+200
124***************************************
125octave-3.0.3.exe:33> compare(x)
126x(1)=1.000000e-200, x(2)=1.000000e-200
127 Octave : 1.414214e-200
128 Naive : 0.000000e+000
129 Scaling : 1.414214e-200
130 Moler & Morrison : 1.414214e-200
131***************************************
132> Conclusion : Octave is not naive !
133
134With complex numbers.
135***************************************
136
137Re(x)=1.000000e+000, Im(x)=1.000000e+000
138 Scilab : 1.414214e+000
139 Naive : 1.414214e+000
140 Scaling : 1.414214e+000
141 Moler & Morrison : 1.414214e+000
142***************************************
143Re(x)=1.000000e+200, Im(x)=1.000000e+000
144 Scilab : 1.000000e+200
145 Naive : Inf
146 Scaling : 1.000000e+200
147 Moler & Morrison : 1.000000e+200
148***************************************
149Re(x)=1.000000e-200, Im(x)=1.000000e-200
150 Scilab : 1.414214e-200
151 Naive : 0.000000e+000
152 Scaling : 1.414214e-200
153 Moler & Morrison : 1.414214e-200
154***************************************
155> Conclusion : Scilab is not naive !
156\end{verbatim}
157
158\subsection{Explanations}
159
160\subsection{References}
161
162The paper by Moler and Morrisson 1983 \cite{journals/ibmrd/MolerM83} gives an
163algorithm to compute the Pythagorean sum $a\oplus b = \sqrt{a^2 + b^2}$
164without computing their squares or their square roots. Their algorithm is based on a cubically
165convergent sequence.
166The BLAS linear algebra suite of routines \cite{900236} includes the SNRM2, DNRM2
167and SCNRM2 routines which conpute the euclidian norm of a vector.
168These routines are based on Blue \cite{355771} and Cody \cite{Cody:1971:SEF}.
169In his 1978 paper \cite{355771}, James Blue gives an algorithm to compute the
170Euclidian norm of a n-vector $\|x\| = \sqrt{\sum_{i=1,n}x_i^2}$.
171The exceptionnal values of the \emph{hypot} operator are defined as the
172Pythagorean sum in the IEEE 754 standard \cite{P754:2008:ISF,ieee754-1985}.
173The \emph{\_\_ieee754\_hypot(x,y)} C function is implemented in the
174Fdlibm software library \cite{fdlibm} developed by Sun Microsystems and
175available at netlib. This library is used by Matlab \cite{matlab-hypot}
176and its \emph{hypot} command.
177
178
179
180
diff --git a/scilab_doc/scilabisnotnaive/quadratic.tex b/scilab_doc/scilabisnotnaive/quadratic.tex
index 122e80d..1a888f9 100644
--- a/scilab_doc/scilabisnotnaive/quadratic.tex
+++ b/scilab_doc/scilabisnotnaive/quadratic.tex
@@ -229,12 +229,6 @@ IEEE overflow exception and set the result as \emph{Inf}.
229 229
230\subsection{Explanations} 230\subsection{Explanations}
231 231
232The technical report by G. Forsythe \cite{Forsythe1966} is
233especially interesting on that subject. The paper by Goldberg
234\cite{WhatEveryComputerScientist} is also a good reference for the
235quadratic equation. One can also consult the experiments performed by
236Nievergelt in \cite{Nievergelt2003}.
237
238The following tricks are extracted from the 232The following tricks are extracted from the
239\emph{quad} routine of the \emph{RPOLY} algorithm by 233\emph{quad} routine of the \emph{RPOLY} algorithm by
240Jenkins \cite{Jenkins1975}. This algorithm is used by Scilab in the 234Jenkins \cite{Jenkins1975}. This algorithm is used by Scilab in the
@@ -498,7 +492,20 @@ should limit the possible overflows since $|b'/c| < 1$.
498These normalization tricks are similar to the one used by Smith in the 492These normalization tricks are similar to the one used by Smith in the
499algorithm for the division of complex numbers \cite{Smith1962}. 493algorithm for the division of complex numbers \cite{Smith1962}.
500 494
501 495\subsection{References}
502 496
497The 1966 technical report by G. Forsythe \cite{Forsythe1966}
498presents the floating point system and the possible large error
499in using mathematical algorithms blindly. An accurate way of solving
500a quadratic is outlined. A few general remarks are made about
501computational mathematics. The 1991 paper by Goldberg
502\cite{WhatEveryComputerScientist} is a general presentation of the floating
503point system and its consequences. It begins with background on floating point
504representation and rounding errors, continues with a discussion
505of the IEEE floating point standard and concludes with examples of how
506computer system builders can better support floating point. The section
5071.4, "Cancellation" specificaly consider the computation of the roots
508of a quadratic equation.
509One can also consult the experiments performed by Nievergelt in \cite{Nievergelt2003}.
503 510
504 511
diff --git a/scilab_doc/scilabisnotnaive/references.tex b/scilab_doc/scilabisnotnaive/references.tex
new file mode 100644
index 0000000..8f7463e
--- /dev/null
+++ b/scilab_doc/scilabisnotnaive/references.tex
@@ -0,0 +1,10 @@
1\section{References}
2
3A central reference on this subject is the article
4by Goldberg, "What Every Computer Scientist Should Know About Floating-Point Arithmetic",
5\cite{WhatEveryComputerScientist}.
6If one focuses on numerical algorithms, the "Numerical Recipes" \cite{NumericalRecipes},
7is another good sources of solutions for that problem.
8The work of Kahan is also central in this domain, for example \cite{Kahan2004}.
9
10
diff --git a/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib b/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib
index 6de801c..36384a0 100644
--- a/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib
+++ b/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib
@@ -241,3 +241,119 @@ publisher= {Third Edition, Addison Wesley, Reading, MA}}
241 bibsource = {DBLP, http://dblp.uni-trier.de} 241 bibsource = {DBLP, http://dblp.uni-trier.de}
242} 242}
243 243
244 @article{journals/ibmrd/MolerM83,
245title = {Replacing Square Roots by Pythagorean Sums.},
246author = {Cleve B. Moler and Donald Morrison},
247journal = {IBM Journal of Research and Development},
248number = {6},
249pages = {577-581},
250url = {http://dblp.uni-trier.de/db/journals/ibmrd/ibmrd27.html#MolerM83},
251volume = {27},
252year = {1983},
253description = {dblp},
254date = {2002-01-03},
255keywords = {dblp }
256}
257
258@techreport{900236,
259 author = {Lawson,, C. L. and Hanson,, R. J. and Kincaid,, D. R. and Krogh,, F. T.},
260 title = {Basic Linear Algebra Subprograms for FORTRAN Usage},
261 year = {1977},
262 source = {http://www.ncstrl.org:8900/ncstrl/servlet/search?formname=detail\&id=oai%3Ancstrlh%3Autexas_cs%3AUTEXAS_CS%2F%2FCS-TR-77-72},
263 publisher = {University of Texas at Austin},
264 address = {Austin, TX, USA},
265 }
266
267@article{355771,
268 author = {Blue,, James L.},
269 title = {A Portable Fortran Program to Find the Euclidean Norm of a Vector},
270 journal = {ACM Trans. Math. Softw.},
271 volume = {4},
272 number = {1},
273 year = {1978},
274 issn = {0098-3500},
275 pages = {15--23},
276 doi = {http://doi.acm.org/10.1145/355769.355771},
277 publisher = {ACM},
278 address = {New York, NY, USA},
279 }
280@InCollection{Cody:1971:SEF,
281 author = "W. J. Cody",
282 title = "Software for the Elementary Functions",
283 crossref = "Rice:1971:MS",
284 pages = "171--186",
285 year = "1971",
286 bibdate = "Thu Sep 15 18:56:47 1994",
287 bibsource = "garbo.uwasa.fi:/pc/doc-soft/fpbiblio.txt",
288 acknowledgement = ack-nj,
289}
290
291@techreport{ieee754-1985,
292 author = {Stevenson, David },
293 booktitle = {ANSI/IEEE Std 754-1985},
294 citeulike-article-id = {1677576},
295 journal = {ANSI/IEEE Std 754-1985},
296 keywords = {floating-point, para08},
297 month = {August},
298 posted-at = {2008-04-21 12:47:40},
299 priority = {1},
300 title = {IEEE standard for binary floating-point arithmetic},
301 url = {http://ieeexplore.ieee.org/xpls/abs\_all.jsp?arnumber=30711},
302 year = {1985}
303}
304
305@Book{P754:2008:ISF,
306 author = "{IEEE Task P754}",
307 title = "{IEEE 754-2008, Standard for Floating-Point
308 Arithmetic}",
309 publisher = pub-IEEE-STD,
310 address = pub-IEEE-STD:adr,
311 pages = "58",
312 day = "29",
313 month = aug,
314 year = "2008",
315 DOI = "http://doi.ieeecomputersociety.org/10.1109/IEEESTD.2008.4610935",
316 ISBN = "0-7381-5753-8 (paper), 0-7381-5752-X (electronic)",
317 ISBN-13 = "978-0-7381-5753-5 (paper), 978-0-7381-5752-8
318 (electronic)",
319 bibdate = "Thu Sep 25 09:50:30 2008",
320 URL = "http://ieeexplore.ieee.org/servlet/opac?punumber=4610933;
321 http://en.wikipedia.org/wiki/IEEE_754-2008",
322 abstract = "This standard specifies interchange and arithmetic
323 formats and methods for binary and decimal
324 floating-point arithmetic in computer programming
325 environments. This standard specifies exception
326 conditions and their default handling. An
327 implementation of a floating-point system conforming to
328 this standard may be realized entirely in software,
329 entirely in hardware, or in any combination of software
330 and hardware. For operations specified in the normative
331 part of this standard, numerical results and exceptions
332 are uniquely determined by the values of the input
333 data, sequence of operations, and destination formats,
334 all under user control.",
335 acknowledgement = ack-nhfb,
336}
337
338@article{matlab-hypot,
339note={\url{http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/ref/hypot.html&http://www.google.fr/search?q=hypot&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:fr:official&client=firefox-a}},
340title={Matlab - hypot : square root of sum of squares},
341}
342
343@article{fdlibm,
344title = {A Freely Distributable C Math Library},
345author = {Sun Microsystems, Inc.},
346year = {1993},
347note = {\url{http://www.netlib.org/fdlibm}},
348}
349
350@book{261217,
351 author = {Muller,, Jean-Michel},
352 title = {Elementary functions: algorithms and implementation},
353 year = {1997},
354 isbn = {0-8176-3990-X},
355 publisher = {Birkhauser Boston, Inc.},
356 address = {Secaucus, NJ, USA},
357 }
358
359
diff --git a/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf b/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf
index c7e1f0d..7ef8eaa 100644
--- a/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf
+++ b/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf
Binary files differ
diff --git a/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex b/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex
index 0c35b37..aced19f 100644
--- a/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex
+++ b/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex
@@ -122,10 +122,17 @@ is not sufficiently accurate.
122 122
123\include{complexdivision} 123\include{complexdivision}
124 124
125
125\include{conclusion} 126\include{conclusion}
126 127
127\clearpage 128\clearpage
128 129
130\appendix
131
132\include{simpleexperiments}
133
134%\include{pythagoreansum}
135
129%% Bibliography 136%% Bibliography
130 137
131 138
diff --git a/scilab_doc/scilabisnotnaive/simpleexperiments.tex b/scilab_doc/scilabisnotnaive/simpleexperiments.tex
new file mode 100644
index 0000000..bea9803
--- /dev/null
+++ b/scilab_doc/scilabisnotnaive/simpleexperiments.tex
@@ -0,0 +1,149 @@
1\section{Simple experiments}
2
3In this section, we analyse the examples given in the introduction of this
4article.
5
6\subsection{Why $0.1$ is rounded}
7
8In this section, we present a brief explanation for the
9following Scilab session.
10
11\begin{verbatim}
12-->format(25)
13-->x1=0.1
14 x1 =
15 0.1000000000000000055511
16-->x2 = 1.0-0.9
17 x2 =
18 0.0999999999999999777955
19-->x1==x2
20 ans =
21 F
22\end{verbatim}
23
24In fact, only the 17 first digits $0.100000000000000005$ are
25significant and the last digits are a artifact of Scilab's
26displaying system.
27
28The number $0.1$ can be represented as the normalized number
29$1.0 \times 10^{-1}$. But the binary floating point representation
30of $0.1$ is approximately \cite{WhatEveryComputerScientist}
31$1.100110011001100110011001... \times 2^{-4}$. As you see, the decimal
32representation is made of a finite number of digits while the
33binary representation is made of an infinite sequence of
34digits. Because Scilab computations are based on double precision numbers
35and because that numbers only have 64 bits to represent the number,
36some \emph{rounding} must be performed.
37
38In our example, it happens that $0.1$ falls between two
39different binary floating point numbers. After rounding,
40the binary floating point number is associated with the decimal
41representation "0.100000000000000005", that is "rounding up"
42in this case. On the other side, $0.9$ is also not representable
43as an exact binary floating point number (but 1.0 is exactly represented).
44It happens that, after the substraction "1.0-0.9", the decimal representation of the
45result is "0.09999999999999997", which is different from the rounded
46value of $0.1$.
47
48\subsection{Why $sin(\pi)$ is rounded}
49
50In this section, we present a brief explanation of the following
51Scilab 5.1 session, where the function sinus is applied to the
52number $\pi$.
53
54\begin{verbatim}
55-->format(25)
56-->sin(0.0)
57 ans =
58 0.
59-->sin(%pi)
60 ans =
61 0.0000000000000001224647
62\end{verbatim}
63
64Two kinds of approximations are associated with the previous
65result
66\begin{itemize}
67\item $\pi=3.1415926...$ is approximated by Scilab
68as the value returned by $4*atan(1.0)$,
69\item the $sin$ function is approximated by a polynomial.
70\end{itemize}
71
72This article is too short to make a complete presentation
73of the computation of elementary functions. The interested
74reader may consider the direct analysis of the Fdlibm library
75as very instructive \cite{fdlibm}.
76The "Elementary Functions" book by Muller \cite{261217}
77is a complete reference on this subject.
78
79In Scilab, the "sin" function is directly performed by a
80fortran source code (sci\_f\_sin.f) and no additionnal
81algorithm is performed directly by Scilab.
82At the compiler level, though, the "sin" function is
83provided by a library which is compiler-dependent.
84The main structure of the algorithm which computes
85"in" is probably the following
86
87\begin{itemize}
88\item scale the input $x$ so that in lies in a restricted
89interval,
90\item use a polynomial approximation of the local
91behaviour of "sin" in the neighbourhood of 0, with a guaranteed
92precision.
93\end{itemize}
94
95In the Fdlibm library for example, the scaling interval is
96$[-\pi/4,\pi/4]$.
97The polynomial approximation of the sine function has the general form
98
99\begin{eqnarray}
100sin(x) &\approx& x + a_3x^3 + \ldots + a_{2n+1} x^{2n+1}\\
101&\approx & x + x^3 p(x^2)
102\end{eqnarray}
103
104In the Fdlibm library, 6 terms are used.
105
106For the inverse tan "atan" function, which is
107used to compute an approximated value of $\pi$, the process is the same.
108All these operations are guaranteed with some precision.
109For example, suppose that the functions are guaranteed with 14 significant
110digits. That means that 17-14 + 1 = 3 digits may be rounded in the process.
111In our current example, the value of $sin(\pi)$ is approximated
112with 17 digits after the point as "0.00000000000000012". That means that
1132 digits have been rounded.
114
115\subsection{One more step}
116
117In fact, it is possible to reduce the number of
118significant digits of the sine function to as low as 0 significant digits.
119The mathematical theory is $sin(2^n \pi) = 0$, but that is not true with
120floating point numbers. In the following Scilab session, we
121
122\begin{verbatim}
123-->for i = 1:5
124-->k=10*i;
125-->n = 2^k;
126-->sin(n*%pi)
127-->end
128 ans =
129 - 0.0000000000001254038322
130 ans =
131 - 0.0000000001284135242063
132 ans =
133 - 0.0000001314954487872237
134 ans =
135 - 0.0001346513391512239052
136 ans =
137 - 0.1374464882277985633419
138\end{verbatim}
139
140For $sin(2^{50})$, all significant digits are lost. This computation
141may sound \emph{extreme}, but it must be noticed that it is inside the
142double precision possibility, since $2^{50} \approx 3.10^{15} \ll 10^{308}$.
143The solution may be to use multiple precision numbers, such as in the
144Gnu Multiple Precision system.
145
146If you know a better algorithm, based on double precision only,
147which allows to compute accurately such kind of values, the Scilab
148team will surely be interested to hear from you !
149