summaryrefslogtreecommitdiffstats log msg author committer range
path: root/scilab_doc
diff options
 context: 12345678910152025303540 space: includeignore mode: unifiedssdiffstat only
author committer Micha�l Baudin 2009-03-05 11:59:48 +0100 Micha�l Baudin 2009-03-05 11:59:48 +0100 32542f089c085323922e89343b6bfcf74835db07 (patch) f7f57db3d045a024a52f608af4f411b26fb749ef /scilab_doc 8c70f93306f99b0751f0cf7021c869af272881c7 (diff) scilab-32542f089c085323922e89343b6bfcf74835db07.zipscilab-32542f089c085323922e89343b6bfcf74835db07.tar.gz
 diff --git a/scilab_doc/scilabisnotnaive/introduction.tex b/scilab_doc/scilabisnotnaive/introduction.texindex b697405..db7affb 100644--- a/scilab_doc/scilabisnotnaive/introduction.tex+++ b/scilab_doc/scilabisnotnaive/introduction.tex @@ -5,12 +5,42 @@ While most mathematic books deals with exact formulas, 5 Scilab uses algorithms which are specifically designed for 5 Scilab uses algorithms which are specifically designed for 6 computers. 6 computers. 7 7 8 As a practical example of the problem considered in this 9 document, consider the following experiments. The following is an example of 10 a Scilab 5.1 session, where we compute 0.1 by two ways. 11 12 \begin{verbatim} 13 -->format(25) 14 -->0.1 15 ans = 16 0.1000000000000000055511 17 -->1.0-0.9 18 ans = 19 0.0999999999999999777955 20 \end{verbatim} 21 22 I guess that for a person who has never heard of these problems, 23 this experiment may be a shock. To get things clearer, let's 24 check that the sinus function is approximated. 25 26 \begin{verbatim} 27 -->format(25) 28 -->sin(0.0) 29 ans = 30 0. 31 -->sin(%pi) 32 ans = 33 0.0000000000000001224647 34 \end{verbatim} 35 8 The difficulty is generated by the fact that, while 36 The difficulty is generated by the fact that, while 9 the mathematics treat with \emph{real} numbers, the 37 the mathematics treat with \emph{real} numbers, the 10 computer deals with their \emph{floating point representations}. 38 computer deals with their \emph{floating point representations}. 11 This is the difference between the 39 This is the difference between the 12 \emph{naive}, mathematical, approach, and the \emph{numerical}, 40 \emph{naive}, mathematical, approach, and the \emph{numerical}, 13 floating-point aware, implementation. 41 floating-point aware, implementation. 42 (The detailed explanations of the previous examples are presented 43 in the appendix of this document.) 14 44 15 In this article, we will show examples of these problems by 45 In this article, we will show examples of these problems by 16 using the following theoric and experimental approach. 46 using the following theoric and experimental approach. @@ -36,18 +66,11 @@ digits in the computed value $x_c$. For example, if the relative 36 error $e_r=10^{-6}$, then the number of significant digits is 6. 66 error $e_r=10^{-6}$, then the number of significant digits is 6. 37 67 38 When the expected value is zero, the relative error cannot 68 When the expected value is zero, the relative error cannot 39 be computed, and we then use the absolute error 69 be computed, and we then use the absolute error instead 40 \begin{eqnarray} 70 \begin{eqnarray} 41 e_a=|x_c-x_e|. 71 e_a=|x_c-x_e|. 42 \end{eqnarray} 72 \end{eqnarray} 43 73 44 A central reference on this subject is the article 45 by Goldberg, "What Every Computer Scientist Should Know About Floating-Point Arithmetic", 46 \cite{WhatEveryComputerScientist}. 47 If one focuses on numerical algorithms, the "Numerical Recipes" \cite{NumericalRecipes}, 48 is another good sources of solutions for that problem. 49 The work of Kahan is also central in this domain, for example \cite{Kahan2004}. 50 51 Before getting into the details, it is important to 74 Before getting into the details, it is important to 52 know that real variables in the Scilab language are stored in 75 know that real variables in the Scilab language are stored in 53 \emph{double precision} variables. Since Scilab is 76 \emph{double precision} variables. Since Scilab is @@ -56,3 +79,5 @@ variables are stored with 64 bits precision. 56 As we shall see later, this has a strong influence on the 79 As we shall see later, this has a strong influence on the 57 results. 80 results. 58 81 82 83 diff --git a/scilab_doc/scilabisnotnaive/numericalderivative.tex b/scilab_doc/scilabisnotnaive/numericalderivative.texindex c95a3b3..7ce0768 100644--- a/scilab_doc/scilabisnotnaive/numericalderivative.tex+++ b/scilab_doc/scilabisnotnaive/numericalderivative.tex @@ -12,12 +12,6 @@ In the third part, we analyse 12 why and how floating point numbers must be taken into account when the 12 why and how floating point numbers must be taken into account when the 13 numerical derivatives are to compute. 13 numerical derivatives are to compute. 14 14 15 A reference for numerical derivates 16 is \cite{AbramowitzStegun1972}, chapter 25. "Numerical Interpolation, 17 Differentiation and Integration" (p. 875). 18 The webpage \cite{schimdtnd} and the book \cite{NumericalRecipes} give 19 results about the rounding errors. 20 21 \subsection{Theory} 15 \subsection{Theory} 22 16 23 The basic result is the Taylor formula with one variable \cite{dixmier} 17 The basic result is the Taylor formula with one variable \cite{dixmier} @@ -307,7 +301,7 @@ a function of one variable is presented in figure \ref{robust-numericalderivativ 307 \label{robust-numericalderivative} 301 \label{robust-numericalderivative} 308 \end{figure} 302 \end{figure} 309 303 310 \subsection{One step further} 304 \subsection{One more step} 311 305 312 In this section, we analyse the behaviour of \emph{derivative} 306 In this section, we analyse the behaviour of \emph{derivative} 313 when the point $x$ is either large $x \rightarrow \infty$, 307 when the point $x$ is either large $x \rightarrow \infty$, @@ -426,3 +420,12 @@ large values of $x$, but produces a slightly sub-optimal step size when 426 $x$ is near 1. The behaviour near zero is the same, i.e. both commands 420 $x$ is near 1. The behaviour near zero is the same, i.e. both commands 427 produce wrong results when $x \rightarrow 0$ and $x\neq 0$. 421 produce wrong results when $x \rightarrow 0$ and $x\neq 0$. 428 422 423 \subsection{References} 424 425 A reference for numerical derivates 426 is \cite{AbramowitzStegun1972}, chapter 25. "Numerical Interpolation, 427 Differentiation and Integration" (p. 875). 428 The webpage \cite{schimdtnd} and the book \cite{NumericalRecipes} give 429 results about the rounding errors. 430 431 diff --git a/scilab_doc/scilabisnotnaive/pythagoreansum.tex b/scilab_doc/scilabisnotnaive/pythagoreansum.texnew file mode 100644index 0000000..5833c69--- /dev/null+++ b/scilab_doc/scilabisnotnaive/pythagoreansum.tex @@ -0,0 +1,180 @@ 1 \section{The Pythagorean sum} 2 3 In this section, we analyse the computation of the Pythagorean sum, 4 which is used in two different computations, that is the norm of a complex 5 number and the 2-norm of a vector of real values. 6 7 In the first part, we briefly present the mathematical formulas for these 8 two computations. 9 We then present the na�ve algorithm based on these mathematical formulas. 10 In the second part, we make some experiments in Scilab and compare our 11 na�ve algorithm with the \emph{abs} and \emph{norm} Scilab primitives. 12 In the third part, we analyse 13 why and how floating point numbers must be taken into account when the 14 Pythagorean sum is to compute. 15 16 \subsection{Theory} 17 18 \subsection{Experiments} 19 20 % TODO : compare both abs and norm. 21 22 \lstset{language=Scilab} 23 \lstset{numbers=left} 24 \lstset{basicstyle=\footnotesize} 25 \lstset{keywordstyle=\bfseries} 26 \begin{lstlisting} 27 // Straitforward implementation 28 function mn2 = mynorm2(a,b) 29 mn2 = sqrt(a^2+b^2) 30 endfunction 31 // With scaling 32 function mn2 = mypythag1(a,b) 33 if (a==0.0) then 34 mn2 = abs(b); 35 elseif (b==0.0) then 36 mn2 = abs(a); 37 else 38 if (abs(b)>abs(a)) then 39 r = a/b; 40 t = abs(b); 41 else 42 r = b/a; 43 t = abs(a); 44 end 45 mn2 = t * sqrt(1 + r^2); 46 end 47 endfunction 48 // With Moler & Morrison's 49 // At most 7 iterations are required. 50 function mn2 = mypythag2(a,b) 51 p = max(abs(a),abs(b)) 52 q = min(abs(a),abs(b)) 53 //index = 0 54 while (q<>0.0) 55 //index = index + 1 56 //mprintf("index = %d, p = %e, q = %e\n",index,p,q) 57 r = (q/p)^2 58 s = r/(4+r) 59 p = p + 2*s*p 60 q = s * q 61 end 62 mn2 = p 63 endfunction 64 function compare(x) 65 mprintf("Re(x)=%e, Im(x)=%e\n",real(x),imag(x)); 66 p = abs(x); 67 mprintf("%20s : %e\n","Scilab",p); 68 p = mynorm2(real(x),imag(x)); 69 mprintf("%20s : %e\n","Naive",p); 70 p = mypythag1(real(x),imag(x)); 71 mprintf("%20s : %e\n","Scaling",p); 72 p = mypythag2(real(x),imag(x)); 73 mprintf("%20s : %e\n","Moler & Morrison",p); 74 endfunction 75 // Test #1 : all is fine 76 x = 1 + 1 * %i; 77 compare(x); 78 // Test #2 : more difficult when x is large 79 x = 1.e200 + 1 * %i; 80 compare(x); 81 // Test #3 : more difficult when x is small 82 x = 1.e-200 + 1.e-200 * %i; 83 compare(x); 84 \end{lstlisting} 85 86 \begin{verbatim} 87 *************************************** 88 Example #1 : simple computation with Scilab 5.1 89 x(1)=1.000000e+000, x(2)=1.000000e+000 90 Scilab : 1.414214e+000 91 Naive : 1.414214e+000 92 Scaling : 1.414214e+000 93 Moler & Morrison : 1.414214e+000 94 *************************************** 95 Example #2 : with large numbers ? 96 Scilab : Inf 97 Naive : Inf 98 Scaling : 1.000000e+200 99 Moler & Morrison : 1.000000e+200 100 *************************************** 101 Example #3 : with small numbers ? 102 x(1)=1.000000e-200, x(2)=1.000000e-200 103 Scilab : 0.000000e+000 104 Naive : 0.000000e+000 105 Scaling : 1.414214e-200 106 Moler & Morrison : 1.414214e-200 107 *************************************** 108 > Conclusion : Scilab is naive ! 109 Octave 3.0.3 110 *************************************** 111 octave-3.0.3.exe:29> compare(x); 112 *************************************** 113 x(1)=1.000000e+000, x(2)=1.000000e+000 114 Octave : 1.414214e+000 115 Naive : 1.414214e+000 116 Scaling : 1.414214e+000 117 Moler & Morrison : 1.414214e+000 118 *************************************** 119 x(1)=1.000000e+200, x(2)=1.000000e+000 120 Octave : 1.000000e+200 121 Naive : Inf 122 Scaling : 1.000000e+200 123 Moler & Morrison : 1.000000e+200 124 *************************************** 125 octave-3.0.3.exe:33> compare(x) 126 x(1)=1.000000e-200, x(2)=1.000000e-200 127 Octave : 1.414214e-200 128 Naive : 0.000000e+000 129 Scaling : 1.414214e-200 130 Moler & Morrison : 1.414214e-200 131 *************************************** 132 > Conclusion : Octave is not naive ! 133 134 With complex numbers. 135 *************************************** 136 137 Re(x)=1.000000e+000, Im(x)=1.000000e+000 138 Scilab : 1.414214e+000 139 Naive : 1.414214e+000 140 Scaling : 1.414214e+000 141 Moler & Morrison : 1.414214e+000 142 *************************************** 143 Re(x)=1.000000e+200, Im(x)=1.000000e+000 144 Scilab : 1.000000e+200 145 Naive : Inf 146 Scaling : 1.000000e+200 147 Moler & Morrison : 1.000000e+200 148 *************************************** 149 Re(x)=1.000000e-200, Im(x)=1.000000e-200 150 Scilab : 1.414214e-200 151 Naive : 0.000000e+000 152 Scaling : 1.414214e-200 153 Moler & Morrison : 1.414214e-200 154 *************************************** 155 > Conclusion : Scilab is not naive ! 156 \end{verbatim} 157 158 \subsection{Explanations} 159 160 \subsection{References} 161 162 The paper by Moler and Morrisson 1983 \cite{journals/ibmrd/MolerM83} gives an 163 algorithm to compute the Pythagorean sum $a\oplus b = \sqrt{a^2 + b^2}$ 164 without computing their squares or their square roots. Their algorithm is based on a cubically 165 convergent sequence. 166 The BLAS linear algebra suite of routines \cite{900236} includes the SNRM2, DNRM2 167 and SCNRM2 routines which conpute the euclidian norm of a vector. 168 These routines are based on Blue \cite{355771} and Cody \cite{Cody:1971:SEF}. 169 In his 1978 paper \cite{355771}, James Blue gives an algorithm to compute the 170 Euclidian norm of a n-vector $\|x\| = \sqrt{\sum_{i=1,n}x_i^2}$. 171 The exceptionnal values of the \emph{hypot} operator are defined as the 172 Pythagorean sum in the IEEE 754 standard \cite{P754:2008:ISF,ieee754-1985}. 173 The \emph{\_\_ieee754\_hypot(x,y)} C function is implemented in the 174 Fdlibm software library \cite{fdlibm} developed by Sun Microsystems and 175 available at netlib. This library is used by Matlab \cite{matlab-hypot} 176 and its \emph{hypot} command. 177 178 179 180 diff --git a/scilab_doc/scilabisnotnaive/quadratic.tex b/scilab_doc/scilabisnotnaive/quadratic.texindex 122e80d..1a888f9 100644--- a/scilab_doc/scilabisnotnaive/quadratic.tex+++ b/scilab_doc/scilabisnotnaive/quadratic.tex @@ -229,12 +229,6 @@ IEEE overflow exception and set the result as \emph{Inf}. 229 229 230 \subsection{Explanations} 230 \subsection{Explanations} 231 231 232 The technical report by G. Forsythe \cite{Forsythe1966} is 233 especially interesting on that subject. The paper by Goldberg 234 \cite{WhatEveryComputerScientist} is also a good reference for the 235 quadratic equation. One can also consult the experiments performed by 236 Nievergelt in \cite{Nievergelt2003}. 237 238 The following tricks are extracted from the 232 The following tricks are extracted from the 239 \emph{quad} routine of the \emph{RPOLY} algorithm by 233 \emph{quad} routine of the \emph{RPOLY} algorithm by 240 Jenkins \cite{Jenkins1975}. This algorithm is used by Scilab in the 234 Jenkins \cite{Jenkins1975}. This algorithm is used by Scilab in the @@ -498,7 +492,20 @@ should limit the possible overflows since $|b'/c| < 1$. 498 These normalization tricks are similar to the one used by Smith in the 492 These normalization tricks are similar to the one used by Smith in the 499 algorithm for the division of complex numbers \cite{Smith1962}. 493 algorithm for the division of complex numbers \cite{Smith1962}. 500 494 501 495 \subsection{References} 502 496 497 The 1966 technical report by G. Forsythe \cite{Forsythe1966} 498 presents the floating point system and the possible large error 499 in using mathematical algorithms blindly. An accurate way of solving 500 a quadratic is outlined. A few general remarks are made about 501 computational mathematics. The 1991 paper by Goldberg 502 \cite{WhatEveryComputerScientist} is a general presentation of the floating 503 point system and its consequences. It begins with background on floating point 504 representation and rounding errors, continues with a discussion 505 of the IEEE floating point standard and concludes with examples of how 506 computer system builders can better support floating point. The section 507 1.4, "Cancellation" specificaly consider the computation of the roots 508 of a quadratic equation. 509 One can also consult the experiments performed by Nievergelt in \cite{Nievergelt2003}. 503 510 504 511 diff --git a/scilab_doc/scilabisnotnaive/references.tex b/scilab_doc/scilabisnotnaive/references.texnew file mode 100644index 0000000..8f7463e--- /dev/null+++ b/scilab_doc/scilabisnotnaive/references.tex @@ -0,0 +1,10 @@ 1 \section{References} 2 3 A central reference on this subject is the article 4 by Goldberg, "What Every Computer Scientist Should Know About Floating-Point Arithmetic", 5 \cite{WhatEveryComputerScientist}. 6 If one focuses on numerical algorithms, the "Numerical Recipes" \cite{NumericalRecipes}, 7 is another good sources of solutions for that problem. 8 The work of Kahan is also central in this domain, for example \cite{Kahan2004}. 9 10 diff --git a/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib b/scilab_doc/scilabisnotnaive/scilabisnotnaive.bibindex 6de801c..36384a0 100644--- a/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib+++ b/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib @@ -241,3 +241,119 @@ publisher= {Third Edition, Addison Wesley, Reading, MA}} 241 bibsource = {DBLP, http://dblp.uni-trier.de} 241 bibsource = {DBLP, http://dblp.uni-trier.de} 242 } 242 } 243 243 244 @article{journals/ibmrd/MolerM83, 245 title = {Replacing Square Roots by Pythagorean Sums.}, 246 author = {Cleve B. Moler and Donald Morrison}, 247 journal = {IBM Journal of Research and Development}, 248 number = {6}, 249 pages = {577-581}, 250 url = {http://dblp.uni-trier.de/db/journals/ibmrd/ibmrd27.html#MolerM83}, 251 volume = {27}, 252 year = {1983}, 253 description = {dblp}, 254 date = {2002-01-03}, 255 keywords = {dblp } 256 } 257 258 @techreport{900236, 259 author = {Lawson,, C. L. and Hanson,, R. J. and Kincaid,, D. R. and Krogh,, F. T.}, 260 title = {Basic Linear Algebra Subprograms for FORTRAN Usage}, 261 year = {1977}, 262 source = {http://www.ncstrl.org:8900/ncstrl/servlet/search?formname=detail\&id=oai%3Ancstrlh%3Autexas_cs%3AUTEXAS_CS%2F%2FCS-TR-77-72}, 263 publisher = {University of Texas at Austin}, 264 address = {Austin, TX, USA}, 265 } 266 267 @article{355771, 268 author = {Blue,, James L.}, 269 title = {A Portable Fortran Program to Find the Euclidean Norm of a Vector}, 270 journal = {ACM Trans. Math. Softw.}, 271 volume = {4}, 272 number = {1}, 273 year = {1978}, 274 issn = {0098-3500}, 275 pages = {15--23}, 276 doi = {http://doi.acm.org/10.1145/355769.355771}, 277 publisher = {ACM}, 278 address = {New York, NY, USA}, 279 } 280 @InCollection{Cody:1971:SEF, 281 author = "W. J. Cody", 282 title = "Software for the Elementary Functions", 283 crossref = "Rice:1971:MS", 284 pages = "171--186", 285 year = "1971", 286 bibdate = "Thu Sep 15 18:56:47 1994", 287 bibsource = "garbo.uwasa.fi:/pc/doc-soft/fpbiblio.txt", 288 acknowledgement = ack-nj, 289 } 290 291 @techreport{ieee754-1985, 292 author = {Stevenson, David }, 293 booktitle = {ANSI/IEEE Std 754-1985}, 294 citeulike-article-id = {1677576}, 295 journal = {ANSI/IEEE Std 754-1985}, 296 keywords = {floating-point, para08}, 297 month = {August}, 298 posted-at = {2008-04-21 12:47:40}, 299 priority = {1}, 300 title = {IEEE standard for binary floating-point arithmetic}, 301 url = {http://ieeexplore.ieee.org/xpls/abs\_all.jsp?arnumber=30711}, 302 year = {1985} 303 } 304 305 @Book{P754:2008:ISF, 306 author = "{IEEE Task P754}", 307 title = "{IEEE 754-2008, Standard for Floating-Point 308 Arithmetic}", 309 publisher = pub-IEEE-STD, 310 address = pub-IEEE-STD:adr, 311 pages = "58", 312 day = "29", 313 month = aug, 314 year = "2008", 315 DOI = "http://doi.ieeecomputersociety.org/10.1109/IEEESTD.2008.4610935", 316 ISBN = "0-7381-5753-8 (paper), 0-7381-5752-X (electronic)", 317 ISBN-13 = "978-0-7381-5753-5 (paper), 978-0-7381-5752-8 318 (electronic)", 319 bibdate = "Thu Sep 25 09:50:30 2008", 320 URL = "http://ieeexplore.ieee.org/servlet/opac?punumber=4610933; 321 http://en.wikipedia.org/wiki/IEEE_754-2008", 322 abstract = "This standard specifies interchange and arithmetic 323 formats and methods for binary and decimal 324 floating-point arithmetic in computer programming 325 environments. This standard specifies exception 326 conditions and their default handling. An 327 implementation of a floating-point system conforming to 328 this standard may be realized entirely in software, 329 entirely in hardware, or in any combination of software 330 and hardware. For operations specified in the normative 331 part of this standard, numerical results and exceptions 332 are uniquely determined by the values of the input 333 data, sequence of operations, and destination formats, 334 all under user control.", 335 acknowledgement = ack-nhfb, 336 } 337 338 @article{matlab-hypot, 339 note={\url{http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/ref/hypot.html&http://www.google.fr/search?q=hypot&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:fr:official&client=firefox-a}}, 340 title={Matlab - hypot : square root of sum of squares}, 341 } 342 343 @article{fdlibm, 344 title = {A Freely Distributable C Math Library}, 345 author = {Sun Microsystems, Inc.}, 346 year = {1993}, 347 note = {\url{http://www.netlib.org/fdlibm}}, 348 } 349 350 @book{261217, 351 author = {Muller,, Jean-Michel}, 352 title = {Elementary functions: algorithms and implementation}, 353 year = {1997}, 354 isbn = {0-8176-3990-X}, 355 publisher = {Birkhauser Boston, Inc.}, 356 address = {Secaucus, NJ, USA}, 357 } 358 359 diff --git a/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf b/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdfindex c7e1f0d..7ef8eaa 100644--- a/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf+++ b/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf Binary files differ diff --git a/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex b/scilab_doc/scilabisnotnaive/scilabisnotnaive.texindex 0c35b37..aced19f 100644--- a/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex+++ b/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex @@ -122,10 +122,17 @@ is not sufficiently accurate. 122 122 123 \include{complexdivision} 123 \include{complexdivision} 124 124 125 125 \include{conclusion} 126 \include{conclusion} 126 127 127 \clearpage 128 \clearpage 128 129 130 \appendix 131 132 \include{simpleexperiments} 133 134 %\include{pythagoreansum} 135 129 %% Bibliography 136 %% Bibliography 130 137 131 138 diff --git a/scilab_doc/scilabisnotnaive/simpleexperiments.tex b/scilab_doc/scilabisnotnaive/simpleexperiments.texnew file mode 100644index 0000000..bea9803--- /dev/null+++ b/scilab_doc/scilabisnotnaive/simpleexperiments.tex @@ -0,0 +1,149 @@ 1 \section{Simple experiments} 2 3 In this section, we analyse the examples given in the introduction of this 4 article. 5 6 \subsection{Why $0.1$ is rounded} 7 8 In this section, we present a brief explanation for the 9 following Scilab session. 10 11 \begin{verbatim} 12 -->format(25) 13 -->x1=0.1 14 x1 = 15 0.1000000000000000055511 16 -->x2 = 1.0-0.9 17 x2 = 18 0.0999999999999999777955 19 -->x1==x2 20 ans = 21 F 22 \end{verbatim} 23 24 In fact, only the 17 first digits $0.100000000000000005$ are 25 significant and the last digits are a artifact of Scilab's 26 displaying system. 27 28 The number $0.1$ can be represented as the normalized number 29 $1.0 \times 10^{-1}$. But the binary floating point representation 30 of $0.1$ is approximately \cite{WhatEveryComputerScientist} 31 $1.100110011001100110011001... \times 2^{-4}$. As you see, the decimal 32 representation is made of a finite number of digits while the 33 binary representation is made of an infinite sequence of 34 digits. Because Scilab computations are based on double precision numbers 35 and because that numbers only have 64 bits to represent the number, 36 some \emph{rounding} must be performed. 37 38 In our example, it happens that $0.1$ falls between two 39 different binary floating point numbers. After rounding, 40 the binary floating point number is associated with the decimal 41 representation "0.100000000000000005", that is "rounding up" 42 in this case. On the other side, $0.9$ is also not representable 43 as an exact binary floating point number (but 1.0 is exactly represented). 44 It happens that, after the substraction "1.0-0.9", the decimal representation of the 45 result is "0.09999999999999997", which is different from the rounded 46 value of $0.1$. 47 48 \subsection{Why $sin(\pi)$ is rounded} 49 50 In this section, we present a brief explanation of the following 51 Scilab 5.1 session, where the function sinus is applied to the 52 number $\pi$. 53 54 \begin{verbatim} 55 -->format(25) 56 -->sin(0.0) 57 ans = 58 0. 59 -->sin(%pi) 60 ans = 61 0.0000000000000001224647 62 \end{verbatim} 63 64 Two kinds of approximations are associated with the previous 65 result 66 \begin{itemize} 67 \item $\pi=3.1415926...$ is approximated by Scilab 68 as the value returned by $4*atan(1.0)$, 69 \item the $sin$ function is approximated by a polynomial. 70 \end{itemize} 71 72 This article is too short to make a complete presentation 73 of the computation of elementary functions. The interested 74 reader may consider the direct analysis of the Fdlibm library 75 as very instructive \cite{fdlibm}. 76 The "Elementary Functions" book by Muller \cite{261217} 77 is a complete reference on this subject. 78 79 In Scilab, the "sin" function is directly performed by a 80 fortran source code (sci\_f\_sin.f) and no additionnal 81 algorithm is performed directly by Scilab. 82 At the compiler level, though, the "sin" function is 83 provided by a library which is compiler-dependent. 84 The main structure of the algorithm which computes 85 "in" is probably the following 86 87 \begin{itemize} 88 \item scale the input $x$ so that in lies in a restricted 89 interval, 90 \item use a polynomial approximation of the local 91 behaviour of "sin" in the neighbourhood of 0, with a guaranteed 92 precision. 93 \end{itemize} 94 95 In the Fdlibm library for example, the scaling interval is 96 $[-\pi/4,\pi/4]$. 97 The polynomial approximation of the sine function has the general form 98 99 \begin{eqnarray} 100 sin(x) &\approx& x + a_3x^3 + \ldots + a_{2n+1} x^{2n+1}\\ 101 &\approx & x + x^3 p(x^2) 102 \end{eqnarray} 103 104 In the Fdlibm library, 6 terms are used. 105 106 For the inverse tan "atan" function, which is 107 used to compute an approximated value of $\pi$, the process is the same. 108 All these operations are guaranteed with some precision. 109 For example, suppose that the functions are guaranteed with 14 significant 110 digits. That means that 17-14 + 1 = 3 digits may be rounded in the process. 111 In our current example, the value of $sin(\pi)$ is approximated 112 with 17 digits after the point as "0.00000000000000012". That means that 113 2 digits have been rounded. 114 115 \subsection{One more step} 116 117 In fact, it is possible to reduce the number of 118 significant digits of the sine function to as low as 0 significant digits. 119 The mathematical theory is $sin(2^n \pi) = 0$, but that is not true with 120 floating point numbers. In the following Scilab session, we 121 122 \begin{verbatim} 123 -->for i = 1:5 124 -->k=10*i; 125 -->n = 2^k; 126 -->sin(n*%pi) 127 -->end 128 ans = 129 - 0.0000000000001254038322 130 ans = 131 - 0.0000000001284135242063 132 ans = 133 - 0.0000001314954487872237 134 ans = 135 - 0.0001346513391512239052 136 ans = 137 - 0.1374464882277985633419 138 \end{verbatim} 139 140 For $sin(2^{50})$, all significant digits are lost. This computation 141 may sound \emph{extreme}, but it must be noticed that it is inside the 142 double precision possibility, since $2^{50} \approx 3.10^{15} \ll 10^{308}$. 143 The solution may be to use multiple precision numbers, such as in the 144 Gnu Multiple Precision system. 145 146 If you know a better algorithm, based on double precision only, 147 which allows to compute accurately such kind of values, the Scilab 148 team will surely be interested to hear from you ! 149