diff options
author | Michaël Baudin <michael.baudin@scilab.org> | 2009-03-05 11:59:48 +0100 |
---|---|---|
committer | Michaël Baudin <michael.baudin@scilab.org> | 2009-03-05 11:59:48 +0100 |
commit | 32542f089c085323922e89343b6bfcf74835db07 (patch) | |
tree | f7f57db3d045a024a52f608af4f411b26fb749ef /scilab_doc | |
parent | 8c70f93306f99b0751f0cf7021c869af272881c7 (diff) | |
download | scilab-32542f089c085323922e89343b6bfcf74835db07.zip scilab-32542f089c085323922e89343b6bfcf74835db07.tar.gz |
Added some -not-so-funny- examples
Diffstat (limited to 'scilab_doc')
-rw-r--r-- | scilab_doc/scilabisnotnaive/introduction.tex | 41 | ||||
-rw-r--r-- | scilab_doc/scilabisnotnaive/numericalderivative.tex | 17 | ||||
-rw-r--r-- | scilab_doc/scilabisnotnaive/pythagoreansum.tex | 180 | ||||
-rw-r--r-- | scilab_doc/scilabisnotnaive/quadratic.tex | 23 | ||||
-rw-r--r-- | scilab_doc/scilabisnotnaive/references.tex | 10 | ||||
-rw-r--r-- | scilab_doc/scilabisnotnaive/scilabisnotnaive.bib | 116 | ||||
-rw-r--r-- | scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf | bin | 299187 -> 313874 bytes | |||
-rw-r--r-- | scilab_doc/scilabisnotnaive/scilabisnotnaive.tex | 7 | ||||
-rw-r--r-- | scilab_doc/scilabisnotnaive/simpleexperiments.tex | 149 |
9 files changed, 520 insertions, 23 deletions
diff --git a/scilab_doc/scilabisnotnaive/introduction.tex b/scilab_doc/scilabisnotnaive/introduction.tex index b697405..db7affb 100644 --- a/scilab_doc/scilabisnotnaive/introduction.tex +++ b/scilab_doc/scilabisnotnaive/introduction.tex | |||
@@ -5,12 +5,42 @@ While most mathematic books deals with exact formulas, | |||
5 | Scilab uses algorithms which are specifically designed for | 5 | Scilab uses algorithms which are specifically designed for |
6 | computers. | 6 | computers. |
7 | 7 | ||
8 | As a practical example of the problem considered in this | ||
9 | document, consider the following experiments. The following is an example of | ||
10 | a Scilab 5.1 session, where we compute 0.1 by two ways. | ||
11 | |||
12 | \begin{verbatim} | ||
13 | -->format(25) | ||
14 | -->0.1 | ||
15 | ans = | ||
16 | 0.1000000000000000055511 | ||
17 | -->1.0-0.9 | ||
18 | ans = | ||
19 | 0.0999999999999999777955 | ||
20 | \end{verbatim} | ||
21 | |||
22 | I guess that for a person who has never heard of these problems, | ||
23 | this experiment may be a shock. To get things clearer, let's | ||
24 | check that the sinus function is approximated. | ||
25 | |||
26 | \begin{verbatim} | ||
27 | -->format(25) | ||
28 | -->sin(0.0) | ||
29 | ans = | ||
30 | 0. | ||
31 | -->sin(%pi) | ||
32 | ans = | ||
33 | 0.0000000000000001224647 | ||
34 | \end{verbatim} | ||
35 | |||
8 | The difficulty is generated by the fact that, while | 36 | The difficulty is generated by the fact that, while |
9 | the mathematics treat with \emph{real} numbers, the | 37 | the mathematics treat with \emph{real} numbers, the |
10 | computer deals with their \emph{floating point representations}. | 38 | computer deals with their \emph{floating point representations}. |
11 | This is the difference between the | 39 | This is the difference between the |
12 | \emph{naive}, mathematical, approach, and the \emph{numerical}, | 40 | \emph{naive}, mathematical, approach, and the \emph{numerical}, |
13 | floating-point aware, implementation. | 41 | floating-point aware, implementation. |
42 | (The detailed explanations of the previous examples are presented | ||
43 | in the appendix of this document.) | ||
14 | 44 | ||
15 | In this article, we will show examples of these problems by | 45 | In this article, we will show examples of these problems by |
16 | using the following theoric and experimental approach. | 46 | using the following theoric and experimental approach. |
@@ -36,18 +66,11 @@ digits in the computed value $x_c$. For example, if the relative | |||
36 | error $e_r=10^{-6}$, then the number of significant digits is 6. | 66 | error $e_r=10^{-6}$, then the number of significant digits is 6. |
37 | 67 | ||
38 | When the expected value is zero, the relative error cannot | 68 | When the expected value is zero, the relative error cannot |
39 | be computed, and we then use the absolute error | 69 | be computed, and we then use the absolute error instead |
40 | \begin{eqnarray} | 70 | \begin{eqnarray} |
41 | e_a=|x_c-x_e|. | 71 | e_a=|x_c-x_e|. |
42 | \end{eqnarray} | 72 | \end{eqnarray} |
43 | 73 | ||
44 | A central reference on this subject is the article | ||
45 | by Goldberg, "What Every Computer Scientist Should Know About Floating-Point Arithmetic", | ||
46 | \cite{WhatEveryComputerScientist}. | ||
47 | If one focuses on numerical algorithms, the "Numerical Recipes" \cite{NumericalRecipes}, | ||
48 | is another good sources of solutions for that problem. | ||
49 | The work of Kahan is also central in this domain, for example \cite{Kahan2004}. | ||
50 | |||
51 | Before getting into the details, it is important to | 74 | Before getting into the details, it is important to |
52 | know that real variables in the Scilab language are stored in | 75 | know that real variables in the Scilab language are stored in |
53 | \emph{double precision} variables. Since Scilab is | 76 | \emph{double precision} variables. Since Scilab is |
@@ -56,3 +79,5 @@ variables are stored with 64 bits precision. | |||
56 | As we shall see later, this has a strong influence on the | 79 | As we shall see later, this has a strong influence on the |
57 | results. | 80 | results. |
58 | 81 | ||
82 | |||
83 | |||
diff --git a/scilab_doc/scilabisnotnaive/numericalderivative.tex b/scilab_doc/scilabisnotnaive/numericalderivative.tex index c95a3b3..7ce0768 100644 --- a/scilab_doc/scilabisnotnaive/numericalderivative.tex +++ b/scilab_doc/scilabisnotnaive/numericalderivative.tex | |||
@@ -12,12 +12,6 @@ In the third part, we analyse | |||
12 | why and how floating point numbers must be taken into account when the | 12 | why and how floating point numbers must be taken into account when the |
13 | numerical derivatives are to compute. | 13 | numerical derivatives are to compute. |
14 | 14 | ||
15 | A reference for numerical derivates | ||
16 | is \cite{AbramowitzStegun1972}, chapter 25. "Numerical Interpolation, | ||
17 | Differentiation and Integration" (p. 875). | ||
18 | The webpage \cite{schimdtnd} and the book \cite{NumericalRecipes} give | ||
19 | results about the rounding errors. | ||
20 | |||
21 | \subsection{Theory} | 15 | \subsection{Theory} |
22 | 16 | ||
23 | The basic result is the Taylor formula with one variable \cite{dixmier} | 17 | The basic result is the Taylor formula with one variable \cite{dixmier} |
@@ -307,7 +301,7 @@ a function of one variable is presented in figure \ref{robust-numericalderivativ | |||
307 | \label{robust-numericalderivative} | 301 | \label{robust-numericalderivative} |
308 | \end{figure} | 302 | \end{figure} |
309 | 303 | ||
310 | \subsection{One step further} | 304 | \subsection{One more step} |
311 | 305 | ||
312 | In this section, we analyse the behaviour of \emph{derivative} | 306 | In this section, we analyse the behaviour of \emph{derivative} |
313 | when the point $x$ is either large $x \rightarrow \infty$, | 307 | when the point $x$ is either large $x \rightarrow \infty$, |
@@ -426,3 +420,12 @@ large values of $x$, but produces a slightly sub-optimal step size when | |||
426 | $x$ is near 1. The behaviour near zero is the same, i.e. both commands | 420 | $x$ is near 1. The behaviour near zero is the same, i.e. both commands |
427 | produce wrong results when $x \rightarrow 0$ and $x\neq 0$. | 421 | produce wrong results when $x \rightarrow 0$ and $x\neq 0$. |
428 | 422 | ||
423 | \subsection{References} | ||
424 | |||
425 | A reference for numerical derivates | ||
426 | is \cite{AbramowitzStegun1972}, chapter 25. "Numerical Interpolation, | ||
427 | Differentiation and Integration" (p. 875). | ||
428 | The webpage \cite{schimdtnd} and the book \cite{NumericalRecipes} give | ||
429 | results about the rounding errors. | ||
430 | |||
431 | |||
diff --git a/scilab_doc/scilabisnotnaive/pythagoreansum.tex b/scilab_doc/scilabisnotnaive/pythagoreansum.tex new file mode 100644 index 0000000..5833c69 --- /dev/null +++ b/scilab_doc/scilabisnotnaive/pythagoreansum.tex | |||
@@ -0,0 +1,180 @@ | |||
1 | \section{The Pythagorean sum} | ||
2 | |||
3 | In this section, we analyse the computation of the Pythagorean sum, | ||
4 | which is used in two different computations, that is the norm of a complex | ||
5 | number and the 2-norm of a vector of real values. | ||
6 | |||
7 | In the first part, we briefly present the mathematical formulas for these | ||
8 | two computations. | ||
9 | We then present the naïve algorithm based on these mathematical formulas. | ||
10 | In the second part, we make some experiments in Scilab and compare our | ||
11 | naïve algorithm with the \emph{abs} and \emph{norm} Scilab primitives. | ||
12 | In the third part, we analyse | ||
13 | why and how floating point numbers must be taken into account when the | ||
14 | Pythagorean sum is to compute. | ||
15 | |||
16 | \subsection{Theory} | ||
17 | |||
18 | \subsection{Experiments} | ||
19 | |||
20 | % TODO : compare both abs and norm. | ||
21 | |||
22 | \lstset{language=Scilab} | ||
23 | \lstset{numbers=left} | ||
24 | \lstset{basicstyle=\footnotesize} | ||
25 | \lstset{keywordstyle=\bfseries} | ||
26 | \begin{lstlisting} | ||
27 | // Straitforward implementation | ||
28 | function mn2 = mynorm2(a,b) | ||
29 | mn2 = sqrt(a^2+b^2) | ||
30 | endfunction | ||
31 | // With scaling | ||
32 | function mn2 = mypythag1(a,b) | ||
33 | if (a==0.0) then | ||
34 | mn2 = abs(b); | ||
35 | elseif (b==0.0) then | ||
36 | mn2 = abs(a); | ||
37 | else | ||
38 | if (abs(b)>abs(a)) then | ||
39 | r = a/b; | ||
40 | t = abs(b); | ||
41 | else | ||
42 | r = b/a; | ||
43 | t = abs(a); | ||
44 | end | ||
45 | mn2 = t * sqrt(1 + r^2); | ||
46 | end | ||
47 | endfunction | ||
48 | // With Moler & Morrison's | ||
49 | // At most 7 iterations are required. | ||
50 | function mn2 = mypythag2(a,b) | ||
51 | p = max(abs(a),abs(b)) | ||
52 | q = min(abs(a),abs(b)) | ||
53 | //index = 0 | ||
54 | while (q<>0.0) | ||
55 | //index = index + 1 | ||
56 | //mprintf("index = %d, p = %e, q = %e\n",index,p,q) | ||
57 | r = (q/p)^2 | ||
58 | s = r/(4+r) | ||
59 | p = p + 2*s*p | ||
60 | q = s * q | ||
61 | end | ||
62 | mn2 = p | ||
63 | endfunction | ||
64 | function compare(x) | ||
65 | mprintf("Re(x)=%e, Im(x)=%e\n",real(x),imag(x)); | ||
66 | p = abs(x); | ||
67 | mprintf("%20s : %e\n","Scilab",p); | ||
68 | p = mynorm2(real(x),imag(x)); | ||
69 | mprintf("%20s : %e\n","Naive",p); | ||
70 | p = mypythag1(real(x),imag(x)); | ||
71 | mprintf("%20s : %e\n","Scaling",p); | ||
72 | p = mypythag2(real(x),imag(x)); | ||
73 | mprintf("%20s : %e\n","Moler & Morrison",p); | ||
74 | endfunction | ||
75 | // Test #1 : all is fine | ||
76 | x = 1 + 1 * %i; | ||
77 | compare(x); | ||
78 | // Test #2 : more difficult when x is large | ||
79 | x = 1.e200 + 1 * %i; | ||
80 | compare(x); | ||
81 | // Test #3 : more difficult when x is small | ||
82 | x = 1.e-200 + 1.e-200 * %i; | ||
83 | compare(x); | ||
84 | \end{lstlisting} | ||
85 | |||
86 | \begin{verbatim} | ||
87 | *************************************** | ||
88 | Example #1 : simple computation with Scilab 5.1 | ||
89 | x(1)=1.000000e+000, x(2)=1.000000e+000 | ||
90 | Scilab : 1.414214e+000 | ||
91 | Naive : 1.414214e+000 | ||
92 | Scaling : 1.414214e+000 | ||
93 | Moler & Morrison : 1.414214e+000 | ||
94 | *************************************** | ||
95 | Example #2 : with large numbers ? | ||
96 | Scilab : Inf | ||
97 | Naive : Inf | ||
98 | Scaling : 1.000000e+200 | ||
99 | Moler & Morrison : 1.000000e+200 | ||
100 | *************************************** | ||
101 | Example #3 : with small numbers ? | ||
102 | x(1)=1.000000e-200, x(2)=1.000000e-200 | ||
103 | Scilab : 0.000000e+000 | ||
104 | Naive : 0.000000e+000 | ||
105 | Scaling : 1.414214e-200 | ||
106 | Moler & Morrison : 1.414214e-200 | ||
107 | *************************************** | ||
108 | > Conclusion : Scilab is naive ! | ||
109 | Octave 3.0.3 | ||
110 | *************************************** | ||
111 | octave-3.0.3.exe:29> compare(x); | ||
112 | *************************************** | ||
113 | x(1)=1.000000e+000, x(2)=1.000000e+000 | ||
114 | Octave : 1.414214e+000 | ||
115 | Naive : 1.414214e+000 | ||
116 | Scaling : 1.414214e+000 | ||
117 | Moler & Morrison : 1.414214e+000 | ||
118 | *************************************** | ||
119 | x(1)=1.000000e+200, x(2)=1.000000e+000 | ||
120 | Octave : 1.000000e+200 | ||
121 | Naive : Inf | ||
122 | Scaling : 1.000000e+200 | ||
123 | Moler & Morrison : 1.000000e+200 | ||
124 | *************************************** | ||
125 | octave-3.0.3.exe:33> compare(x) | ||
126 | x(1)=1.000000e-200, x(2)=1.000000e-200 | ||
127 | Octave : 1.414214e-200 | ||
128 | Naive : 0.000000e+000 | ||
129 | Scaling : 1.414214e-200 | ||
130 | Moler & Morrison : 1.414214e-200 | ||
131 | *************************************** | ||
132 | > Conclusion : Octave is not naive ! | ||
133 | |||
134 | With complex numbers. | ||
135 | *************************************** | ||
136 | |||
137 | Re(x)=1.000000e+000, Im(x)=1.000000e+000 | ||
138 | Scilab : 1.414214e+000 | ||
139 | Naive : 1.414214e+000 | ||
140 | Scaling : 1.414214e+000 | ||
141 | Moler & Morrison : 1.414214e+000 | ||
142 | *************************************** | ||
143 | Re(x)=1.000000e+200, Im(x)=1.000000e+000 | ||
144 | Scilab : 1.000000e+200 | ||
145 | Naive : Inf | ||
146 | Scaling : 1.000000e+200 | ||
147 | Moler & Morrison : 1.000000e+200 | ||
148 | *************************************** | ||
149 | Re(x)=1.000000e-200, Im(x)=1.000000e-200 | ||
150 | Scilab : 1.414214e-200 | ||
151 | Naive : 0.000000e+000 | ||
152 | Scaling : 1.414214e-200 | ||
153 | Moler & Morrison : 1.414214e-200 | ||
154 | *************************************** | ||
155 | > Conclusion : Scilab is not naive ! | ||
156 | \end{verbatim} | ||
157 | |||
158 | \subsection{Explanations} | ||
159 | |||
160 | \subsection{References} | ||
161 | |||
162 | The paper by Moler and Morrisson 1983 \cite{journals/ibmrd/MolerM83} gives an | ||
163 | algorithm to compute the Pythagorean sum $a\oplus b = \sqrt{a^2 + b^2}$ | ||
164 | without computing their squares or their square roots. Their algorithm is based on a cubically | ||
165 | convergent sequence. | ||
166 | The BLAS linear algebra suite of routines \cite{900236} includes the SNRM2, DNRM2 | ||
167 | and SCNRM2 routines which conpute the euclidian norm of a vector. | ||
168 | These routines are based on Blue \cite{355771} and Cody \cite{Cody:1971:SEF}. | ||
169 | In his 1978 paper \cite{355771}, James Blue gives an algorithm to compute the | ||
170 | Euclidian norm of a n-vector $\|x\| = \sqrt{\sum_{i=1,n}x_i^2}$. | ||
171 | The exceptionnal values of the \emph{hypot} operator are defined as the | ||
172 | Pythagorean sum in the IEEE 754 standard \cite{P754:2008:ISF,ieee754-1985}. | ||
173 | The \emph{\_\_ieee754\_hypot(x,y)} C function is implemented in the | ||
174 | Fdlibm software library \cite{fdlibm} developed by Sun Microsystems and | ||
175 | available at netlib. This library is used by Matlab \cite{matlab-hypot} | ||
176 | and its \emph{hypot} command. | ||
177 | |||
178 | |||
179 | |||
180 | |||
diff --git a/scilab_doc/scilabisnotnaive/quadratic.tex b/scilab_doc/scilabisnotnaive/quadratic.tex index 122e80d..1a888f9 100644 --- a/scilab_doc/scilabisnotnaive/quadratic.tex +++ b/scilab_doc/scilabisnotnaive/quadratic.tex | |||
@@ -229,12 +229,6 @@ IEEE overflow exception and set the result as \emph{Inf}. | |||
229 | 229 | ||
230 | \subsection{Explanations} | 230 | \subsection{Explanations} |
231 | 231 | ||
232 | The technical report by G. Forsythe \cite{Forsythe1966} is | ||
233 | especially interesting on that subject. The paper by Goldberg | ||
234 | \cite{WhatEveryComputerScientist} is also a good reference for the | ||
235 | quadratic equation. One can also consult the experiments performed by | ||
236 | Nievergelt in \cite{Nievergelt2003}. | ||
237 | |||
238 | The following tricks are extracted from the | 232 | The following tricks are extracted from the |
239 | \emph{quad} routine of the \emph{RPOLY} algorithm by | 233 | \emph{quad} routine of the \emph{RPOLY} algorithm by |
240 | Jenkins \cite{Jenkins1975}. This algorithm is used by Scilab in the | 234 | Jenkins \cite{Jenkins1975}. This algorithm is used by Scilab in the |
@@ -498,7 +492,20 @@ should limit the possible overflows since $|b'/c| < 1$. | |||
498 | These normalization tricks are similar to the one used by Smith in the | 492 | These normalization tricks are similar to the one used by Smith in the |
499 | algorithm for the division of complex numbers \cite{Smith1962}. | 493 | algorithm for the division of complex numbers \cite{Smith1962}. |
500 | 494 | ||
501 | 495 | \subsection{References} | |
502 | 496 | ||
497 | The 1966 technical report by G. Forsythe \cite{Forsythe1966} | ||
498 | presents the floating point system and the possible large error | ||
499 | in using mathematical algorithms blindly. An accurate way of solving | ||
500 | a quadratic is outlined. A few general remarks are made about | ||
501 | computational mathematics. The 1991 paper by Goldberg | ||
502 | \cite{WhatEveryComputerScientist} is a general presentation of the floating | ||
503 | point system and its consequences. It begins with background on floating point | ||
504 | representation and rounding errors, continues with a discussion | ||
505 | of the IEEE floating point standard and concludes with examples of how | ||
506 | computer system builders can better support floating point. The section | ||
507 | 1.4, "Cancellation" specificaly consider the computation of the roots | ||
508 | of a quadratic equation. | ||
509 | One can also consult the experiments performed by Nievergelt in \cite{Nievergelt2003}. | ||
503 | 510 | ||
504 | 511 | ||
diff --git a/scilab_doc/scilabisnotnaive/references.tex b/scilab_doc/scilabisnotnaive/references.tex new file mode 100644 index 0000000..8f7463e --- /dev/null +++ b/scilab_doc/scilabisnotnaive/references.tex | |||
@@ -0,0 +1,10 @@ | |||
1 | \section{References} | ||
2 | |||
3 | A central reference on this subject is the article | ||
4 | by Goldberg, "What Every Computer Scientist Should Know About Floating-Point Arithmetic", | ||
5 | \cite{WhatEveryComputerScientist}. | ||
6 | If one focuses on numerical algorithms, the "Numerical Recipes" \cite{NumericalRecipes}, | ||
7 | is another good sources of solutions for that problem. | ||
8 | The work of Kahan is also central in this domain, for example \cite{Kahan2004}. | ||
9 | |||
10 | |||
diff --git a/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib b/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib index 6de801c..36384a0 100644 --- a/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib +++ b/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib | |||
@@ -241,3 +241,119 @@ publisher= {Third Edition, Addison Wesley, Reading, MA}} | |||
241 | bibsource = {DBLP, http://dblp.uni-trier.de} | 241 | bibsource = {DBLP, http://dblp.uni-trier.de} |
242 | } | 242 | } |
243 | 243 | ||
244 | @article{journals/ibmrd/MolerM83, | ||
245 | title = {Replacing Square Roots by Pythagorean Sums.}, | ||
246 | author = {Cleve B. Moler and Donald Morrison}, | ||
247 | journal = {IBM Journal of Research and Development}, | ||
248 | number = {6}, | ||
249 | pages = {577-581}, | ||
250 | url = {http://dblp.uni-trier.de/db/journals/ibmrd/ibmrd27.html#MolerM83}, | ||
251 | volume = {27}, | ||
252 | year = {1983}, | ||
253 | description = {dblp}, | ||
254 | date = {2002-01-03}, | ||
255 | keywords = {dblp } | ||
256 | } | ||
257 | |||
258 | @techreport{900236, | ||
259 | author = {Lawson,, C. L. and Hanson,, R. J. and Kincaid,, D. R. and Krogh,, F. T.}, | ||
260 | title = {Basic Linear Algebra Subprograms for FORTRAN Usage}, | ||
261 | year = {1977}, | ||
262 | source = {http://www.ncstrl.org:8900/ncstrl/servlet/search?formname=detail\&id=oai%3Ancstrlh%3Autexas_cs%3AUTEXAS_CS%2F%2FCS-TR-77-72}, | ||
263 | publisher = {University of Texas at Austin}, | ||
264 | address = {Austin, TX, USA}, | ||
265 | } | ||
266 | |||
267 | @article{355771, | ||
268 | author = {Blue,, James L.}, | ||
269 | title = {A Portable Fortran Program to Find the Euclidean Norm of a Vector}, | ||
270 | journal = {ACM Trans. Math. Softw.}, | ||
271 | volume = {4}, | ||
272 | number = {1}, | ||
273 | year = {1978}, | ||
274 | issn = {0098-3500}, | ||
275 | pages = {15--23}, | ||
276 | doi = {http://doi.acm.org/10.1145/355769.355771}, | ||
277 | publisher = {ACM}, | ||
278 | address = {New York, NY, USA}, | ||
279 | } | ||
280 | @InCollection{Cody:1971:SEF, | ||
281 | author = "W. J. Cody", | ||
282 | title = "Software for the Elementary Functions", | ||
283 | crossref = "Rice:1971:MS", | ||
284 | pages = "171--186", | ||
285 | year = "1971", | ||
286 | bibdate = "Thu Sep 15 18:56:47 1994", | ||
287 | bibsource = "garbo.uwasa.fi:/pc/doc-soft/fpbiblio.txt", | ||
288 | acknowledgement = ack-nj, | ||
289 | } | ||
290 | |||
291 | @techreport{ieee754-1985, | ||
292 | author = {Stevenson, David }, | ||
293 | booktitle = {ANSI/IEEE Std 754-1985}, | ||
294 | citeulike-article-id = {1677576}, | ||
295 | journal = {ANSI/IEEE Std 754-1985}, | ||
296 | keywords = {floating-point, para08}, | ||
297 | month = {August}, | ||
298 | posted-at = {2008-04-21 12:47:40}, | ||
299 | priority = {1}, | ||
300 | title = {IEEE standard for binary floating-point arithmetic}, | ||
301 | url = {http://ieeexplore.ieee.org/xpls/abs\_all.jsp?arnumber=30711}, | ||
302 | year = {1985} | ||
303 | } | ||
304 | |||
305 | @Book{P754:2008:ISF, | ||
306 | author = "{IEEE Task P754}", | ||
307 | title = "{IEEE 754-2008, Standard for Floating-Point | ||
308 | Arithmetic}", | ||
309 | publisher = pub-IEEE-STD, | ||
310 | address = pub-IEEE-STD:adr, | ||
311 | pages = "58", | ||
312 | day = "29", | ||
313 | month = aug, | ||
314 | year = "2008", | ||
315 | DOI = "http://doi.ieeecomputersociety.org/10.1109/IEEESTD.2008.4610935", | ||
316 | ISBN = "0-7381-5753-8 (paper), 0-7381-5752-X (electronic)", | ||
317 | ISBN-13 = "978-0-7381-5753-5 (paper), 978-0-7381-5752-8 | ||
318 | (electronic)", | ||
319 | bibdate = "Thu Sep 25 09:50:30 2008", | ||
320 | URL = "http://ieeexplore.ieee.org/servlet/opac?punumber=4610933; | ||
321 | http://en.wikipedia.org/wiki/IEEE_754-2008", | ||
322 | abstract = "This standard specifies interchange and arithmetic | ||
323 | formats and methods for binary and decimal | ||
324 | floating-point arithmetic in computer programming | ||
325 | environments. This standard specifies exception | ||
326 | conditions and their default handling. An | ||
327 | implementation of a floating-point system conforming to | ||
328 | this standard may be realized entirely in software, | ||
329 | entirely in hardware, or in any combination of software | ||
330 | and hardware. For operations specified in the normative | ||
331 | part of this standard, numerical results and exceptions | ||
332 | are uniquely determined by the values of the input | ||
333 | data, sequence of operations, and destination formats, | ||
334 | all under user control.", | ||
335 | acknowledgement = ack-nhfb, | ||
336 | } | ||
337 | |||
338 | @article{matlab-hypot, | ||
339 | note={\url{http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/ref/hypot.html&http://www.google.fr/search?q=hypot&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:fr:official&client=firefox-a}}, | ||
340 | title={Matlab - hypot : square root of sum of squares}, | ||
341 | } | ||
342 | |||
343 | @article{fdlibm, | ||
344 | title = {A Freely Distributable C Math Library}, | ||
345 | author = {Sun Microsystems, Inc.}, | ||
346 | year = {1993}, | ||
347 | note = {\url{http://www.netlib.org/fdlibm}}, | ||
348 | } | ||
349 | |||
350 | @book{261217, | ||
351 | author = {Muller,, Jean-Michel}, | ||
352 | title = {Elementary functions: algorithms and implementation}, | ||
353 | year = {1997}, | ||
354 | isbn = {0-8176-3990-X}, | ||
355 | publisher = {Birkhauser Boston, Inc.}, | ||
356 | address = {Secaucus, NJ, USA}, | ||
357 | } | ||
358 | |||
359 | |||
diff --git a/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf b/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf index c7e1f0d..7ef8eaa 100644 --- a/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf +++ b/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf | |||
Binary files differ | |||
diff --git a/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex b/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex index 0c35b37..aced19f 100644 --- a/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex +++ b/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex | |||
@@ -122,10 +122,17 @@ is not sufficiently accurate. | |||
122 | 122 | ||
123 | \include{complexdivision} | 123 | \include{complexdivision} |
124 | 124 | ||
125 | |||
125 | \include{conclusion} | 126 | \include{conclusion} |
126 | 127 | ||
127 | \clearpage | 128 | \clearpage |
128 | 129 | ||
130 | \appendix | ||
131 | |||
132 | \include{simpleexperiments} | ||
133 | |||
134 | %\include{pythagoreansum} | ||
135 | |||
129 | %% Bibliography | 136 | %% Bibliography |
130 | 137 | ||
131 | 138 | ||
diff --git a/scilab_doc/scilabisnotnaive/simpleexperiments.tex b/scilab_doc/scilabisnotnaive/simpleexperiments.tex new file mode 100644 index 0000000..bea9803 --- /dev/null +++ b/scilab_doc/scilabisnotnaive/simpleexperiments.tex | |||
@@ -0,0 +1,149 @@ | |||
1 | \section{Simple experiments} | ||
2 | |||
3 | In this section, we analyse the examples given in the introduction of this | ||
4 | article. | ||
5 | |||
6 | \subsection{Why $0.1$ is rounded} | ||
7 | |||
8 | In this section, we present a brief explanation for the | ||
9 | following Scilab session. | ||
10 | |||
11 | \begin{verbatim} | ||
12 | -->format(25) | ||
13 | -->x1=0.1 | ||
14 | x1 = | ||
15 | 0.1000000000000000055511 | ||
16 | -->x2 = 1.0-0.9 | ||
17 | x2 = | ||
18 | 0.0999999999999999777955 | ||
19 | -->x1==x2 | ||
20 | ans = | ||
21 | F | ||
22 | \end{verbatim} | ||
23 | |||
24 | In fact, only the 17 first digits $0.100000000000000005$ are | ||
25 | significant and the last digits are a artifact of Scilab's | ||
26 | displaying system. | ||
27 | |||
28 | The number $0.1$ can be represented as the normalized number | ||
29 | $1.0 \times 10^{-1}$. But the binary floating point representation | ||
30 | of $0.1$ is approximately \cite{WhatEveryComputerScientist} | ||
31 | $1.100110011001100110011001... \times 2^{-4}$. As you see, the decimal | ||
32 | representation is made of a finite number of digits while the | ||
33 | binary representation is made of an infinite sequence of | ||
34 | digits. Because Scilab computations are based on double precision numbers | ||
35 | and because that numbers only have 64 bits to represent the number, | ||
36 | some \emph{rounding} must be performed. | ||
37 | |||
38 | In our example, it happens that $0.1$ falls between two | ||
39 | different binary floating point numbers. After rounding, | ||
40 | the binary floating point number is associated with the decimal | ||
41 | representation "0.100000000000000005", that is "rounding up" | ||
42 | in this case. On the other side, $0.9$ is also not representable | ||
43 | as an exact binary floating point number (but 1.0 is exactly represented). | ||
44 | It happens that, after the substraction "1.0-0.9", the decimal representation of the | ||
45 | result is "0.09999999999999997", which is different from the rounded | ||
46 | value of $0.1$. | ||
47 | |||
48 | \subsection{Why $sin(\pi)$ is rounded} | ||
49 | |||
50 | In this section, we present a brief explanation of the following | ||
51 | Scilab 5.1 session, where the function sinus is applied to the | ||
52 | number $\pi$. | ||
53 | |||
54 | \begin{verbatim} | ||
55 | -->format(25) | ||
56 | -->sin(0.0) | ||
57 | ans = | ||
58 | 0. | ||
59 | -->sin(%pi) | ||
60 | ans = | ||
61 | 0.0000000000000001224647 | ||
62 | \end{verbatim} | ||
63 | |||
64 | Two kinds of approximations are associated with the previous | ||
65 | result | ||
66 | \begin{itemize} | ||
67 | \item $\pi=3.1415926...$ is approximated by Scilab | ||
68 | as the value returned by $4*atan(1.0)$, | ||
69 | \item the $sin$ function is approximated by a polynomial. | ||
70 | \end{itemize} | ||
71 | |||
72 | This article is too short to make a complete presentation | ||
73 | of the computation of elementary functions. The interested | ||
74 | reader may consider the direct analysis of the Fdlibm library | ||
75 | as very instructive \cite{fdlibm}. | ||
76 | The "Elementary Functions" book by Muller \cite{261217} | ||
77 | is a complete reference on this subject. | ||
78 | |||
79 | In Scilab, the "sin" function is directly performed by a | ||
80 | fortran source code (sci\_f\_sin.f) and no additionnal | ||
81 | algorithm is performed directly by Scilab. | ||
82 | At the compiler level, though, the "sin" function is | ||
83 | provided by a library which is compiler-dependent. | ||
84 | The main structure of the algorithm which computes | ||
85 | "in" is probably the following | ||
86 | |||
87 | \begin{itemize} | ||
88 | \item scale the input $x$ so that in lies in a restricted | ||
89 | interval, | ||
90 | \item use a polynomial approximation of the local | ||
91 | behaviour of "sin" in the neighbourhood of 0, with a guaranteed | ||
92 | precision. | ||
93 | \end{itemize} | ||
94 | |||
95 | In the Fdlibm library for example, the scaling interval is | ||
96 | $[-\pi/4,\pi/4]$. | ||
97 | The polynomial approximation of the sine function has the general form | ||
98 | |||
99 | \begin{eqnarray} | ||
100 | sin(x) &\approx& x + a_3x^3 + \ldots + a_{2n+1} x^{2n+1}\\ | ||
101 | &\approx & x + x^3 p(x^2) | ||
102 | \end{eqnarray} | ||
103 | |||
104 | In the Fdlibm library, 6 terms are used. | ||
105 | |||
106 | For the inverse tan "atan" function, which is | ||
107 | used to compute an approximated value of $\pi$, the process is the same. | ||
108 | All these operations are guaranteed with some precision. | ||
109 | For example, suppose that the functions are guaranteed with 14 significant | ||
110 | digits. That means that 17-14 + 1 = 3 digits may be rounded in the process. | ||
111 | In our current example, the value of $sin(\pi)$ is approximated | ||
112 | with 17 digits after the point as "0.00000000000000012". That means that | ||
113 | 2 digits have been rounded. | ||
114 | |||
115 | \subsection{One more step} | ||
116 | |||
117 | In fact, it is possible to reduce the number of | ||
118 | significant digits of the sine function to as low as 0 significant digits. | ||
119 | The mathematical theory is $sin(2^n \pi) = 0$, but that is not true with | ||
120 | floating point numbers. In the following Scilab session, we | ||
121 | |||
122 | \begin{verbatim} | ||
123 | -->for i = 1:5 | ||
124 | -->k=10*i; | ||
125 | -->n = 2^k; | ||
126 | -->sin(n*%pi) | ||
127 | -->end | ||
128 | ans = | ||
129 | - 0.0000000000001254038322 | ||
130 | ans = | ||
131 | - 0.0000000001284135242063 | ||
132 | ans = | ||
133 | - 0.0000001314954487872237 | ||
134 | ans = | ||
135 | - 0.0001346513391512239052 | ||
136 | ans = | ||
137 | - 0.1374464882277985633419 | ||
138 | \end{verbatim} | ||
139 | |||
140 | For $sin(2^{50})$, all significant digits are lost. This computation | ||
141 | may sound \emph{extreme}, but it must be noticed that it is inside the | ||
142 | double precision possibility, since $2^{50} \approx 3.10^{15} \ll 10^{308}$. | ||
143 | The solution may be to use multiple precision numbers, such as in the | ||
144 | Gnu Multiple Precision system. | ||
145 | |||
146 | If you know a better algorithm, based on double precision only, | ||
147 | which allows to compute accurately such kind of values, the Scilab | ||
148 | team will surely be interested to hear from you ! | ||
149 | |||