**diff options**

author | Michaël Baudin <michael.baudin@scilab.org> | 2009-03-05 11:59:48 +0100 |
---|---|---|

committer | Michaël Baudin <michael.baudin@scilab.org> | 2009-03-05 11:59:48 +0100 |

commit | 32542f089c085323922e89343b6bfcf74835db07 (patch) | |

tree | f7f57db3d045a024a52f608af4f411b26fb749ef /scilab_doc | |

parent | 8c70f93306f99b0751f0cf7021c869af272881c7 (diff) | |

download | scilab-32542f089c085323922e89343b6bfcf74835db07.zip scilab-32542f089c085323922e89343b6bfcf74835db07.tar.gz |

Added some -not-so-funny- examples

Diffstat (limited to 'scilab_doc')

-rw-r--r-- | scilab_doc/scilabisnotnaive/introduction.tex | 41 | ||||

-rw-r--r-- | scilab_doc/scilabisnotnaive/numericalderivative.tex | 17 | ||||

-rw-r--r-- | scilab_doc/scilabisnotnaive/pythagoreansum.tex | 180 | ||||

-rw-r--r-- | scilab_doc/scilabisnotnaive/quadratic.tex | 23 | ||||

-rw-r--r-- | scilab_doc/scilabisnotnaive/references.tex | 10 | ||||

-rw-r--r-- | scilab_doc/scilabisnotnaive/scilabisnotnaive.bib | 116 | ||||

-rw-r--r-- | scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf | bin | 299187 -> 313874 bytes | |||

-rw-r--r-- | scilab_doc/scilabisnotnaive/scilabisnotnaive.tex | 7 | ||||

-rw-r--r-- | scilab_doc/scilabisnotnaive/simpleexperiments.tex | 149 |

9 files changed, 520 insertions, 23 deletions

diff --git a/scilab_doc/scilabisnotnaive/introduction.tex b/scilab_doc/scilabisnotnaive/introduction.tex index b697405..db7affb 100644 --- a/scilab_doc/scilabisnotnaive/introduction.tex +++ b/scilab_doc/scilabisnotnaive/introduction.tex | |||

@@ -5,12 +5,42 @@ While most mathematic books deals with exact formulas, | |||

5 | Scilab uses algorithms which are specifically designed for | 5 | Scilab uses algorithms which are specifically designed for |

6 | computers. | 6 | computers. |

7 | 7 | ||

8 | As a practical example of the problem considered in this | ||

9 | document, consider the following experiments. The following is an example of | ||

10 | a Scilab 5.1 session, where we compute 0.1 by two ways. | ||

11 | |||

12 | \begin{verbatim} | ||

13 | -->format(25) | ||

14 | -->0.1 | ||

15 | ans = | ||

16 | 0.1000000000000000055511 | ||

17 | -->1.0-0.9 | ||

18 | ans = | ||

19 | 0.0999999999999999777955 | ||

20 | \end{verbatim} | ||

21 | |||

22 | I guess that for a person who has never heard of these problems, | ||

23 | this experiment may be a shock. To get things clearer, let's | ||

24 | check that the sinus function is approximated. | ||

25 | |||

26 | \begin{verbatim} | ||

27 | -->format(25) | ||

28 | -->sin(0.0) | ||

29 | ans = | ||

30 | 0. | ||

31 | -->sin(%pi) | ||

32 | ans = | ||

33 | 0.0000000000000001224647 | ||

34 | \end{verbatim} | ||

35 | |||

8 | The difficulty is generated by the fact that, while | 36 | The difficulty is generated by the fact that, while |

9 | the mathematics treat with \emph{real} numbers, the | 37 | the mathematics treat with \emph{real} numbers, the |

10 | computer deals with their \emph{floating point representations}. | 38 | computer deals with their \emph{floating point representations}. |

11 | This is the difference between the | 39 | This is the difference between the |

12 | \emph{naive}, mathematical, approach, and the \emph{numerical}, | 40 | \emph{naive}, mathematical, approach, and the \emph{numerical}, |

13 | floating-point aware, implementation. | 41 | floating-point aware, implementation. |

42 | (The detailed explanations of the previous examples are presented | ||

43 | in the appendix of this document.) | ||

14 | 44 | ||

15 | In this article, we will show examples of these problems by | 45 | In this article, we will show examples of these problems by |

16 | using the following theoric and experimental approach. | 46 | using the following theoric and experimental approach. |

@@ -36,18 +66,11 @@ digits in the computed value $x_c$. For example, if the relative | |||

36 | error $e_r=10^{-6}$, then the number of significant digits is 6. | 66 | error $e_r=10^{-6}$, then the number of significant digits is 6. |

37 | 67 | ||

38 | When the expected value is zero, the relative error cannot | 68 | When the expected value is zero, the relative error cannot |

39 | be computed, and we then use the absolute error | 69 | be computed, and we then use the absolute error instead |

40 | \begin{eqnarray} | 70 | \begin{eqnarray} |

41 | e_a=|x_c-x_e|. | 71 | e_a=|x_c-x_e|. |

42 | \end{eqnarray} | 72 | \end{eqnarray} |

43 | 73 | ||

44 | A central reference on this subject is the article | ||

45 | by Goldberg, "What Every Computer Scientist Should Know About Floating-Point Arithmetic", | ||

46 | \cite{WhatEveryComputerScientist}. | ||

47 | If one focuses on numerical algorithms, the "Numerical Recipes" \cite{NumericalRecipes}, | ||

48 | is another good sources of solutions for that problem. | ||

49 | The work of Kahan is also central in this domain, for example \cite{Kahan2004}. | ||

50 | |||

51 | Before getting into the details, it is important to | 74 | Before getting into the details, it is important to |

52 | know that real variables in the Scilab language are stored in | 75 | know that real variables in the Scilab language are stored in |

53 | \emph{double precision} variables. Since Scilab is | 76 | \emph{double precision} variables. Since Scilab is |

@@ -56,3 +79,5 @@ variables are stored with 64 bits precision. | |||

56 | As we shall see later, this has a strong influence on the | 79 | As we shall see later, this has a strong influence on the |

57 | results. | 80 | results. |

58 | 81 | ||

82 | |||

83 | |||

diff --git a/scilab_doc/scilabisnotnaive/numericalderivative.tex b/scilab_doc/scilabisnotnaive/numericalderivative.tex index c95a3b3..7ce0768 100644 --- a/scilab_doc/scilabisnotnaive/numericalderivative.tex +++ b/scilab_doc/scilabisnotnaive/numericalderivative.tex | |||

@@ -12,12 +12,6 @@ In the third part, we analyse | |||

12 | why and how floating point numbers must be taken into account when the | 12 | why and how floating point numbers must be taken into account when the |

13 | numerical derivatives are to compute. | 13 | numerical derivatives are to compute. |

14 | 14 | ||

15 | A reference for numerical derivates | ||

16 | is \cite{AbramowitzStegun1972}, chapter 25. "Numerical Interpolation, | ||

17 | Differentiation and Integration" (p. 875). | ||

18 | The webpage \cite{schimdtnd} and the book \cite{NumericalRecipes} give | ||

19 | results about the rounding errors. | ||

20 | |||

21 | \subsection{Theory} | 15 | \subsection{Theory} |

22 | 16 | ||

23 | The basic result is the Taylor formula with one variable \cite{dixmier} | 17 | The basic result is the Taylor formula with one variable \cite{dixmier} |

@@ -307,7 +301,7 @@ a function of one variable is presented in figure \ref{robust-numericalderivativ | |||

307 | \label{robust-numericalderivative} | 301 | \label{robust-numericalderivative} |

308 | \end{figure} | 302 | \end{figure} |

309 | 303 | ||

310 | \subsection{One step further} | 304 | \subsection{One more step} |

311 | 305 | ||

312 | In this section, we analyse the behaviour of \emph{derivative} | 306 | In this section, we analyse the behaviour of \emph{derivative} |

313 | when the point $x$ is either large $x \rightarrow \infty$, | 307 | when the point $x$ is either large $x \rightarrow \infty$, |

@@ -426,3 +420,12 @@ large values of $x$, but produces a slightly sub-optimal step size when | |||

426 | $x$ is near 1. The behaviour near zero is the same, i.e. both commands | 420 | $x$ is near 1. The behaviour near zero is the same, i.e. both commands |

427 | produce wrong results when $x \rightarrow 0$ and $x\neq 0$. | 421 | produce wrong results when $x \rightarrow 0$ and $x\neq 0$. |

428 | 422 | ||

423 | \subsection{References} | ||

424 | |||

425 | A reference for numerical derivates | ||

426 | is \cite{AbramowitzStegun1972}, chapter 25. "Numerical Interpolation, | ||

427 | Differentiation and Integration" (p. 875). | ||

428 | The webpage \cite{schimdtnd} and the book \cite{NumericalRecipes} give | ||

429 | results about the rounding errors. | ||

430 | |||

431 | |||

diff --git a/scilab_doc/scilabisnotnaive/pythagoreansum.tex b/scilab_doc/scilabisnotnaive/pythagoreansum.tex new file mode 100644 index 0000000..5833c69 --- /dev/null +++ b/scilab_doc/scilabisnotnaive/pythagoreansum.tex | |||

@@ -0,0 +1,180 @@ | |||

1 | \section{The Pythagorean sum} | ||

2 | |||

3 | In this section, we analyse the computation of the Pythagorean sum, | ||

4 | which is used in two different computations, that is the norm of a complex | ||

5 | number and the 2-norm of a vector of real values. | ||

6 | |||

7 | In the first part, we briefly present the mathematical formulas for these | ||

8 | two computations. | ||

9 | We then present the naïve algorithm based on these mathematical formulas. | ||

10 | In the second part, we make some experiments in Scilab and compare our | ||

11 | naïve algorithm with the \emph{abs} and \emph{norm} Scilab primitives. | ||

12 | In the third part, we analyse | ||

13 | why and how floating point numbers must be taken into account when the | ||

14 | Pythagorean sum is to compute. | ||

15 | |||

16 | \subsection{Theory} | ||

17 | |||

18 | \subsection{Experiments} | ||

19 | |||

20 | % TODO : compare both abs and norm. | ||

21 | |||

22 | \lstset{language=Scilab} | ||

23 | \lstset{numbers=left} | ||

24 | \lstset{basicstyle=\footnotesize} | ||

25 | \lstset{keywordstyle=\bfseries} | ||

26 | \begin{lstlisting} | ||

27 | // Straitforward implementation | ||

28 | function mn2 = mynorm2(a,b) | ||

29 | mn2 = sqrt(a^2+b^2) | ||

30 | endfunction | ||

31 | // With scaling | ||

32 | function mn2 = mypythag1(a,b) | ||

33 | if (a==0.0) then | ||

34 | mn2 = abs(b); | ||

35 | elseif (b==0.0) then | ||

36 | mn2 = abs(a); | ||

37 | else | ||

38 | if (abs(b)>abs(a)) then | ||

39 | r = a/b; | ||

40 | t = abs(b); | ||

41 | else | ||

42 | r = b/a; | ||

43 | t = abs(a); | ||

44 | end | ||

45 | mn2 = t * sqrt(1 + r^2); | ||

46 | end | ||

47 | endfunction | ||

48 | // With Moler & Morrison's | ||

49 | // At most 7 iterations are required. | ||

50 | function mn2 = mypythag2(a,b) | ||

51 | p = max(abs(a),abs(b)) | ||

52 | q = min(abs(a),abs(b)) | ||

53 | //index = 0 | ||

54 | while (q<>0.0) | ||

55 | //index = index + 1 | ||

56 | //mprintf("index = %d, p = %e, q = %e\n",index,p,q) | ||

57 | r = (q/p)^2 | ||

58 | s = r/(4+r) | ||

59 | p = p + 2*s*p | ||

60 | q = s * q | ||

61 | end | ||

62 | mn2 = p | ||

63 | endfunction | ||

64 | function compare(x) | ||

65 | mprintf("Re(x)=%e, Im(x)=%e\n",real(x),imag(x)); | ||

66 | p = abs(x); | ||

67 | mprintf("%20s : %e\n","Scilab",p); | ||

68 | p = mynorm2(real(x),imag(x)); | ||

69 | mprintf("%20s : %e\n","Naive",p); | ||

70 | p = mypythag1(real(x),imag(x)); | ||

71 | mprintf("%20s : %e\n","Scaling",p); | ||

72 | p = mypythag2(real(x),imag(x)); | ||

73 | mprintf("%20s : %e\n","Moler & Morrison",p); | ||

74 | endfunction | ||

75 | // Test #1 : all is fine | ||

76 | x = 1 + 1 * %i; | ||

77 | compare(x); | ||

78 | // Test #2 : more difficult when x is large | ||

79 | x = 1.e200 + 1 * %i; | ||

80 | compare(x); | ||

81 | // Test #3 : more difficult when x is small | ||

82 | x = 1.e-200 + 1.e-200 * %i; | ||

83 | compare(x); | ||

84 | \end{lstlisting} | ||

85 | |||

86 | \begin{verbatim} | ||

87 | *************************************** | ||

88 | Example #1 : simple computation with Scilab 5.1 | ||

89 | x(1)=1.000000e+000, x(2)=1.000000e+000 | ||

90 | Scilab : 1.414214e+000 | ||

91 | Naive : 1.414214e+000 | ||

92 | Scaling : 1.414214e+000 | ||

93 | Moler & Morrison : 1.414214e+000 | ||

94 | *************************************** | ||

95 | Example #2 : with large numbers ? | ||

96 | Scilab : Inf | ||

97 | Naive : Inf | ||

98 | Scaling : 1.000000e+200 | ||

99 | Moler & Morrison : 1.000000e+200 | ||

100 | *************************************** | ||

101 | Example #3 : with small numbers ? | ||

102 | x(1)=1.000000e-200, x(2)=1.000000e-200 | ||

103 | Scilab : 0.000000e+000 | ||

104 | Naive : 0.000000e+000 | ||

105 | Scaling : 1.414214e-200 | ||

106 | Moler & Morrison : 1.414214e-200 | ||

107 | *************************************** | ||

108 | > Conclusion : Scilab is naive ! | ||

109 | Octave 3.0.3 | ||

110 | *************************************** | ||

111 | octave-3.0.3.exe:29> compare(x); | ||

112 | *************************************** | ||

113 | x(1)=1.000000e+000, x(2)=1.000000e+000 | ||

114 | Octave : 1.414214e+000 | ||

115 | Naive : 1.414214e+000 | ||

116 | Scaling : 1.414214e+000 | ||

117 | Moler & Morrison : 1.414214e+000 | ||

118 | *************************************** | ||

119 | x(1)=1.000000e+200, x(2)=1.000000e+000 | ||

120 | Octave : 1.000000e+200 | ||

121 | Naive : Inf | ||

122 | Scaling : 1.000000e+200 | ||

123 | Moler & Morrison : 1.000000e+200 | ||

124 | *************************************** | ||

125 | octave-3.0.3.exe:33> compare(x) | ||

126 | x(1)=1.000000e-200, x(2)=1.000000e-200 | ||

127 | Octave : 1.414214e-200 | ||

128 | Naive : 0.000000e+000 | ||

129 | Scaling : 1.414214e-200 | ||

130 | Moler & Morrison : 1.414214e-200 | ||

131 | *************************************** | ||

132 | > Conclusion : Octave is not naive ! | ||

133 | |||

134 | With complex numbers. | ||

135 | *************************************** | ||

136 | |||

137 | Re(x)=1.000000e+000, Im(x)=1.000000e+000 | ||

138 | Scilab : 1.414214e+000 | ||

139 | Naive : 1.414214e+000 | ||

140 | Scaling : 1.414214e+000 | ||

141 | Moler & Morrison : 1.414214e+000 | ||

142 | *************************************** | ||

143 | Re(x)=1.000000e+200, Im(x)=1.000000e+000 | ||

144 | Scilab : 1.000000e+200 | ||

145 | Naive : Inf | ||

146 | Scaling : 1.000000e+200 | ||

147 | Moler & Morrison : 1.000000e+200 | ||

148 | *************************************** | ||

149 | Re(x)=1.000000e-200, Im(x)=1.000000e-200 | ||

150 | Scilab : 1.414214e-200 | ||

151 | Naive : 0.000000e+000 | ||

152 | Scaling : 1.414214e-200 | ||

153 | Moler & Morrison : 1.414214e-200 | ||

154 | *************************************** | ||

155 | > Conclusion : Scilab is not naive ! | ||

156 | \end{verbatim} | ||

157 | |||

158 | \subsection{Explanations} | ||

159 | |||

160 | \subsection{References} | ||

161 | |||

162 | The paper by Moler and Morrisson 1983 \cite{journals/ibmrd/MolerM83} gives an | ||

163 | algorithm to compute the Pythagorean sum $a\oplus b = \sqrt{a^2 + b^2}$ | ||

164 | without computing their squares or their square roots. Their algorithm is based on a cubically | ||

165 | convergent sequence. | ||

166 | The BLAS linear algebra suite of routines \cite{900236} includes the SNRM2, DNRM2 | ||

167 | and SCNRM2 routines which conpute the euclidian norm of a vector. | ||

168 | These routines are based on Blue \cite{355771} and Cody \cite{Cody:1971:SEF}. | ||

169 | In his 1978 paper \cite{355771}, James Blue gives an algorithm to compute the | ||

170 | Euclidian norm of a n-vector $\|x\| = \sqrt{\sum_{i=1,n}x_i^2}$. | ||

171 | The exceptionnal values of the \emph{hypot} operator are defined as the | ||

172 | Pythagorean sum in the IEEE 754 standard \cite{P754:2008:ISF,ieee754-1985}. | ||

173 | The \emph{\_\_ieee754\_hypot(x,y)} C function is implemented in the | ||

174 | Fdlibm software library \cite{fdlibm} developed by Sun Microsystems and | ||

175 | available at netlib. This library is used by Matlab \cite{matlab-hypot} | ||

176 | and its \emph{hypot} command. | ||

177 | |||

178 | |||

179 | |||

180 | |||

diff --git a/scilab_doc/scilabisnotnaive/quadratic.tex b/scilab_doc/scilabisnotnaive/quadratic.tex index 122e80d..1a888f9 100644 --- a/scilab_doc/scilabisnotnaive/quadratic.tex +++ b/scilab_doc/scilabisnotnaive/quadratic.tex | |||

@@ -229,12 +229,6 @@ IEEE overflow exception and set the result as \emph{Inf}. | |||

229 | 229 | ||

230 | \subsection{Explanations} | 230 | \subsection{Explanations} |

231 | 231 | ||

232 | The technical report by G. Forsythe \cite{Forsythe1966} is | ||

233 | especially interesting on that subject. The paper by Goldberg | ||

234 | \cite{WhatEveryComputerScientist} is also a good reference for the | ||

235 | quadratic equation. One can also consult the experiments performed by | ||

236 | Nievergelt in \cite{Nievergelt2003}. | ||

237 | |||

238 | The following tricks are extracted from the | 232 | The following tricks are extracted from the |

239 | \emph{quad} routine of the \emph{RPOLY} algorithm by | 233 | \emph{quad} routine of the \emph{RPOLY} algorithm by |

240 | Jenkins \cite{Jenkins1975}. This algorithm is used by Scilab in the | 234 | Jenkins \cite{Jenkins1975}. This algorithm is used by Scilab in the |

@@ -498,7 +492,20 @@ should limit the possible overflows since $|b'/c| < 1$. | |||

498 | These normalization tricks are similar to the one used by Smith in the | 492 | These normalization tricks are similar to the one used by Smith in the |

499 | algorithm for the division of complex numbers \cite{Smith1962}. | 493 | algorithm for the division of complex numbers \cite{Smith1962}. |

500 | 494 | ||

501 | 495 | \subsection{References} | |

502 | 496 | ||

497 | The 1966 technical report by G. Forsythe \cite{Forsythe1966} | ||

498 | presents the floating point system and the possible large error | ||

499 | in using mathematical algorithms blindly. An accurate way of solving | ||

500 | a quadratic is outlined. A few general remarks are made about | ||

501 | computational mathematics. The 1991 paper by Goldberg | ||

502 | \cite{WhatEveryComputerScientist} is a general presentation of the floating | ||

503 | point system and its consequences. It begins with background on floating point | ||

504 | representation and rounding errors, continues with a discussion | ||

505 | of the IEEE floating point standard and concludes with examples of how | ||

506 | computer system builders can better support floating point. The section | ||

507 | 1.4, "Cancellation" specificaly consider the computation of the roots | ||

508 | of a quadratic equation. | ||

509 | One can also consult the experiments performed by Nievergelt in \cite{Nievergelt2003}. | ||

503 | 510 | ||

504 | 511 | ||

diff --git a/scilab_doc/scilabisnotnaive/references.tex b/scilab_doc/scilabisnotnaive/references.tex new file mode 100644 index 0000000..8f7463e --- /dev/null +++ b/scilab_doc/scilabisnotnaive/references.tex | |||

@@ -0,0 +1,10 @@ | |||

1 | \section{References} | ||

2 | |||

3 | A central reference on this subject is the article | ||

4 | by Goldberg, "What Every Computer Scientist Should Know About Floating-Point Arithmetic", | ||

5 | \cite{WhatEveryComputerScientist}. | ||

6 | If one focuses on numerical algorithms, the "Numerical Recipes" \cite{NumericalRecipes}, | ||

7 | is another good sources of solutions for that problem. | ||

8 | The work of Kahan is also central in this domain, for example \cite{Kahan2004}. | ||

9 | |||

10 | |||

diff --git a/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib b/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib index 6de801c..36384a0 100644 --- a/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib +++ b/scilab_doc/scilabisnotnaive/scilabisnotnaive.bib | |||

@@ -241,3 +241,119 @@ publisher= {Third Edition, Addison Wesley, Reading, MA}} | |||

241 | bibsource = {DBLP, http://dblp.uni-trier.de} | 241 | bibsource = {DBLP, http://dblp.uni-trier.de} |

242 | } | 242 | } |

243 | 243 | ||

244 | @article{journals/ibmrd/MolerM83, | ||

245 | title = {Replacing Square Roots by Pythagorean Sums.}, | ||

246 | author = {Cleve B. Moler and Donald Morrison}, | ||

247 | journal = {IBM Journal of Research and Development}, | ||

248 | number = {6}, | ||

249 | pages = {577-581}, | ||

250 | url = {http://dblp.uni-trier.de/db/journals/ibmrd/ibmrd27.html#MolerM83}, | ||

251 | volume = {27}, | ||

252 | year = {1983}, | ||

253 | description = {dblp}, | ||

254 | date = {2002-01-03}, | ||

255 | keywords = {dblp } | ||

256 | } | ||

257 | |||

258 | @techreport{900236, | ||

259 | author = {Lawson,, C. L. and Hanson,, R. J. and Kincaid,, D. R. and Krogh,, F. T.}, | ||

260 | title = {Basic Linear Algebra Subprograms for FORTRAN Usage}, | ||

261 | year = {1977}, | ||

262 | source = {http://www.ncstrl.org:8900/ncstrl/servlet/search?formname=detail\&id=oai%3Ancstrlh%3Autexas_cs%3AUTEXAS_CS%2F%2FCS-TR-77-72}, | ||

263 | publisher = {University of Texas at Austin}, | ||

264 | address = {Austin, TX, USA}, | ||

265 | } | ||

266 | |||

267 | @article{355771, | ||

268 | author = {Blue,, James L.}, | ||

269 | title = {A Portable Fortran Program to Find the Euclidean Norm of a Vector}, | ||

270 | journal = {ACM Trans. Math. Softw.}, | ||

271 | volume = {4}, | ||

272 | number = {1}, | ||

273 | year = {1978}, | ||

274 | issn = {0098-3500}, | ||

275 | pages = {15--23}, | ||

276 | doi = {http://doi.acm.org/10.1145/355769.355771}, | ||

277 | publisher = {ACM}, | ||

278 | address = {New York, NY, USA}, | ||

279 | } | ||

280 | @InCollection{Cody:1971:SEF, | ||

281 | author = "W. J. Cody", | ||

282 | title = "Software for the Elementary Functions", | ||

283 | crossref = "Rice:1971:MS", | ||

284 | pages = "171--186", | ||

285 | year = "1971", | ||

286 | bibdate = "Thu Sep 15 18:56:47 1994", | ||

287 | bibsource = "garbo.uwasa.fi:/pc/doc-soft/fpbiblio.txt", | ||

288 | acknowledgement = ack-nj, | ||

289 | } | ||

290 | |||

291 | @techreport{ieee754-1985, | ||

292 | author = {Stevenson, David }, | ||

293 | booktitle = {ANSI/IEEE Std 754-1985}, | ||

294 | citeulike-article-id = {1677576}, | ||

295 | journal = {ANSI/IEEE Std 754-1985}, | ||

296 | keywords = {floating-point, para08}, | ||

297 | month = {August}, | ||

298 | posted-at = {2008-04-21 12:47:40}, | ||

299 | priority = {1}, | ||

300 | title = {IEEE standard for binary floating-point arithmetic}, | ||

301 | url = {http://ieeexplore.ieee.org/xpls/abs\_all.jsp?arnumber=30711}, | ||

302 | year = {1985} | ||

303 | } | ||

304 | |||

305 | @Book{P754:2008:ISF, | ||

306 | author = "{IEEE Task P754}", | ||

307 | title = "{IEEE 754-2008, Standard for Floating-Point | ||

308 | Arithmetic}", | ||

309 | publisher = pub-IEEE-STD, | ||

310 | address = pub-IEEE-STD:adr, | ||

311 | pages = "58", | ||

312 | day = "29", | ||

313 | month = aug, | ||

314 | year = "2008", | ||

315 | DOI = "http://doi.ieeecomputersociety.org/10.1109/IEEESTD.2008.4610935", | ||

316 | ISBN = "0-7381-5753-8 (paper), 0-7381-5752-X (electronic)", | ||

317 | ISBN-13 = "978-0-7381-5753-5 (paper), 978-0-7381-5752-8 | ||

318 | (electronic)", | ||

319 | bibdate = "Thu Sep 25 09:50:30 2008", | ||

320 | URL = "http://ieeexplore.ieee.org/servlet/opac?punumber=4610933; | ||

321 | http://en.wikipedia.org/wiki/IEEE_754-2008", | ||

322 | abstract = "This standard specifies interchange and arithmetic | ||

323 | formats and methods for binary and decimal | ||

324 | floating-point arithmetic in computer programming | ||

325 | environments. This standard specifies exception | ||

326 | conditions and their default handling. An | ||

327 | implementation of a floating-point system conforming to | ||

328 | this standard may be realized entirely in software, | ||

329 | entirely in hardware, or in any combination of software | ||

330 | and hardware. For operations specified in the normative | ||

331 | part of this standard, numerical results and exceptions | ||

332 | are uniquely determined by the values of the input | ||

333 | data, sequence of operations, and destination formats, | ||

334 | all under user control.", | ||

335 | acknowledgement = ack-nhfb, | ||

336 | } | ||

337 | |||

338 | @article{matlab-hypot, | ||

339 | note={\url{http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/ref/hypot.html&http://www.google.fr/search?q=hypot&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:fr:official&client=firefox-a}}, | ||

340 | title={Matlab - hypot : square root of sum of squares}, | ||

341 | } | ||

342 | |||

343 | @article{fdlibm, | ||

344 | title = {A Freely Distributable C Math Library}, | ||

345 | author = {Sun Microsystems, Inc.}, | ||

346 | year = {1993}, | ||

347 | note = {\url{http://www.netlib.org/fdlibm}}, | ||

348 | } | ||

349 | |||

350 | @book{261217, | ||

351 | author = {Muller,, Jean-Michel}, | ||

352 | title = {Elementary functions: algorithms and implementation}, | ||

353 | year = {1997}, | ||

354 | isbn = {0-8176-3990-X}, | ||

355 | publisher = {Birkhauser Boston, Inc.}, | ||

356 | address = {Secaucus, NJ, USA}, | ||

357 | } | ||

358 | |||

359 | |||

diff --git a/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf b/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf index c7e1f0d..7ef8eaa 100644 --- a/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf +++ b/scilab_doc/scilabisnotnaive/scilabisnotnaive.pdf | |||

Binary files differ | |||

diff --git a/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex b/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex index 0c35b37..aced19f 100644 --- a/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex +++ b/scilab_doc/scilabisnotnaive/scilabisnotnaive.tex | |||

@@ -122,10 +122,17 @@ is not sufficiently accurate. | |||

122 | 122 | ||

123 | \include{complexdivision} | 123 | \include{complexdivision} |

124 | 124 | ||

125 | |||

125 | \include{conclusion} | 126 | \include{conclusion} |

126 | 127 | ||

127 | \clearpage | 128 | \clearpage |

128 | 129 | ||

130 | \appendix | ||

131 | |||

132 | \include{simpleexperiments} | ||

133 | |||

134 | %\include{pythagoreansum} | ||

135 | |||

129 | %% Bibliography | 136 | %% Bibliography |

130 | 137 | ||

131 | 138 | ||

diff --git a/scilab_doc/scilabisnotnaive/simpleexperiments.tex b/scilab_doc/scilabisnotnaive/simpleexperiments.tex new file mode 100644 index 0000000..bea9803 --- /dev/null +++ b/scilab_doc/scilabisnotnaive/simpleexperiments.tex | |||

@@ -0,0 +1,149 @@ | |||

1 | \section{Simple experiments} | ||

2 | |||

3 | In this section, we analyse the examples given in the introduction of this | ||

4 | article. | ||

5 | |||

6 | \subsection{Why $0.1$ is rounded} | ||

7 | |||

8 | In this section, we present a brief explanation for the | ||

9 | following Scilab session. | ||

10 | |||

11 | \begin{verbatim} | ||

12 | -->format(25) | ||

13 | -->x1=0.1 | ||

14 | x1 = | ||

15 | 0.1000000000000000055511 | ||

16 | -->x2 = 1.0-0.9 | ||

17 | x2 = | ||

18 | 0.0999999999999999777955 | ||

19 | -->x1==x2 | ||

20 | ans = | ||

21 | F | ||

22 | \end{verbatim} | ||

23 | |||

24 | In fact, only the 17 first digits $0.100000000000000005$ are | ||

25 | significant and the last digits are a artifact of Scilab's | ||

26 | displaying system. | ||

27 | |||

28 | The number $0.1$ can be represented as the normalized number | ||

29 | $1.0 \times 10^{-1}$. But the binary floating point representation | ||

30 | of $0.1$ is approximately \cite{WhatEveryComputerScientist} | ||

31 | $1.100110011001100110011001... \times 2^{-4}$. As you see, the decimal | ||

32 | representation is made of a finite number of digits while the | ||

33 | binary representation is made of an infinite sequence of | ||

34 | digits. Because Scilab computations are based on double precision numbers | ||

35 | and because that numbers only have 64 bits to represent the number, | ||

36 | some \emph{rounding} must be performed. | ||

37 | |||

38 | In our example, it happens that $0.1$ falls between two | ||

39 | different binary floating point numbers. After rounding, | ||

40 | the binary floating point number is associated with the decimal | ||

41 | representation "0.100000000000000005", that is "rounding up" | ||

42 | in this case. On the other side, $0.9$ is also not representable | ||

43 | as an exact binary floating point number (but 1.0 is exactly represented). | ||

44 | It happens that, after the substraction "1.0-0.9", the decimal representation of the | ||

45 | result is "0.09999999999999997", which is different from the rounded | ||

46 | value of $0.1$. | ||

47 | |||

48 | \subsection{Why $sin(\pi)$ is rounded} | ||

49 | |||

50 | In this section, we present a brief explanation of the following | ||

51 | Scilab 5.1 session, where the function sinus is applied to the | ||

52 | number $\pi$. | ||

53 | |||

54 | \begin{verbatim} | ||

55 | -->format(25) | ||

56 | -->sin(0.0) | ||

57 | ans = | ||

58 | 0. | ||

59 | -->sin(%pi) | ||

60 | ans = | ||

61 | 0.0000000000000001224647 | ||

62 | \end{verbatim} | ||

63 | |||

64 | Two kinds of approximations are associated with the previous | ||

65 | result | ||

66 | \begin{itemize} | ||

67 | \item $\pi=3.1415926...$ is approximated by Scilab | ||

68 | as the value returned by $4*atan(1.0)$, | ||

69 | \item the $sin$ function is approximated by a polynomial. | ||

70 | \end{itemize} | ||

71 | |||

72 | This article is too short to make a complete presentation | ||

73 | of the computation of elementary functions. The interested | ||

74 | reader may consider the direct analysis of the Fdlibm library | ||

75 | as very instructive \cite{fdlibm}. | ||

76 | The "Elementary Functions" book by Muller \cite{261217} | ||

77 | is a complete reference on this subject. | ||

78 | |||

79 | In Scilab, the "sin" function is directly performed by a | ||

80 | fortran source code (sci\_f\_sin.f) and no additionnal | ||

81 | algorithm is performed directly by Scilab. | ||

82 | At the compiler level, though, the "sin" function is | ||

83 | provided by a library which is compiler-dependent. | ||

84 | The main structure of the algorithm which computes | ||

85 | "in" is probably the following | ||

86 | |||

87 | \begin{itemize} | ||

88 | \item scale the input $x$ so that in lies in a restricted | ||

89 | interval, | ||

90 | \item use a polynomial approximation of the local | ||

91 | behaviour of "sin" in the neighbourhood of 0, with a guaranteed | ||

92 | precision. | ||

93 | \end{itemize} | ||

94 | |||

95 | In the Fdlibm library for example, the scaling interval is | ||

96 | $[-\pi/4,\pi/4]$. | ||

97 | The polynomial approximation of the sine function has the general form | ||

98 | |||

99 | \begin{eqnarray} | ||

100 | sin(x) &\approx& x + a_3x^3 + \ldots + a_{2n+1} x^{2n+1}\\ | ||

101 | &\approx & x + x^3 p(x^2) | ||

102 | \end{eqnarray} | ||

103 | |||

104 | In the Fdlibm library, 6 terms are used. | ||

105 | |||

106 | For the inverse tan "atan" function, which is | ||

107 | used to compute an approximated value of $\pi$, the process is the same. | ||

108 | All these operations are guaranteed with some precision. | ||

109 | For example, suppose that the functions are guaranteed with 14 significant | ||

110 | digits. That means that 17-14 + 1 = 3 digits may be rounded in the process. | ||

111 | In our current example, the value of $sin(\pi)$ is approximated | ||

112 | with 17 digits after the point as "0.00000000000000012". That means that | ||

113 | 2 digits have been rounded. | ||

114 | |||

115 | \subsection{One more step} | ||

116 | |||

117 | In fact, it is possible to reduce the number of | ||

118 | significant digits of the sine function to as low as 0 significant digits. | ||

119 | The mathematical theory is $sin(2^n \pi) = 0$, but that is not true with | ||

120 | floating point numbers. In the following Scilab session, we | ||

121 | |||

122 | \begin{verbatim} | ||

123 | -->for i = 1:5 | ||

124 | -->k=10*i; | ||

125 | -->n = 2^k; | ||

126 | -->sin(n*%pi) | ||

127 | -->end | ||

128 | ans = | ||

129 | - 0.0000000000001254038322 | ||

130 | ans = | ||

131 | - 0.0000000001284135242063 | ||

132 | ans = | ||

133 | - 0.0000001314954487872237 | ||

134 | ans = | ||

135 | - 0.0001346513391512239052 | ||

136 | ans = | ||

137 | - 0.1374464882277985633419 | ||

138 | \end{verbatim} | ||

139 | |||

140 | For $sin(2^{50})$, all significant digits are lost. This computation | ||

141 | may sound \emph{extreme}, but it must be noticed that it is inside the | ||

142 | double precision possibility, since $2^{50} \approx 3.10^{15} \ll 10^{308}$. | ||

143 | The solution may be to use multiple precision numbers, such as in the | ||

144 | Gnu Multiple Precision system. | ||

145 | |||

146 | If you know a better algorithm, based on double precision only, | ||

147 | which allows to compute accurately such kind of values, the Scilab | ||

148 | team will surely be interested to hear from you ! | ||

149 | |||