summaryrefslogtreecommitdiffstats
path: root/scilab_doc/statisticswithscilab/introduction.tex
blob: 0da036f31722a6df5e98722186e42055ec4bb0e0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
\chapter{Introduction}

As stated in \cite{scilabstats2001}, Scilab's core provides
a complete set of features related to simulation and statistical
computations.
Indeed, Scilab provides uniform pseudo-random number generators, functions to 
compute the moments of a distribution and a complete set of 
distributions. In this document, we will make a complete 
overview on these features.

It must be noticed, though, that these features
are not as complete as in other languages, like R for example.
This is why several toolboxes have developped in order to extend 
the features of Scilab. In this document, we will present two
major toolboxes, that is the Sci\_R toolbox and the Stix toolbox.

In the last chapter, we will analyse the missing statistical 
features and will analyse how these features are available in other
tools, such as Matlab, R, or Octave.

\section{A sample session}

A good introduction on the statistical features of Scilab is \cite{scilabintro2007}.
In the remaining of this introduction chapter, we will try to have 
a flavour of how to perform statistical computations with Scilab.
We focus on the algorithms which are used inside Scilab, to show what 
exact algorithms perform the computations.

As a first example, we will generate a sequence of numbers from a 
normal law with mean 0 and standard deviation 1 (example inspired and simplified 
from \cite{scilabintro2007}). The probability density function (pdf) 
and the cumulated probability density function of the normal law is 
\begin{eqnarray}
f(x) &=& \frac{1}{\sqrt{2\pi}} e^{-\frac{t^2}{2}},\\
P(x) &=& \frac{1}{\sqrt{2\pi}} \int_{-\infty}^x e^{-\frac{t^2}{2}}.
\end{eqnarray}

The empirical cumulated density function \cite{artcomputerKnuthVol2} of a given
set of data $\{x_i\}_{i=1,N}$ is given by 
\begin{eqnarray}
F_N(x) &=& \frac{\textrm{number of } x_1,x_2,\ldots,x_n \textrm{ that are }\leq x}{N}.
\end{eqnarray}


The numerical method used by Scilab to generate such numbers is the Polar 
method for normal deviates, as presented in \cite{artcomputerKnuthVol2}.

\lstset{language=Scilab}
\lstset{numbers=left}
\lstset{basicstyle=\footnotesize}
\lstset{keywordstyle=\color{green}\bfseries}
\begin{lstlisting}
N=200;
x = rand(1,N,"normal");
Xsorted =gsort(x,"g","i"); 
Ydata = (1:N)/N;
plot(Xsorted,Ydata);
e=gce();
e.children.polyline_style=2;
xtitle("Empirical Cumulated Density Function of Normal Law with 200 samples")
filename = "introduction_ecdfnormal.png";
xs2png(0,filename);
\end{lstlisting}

The empirical cumulated density function 
is presented in figure \ref{introduction_ecdfnormal}.

\begin{figure}[htbp]
\begin{center}
\includegraphics[height=10cm]{introduction_ecdfnormal.png}
\end{center}
\caption{Empirical Cumulated Density Function of Normal Law with 200 samples}
\label{introduction_ecdfnormal}
\end{figure}

To compare the data which is produced by rand with the 
cumulated density function of the normal law, we use the 
\emph{cdfnor} primitive. This primitive is based on \cite{Algorithm715}
and uses rational functions that theoretically approximate the normal 
distribution function to at least 18 significant decimal digits. The same 
primitive can be used to compute the inverse of the cumulated density 
function. In that case, rational functions are used as starting values to 
Newton's Iterations which compute the inverse standard normal.

\lstset{language=Scilab}
\lstset{numbers=left}
\lstset{basicstyle=\footnotesize}
\lstset{keywordstyle=\color{green}\bfseries}
\begin{lstlisting}
N=200;
x = rand(1,N,"normal");
Xsorted =gsort(x,"g","i"); 
Ydata = (1:N)/N;
x=linspace(-3,3,100);
P=cdfnor("PQ",x,zeros(x),ones(x));
plot(Xsorted,Ydata,x,P);
\end{lstlisting}

The comparison plot between the empirical cdf and the 
computed cdf is presented in figure \ref{introduction_ecdcomparison}.

\begin{figure}[htbp]
\begin{center}
\includegraphics[height=10cm]{introduction_ecdfcomparison.png}
\end{center}
\caption{Cumulated Density Function of Normal Law : comparison of cdf from 
rational functions and empirical cdf from Polar method }
\label{introduction_ecdcomparison}
\end{figure}

The moments of a distribution can be computed with the 
\emph{mean}, \emph{variance} and \emph{stdev} Scilab macros,
which are implementations of the moments. For the variance
and standard deviation, the scaling factor is $N-1$.
In the following script, one computes these moment for 
an increasing number of samples, from $10^1$ to $10^5$.

\lstset{language=Scilab}
\lstset{numbers=left}
\lstset{basicstyle=\footnotesize}
\lstset{keywordstyle=\color{green}\bfseries}
\begin{lstlisting}
nbpoints = 5;
means=zeros(nbpoints,1);
vars=zeros(nbpoints,1);
stdevs=zeros(nbpoints,1);
nlist = 1:nbpoints;
for i = nlist
  N=10^i;
  x = rand(1,N,"normal");
  means(i) = mean(x);
  vars(i) = variance(x);
  stdevs(i) = stdev(x);
end
plot(nlist,[means,vars,stdevs]);
\end{lstlisting}

The convergence plot of the moments is presented in 
figure \ref{introduction_convergencemoments}.

\begin{figure}[htbp]
\begin{center}
\includegraphics[height=10cm]{introduction_convergencemoments.png}
\end{center}
\caption{Convergence of the moments of the normal law}
\label{introduction_convergencemoments}
\end{figure}