Services on Demand
Journal
Article
Indicators
 Cited by SciELO
 Access statistics
Related links
 Cited by Google
 Similars in SciELO
 Similars in Google
Share
Computational & Applied Mathematics
Online version ISSN 18070302
Comput. Appl. Math. vol.31 no.3 São Carlos 2012
http://dx.doi.org/10.1590/S180703022012000300005
How good are MatLab, Octave and Scilab for computational modelling?
Eliana S. de Almeida; Antonio C. Medeiros; Alejandro C. Frery
Laboratório de Computação Científica e Análise Numérica (LaCCAN). Centro de Pesquisas em Matemática Computacional (CPMAT). Universidade Federal de Alagoas, BR 104 Norte Km 97, 57072970 Maceió, AL, Brazil. Emails: acfrery@gmail.com / eliana.almeida@gmail.com / medeiros.tonny@gmail.com
ABSTRACT
In this article we test the accuracy of three platforms used in computational modelling: MatLab, Octave and Scilab, running on i386 architecture and three operating systems (Windows, Ubuntu and Mac OS). We submitted them to numerical tests using standard data sets and using the functions provided by each platform. A Monte Carlo study was conducted in some of the datasets in order to verify the stability of the results with respect to small departures from the original input. We propose a set of operations which include the computation of matrix determinants and eigenvalues, whose results are known. We also used data provided by NIST (National Institute of Standards and Technology), a protocol which includes the computation of basic univariate statistics (mean, standard deviation and firstlag correlation), linear regression and extremes of probability distributions. The assessment was made comparing the results computed by the platforms with certified values, that is, known results, computing the number of correct significant digits.
Mathematical subject classification: Primary: 06B10; Secondary: 06D05.
Key words: numerical analisys, computational platforms, spectral graph analysis, statistical computing.
1 Introduction
Mathematical modelling aims at solving complex problems which can be described in a rigorous mathematical way that enables the use of computers for finding the solution.
Many mathematical models which arise in diverse areas as engineering, bioinformatics and ecology rely on partial differential equations (PDE) or ordinary differential equations (ODE), where the high number of variables requires strong computational effort in their solution.
The final output is the result of a myriad of often disregarded intermediate computations. To illustrate this, when a finite element mesh is used to perform a structural analysis, computing matrix inversions and determinants are important commonplace operations which are rarely checked.
The search for best approximate solutions, considering some reasonablebounds of errors, imposes tight accuracy requirements on the computational platforms and its libraries or functions. When dealing with huge structures described by irregular meshes, for instance, algorithms for domain partitioning and parallel computing are often needed, and the correctness of the results is still more critical. Such partitioning algorithms are usually based on either topological or spectral methods, which assess the algebraic properties of the graph associated to the mesh [1, 16]. That is, the mesh can be associated to a dual graph, such that the vertices correspond to the finite elements and the edges represent the connectivity of the elements which share common boards. If a graph is connected, then it is shown that the second eigenvalue of its Laplacian matrix is positive [7]. The components of the second eigenvector are associated with the corresponding vertices of the graph and can be used to assign weights for partitioning of the graph.
In complex problems with many variables and values, minute errors in obtaining the eigenvalue and eigenvector or a matrix determinant, in calculating an average, a standard deviation or a correlation coefficient, can lead to erroneous decisions. Computational platforms offer libraries and functions for carrying out these calculations. When it comes to modelling large problems, with complex variables, good, or at least controlled, responses are fundamental.
Little attention has been drawn to assess these platforms under the diversity of operational systems and hardware considering the accuracy of the results. Examples of such assessments are found in [3, 4, 6, 9]. Most of these studies, usually limited to spreadsheets, follow the methodology suggested by McCullough 10, 11, 12, 14, ]: constasting results with the certified values provided by the Statistical Reference Datasets (StRD) produced by the National Institute of Standards and Technology (NIST) [15]. We add a Monte Carlo study to the protocol in some of the datasets in order to verify the stability of the results with respect to small departures from the original input. Besides statistical tests, in [8] we proposed additional tests that employ operations on matrices in order to assess the scientific platforms from this viewpoint.
In this work we test three numerical scientific platforms: Octave 3.2.4, Scilab 5.3 and Matlab R2011a, under the three well known operating systems: Windows XP Professional SP 2, Linux Ubuntu 10.4 and Mac OS X Leopard 10.5.6, whenever the former are avaliable. In all cases, i386 architecture hardware was employed, and double precision computation was enforced.
Outline. The remainder of this work is organized as follows. Section discusses how accuracy is measured. Section presents the results obtained assessing basic statistics (subsection ), probability distribution functions (subection ), linear regression (subsection ) and operations on matrices (subsection ). Section concludes the paper.
2 Measuring accuracy
Errors in computational simulations can occur and arise from diferents sources. They range from modelling errors, defined by the difference between real world and the computable model, to numerical errors introduced in the solution of the problem. The latter are (i) roundoff errors, (ii) truncation and discretization errors, or (iii) numerical instability. Usually the availability to implementation details of the algorithms is very limited and, even when available, other factors, like hardware, compiler, operational system, may compromise the software accuracy.
Considering such limitations, many authors (see, for instance, [3, 4, 6, 9]) adopt the strategy of measuring the software accuracy from the user viewpoing, that is, comparing the results provided by the software with certified values known to be correct. The certified values and datasets employed in this study are obtained in the Statistical Reference Datasets from the National Institute of Standards and Technology (NIST) [15]. We also measure the stability of the results by Monte Carlo, and propose a strategy that considers the results of operations on matrices.
In the first strategy, statistical descriptive measures are assessed: the mean, the standard deviation and the first lag coefficient of autocorrelation. Linear regression and quantiles of tail probabilities of usual distributions are also computed.
The LRE (base10 logarithm of the absolute value of the relative error) is computed as a measure of the accuracy of the functions. LRE is approximately the number of matching significant digits between the certified and obtained values:
where x is the result of evaluation function computed by the software under assessment and c is the certified value.
The following convention was adopted: when LRE(x, c) > 1 we consider only one decimal place. If 0 < LRE(x, c) < 1, it is assumed zero, that is, no correct digit was found. If the value was very far from the certified, "–" is used; the word 'Inf' is used to mean that there is a perfect match, and when the platform returns an error, it is denoted by 'NA'.
LREs were computed using the R platform (http://www.rproject.org/), whose excellent numerical properties were checked in [3].
In order to assess the stability of the results, we propose a Monte Carlo procedure for some of the datasets used to compute the precision of statistical descriptive measures. A bootstrap estimate of the LRE produced by each platform for each measure (mean, standard deviation and first lag coefficient of autocorrelation) was computed for each of the four realworld datasets plus PiDigits. One hundred independent vectors of the same size of the original ones were obtained sampling with reposition from each original data set. Certified values were computed using R. These vectors were submitted to each platform, and the resulting observed quantities were contrasted with the certified values producing 100 LREs for each quantity of interest. These last values were used to compute an estimate of the standard deviation:
where LRE denotes the "true" logarithm of the absolute value of the relative error, which was observed with the original dataset. This procedure does not belong to the original protocol, but it allows verifying how stable the resuts are with respect to relatively small perturbations of the dataset.
The second strategy deals with operations on matrices. We propose two assessments: the first is to compute the determinant of a 2×2 matrix whose certified value is zero M = 0. Consider the matrix
with arbitrary values b, s, ε.
The numerical computation of the determinant with builtin functions will guarantee that the intermediate values (bε) and (s/ε) are evaluated. The values we proposed to be assessed are b = 10^{j} and s = 10^{j}, with j ∈ 0, 1, ..., 15, and
The measure of accuracy is the result of the logical comparison of the computed value and zero in the platform under assessment. That is, the interest in such case is not the value itself but the result of comparison.
We also propose another assessment considering spectral graph theory. The interest in this proposal is that the Laplacian matrix is directly related to many properties of the graph as, for instance, connectivity [5].
Let G = (V, E) be a nondirected finite graph without loops such that V = {w_{1}, w_{2}, ..., w_{n}} is the set of vertices and E is the set of edges. Denote by deg(w_{i}) the degree of vertex w_{i}. Let D be the diagonal degree matrix with entries deg(w_{i}) and A be the adjacency matrix with elements a_{ij}, which take value 1 if there is an edge between w_{i} and w_{j}. The Laplacian matrix (G) of G is the difference between D and A, i.e., (G) = D – A. Fiedler [7] noted that:

The number of zero eigenvalues of (G) is the number of connected components in the graph;

If λ_{1}, ..., λ_{n} are the eigenvalues of (G), then 0 = λ_{1} < λ_{2} < ... < λ_{n}.

If the second smallest eigenvalue λ_{2} is greater then zero, λ_{2} > 0, then G is connected and λ_{2} is called algebraic connectivity.
Considering algebraic connectivity, we propose an accuracy assessment based on the class of complete bipartite graphs. In such graphs we have two subsets of vertices, say V_{1} and V_{2}, such that no connections exists between vertices belonging to the same subset, and each vertex from V_{1} is connected to every vertex from V_{2}. Let m and n be the cardinality of V_{1} and V_{2}, respectively. If we denote this bipartite graph by K_{m,n}, its Laplacian matrix has the following form:
For this Laplacian matrix, Bolloboas [5] showed that the eigenvalues are λ_{1} = 0, λ_{m+n} = m + n and there are n – 1 eigenvalues whose value is m, and m – 1 eigenvalues whose value is n.
In order to do the assessment, we considered two special cases: one with almost perfect balance K_{m, m}_{+1}, and other with almost the worst possible balance K_{2, 2m1}, where m ∈ {9, 99, 999}. We formed examples of three sizes of graphs: small, medium and big. The assessment is based upon the observation of seven quantities:
(i) the LRE of the smallest eigenvalue (λ_{1} = 0) denoted _{1},
(ii) the LRE of the biggest eigenvalue (λ_{m+n} = m + n) denoted _{m+n},
(iii) the LRE of the sum of the eigenvalues denoted _{S},
(iv) the minimum LRE of those eigenvalues that should take value n (there are m – 1 of them) denoted _{n},
(v) the minimum LRE of those eigenvalues that should take value m (there are n – 1 of them) denoted _{m}, and
(vi,vii) the percentage of eigenvalues which test equal to m and to n (being the correct answers n – 1 and m – 1, respectively) denoted _{N} and _{M}.
3 Results
In this section we present the results of applying the two strategies described in the previous section. The tables present the accuracy of the three programming ambients under assessment running (whenever available) under Windows ('Win'), Linux ('Lin') and Mac OS ('Mac'), in 32 and 64 bits architecture.
3.1 Basic statistics
The univariate summary statistics we assessed are the sample mean, the sample standard deviation and the sample first lag correlation of nine datasets (Table 1). These datasets are classified by NIST in three levels of numerical difficulty: low, average and high. The datasets with low difficulty are Lew, Lottery, Mavro, Michelso (these four datasets come from real world experiments), NumAcc1 and PiDigits. The average difficulty datasets are NumAcc2 and NumAcc3, while NumAcc4 is the only high difficulty dataset. The certified values were calculated using multiple precision arithmetic to obtain 500 digits answers.
The command mean is common to all platforms. The standard deviation in Octave and MatLab is computed with the command std, whereas the Scilab command is st_deviation. For computing the correlation, Scilab provides the function correl which, surprisingly and in spite of what is informed in the documentation, returns the covariance rather than the correlation; the correlation was obtained dividing this result by the product of the sample standard deviations of the subvectors. In Octave we used the command Correl(v(1:n1), v(2:n)), and in Matlab the command applied was corr(v(1:n1), v(2:n)), considering in both the vector v of size n > 3.
The values in parenthesis are the bootstrap estimates of the standard deviation of LREs, s_{LRE}. Whenever 'Inf' was observed, LRE = 16, i.e., the highest possible accuracy in double precision, was used.
3.2 Statistical functions
The distributions herein assessed are the binomial (Table 2), Poisson (Tables 3 and 4), gamma (Table 5), normal (Table 6), χ^{2} (Table 7), beta (Table 8), tStudent (Table 9) and F (Table 10).
The commands to compute all the distributions, except for the beta in Matlab and Octave, are the same: binocdf for the binomial, poisscdf for the Poisson, gamcdf for the gamma, norminv for the normal, cdfchi for the χ^{2}, tinv for the tStudent, and finv for the F law. The command beta_cdf computes beta distribuition in Octave, while Matlab provides the command betainv. Scilab provides cdfbin, cdfpoi, cdfgam, cdfnor, cdfchi, cdfbet, cdft and cdff for computing the binomial, Poisson, gamma, normal, χ^{2}, beta, tStudent and F quantiles, respectively.
3.3 Linear regression
NIST offers eleven datasets to perform linear regression analysis. The datasets are divided into numerical difficulty levels: two of low level, (Norris and Pontius), two of average level, (Noint1 and Noint2) and seven of high level. Table 11 presents the smallest LRE of each regression, and the LRE of the residual standard deviation (RSD) of each fit.
Octave and Matlab do not provide explicit functions for performing linear regression. Rather than that, linear regression is computed solving a least squares problem, and the data requires prior preparation for that. Scilab provides the function reglin to obtain the coefficients and RSD.
3.4 Results on decisions based on matrices
The command det is used by all three platforms under assessment to compute determinants. As proposed Section 2, the evaluation is based on comparing the results with the certified value zero, rather than on the numerical value itself. This is due to the fact that more often than not what users are interested upon is a decision, and not a numerical value.
Curiosily, the number of correct results of comparing with zero was the same, that is, exactly 146 for the three platforms under assessment.
The results of computing spectral graph analyses are presented in Table 12.
4 Conclusions
Regarding the computation of basic statistics, Table 1 shows that the mean poses little difficulty for the platforms, with the exception of Octave for Linux, which presented the smallest number of LRE in five of the nine datasets (LRE(x, c) < 7). Surprisingly, these five datasets offer low numerical difficulty.
When computing the standard deviation, Octave presented the bests results when comparing with others two platforms. The version tested here was better than the one tested before (see [8]). Scilab presented an unacceptable low accuracy in a single dataset, for which LRE(x, c) < 5.
As in other studies, c.f. reference [2], the firstlag sample autocorrelation is a challenging quantity to compute. None of the platforms here tested provided acceptable results. All of them computed LRE(x, c) < 5, and Scilab had the worst performance.
Scilab is also the worst platform with respect to the stability of the results, as measured by estimated standard variation of the observed LRE. As can be noted in Table 1, all conclusions about the standard deviation may be reverted with small perturbations of the original input, e.g., the best results which were produced for the Lew dataset can be turned into unacceptable by subtracting s_{LRE} from the observed LRE.
Two other cases are notorious for their instability: the sample mean and the sample standard deviation, both computed by Octave under Linux. In most of the other cases, small perturbations of the original input do not change the conclusion about the precision.
Scilab presented the best performance when dealing with the binomial and tStudent distributions, and also when computing the cumulative distribution function of the Poisson law (LRE(x, c) > 6). In this last, Octave and MatLab presented better results that their previous versions (see [8]).
When computing the F distributions, Octave produced the best results, mainly if compared with its previous version; as presented by Frery et al. [8], this platform had produced the worst answers. But Octave fails to produce acceptable results when dealing with the binomial and tStudent laws. Regarding the normal distribution, MatLab and Octave obtained the same good results, while Scilab produced bad results. The three platforms were acceptable when dealing with the gamma law, that is, in this case LRE(x, c) > 6.
Matlab and Octave failed at computing the tStudent distribution; in every assessed case, there was no match or they returned an error message. This is a serious issue due to the widely spread use of this distribution in statistical tests.
Six out of eleven linear regression datasets were not adequately dealt with by any of the considered platforms. Only Matlab provided acceptable results for Filip, Norris, Wampler1 and for Wampler2. Wampler2 was acceptably treated by Octave under Windows and Linux and Scilab under the three operational systems. Again, no single platform can be advised as safe for the linear regression problems here considered.
Suprisingly, the same results were provided by the three platforms when making decisions about the determinant of illconditioned matrices under the three operating systems. The number of erroneus result was acceptable, that is only 94 in 240 logical comparisons with the value zero. Nevertheless, users are advised to be very careful when testing equality between a value of interest and a numerical computation involving determinats in these platforms.
The assessment based on spectral graph analysis presented a very consistent behavior with respect to the problem size (the bigger the graph, the worse the answer), being _{M} and _{N} the most sensitive quantities across all platforms and operating systems, and they can be reported as good in most cases. The first and last eigenvalues (_{1} and _{m+n}) are always dependable if computed in double precision and then tested in single precision, being the latter consistently more precise than the former. The balance of bipartite connected graphs did not have a strong impact on the results, except for the percentage of correct eigenvalues.
Extreme care must be taken when making decisions about graphs based on their spectral properties. As a rule of the thumb, doubleprecision computation is advised, but the comparison to known values should be made rounding or, at most, using at most floating point representation.
Regarding the variability among operating systems, MatLab and Octave were equivalent and more consistent than Scilab in most of the situations under assessment.
The results are the same in platforms under 32 and 64 bits operating systems, so the latter were not reported in the tables.
REFERENCES
[1] M. AlNasra and D.T. Nguyen, An algorithm for domain decomposition in finite element analisys. Computers & Structures (1991). [ Links ]
[2] M. Almiron, B.L. Vieira, A.L.C. Oliveira, A.C. Medeiros and A.C. Frery, On the numerical accuracy of spreadsheets. Journal of Statistical Software, 34(4) (2010), 129. [ Links ]
[3] M.G. Almiron, E.S. Almeida and M.N. Miranda, The reliability of statistical functions in four software packages freely used in numerical computation. Brazilian Journal of Probability and Statistics, 23(2) (2009), 107119. [ Links ]
[4] M.G. Almiron, B. Lopes, A.L.C. Oliveira, A.C. Medeiros and A.C. Frery, On the numerical accuracy of spreadsheets. Journal of Statistical Software, 34(4) (2010), 129. [ Links ]
[5] B. Bollobas, Modern Graph Theory. Springer (1998). [ Links ]
[6] O.H. Bustos and A.C. Frery, Statistical functions and procedures in IDL 5.6 and 6.0. Computational Statistics & Data Analysis, 50(2), 301310, January (2006). [ Links ]
[7] M. Fiedler, Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 23(98) (1973), 298305. [ Links ]
[8] A.C. Frery, E.S. Almeida, and A.C. Medeiros, Are Octave, Scilab and Matlab reliable? In: E. Dvorkin, M. Goldschmit and M. Storti, editors, CILAMCE 2010: XXXI Congreso IberoLatinoAmericano de Métodos Computacionales en la Ingeniería, volume XXIX, pages 22932307, Buenos Aires, Argentina, 2010. Asociación Argentina de Mecánica Computacional. [ Links ]
[9] K.B. Keeling and R.J. Pavur, A comparative study of the reliability of nine statistical software packages. Computational Statistics & Data Analysis, 51(8), 38113831, May (2007). [ Links ]
[10] B.D. McCullough, Assessing the reliability of statistical software: Part I. American Statistician, 52(4), 358366, November (1998). [ Links ]
[11] B.D. McCullough and B. Wilson, On the accuracy of statistical procedures in Microsoft Excel 97. Computational Statistics & Data Analysis, 31(1), 2737, July (1999). [ Links ]
[12] B.D. McCullough and B. Wilson, On the accuracy of statistical procedures in Microsoft Excel 2000 and Excel XP. Computational Statistics & Data Analysis, 40(4), 713721, October (2002). [ Links ]
[13] B.D. McCullough and B. Wilson, On the accuracy of statistical procedures in Microsoft Excel 2003. Computational Statistics & Data Analysis, 49(4), 12441252, June (2005). [ Links ]
[14] B.D. McCulluogh, The accurary of Mathematica 4 as a statistical package. Computational Statistics, 15(2) (2000), 279299. [ Links ]
[15] National Institute of Standards and Technology, Statistical reference datasets, 2010. URL http://www.itl.nist.gov/div898/strd/general/dataarchive.html. [ Links ]
[16] H.D Simon, Partitioning of unstructured problems for parallel processing. Computing Systems in Engeneering (1991). [ Links ]
Received: 22/IX/12.
Accepted: 28/IX/12.
#CAM699/12.
The authors received support from CNPq, Capes and Fapeal.