The Impact of Compilation Flags and Choosing Single - or -Double-Precision Variables in Linear Systems Solvers

BRUM, R. C.; DE CASTRO, M. C. S.; FARIA, C. O.

doi:10.5540/tcam.2023.024.02.00319

ABSTRACT

This paper intends to show the impact of compiler optimization flags and the variable’s precision on direct methods to solve linear systems. The chosen six methods are simple direct methods, so our work could be a study for new researchers in this field. The methods are LU Decomposition, LDU Decomposition, Gaussian Elimination, Gauss-Jordan Elimination, Cholesky Decomposition, and QR Decomposition using the Gram-Schmidt orthogonalization process. Our study showed a huge difference in time between single-and double-precision in all methods, but the error encountered in the single-precision was not so high. Also, the best flags to these methods were the ‘-O3’ and the ‘-Ofast’ ones.

Keywords:
linear systems; compiler flags; optimization; variable precision

1 INTRODUCTION

Many areas of science wants to find a fast and correct solution using computational models. At some point, most of the created models are turned into a system of linear equations. This system can be described as a set of m linear equations and n unknowns in which each equation can be described as $a_{i 1} x_{1} + a_{i 2} x_{2} + \dots + a_{i n} x_{n} = b_{i}$ . All the coefficients can be stored in a matrix A and all constant terms in a column vector b. Both A and b can be stored in a single matrix M called by “augmented matrix”, considering m = n, as described in (1.1).

M = [\begin{matrix} a_{1, 1} & a_{1, 2} & \dots & a_{1, n} & b_{1} \\ a_{2, 1} & a_{2, 2} & \dots & a_{2, n} & b_{2} \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ a_{n, 1} & a_{n, 2} & \dots & a_{n, n} & b_{n} \end{matrix}]

(1.1)

In this paper, we focus on six direct methods: Gaussian Elimination ¹⁴14 R. Sedgewick. “Algorithms in C”. Addison-Wesley series in computer science. Addison-Wesley Pub. Co, Reading, Mass (1990).^),(¹⁷17 D.S. Watkins. “Fundamentals of matrix computations”. Pure and applied mathematics. Wiley-Interscience, New York, 2nd ed ed. (2002)., Gauss-Jordan Elimination ¹⁷17 D.S. Watkins. “Fundamentals of matrix computations”. Pure and applied mathematics. Wiley-Interscience, New York, 2nd ed ed. (2002)., LU Decomposition ¹⁰10 S. Lipschutz. “Álgebra linear (4a. ed.).”. Grupo A -Bookman (2000). URL http://public.eblib.com/choice/publicfullrecord.aspx?p=3236279. OCLC: 923757758.
http://public.eblib.com/choice/publicful... , LDU Decomposition ¹⁰10 S. Lipschutz. “Álgebra linear (4a. ed.).”. Grupo A -Bookman (2000). URL http://public.eblib.com/choice/publicfullrecord.aspx?p=3236279. OCLC: 923757758.
http://public.eblib.com/choice/publicful... , Cholesky Decomposition ¹⁷17 D.S. Watkins. “Fundamentals of matrix computations”. Pure and applied mathematics. Wiley-Interscience, New York, 2nd ed ed. (2002). and QR Decomposition using the Gram-Schmidt process ¹⁰10 S. Lipschutz. “Álgebra linear (4a. ed.).”. Grupo A -Bookman (2000). URL http://public.eblib.com/choice/publicfullrecord.aspx?p=3236279. OCLC: 923757758.
http://public.eblib.com/choice/publicful... . All these methods are classified as direct methods and they find the solution in a finite number of steps ⁶6 M.C.C. Cunha. “Métodos numéricos”. Editora da UNICAMP (2003).. More precisely, in Table 1 we present the quantity of floating-point operations (FLOPs) for each method, found in ⁸8 N.J. Higham. Cholesky factorization. Wiley Interdisciplinary Reviews: Computational Statistics, 1(2) (2009), 251-254. doi:10.1002/wics.18.
https://doi.org/10.1002/wics.18... ^{), (}¹¹11 C.D. Meyer. “Matrix analysis and applied linear algebra”, volume 71. Siam (2000).^{) (}¹⁵15 J. Trahan, A. Kaw & K. Martin. Computational time for finding the inverse of a matrix: LU decomposition vs. naive gaussian elimination. University of South Florida, (2006).^{), (}¹⁶16 L.N. Trefethen. Three mysteries of Gaussian elimination. ACM SIGNUM Newsletter, 20(4) (1985), 2-5. doi:10.1145/1057954.1057955.
https://doi.org/10.1145/1057954.1057955...

Method	# FLOPs	Reference
Gaussian Elimination	O(n³/3)	(Trefethen, 1985) ¹⁶16 L.N. Trefethen. Three mysteries of Gaussian elimination. ACM SIGNUM Newsletter, 20(4) (1985), 2-5. doi:10.1145/1057954.1057955. https://doi.org/10.1145/1057954.1057955...
Gauss-Jordan Elimination	O(n³/2)	(Meyer, 2000) ¹¹11 C.D. Meyer. “Matrix analysis and applied linear algebra”, volume 71. Siam (2000).
LU Decomposition	O(n³/3)	(Trahan et al, 2006) ¹⁵15 J. Trahan, A. Kaw & K. Martin. Computational time for finding the inverse of a matrix: LU decomposition vs. naive gaussian elimination. University of South Florida, (2006).
LDU Decomposition	O(n³/3)	(Meyer, 2000) ¹¹11 C.D. Meyer. “Matrix analysis and applied linear algebra”, volume 71. Siam (2000).
Cholesky Decomposition	O(n³/3)	(Higham, 2009) ⁸8 N.J. Higham. Cholesky factorization. Wiley Interdisciplinary Reviews: Computational Statistics, 1(2) (2009), 251-254. doi:10.1002/wics.18. https://doi.org/10.1002/wics.18...
QR Decomposition (using GS process)	O(2n³/3)	(Trefethen, 1985) ¹⁶16 L.N. Trefethen. Three mysteries of Gaussian elimination. ACM SIGNUM Newsletter, 20(4) (1985), 2-5. doi:10.1145/1057954.1057955. https://doi.org/10.1145/1057954.1057955...

	GCC version 5.4.0		GCC version 7.1.0
Flag	Single-precision	Double-precision	Single-precision	Double-precision
-O0	284.24 s	290.12 s	286.43 s	296.84 s
-O	108.49 s	118.74 s	108.24 s	117.98 s
-O1	108.43 s	118.80 s	107.69 s	118.04 s
-O2	59.12 s	80.34 s	59.10 s	76.65 s
-O3	46.29 s	78.98 s	45.91 s	70.99 s
-Ofast	46.85 s	78.96 s	61.40 s	79.57 s
-Og	106.61 s	118.12 s	107.63 s	117.78 s
-Os	77.27 s	79.53 s	64.14 s	79.33 s

Flag	Single-precision	Double-precision
-O0	1.50 × 10⁻⁴	1.67 × 10⁻¹³
-O	1.50 × 10⁻⁴	1.67 × 10⁻¹³
-O1	1.50 × 10⁻⁴	1.67 × 10⁻¹³
-O2	1.50 × 10⁻⁴	1.67 × 10⁻¹³
-O3	1.50 × 10⁻⁴	1.67 × 10⁻¹³
-Ofast	1.50 × 10⁻⁴	1.67 × 10⁻¹³
-Og	1.50 × 10⁻⁴	1.67 × 10⁻¹³
-Os	1.50 × 10⁻⁴	1.67 × 10⁻¹³

Flag	Single-precision	Double-precision
-O0	-NaN	-NaN
-O	-NaN	-NaN
-O1	-NaN	-NaN
-O2	1.50 × 10⁻⁴	1.67 × 10⁻¹³
-O3	1.50 × 10⁻⁴	1.67 × 10⁻¹³
-Ofast	1.50 × 10⁻⁴	1.67 × 10⁻¹³
-Og	-NaN	-NaN
-Os	1.50 × 10⁻⁴	1.67 × 10⁻¹³

Flag	Single-precision	Double-precision
-O0	-NaN	1.78 × 10⁻²
-O	-NaN	1.78 × 10⁻²
-O1	-NaN	1.78 × 10⁻²
-O2	1.50 × 10⁻⁴	1.67 × 10⁻¹³
-O3	1.50 × 10⁻⁴	1.67 × 10⁻¹³
-Ofast	1.50 × 10⁻⁴	1.67 × 10⁻¹³
-Og	-NaN	1.78 × 10⁻²
-Os	1.50 × 10⁻⁴	1.67 × 10⁻¹³

	Flag ’-O3’		Flag ’-Ofast’
# of unknowns	GCC v. 5.4.0	GCC v. 7.1.0	GCC v. 5.4.0	GCC v. 7.1.0
5000	1.706	1.546	1.685	1.296
10000	1.832	1.864	1.780	1.330
15000	1.930	1.862	1.841	1.378
20000	1.813	1.977	1.609	1.413

	GCC version 5.4.0		GCC version 7.1.0
Flag	Single-precision	Double-precision	Single-precision	Double-precision
-O0	284.45 s	290.67 s	288.24 s	297.53 s
-O	107.82 s	113.24 s	108.07 s	119.13 s
-O1	107.86 s	121.35 s	108.10 s	119.16 s
-O2	59.31 s	71.79 s	59.47 s	80.94 s
-O3	46.17 s	64.99 s	46.55 s	80.10 s
-Ofast	47.45 s	79.48 s	52.65 s	114.12 s
-Og	109.71 s	117.34 s	107.40 s	118.95 s
-Os	74.91 s	86.43 s	63.69 s	87.22 s

	GCC version 5.4.0		GCC version 7.1.0
Flag	Single-precision	Double-precision	Single-precision	Double-precision
-O0	410.45 s	413.15 s	417.31 s	421.74 s
-O	139.99 s	147.88 s	139.67 s	149.60 s
-O1	139.30 s	147.93 s	139.74 s	150.08 s
-O2	205.27 s	316.08 s	205.34 s	317.09 s
-O3	204.88 s	316.82 s	205.79 s	317.40 s
-Ofast	205.56 s	316.90 s	207.51 s	317.12 s
-Og	139.93 s	146.45 s	138.20 s	147.11 s
-Os	204.24 s	316.86 s	204.45 s	316.87 s

	GCC version 5.4.0		GCC version 7.1.0
Flag	Single-precision	Double-precision	Single-precision	Double-precision
-O0	799.43 s	823.06 s	821.80 s	833.27 s
-O	342.39 s	467.30 s	338.41 s	463.62 s
-O1	342.83 s	466.87 s	340.40 s	463.72 s
-O2	323.91 s	441.57 s	323.96 s	441.29 s
-O3	323.88 s	441.49 s	324.43 s	442.05 s
-Ofast	322.22 s	442.22 s	322.51 s	442.19 s
-Og	342.02 s	463.40 s	340.54 s	464.37 s
-Os	323.45 s	441.58 s	317.42 s	440.68 s

	GCC version 5.4.0		GCC version 7.1.0
Flag	Single-precision	Double-precision	Single-precision	Double-precision
-O0	122.38 s	124.96 s	121.85 s	124.12 s
-O	30.60 s	33.93 s	30.17 s	33.66 s
-O1	30.51 s	33.91 s	30.17 s	33.62 s
-O2	30.47 s	34.04 s	30.14 s	33.81 s
-O3	30.54 s	34.30 s	30.22 s	33.35 s
-Ofast	19.55 s	40.95 s	19.68 s	34.81 s
-Og	47.80 s	49.64 s	47.55 s	49.64 s
-Os	31.34 s	35.05 s	30.98 s	35.70 s

Brasil

Brasil

The Impact of Compilation Flags and Choosing Single - or -Double-Precision Variables in Linear Systems Solvers

ABSTRACT

1 INTRODUCTION

2 PROGRAMMING LANGUAGE AND COMPILERS USED

2.1 Compiler options used

2.2 Compiler flags for GCC version 5.4.0

2.3 Compiler flags for GCC version 7.1.0

3 NUMERICAL TESTS AND RESULTS

3.1 Execution time considering all flags

3.2 Numerical solution considering all flags

3.3 Comparison between double-and single-precision

4 CONCLUSIONS

Acknowledgments

REFERENCES

Publication Dates

History