Dataset construction and data science analysis of physicochemical characterization of ordinary Portland cement

Abstract This paper presents a dataset construction and data science analysis from the literature results of physicochemical characterization of ordinary Portland cement (OPC). The physicochemical variables included the percentage by mass of calcium oxide (CaO), silicon dioxide (SiO2), aluminum oxide (Al2O3), iron oxide (Fe2O3), magnesium oxide (MgO), sulfuric oxide (SO3), sodium oxide (Na2O), potassium oxide (K2O), titanium oxide (TiO2), free lime (CaOfree), equivalent alkaline (Na2Oeq), loss on ignition, specific surface, density, water-cement ratio, and compressive strength of cement at 28 days. The searching, collection, and assembly of the dataset aimed to evaluate the information related to those variables through exploratory data analysis, enabling a basic understanding of characterization results of OPCs obtained in publications from different types, sources, years, and countries. The dataset provides a useful source of physicochemical characterization of ordinary cement, and the exploratory data analysis provided an understanding of central, dispersion, and data distribution with statistical metrics of each variable and their pair-wise correlations in the assembled dataset. The constructed dataset and its analysis are a starting point to further data, studies, and artificial intelligence models to provide a broader global view of the production and properties of ordinary Portland cement.


INTRODUCTION
Portland cement is one of the most used materials worldwide and its compressive strength after 28 days of age is the most used measure concerning engineering and performance properties.This strength is a critical input in the technological control of the material and structural design.At 28 days, it is considered the end of the curing process of cement, and it is expected that the strength specified by the manufacturer will be reached, making the compressive strength a fundamental parameter for comparison, and it is still the most used requirement in the choice of cementitious materials [1], [2].Therefore, this property is an important criterion for standard compliance and is frequently used in the field of civil construction and scientific research.It is well known that this 28-day compressive strength is influenced by its constituent materials [2], [3].
Ordinary Portland cement (OPC) consists of clinker, which has as raw materials limestone, clay or siliceous materials, and materials containing iron and aluminum oxide, and a small percentage of gypsum to regularize the setting.The cement manufacturing process essentially consists of grinding the raw material, mixing it intimately in certain proportions, and burning (at temperatures of up to around 1.450 °C) in large rotary kilns, where the material is sintered and partially melted.The chemical reaction that takes place between the raw materials of clinker in the rotary kiln generates its four main compounds which are tricalcium silicate or alite (C3S), dicalcium silicate, or belite (C 2 S), tricalcium aluminate (C 3 A), and tetracalcium ferroaluminate (C 4 AF) [4]- [6].The proportions of the phases present in the cement influence the physical properties of cementitious materials, such as strength, setting time, among other factors.
It is important to point out that silicates in cement are not pure compounds as they contain secondary oxides in solid solution.Both invariably contain small amounts of magnesium, aluminum, iron, potassium, sodium, and sulfur ions.These oxides exert significant effects on the atomic arrangement, crystal shape and hydraulic properties of silicates.Similar to calcium silicates, Mehta and Monteiro [4] mention that in industrial clinkers both C 3 A and C 4 AF contain significant amounts of magnesium, sodium, potassium and silica in their crystalline structure [4], [7].
The alkalis, potassium oxide (K 2 O) and sodium oxide (Na 2 O), because they are soluble and reactive, are among the most common elements in nature and are found in small amounts in all raw materials used in the manufacture of cement, especially in clay compounds.Alkalis are of interest in concrete technology due to their reaction with reactive aggregates, originating the alkali-aggregate reaction that causes disintegration of concrete [8].However, Neville and Brooks [9] mentions that they influence the speed of development of cement strength.
The Portland Cement manufacturing process has undergone changes to improve the environmental aspect of production.The co-processing technique, for example, is the reusing waste as raw material, or as a source of energy, or both to replace natural mineral resources and fossil fuels such as coal, petroleum, and gas in industrial processes [10].Although this practice can improve the efficiency of material resources, most waste contains contaminants that can be inserted into some of the secondary oxides in the structure of the 4 main cement compounds and interfere with the products formed.In this way, cleaner production through co-processing therefore requires a good understanding of the impacts of these contaminants on the cement manufacturing process, cement quality and the environment [11], [12].
As much as the present research deals with OPC, commercial cements usually incorporate some type of supplementary cementitious materials (SCM).Some materials such as calcined clay, limestone, silica fume, rice hush ash with controlled burning and metacaulim are studied and increasingly used in the sector as SCM, having different influences on the final product.Limestone, for example, contributes to the process of hydration of cement and during the hydration process there is the formation of carbonamine compounds in the presence of finely ground carbonate material, decreasing the porosity of the cementitious system.In addition, the mechanical strength of cementitious materials is greatly influenced by the presence of SCM, in which the strength gain is slower, being lower in the initial ages and increasing with advancing time [13], [14].
Since the chemical composition of Portland cement and its hydrates have a direct influence on the characteristics of cementitious matrices, its characterization and quantification are of fundamental importance.Quantitative analysis of the concentrations of cement elements is a step widely applied in research that uses it in their experimental programs.Although Portland cement consists essentially of various calcium compounds, the results of routine chemical analyzes are expressed in terms of the elemental oxides present [4].
In research that uses and investigate the OPC, parameters such as its chemical composition, specific surface, density, watercement ratio, among other properties, are usually investigated together with the results of 28-day compressive strength, since scientific works around the world have already proven the influence between physicochemical properties with the development of mechanical resistance of cementitious materials.The change in the chemical composition of the cement, for example, influences the compressive strength, since the proportions of the different compounds vary significantly from one cement to another.The main oxides, expressed in percentage by mass, investigated in scientific research are calcium oxide (CaO), silicon dioxide (SiO2), aluminum oxide (Al 2 O 3 ), iron oxide (Fe 2 O 3 ), magnesium oxide (MgO), sulfuric oxide (SO 3 ), sodium oxide (Na 2 O), potassium oxide (K 2 O), titanium oxide (TiO 2 ), free lime (CaO free ), and alkaline equivalent (Na 2 O eq ).The oxide content of each cement influences the proportion of the main compounds (C 3 S, C 2 S, C 3 A, C 4 AF).Because each compound has a different reactivity and forms different products, they influence the mechanical strength in different ways [15].Several techniques are applied to determine the composition of OPC; however, X-ray fluorescence spectroscopy (XRF) is widely used to characterize the oxides present in Portland cement samples.In addition, X-ray diffraction (XRD) with the Rietveld method and X-ray fluorescence spectroscopy combined with the Bogue Potential calculation are used in studies with ordinary Portland cement to quantify its 4 main compounds (C 3 S, C 2 S, C 3 A, C 4 AF).
Regarding the physical characterization, fineness is a parameter used by several researchers that use cementitious matrices, as it is a property that is directly related to the speed of the hydration reaction, having a proven influence on its mechanical behavior.The fineness of the cement is related to the specific surface of the grains and its determination serves mainly to check the uniformity of the material's grinding process.Normally, this property can be measured by using nitrogen adsorption technique (BET), based on a mathematical theory that has the measurement of the specific surface area of a material through the physical adsorption of hydrogen gas molecules on the surface [16], and Blaine air-permeability apparatus, which the specific surface is expressed as area total surface area in square centimeters per gram, or square meters per kilogram, of cement [17].Since the reaction of Portland cement with water is an effect from the outer surface to the inner surface of the grain, that is, the degree of grinding of the cement will influence the hydration speed and the development of compressive strength [1], [6], [11].
The determination of the compressive strength of Portland cement is standardized worldwide, using cement mortar specimens.The standards of each country establish factors such as dimensions of the specimens, water-cement and sand-cement ratios, type of sand, consistency, among others, to provide uniformity in the process of producing mortars.The American standard C 109/C109M [18] and the British BS EN 196-1 [19], serve as a theoretical basis for the development of standards in different countries.In Brazil, the test method is established by ABNT NBR 7215 [20].The characterization and mechanical behavior of OPC have been investigated by researchers all over the world with several different goals and results, like Malami et al. [21], Felekoǧlu et al. [22], Parande et al. [23], Yao and Sun [24], Dhandapani et al. [25] and Yun et al. [26], however, there is an absolute lack of statistical studies of its oxide components and standard properties.Furthermore, no paper in literature collected and statistically quantified physicochemical characteristics of OPC's composition considering different sources, years, and countries.
This paper aims the collection and analysis of a dataset from the literature on the physicochemical characterization of OPC considering as variables the mass percentage of its oxides: CaO free , Na 2 O eq , loss on ignition; and the commonly reported physical properties: specific surface, density, water-cement ratio, compressive strength at 28 days.From the collected data, the present work also performs a data science exploratory assessment of each variable, as well as their correlations, through an exploratory data analysis to investigate their statistical moments, distribution characteristics, outlier identification, and statistical correlations among variables that make up the dataset.A bibliometric study of the papers was also carried out, showing the scenario in which, these publications are found, as the most frequently used keyword, year and type of publication, main sources.The main contribution of this paper is the collection and assembly of a novel dataset on which the variables are the commonly reported physical properties and chemical composition of OPCs.Furthermore, the exploratory data analysis provided a basic understanding of central, dispersion, and data distribution with statistical metrics of each variable and their pair-wise correlations in the assembled dataset.This set is crucial to statistical regression, machine learning, and artificial intelligence applications to develop predictive models for the compressive strength based on the physicochemical characteristics of OPC.

METHODOLOGY, BIBLIOMETRIC REVIEW, AND DATASET COLLECTION AND ASSEMBLY
The data set was formed from the reading of more than 3.000 scientific productions between March and June 2021 through the Scopus database.To standardize during the entire search, the string "Ordinary Portland Cement" was inserted to limit the results in searches that contained the OPC in the titles, abstracts, and keywords.From the results of the research, the titles, abstracts, and especially the topics of materials and methods were read in all works.The selection of a publication was based on the availability of the results of the OPC characterization tests.In other words, to be selected, the research needed to explicitly provide the mass percentage of, at least, the main four OPC oxides CaO, SiO2, Al2O, FeO2, and the 28-day strength characterization.By having the results of these five variables, especially the compressive strength, the results were collected and added to the dataset collecting, when applicable.After the final screening, the dataset was finally formed from 102 publications.
An initial bibliometric review was carried out to analyze selected scientific productions that used the OPC, using the VOSviewer tool, which provides an interface for viewing and analyzing bibliometric and sociometric networks.The analysis in the software was performed regarding the terms of occurrence of the keywords, applying the full counting method to scan the titles, abstracts, and keywords.This tool employs a visualization method based on the distance between the nodes of the analyzed network, in which the distance between two nodes approximately indicates the intensity of the relationship between them, thus, the smaller the distance, the greater the intensity of this relationship.Figure 1 presents the bibliometric network extracted from the VOSviewer software, which presents all the terms used in the titles, abstracts, and keywords.The network shows that the most used word is "compressive strength", having 17 occurrences, 66 links, meaning that this term has 66 connections with other words, and a link strength of 76, which indicates the number of publications in which it appears linked to other keywords.Table 1 presents the physicochemical variables (oxides or properties) considered in this study.Note that the table also specifies how each variable is labeled in all tables and graphs in this manuscript.The 102 scientific productions consisted of 92 journal papers (11 international journals), 2 book results (Calcined Clay for Sustainable Concrete), and 8 thesis/dissertations. Figure 3   The graph plotted in Figure 4 shows the data-filling matrix of the assembled dataset, in which the white blanks represent the lack of data for a given variable, and the blue color represents the presence of data.It is possible to observe that TiO 2 is the parameter that was least provided in the literature, either because it was not investigated in the respective publication or because it was not identified in the chemical characterization test, followed by alkalis (Na 2 O and K 2 O) and CaO free .In addition, the sample preparation process for chemical characterization tests, such as XRF, for example, can influence the accuracy of the determination of the percentage of alkalis.The strength was the only parameter that had results shown in all publications.The oxides CaO, SiO 2 , Al 2 O 3 e Fe 2 O 3 , which give rise to the four main compounds of Portland cement, are also factors that the researchers sought to investigate and that were described in the characterization of OPC.Likewise, the determination of the specific surface (surface) was present in the works and the water-cement ratio (wc) was indicated through the standard of the respective country of the publication.

DATA SCIENCE ANALYSIS
From the assembled dataset, an exploratory analysis was performed, which consisted primarily of data preparation, exploratory data analysis of each variable, and the relationships between them.Statistical metrics such as maximum, minimum, mean, and median; metrics of dispersion/variability such as standard deviation, coefficient of variation, range, and outliers detection; and metrics of data distribution such as interquartile range and skewness were also performed.An in-house software Tyche [27] with its data science module Datum was used for those metrics.
The analysis starts with Figure 5 showing a data matrix with pair-wise scatter plots and individual histograms for the 16 variables.The main diagonal shows the individual histograms for each variable, whereas the off-diagonal components, shows the dispersion plot for a combination of two variables.For example, the plot in the first row and the third column is the scatter plot of CaO versus Al2O3 oxide variables.Note that the histogram for the water-cement ratio (wc) almost represents a categorical variable with only three valid bins: 0.40-0.41,0.48-0.49,and 0.49-0.50, in which the latter is significatively dominant.This is because almost all standards used the 0.50 water-cement ratio.Fewer exceptions used slightly different values which were counted in the other bins.The Indian standard for determining compressive strength IS 4031 (Part 6) [28], for example, considers that the water-cement ratio is acquired through another standard of the slump measurements IS 4031(Part 4) [29].
Although the water-cement ratio (wc) presented this extreme concentration, at the 0.5 ratios, almost as a deterministic variable for this dataset, the authors decided to keep it for completeness of the analysis.The oxides TiO2 and CaO free also presented a dominant value, but to a lesser degree than wc, as showed in the histograms.This is also observed on the scatter plots of those variables (rows 9, 10, and 15) that tend to form horizontal lines on the plots.Figure 5 only allows an initial qualitative assessment of data, therefore, the following paragraphs present an in-depth quantitative analysis of variables.Levels of heading establish the hierarchy of sections by the format or appearance.The section and subsection headings must be preceded by progressive numbering, presented in Arabic numerals, starting at 1.
The histogram of the compressive strength does not suggest any conventional probability distribution.Goodness of fit tests were performed for the main known distributions: normal, log normal, uniform, exponential, extreme value, and they all failed the null-hypothesis showing that there is significant difference between the observed strength values and the expected distributions.The determination of a possible probability distribution will be further investigated in future papers.Table 2 presents the summary of the main statistical parameters for the samples of each variable of the dataset: minimum (min), maximum (max), mean, standard deviation (SD), and coefficient of variation (CV), median, skewness, and interquartile range (IQR).From the values of the percentage by mass of the oxides from the collected dataset, the mean values of the main components of ordinary Portland cement, C 3 S (53,98%), C 2 S (17,87%), C 3 A (6,92%) e C 4 AF (10,15%) were calculated, in percentage, using Bogue's equations [21] considering the addition of gypsum.It is noteworthy that the remainder of the sum of the percentage values of those four compounds was adopted as the content of incorporated calcium sulfate and impurities, determining a mean value of 11.08%.
For the calculation using Bogue, it is necessary to consider that the composition of the four main components of and becomes C 3 S and C 2 S. The method also considers non-real clinker temperatures close to 2,000 °C, perfect combination of oxides, the existence of balance between C 3 S, C 2 S and liquid phase [30], [31].According to Gobbo [30] the calculation restricts the constitution of the cement clinkers to C 3 S, C 2 S, C 3 A e C 4 AF, being that it despises the existence of minor elements, such as the TiO 2 , MgO, K 2 O e Na 2 O, among others.It is important to emphasize that some impurities, instead of being present in the cement material, may be incorporated into the structures of main compounds.

Analysis of sample dispersion, distribution properties, and outliers
The coefficient of variation (CV) conveys the data dispersion (variability of sample data) in terms of the ratio of the standard deviation (SD) and the sample mean values.The CV is a suitable quantity because it expresses the variability of the data excluding the influence of different scales allowing direct comparison among variables of different units or order of magnitude.Figure 6 shows the CV, in percentage, for the 16 variables in descending order.The graph shows that two oxides (CaO and SiO2), dens, and wc has CV below 8% meaning that their values used around the world to manufacture OPC have very low variability.This was already mentioned for the water-cement ratio since the histogram already showed almost exclusivity of values within the range 0.49-0.50because of uniformity of compressive strength standards used the 0.5 ratios as discussed before.Another important factor is that CaO and SiO 2 are the first and second most dominant components of OPC's mass with means 63.27% and 20.60%, respectively, according to Table 2, which both account for approximately 84% (mean) in OPC's mass.This further showed that those two components mostly related to the manufacturing and extraction process of cement raw materials showed very low variability in their percentage composition in OPCs in the reported literature.However, the CV for the strength was 16%, which is almost double the CV for the most dominant components (in mass).The dashed line in Figure 6 allows a visual comparison of the compressive strength's coefficient of variation with the other variables.Furthermore, the compressive strength samples had a standard deviation of 7.9 Mpa and its mean was 50.74 Mpa, since the values obtained for strength in the research range mainly from 40 to 60 Mpa, having only 9 out of 102 that had resistance below 40 Mpa and only 1 out of 102 above 70 Mpa.
The majority of the other oxides presented high CV, such as SO3 (31%), Fe 2 O (31%), TiO 2 (35%), Na 2 O eq (42%), K 2 O (48%), CaO free (53%), MgO (55%) e Na 2 O (65%).This greater variability in the mass percentage values of these oxides can be explained by the influence of impurities present in the raw materials extracted and used for cement manufacturing, as well as by adjustments made in the chemical composition of the material in each country due to some specific standard.Among the test properties variables, the loss on ignition (loss) presented a high CV value (51%) demonstrating the high variability of these test results in the literature.The interquartile range (IQR) measures the data sample distribution in-between the first (Q1) and third (Q3) quartiles (between 25th and 75th percentile).Therefore, shows the range of values around the median (second quartile Q2) corresponding to the 50% central samples.Smaller IQR values imply more sample values toward the left and right tails: lowest and highest 25% values.The IQR and total range (max -min) values for each variable are presented in Table 2.
Figure 7a shows the IQR/Range ratio, in percentage, for each variable.The component CaO free presented the highest ratio showing that 46% of the range amplitude correspond to 50% of the data samples.The K 2 O and strength presented slightly more than one-third of range amplitude as IQR.This shows those three variables had quantitatively compacted data distribution around the median as shown in Figure 5.However, the oxides CaO, Fe 2 O 3 , SO 3 ; and the property specific surface (surface) had less than 15 of their respective total amplitude as IQR which shows that more than 85% of their samples are below the first quartile (25% lowest values) and above the third quartile (25% highest values).The water-cement ratio (wc) presented the lowest IQR/Range percentage due to the concentration of values on the righthand side of the data distribution as shown and explained before in Figure 5.
Based on each IQR, a systematic method of identifying outliers can be used to establish limit values outside Q1 and Q3.The lower limit is 1.5IQR -Q1, while the upper limit is 1.5IQR + Q3, and any sample value outside those limits is considered an outlier.Figure 7b shows the percentage of outliers identified for each variable showing the specific surface, TiO2, CaO, Fe 2 O 3 , SO 3 had more than 5% of each respective sample data as outliers.This agrees with the results of Figure 7a, which indicate higher percentages of the range of those variables toward the tails.The only exception is the TiO 2 that although presented a reasonable IQR/Range, had 8% of its data as outliers which demonstrated the use of very discrepant content of TiO 2 in the composition of ordinary cement.One possible explanation is that this oxide is not commonly used as a component of OPC.The high amount of 10% outliers of the surface showed a reasonable percentage of extreme results of those tests to characterize the specific surface presented by the literature for similar cement compositions.This property is related to cement grinding, the greater scatter and outlier percentage identified from collected data shows that this process is carried out in different ways in different countries and can influence the speed of hydration reactions and strength gain in the early ages of the final product.Regarding the data distribution of each variable, the sample skewness was determined to quantitatively assess the level of asymmetry of the data distribution around its mean.Figure 8 shows the skewness values of each variable and presents the schematic representation of symmetry or asymmetries, in which negative skewness indicates that the tail is on the left side of data distribution, and positive skewness indicates the distribution tail is on the right.Approximately null-value skewness indicates symmetric data distribution as shown in the figure.The oxides K 2 O, SiO 2 , CaO free , and the compressive strength property had symmetric data distribution due to their very small skewness values (< 0.4).However, the three oxides Na 2 O, MgO, and TiO 2 had significant skewness to the right side (skewness > 1.0), quantitatively confirming them to have an asymmetric data distribution.All the other variables showed a moderate to low skewness.The only exception is the water-cement ratio (wc) that presented -3.2 skewness due to the data concentration at 0.5 of the uniform standards.

Correlation between OPC physicochemical properties
The Pearson correlation matrix for the 16 variables is plotted in Figure 9. Due to the symmetry of the matrix, only the lower triangular part is plotted.Each coefficient (ρ) is a measure of linear correlation between two sets of data, and the color intensity means the magnitude of correlation coefficients for pair-wise combinations.Among the oxides, only the correlation between Na 2 O eq with Na 2 O and K 2 O, presented significant positive values of 0.72 and 0.68, respectively.This is somewhat expected since Na 2 O eq is derived from the other two oxides.A moderate negative correlation of -0.45 can be observed between CaO and MgO meaning that, when the percentage in the mass of one of these components tends to increase in the cement composition, the other oxide tends to decrease its percentage.The last row of the Pearson correlation matrix, which corresponds to ρ-values between the compressive strength and all other variables, is plotted in Figure 10.The highest positive correlation (ρ=0.23) was found to be with TiO 2 , whereas the highest negative correlation of ρ=-0.35 was with MgO.Nevertheless, those magnitudes can be considered moderate to low correlation values.According to Moreno [8], the addition of titanium dioxide to cement aims to adjust the raw material, and MgO is derived from the magnesium carbonate present in the original limestone in the form of dolomite, present only in small amounts depending on the specificity of the cement to be produced.Thus, the presence of these two compounds has no evidence of direct influence on compressive strength.It is important to note that the approximately null correlation between the water-cement ratio and compressive strength is due to the very low variability of the collected data set with almost all wc values being in 0.50 as already discussed.

CONCLUSIONS
This paper presented the methodology to construct and analyze a dataset, collected from the literature of different sources, types, and dates with the main thirteen physicochemical properties of Ordinary Portland Cement (OPC) as variables.It was searched the eleven most used oxides components of OPC and five commonly reported properties including the 28-day compressive strength.The dataset was analyzed through an exploratory statistical to quantify the main statistical properties and parameters for each variable and their correlations.The exploratory analysis provided a basic understanding of the data collected and the relationships between the analyzed OPC variables.The constructed dataset is a starting point to further studies with the addition of more data to complement and provide a broader global view of the production and properties of ordinary Portland cement.
The more specific conclusions from the analysis are: • The main oxides CaO and SiO2 that compose approximately 85% in mass of OPC presented very small data dispersion through their small coefficients of variation (< 7%).Therefore, the composition of OPC produced worldwide has low variability of CaO and SiO 2 , which had the lowest coefficients of variation and are responsible for 2 of the 4 main components of cement, C 3 S, and C 2 S, indicating a certain standardization of these compounds.However, the 28-day compressive strength presented a much higher coefficient of variation reaching 16%.Most of the remaining oxides (Na 2 O, MgO, CaO free , K 2 O, Na 2 O eq ) presented higher dispersion values among the literature in which they had a coefficient of variation greater than 40%.Concerning the C 3 A and C 4 AF, moderate coefficients of variation were noticed for the oxides Al 2 O 3 and Fe 2 O 3 .• Compressive strength at 28 days presented a mean value of 50.7 MPa, and the data range for this property is mostly in-between 40 to 60 MPa, but some publications found more scattered values with a minimum value of 38.2 MPa and a maximum of 71.0 MPa.However, the 28-day strength presented symmetric data distribution and 36% of the data were within the IQR.Furthermore, no outlier was detected for the strength data.All these statistical measurements show a compact assemble of the strength for the OPC among the literature, despite the variability and high skewness of the main oxides that compose OPCs.• Most of the oxides that compose the minority of the OPC in mass had non-symmetric data distribution, especially the Na2O, MgO, TiO 2 , and Al 2 O 3 presented high skewness to the left (mean greater than the median).Among those oxides, TiO 2 showed a high value of 8% of the outlier.The specific surface property was the variable that presented the most amount of outliers (10%) showing extreme values for this characterization reported by the literature.• The ratio between the interquartile range (Q3 -Q1) and the total range (max -min) demonstrated to have a good agreement to the number of outliers detected by each variable, especially variables with higher values for that ratio presented very low or null percentage of outliers on their data samples.• The correlations showed moderate negative correlation (-0.45) between MgO and the main oxide CaO, which indicates compositions with higher percentages of MgO had lower percentages of CaO.Moreover, the increase in the percentages of MgO on the OPC composition, moderately decrease the 28-day compressive strength as indicated by the negative correlation of -0.35 between those variables.The strength did not present any other relevant correlation with other variables.

Figure 2
Figure2presents the evolution of the number of scientific productions published per year among the selected 102 publications.It is possible to notice that from the year 2015 up to 2020 the number of publications increases.The only exceptions are 2019 and 2021, in which this last one was only partially elapsed by the data this paper was written.

Figure 2 .
Figure 2. Number of scientific productions per year.
contains the main sources with their respective numbers of selected publications that characterized the physicochemical properties of OPC.Construction and Building Materials (CBM) stands out with 33 publications, followed by Cement and Concrete Research (CCR) and Cement and Concrete Composites (CCC), with 22 and 17 papers, respectively.

Figure 3 .
Figure 3. Number of papers per source.Legend: Construction and Building Materials (CBM), Cement and Concrete Research (CCR), Cement and Concrete Composites (CCC), Thesis/Dissertation (TD), Journal of Building Engineering (JBE), Structural Concrete (SC), The International Journal of Cement Composites and Lightweight Concrete (IJCL), Fire and Materials (FM), Calcined clay for Sustainable Concrete (CCSC), International Journal of Energy Research (IJER), Procedia Engineering (PE), Science of the Total Environment (STE) and Journal of Materials in Civil Engineering (JMCE).

Figure 5 .
Figure 5. Matrix with histograms and pair-wise scatter plots of the variables.

Figure 6 .
Figure 6.Coefficient of variation (CV) of each variable.

Figure 7 .
Figure 7. Interquartile range (IQR) and outlier quantification: a) IQR/Range and b) the percentage of outliers for each variable.

Figure 8 .
Figure 8. Skewness of data distribution of each variable.

Table 2 . Summary of statistical parameters of each variable.
Portland cement are C 3 S, C 2 S, C 3 A and C 4 AF with theoretical stoichiometries; all Fe 2 O 3 present reacts with Al 2 O 3 and CaO to turn into C 4 AF; the remaining Al 2 O 3 reacts with the CaO to produce C 3 A; the remaining CaO reacts with SiO 2