Acessibilidade / Reportar erro

Econometric model of iron ore through principal component analysis and multiple linear regression

Abstract

Price of iron ore is affected by instabilities of microeconomic balance between supply and demand. Periods of equilibrium adjustment result in huge swings, growth or global recession. They also impact the viability of mineral enterprises and generate consequences to important global economic scenarios. This research aims to evaluate the market variables capable of influencing the price of iron ore through multivariate statistical techniques. Principal component analysis and multiple linear regression, booth multivariate statistical techniques were used. The studied variables were rate export of iron ore and concentrates from Brazil, steel production from China, steel production from Japan, production from Europe, steel production from the United States, steel production from India, steel price, coal price, China’s Construction Gross Domestic Production, United States construction index, oil price and global oil production. First three components explained 89.12% of the variability of the data matrix. Multiple linear regression highlighted the significance of five variables. They are export iron ore from Brazil, steel production from China, price of coal, steel production from India and price of steel.

Key words
multivariate statistics; iron ore; principal component analysis; multiple linear regression; mineral economy

INTRODUCTION

Iron ore pricing system has been updated over the past few decades. For a long time, global iron ore market has presented stability in commodity demand. An agreement between buyers and producers created a benchmark in annual price transactions in the 1970s (Gaggiato 2014GAGGIATO VC. 2014. Do aço Ao Minério: Um Novo Modelo de Avalição Da Oferta e Demanda Global e Precificação de Minério de Ferro. Universidade Federal de Minas Gerais. ). Long-term agreement between major steel makers and mining companies determined the iron ore price for that period. This pricing system is known as the Benchmark Price System or Reference Price System.

An important aspect responsible to change the ore price scenario in global market was the process of growth of mining industry, in relation to the stagnation of steel industry and the shift of the world buyer center from Europe and Japan to China. Gradually China is increasing its demand for iron ore and, consequently, the country contributed to increasing the price of the commodity.

Economic viability of mining enterprises directly depends on market price, quality and extent of the reserves. The increasing of iron ore prices made possible the entry of new suppliers in the market, which offered products with quality characteristics different from the traditional commercialized ores. Then, new trading systems and a growing market negotiated via spot prices with daily variations became a reality and came to compete with the reference price system (Gaggiato 2014GAGGIATO VC. 2014. Do aço Ao Minério: Um Novo Modelo de Avalição Da Oferta e Demanda Global e Precificação de Minério de Ferro. Universidade Federal de Minas Gerais. ). Benchmark Price System has been ended and replaced by a others systems, which are generally based on the spot prices traded on the Chinese market.

The use of statistics focused on economic context is named econometrics. Econometrics consists of the application of mathematical and statistical methods to problems of economics in order to verify hypotheses and predict future trends (Hoffmann 2016HOFFMANN R. 2016. Análise de regressão: uma introdução á econometria.). Use of an accurate methodology is essential to anticipate more assertive economic measures based on market fluctuations. The prerequisites for econometrics are basic concepts of statistical estimations including on sampling procedures, estimators, confidence intervals and hypothesis tests, non-parametric statistics (Biage 2012BIAGE M. 2012. Estatística Econômica e Introdução á Econometria. 3rd ed. Florianópolis: Departamento de Ciências Econômicas/UFSC, 79-97 p.). Regression analysis is one the most used econometric technique. In the present research, the multiple linear regression method was adopted.

Studies conducted by Wårell investigated the impact on the econometric model in the price regime change of iron ore based on monthly data from different periods between 2003 and 2017 (WRell 2018WRELL L. 2018. An Analysis of Iron Ore Prices During the Latest Commodity Boom. Mineral Economics 31(1): 203–216. ). The author used linear multiple regression in its analyses and he presented important conclusions, such as the great influence of the Chinese GDP growth on the price of iron ore, considering the analyzed period.

Most of the researches related to price prevision are grouded on economic principle named “Ceteris Paribus”. This principle assumes that the economic instability factors are constant and defined by a average over a time interval. The researchers usually consider that the instability is constant because is more ease to modeling and calibrating the variables. According to (Alameer 2020ALAMEER Z. 2020. Multistep-Ahead Forecasting of Coal Prices Using a Hybrid Deep Learning Model. Resources Policy 65. URL https://doi.org/10.1016/j.resourpol.2020.101588. 2021-02-24. ) and (Li et al. 2020LI D, MOGHADDAM MR, MONJEZI M, JAHED ARMAGHANI D & MEHRDANESH A. 2020. Development of a Group Method of Data Handling Technique to Forecast Iron Ore Price. Applied Sciences 10(7): 2364. ), it is more accurate to select variables capable of impacting the price of the commodity using principal component analysis. The system developed by (Alameer 2020ALAMEER Z. 2020. Multistep-Ahead Forecasting of Coal Prices Using a Hybrid Deep Learning Model. Resources Policy 65. URL https://doi.org/10.1016/j.resourpol.2020.101588. 2021-02-24. ) based on neural networks showed greater accuracy than the time series model for coal.

In this research, a dataset was built and multivariate statistical methods were performed in this research in order to carry out a econometric analysis and investigate the influence of selected variables in iron ore price, considering the years 1991 to 2020. A predictive model was created using appropriate techniques. Statistics is an excellent tool for data collection and data analysis. These methods are very widespread and support scientific research in different areas.

MATERIALS AND METHODS

R Software Version 4.1.2 (Team 2013TEAM RC. 2013. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.) was used to carry out principal components analysis (PCA) and multiple linear regression (MLR) in the dataset. The package used for booth technique application was ‘stats’, which is part of R Software (Team 2013TEAM RC. 2013. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.). The dataset was constructed and it was verified the need of data standadization througth boxplot analysis. Exploratory analysis and correlation matrix was defined in order to understanding and identifying the relationship between the variables of the data.

Barlett’s Test was carried out in order to verify if there is sufficient correction between the data for the application of multivariate statistical techniques (Bartlett 1951BARTLETT MS. 1951. The Effect of Standardization on a X2 Approximation in Factor Analysis. Biometrika 38(3/4): 337-344. URL http://www.jstor.org/stable/2332580. 2022-09-25.). Then, PCA was carried out with the due of understand the interdependence between the variables. Kaiser’s criterion determined the number of principal components retained in the analysis (Kaiser 1970KAISER HF. 1970. A Second Generation Little Jiffy. Psychometrika 35: 401–415. ).

Principal component analysis was used to determining the variables that did not have a great impact on the iron ore price variable (Hotelling 1933HOTELLING H. 1933. Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6): 417-441, 498-520.). These variables were removed before performing MRL analysis (Hair Jr Joseph et al. 2009HAIR JR JOSEPH F, BLACK WILLIAM C, BABIN BARRY J & ANDERSON ROLPH E. 2009. Multivariate Data Analysis. ) . Multivariate outliers were determined in order to to improve the obtained result (Filzmoser 2004FILZMOSER P. 2004. A Multivariate Outlier Detection Method. In: Proceedings of the Seventh International Conference on Computer Data Analysis and Modeling, Vol. 1. Minsk: Belarusian State University. ). The values identified were removed from the dataset and multiple regression model was defined and proposed.

In order to validate the model, the residual values, linearity, residual homoscedasticity, residual normality and model accuracy was obtained a analyzed. The most significant variables for prediction of the dependent variable were determined.

The Dataset

The dataset uses information from the Trading Economics website (Trading 2021TRADING E. 2021. tradingeconomics. URL https://tradingeconomics.com/. 2021/02/09.
https://tradingeconomics.com/...
). The platform provides accurate information for 232 countries, including historical data for more than 300,000 economic indicators, exchange rates, stock market indices, government bond yields and commodity prices. The data are based on official sources and are regularly checked for inconsistencies. Table I and Table II preset the consolidated dataset used in this research.

Table I
Dataset 1993 - 2020.
Table II
Dataset 1993 - 2020.

Independent variables were used to analyze the influence on the annual average value of the iron ore price (Iron_Price), from 1991 to 2020, in US dollars per dry metric ton. The dependent variables are average annual value of iron ore and concentrated exports from Brazil with 62% content (USD) (Br_Iron_Exp); steel production (t) in China (China_Steel), India (India_Steel), Japan (Japan _Steel), Europe (Euro_Steel) and the United States (USA_Steel); annual average value of prices (USDt) of steel (Steel_Price), coal (Coal_Price) and oil (Oil_Price); construction GDP in China (GDPCCCN) (CNY) (China_GDP); US Construction Index (ICCUS) (%) (USA_Constr) and global oil production (bbld) (Glob_Oil_P-rod).

The variables were selected according to its ability to influence the price of iron ore and they were mainly considered by the World Bank’s an Econometric Model of the Iron Ore Industry in 1987. To built this data set, Trading Economics website (Trading 2021TRADING E. 2021. tradingeconomics. URL https://tradingeconomics.com/. 2021/02/09.
https://tradingeconomics.com/...
) were used.

Multivariate analysis techniques

The statistical methods, regarding the analysis of variables, are divided in two statistical areas: univariate statistics (analysis of variables one by one) and multivariate statistics (joint analysis of the variables) (Vicini 2005VICINI L. 2005. Análise Multivariada Da Teoria à Prática. Santa Maria: UFSM/CCNE. ). Multivariate statistics allows simultaneous investigation of multiple variables, considering each sample element. All variables should be random and correlated and this type of technique provides way of evaluating information, which cannot be obtained and interpreted with the use of uninivariate statistical techniques (Hair Jr Joseph et al. 2009HAIR JR JOSEPH F, BLACK WILLIAM C, BABIN BARRY J & ANDERSON ROLPH E. 2009. Multivariate Data Analysis. ). Principal component analysis (PCA) and multiple linear regression (MLR) are multivariate statistical techniques.

Determination of multivariate outliers

An outlier is an observation so different from other observations. It promotes suspicions that was generated by a distinct mechanism (Enderlein 1987ENDERLEIN G. 1987. Hawkins, d. M.: Identification of Outliers. Chapman and Hall, London – New York 1980, 188 s., £ 14, 50. Biometrical Journal 29: 188p. ). According to (Krige & Magri 1982KRIGE DG & MAGRI EJ. 1982. Studies of the Effects of Outliers and Data Transformation on Variogram Estimates for a Base Metal and a Gold Ore Body. Journal of the International Association for Mathematical Geology 14(6): 557–564. ), the outliers approach is pointed out by:

  1. Use of statistics such as probability charts, histograms and scatterplots;

  2. Validation of the sample context considering the domain according to its support and neighborhood. To decide if it really is an anomalous value and needs to be modified or removed;

  3. Validation of the possibility of human error in the transcription of the sample value by checking the history of the sample;

  4. Validate if the outlier belongs to a previously stipulated confidence interval. If the outlier deviates by a factor close to twice the non-outlier value it is appropriate to remove it from the statistics calculations.

The outliers can negatively influence the analysis and interpretation of the data matrix, therefore its identification is necessary. It can be eliminated depending on the purpose of the analysis and the researcher’s experience.

Mahalanobis distance is most widely used for multivariate outlier detection. The distance is calculated from the ith sample element into the average of the data, given by Equation (1).

MDi=(xix)S1(xix),(1)
where xi is the ith element sample x is vector of means (average) and S is the matrix of variances and convariaces of dataset X.

The distance of the sample elements follows chisquare distribution, with p (number of variables) degrees of freedom. Multivariate outliers are defined as measures that exceed a certain amount of the chisquare distribution (Valadares et al. 2012VALADARES FG, AQUINO A & RABELO R. 2012. Detecção de Outliers Multivariados Em Redes de Sensores Sem Fio. In: XLIV Simpósio Brasileiro de Pesquisa Operacional. SBPO. ).

Principal component analysis

Principal component analysis (PCA) is a multivariate statistical method capable of explaining the interdependence between the variables and reducing the dimensionally of the data (Varella 2008VARELLA C. 2008. Análise de Componentes Principais. Rio de Janeiro: Instituto de Agronomia, Universidade Federal Rural do Rio de Janeiro/UFRJ. ). The principal components shall ensure variance similar to the original variables so as to accurately represent the information contained. The technique consists of converting the original variables into new variables named principal components. The principal composts are linear combinations of the original variables (Bouroche & Saporta 1982BOUROCHE J & SAPORTA G. 1982. Análise de Dados. Rio de janeiro: Zahar Editores. ), see Equation (2).

PCi=etiX=ei1X1+ei2X2++eipXp(2)
Where PCi is ith the principal component (i = 1, 2, ..., p); eit is the transposed eigenvector of the data correlation matrix; X is the vector of the original variables.

The variance associated to each principal component is represented by the associated eigenvalue. The proportion of explained variance of each principal component ins given by Equation (3) .

Pi=λii=1pλi(3)
Where Pi is the proportion of total variance explained by the ith principal component; p is the number of variables and λ i is the ith eigenvalue.

The dimensionality of the problem can be archived by discading the principal components with low proportion of variance explained. Kaiser criterion can be used to define the number of retained principal components.

The function used to carry out Principal Component Analysis in R Software Version 4.1.2 was primcomp from Package Stats Version 3.6.2 (Team 2013TEAM RC. 2013. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.)

Multiple linear regression

Multiple linear regression is a dependence multivariate statics method method. It is capable of describing the linear relationship between predictive variables (independent variables) with a quantitative response variable (dependent variable) (Hair Jr Joseph et al. 2009HAIR JR JOSEPH F, BLACK WILLIAM C, BABIN BARRY J & ANDERSON ROLPH E. 2009. Multivariate Data Analysis. ). The result is a model that can reasonably predict future situations. The mode is given by Equation (4).

Y=β0+i=1pβixi+ε=β0+β1x1+β2x2++βpxp+ε(4)
Where Y the dependent variable; 𝛽0 the intercept; 𝛽i the partial regression coefficient i; xi the independent or predictive variables; p the number of variables and ε the error.

The analysis of residues in the multiple linear regression assesses the adequacy of the model. The waste is the difference between the expected value and the actual value, a suitable model has residues with a normal and average waste distribution close to zero.

The function used to carry out Multiple Linear Regression in R Software Version 4.1.2 was lm from Package Stats Version 3.6.2 (Team 2013TEAM RC. 2013. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.).

RESULTS AND DISCUSSION

The dataset is composed by 30 samples 13 numerical variables. Figure 1 shows the boxplot of these variables. The objective of boxplot analysis in this research is verify the scale and distribution of the data. The variables present different scales and distributions, maximum and minimum values of the variables differ significantly. Higher values of steel production in China can be noted in Figure 1.

Figure 1
Boxplot of the original and stadardized variables.

The data variability shown in Figure 1 suggests the use of the correlation matrix in PCA analysis. It was necessary to establish a standardized covariance pattern, because the difference in data variability can influence the interpretation of the contained information in case of PCA and MLR analysis. Figure 1 presents the boxplot of original data and standardized data.

Statistical exploratory analysis were carried out for each variable of the dataset, see Table III.

Table III
Basic statistic of the data set.

Bartlett’s Test was perforned in the dataset and presented a p-value of 1.74 x (10)-98, as p-value is below 5%, the null hypothesis is rejected. The result suggest that there are significant correlations in the data variables. Figure 2 shows the scatterplot of the data and the values of the correlations between the variables.

Figure 2
Scatterplot of Dataset.

Significative linear correlations between the variables and the variable iron ore price were are presented in Table IV. The exception is steel production (t) in United States (USA_steel), the only variable with linear correlation less than 0.30. United States steel production presented a weak negative correlation with iron ore price. The negative correlation cam be especially associated to USA lost of market share to China and India. United States construction presented significant value of negative correlation with iron price. It is known that the index loses strength when the price of iron ore increases.

Table IV
Correlations between the variables with Iron Ore Price.

Principal component analysis was performed in the standardized dataset. According to Kaiser criterion, the components with eigenvalues greater than 1 must to be retained in the analysis. The three fist principal components were retained.

The proportion of each principal component explains of the original data variance and the cumulative proportion are shown in Table V. The values found for principal components 1, 2 and 3 are 59.80%, 18.95% and 10.36% and they represents 89.12% of the total variability of the original data matrix.

Table V
Proportion of each principal component explains of the original data variance.

The principal components are capable of represent the original variables. The high value of variability in the first two principal components represents strong interdependence between the variables and evidence the oligopolistic behavior of the iron ore market. An oligopolistic behavior is determined by a narrow group of countries.

The variable loadings correspond to the importance of each variable in each principal component. Table VI presents the results of loadings for the three-first principal components.

Table VI
Loadings for principal components 1, 2 and 3.

Figure 3 presents the biplot graph. Biplot presents the two first principal components, that explain 78.75% of the original data variance. A clear change of market behavior is observed from year of 2010.

Figure 3
Biplot of two first principal components.

Most of variables are in the same way of iron price, see Figure 3. Iron price have a rising tend throughout the studied historical series. The variable USA Construction In-dex (ICC-US) (%) (USA_Constr) presents a contrary behavior in relation to iron price. This phenomenon is justified by the increasing of the iron price, which generates a decreasing of the urge to build in USA. But the vector of this variable has small magnitude, then the variable does not have a great influence in the iron ore price to this historical series.

A real estate bubble occurred in United States in year of 2008. Thenceforward, a worldwide financial crisis occurred and steel production from Japan and Europe retracted with small positive oscillations (Trading 2021TRADING E. 2021. tradingeconomics. URL https://tradingeconomics.com/. 2021/02/09.
https://tradingeconomics.com/...
). In year of 2020, Japan had the worst performance within the analyzed historical series and Europe had the third worst production since 1991. The poor performance of then can be explained by the global context associated to Covid19 pandemic. Figure 3 shows the steel price from Japan and Europe. They partly follows the behavior of steel production from China and India.

United States also suffered the effects of the crisis of 2008. Its steel production (USA_Steel) decreased almost 40% between years of 2007 and 2009 (Trading 2021TRADING E. 2021. tradingeconomics. URL https://tradingeconomics.com/. 2021/02/09.
https://tradingeconomics.com/...
). Recovery measures and a protectionist policy were put into practice by the country, including with surcharges on imported steel. However, the production levels resumed to levels observed in the early 1990s. In addition, the country registered the second lowest steel production in the historical series in year of 2020, because of Covid-19 pandemic. Then this variable has a contrary behavior to iron ore price.

Biplot graph shows two district clouds of sample elements (years), see Figure 3. Years of the 1990s and early 2000s followed the behavior of steel production in the United States. Years of the first decade of the 21st century are in an intermediate position, while the years of the decade that started in 2010 follow the accelerated behavior of growth of Chinese production.

In order to carry out multiple linear regression analysis, the variables US Construction Index (ICC-US) (%) (USA_Constr) , steel production (t) from Japan (Japan_Steel), steel production (t) from Europe (Euro_Steel) and steel production (t) from United States (USA_Steel) were removed. The decision of removing these variables was based on the lower interdependence with the dependent variable iron ore price. Before performance of multiple regression analysis, multivariate outlier detection was carried out. Tree outliers were detected: 2004, 2008 and 2020.

Before performance of multiple regression analysis, multivariate outlier detection was carried out. Tree outliers were detected: 2004, 2008 and 2020.

In year of 2004, China registered the highest inflation since 1997. With the objective control the economic growth, the Chinese Government regulated tax increases, since this event would make it difficult to pay the debts of public companies. The implemented measures included restrictions on credit and investment projects, especially in the real estate market and automobile industry, sectors that essentially depend on steel and on iron ore.

According to (Trevisan 2004TREVISAN C. 2004. Indústria superaquecida preocupa a China. São Paulo: Folha de S. Paulo. URL https://www1.folha.uol.com.br/folha/dinheiro/ult91u82949.shtml. 2021/02/16.
https://www1.folha.uol.com.br/folha/dinh...
), the growing Chinese demand for raw materials affected the prices of some products around the world. In year of 2003, the country consumed 30% of the global steel production, influencing the price of the product on the international market. This scenario could be repeated within 2 years after the end of the COVID-19 pandemic. Probably, the Chinese government will try to control the inflation caused by the large issuance of paper money. Currently, it is estimated that 20% of the dollar in circulation was issued in 2020, this being a historic record. Consequently, several countries may adopt measures to rescue their economics.

Inflation of Chinese economy also marked the year of 2008. Consumer Price Index (CPI) is used to measure inflation trends and it achieved 8.7%, representing the biggest increase in the last twelve years. Then, Chinese government invested in containing price increases, maintaining the stable economic growth, active fiscal policy and relatively open monetary policy. At the end of 2008, China has injected about $ 586 billion to stimulate the economy. In addition, the country changed the agreement system, which took advantage of its monopoly to change longterm to mediumterm agreements. Besides, a strong global financial crisis directly affected commodity prices in this year.

Year of 2020 was noticeable by the Covid-19 pandemic. Brazil increased by 2% the volume exported in mineral products in 2020 over 2019, according to data released by the Brazilian Mining Institute (IBRAM 2021IBRAM. 2021. Mineração industrial brasileira fecha 2020 com desempenho positivo. Belo Horizonte. URL https://ibram.org.br/noticia/mineracao-industrialbrasileira-fecha-2020-com-desempenho-positivo/. 2021-02-25.
https://ibram.org.br/noticia/mineracao-i...
). In the context of iron ore trade between Brazil and China, the Asian country reinforced its position as the main destination for Brazilian iron ore. In 2019, the Asian country accounted for 62% of exports. In 2020, this percentage rose to 72% (ANBA 2020ANBA. 2020. (Câmara de Comércio Árabe Brasileira) , Mineração Brasileira Aumentou Exportação Em 2020. URL https://anba.com.br/mineracao-brasileira-aumentouexportacao-em-2020/. 2021-02-24.).

The outliers were removed and the multiple linear regression model was obtained. The model is given by Equation (5).

Iron_price=1,382(102)+8,3910(104)Br_Iron_Exp+4,508(104)China_Steel5,457(103)India_Steel+4,693(102)Steel_Price+9,242(101)Coal_Price+1,018(104)China_GDP+1,409(101)Oil_Price+3,205Glob_Oil_Prod+ε(5)

The residuals consist of the difference between the predicted value and the actual value. The model presented a median equal to -1.432, with minimum value and maxi-mum value equal to -13.215 and 21.741, respectively. Considering this interval, the residuals approach to zero, indicating a good adequacy of the model. The most significant variables in the determination of iron ore price are average annual value of iron ore and concentrated exports from Brazil with 62% content (USD) (Br_Iron_Exp), steel production (t) in China (China_Steel) and annual average value of coal price (Coal_Price), since they presented pvalues less tha 0.05. The significance was measured using the QR decomposition method of resolution for square parameters.

Adjusted R-squared consists of a measure of explanatory power of regression models. The obtained model presented an adjusted R-squared equal to 94.2%, indicating an excellent adequacy of the model.

ANOVA is a statistic in which the variance of a set of observations of adjusted model is analyzed. It was used to made a commentary analysis of variable significance. Table VII presents the results of ANOVA.

Table VII
Loadings for principal components 1 and 2.

ANOVA points out average annual value of iron ore and concentrated exports from Brazil with 62% content (USD) (Br_Iron_Exp), steel production (t) in India (India_Steel), annual average value of prices (USD/t) of steel (Steel_Price) and annual average value of coal price (USD/t) (Coal_Price) as significant predictor variables in the model. These variables are the variables that have the greatest weight in principal component 1.

India is one of the countries with fastest economic growing of the world. Infrastructure and automobile sectors have increased its demand for steel year after year and Indian government promotes incentives to steel industry through investments and political reforms.

According to (T&A 2021T&A. 2021. Consulting. URL https://investexportbrasil.dpr.gov.br/arquivos/PesquisasMercado/PMRIndiaIndustriaSiderurgica2017.pdf. 2021/02/16.
https://investexportbrasil.dpr.gov.br/ar...
), Indian steel production has been growing since its independence. The country gained a prominent position in the global steel landscape due to the establishment of a new state-of-the-art steel plants, the modernization of older plants, the incentive of energy-efficient technologies and retroactive integration to global raw material sources.

Year of 2018, India overtook Japan in steel production raking and became the second largest producing country of the world, only behind China. However, like the others countries, its steel industry was also impacted by Covid-19 pandemic in year of 2020. It is expected that the country will double the current average production until 2031. Figure 4 shows the annual production of steel of the main countries.

Figure 4
Annual steel production of the main producers.

China has the fastest growing economy of the world, with an average GDP growth of 9.28% over the last 30 years. On the other hand, the United States, the world’s largest economy, has an average GPD equal to 2.30% in the same period.

According to (Nonnenberg 2010NONNENBERG MJB. 2010. China: Estabilidade e Crescimento Econômico. Brazilian Journal of Political Economy 30: 201–218. ), the rise of the Chinese economy is due to multiple factors, including: liberalization process of the price formation system; liberalization of foreign trade; creation of Special Economic Zones (SEZs); absence of intellectual property protection; existence of economies of scale thanks to the gigantic population; existence of a large contingent of low-wage labor; growth of Foreign Direct Investments (FDIs); policies to encourage innovation and transfer and generation of science and technology. Chinese GDP has been boostered by the construction and industry sector.

China has been the largest steel producer of the world in the last two decades. In year of 2015, the Chinese production retracted, because steelmakers were forced to make production cuts due to the decrease of demand, growing losses (mainly motivated by the lowest levels of steel prices in decades) and credit banking services with more restrictions.

In year of 2020, it was the only country among the large producers that increased the steel production, growing of 5.8% when compared to 2019.

Steel is one of the principal components of Chinese civil construction. In year of 2017, the country had more than 300,000 construction companies. The value-added production of the sector represented 3.8% of the Chinese Gross Domestic Product (GDP) in 1978, a rate that rose to 6.7% in 2017, according to China National Bureau of Statistics (Portugueses 2018PORTUGUESES P. 2018. Setor de construção da China registra rápido crescimento desde 1978. Beijing: O Diário do Povo Online URL http://portuguese.people.com.cn/n3/2018/0910/c309806-9498945.html. 2021-02-16.
http://portuguese.people.com.cn/n3/2018/...
).

Steel is manufactured using iron ore, coal and lime. Brazil is the second largest global exporter of iron ore and also ranks the position of reserves. In 2019, the export of iron ore had a FOB (Free On Board) value equal to US$ 21.8 billion. In the same year, iron ore occupied the third position in the ranking of the most exported products, behind only soy and oil. China is the main buyer of Brazilian iron ore, accounting 59% of Brazilian exports in 2019. The country is the largest consumer of the commodity in the world and it is among the three largest producers in the world, behind Australia and Brazil.

Coke is a product from mineral coal it is used by steel industry. Thus, the steel industry is largely dependent on coal. Its price fluctuates due to global supply and demand, in addition to production costs. The biggest consumers are China (responsible for half of the world demand), United States and India. In year of 2030, it is estimated that China and the India will account for 60% of the world demand for coal (Rodrigues 2009RODRIGUES AFS. 2009. Economia Mineral do Brasil. Série Estatísticas e Economia Mineral. Brasília: DNPM/MME, 764 p. URL https://www.gov.br/anm/pt-br/centrais-deconteudo/publicacoes/serie-estatisticas-e-economiamineral/outras-publicacoes-1/2-2-carvao. 2021/02/19.).

The discussion above allows comprehend the importance of these independent variables, which are the most important variables of principal component 1 and most significant variables in definition of iron ore price.

Figure 5 presents a constant variance of the experimental errors (homocedasticy) and a non-tendency of the residuals for different samples, which confirms that the model has a good fit. The residuals presented a approximately normal distribution, see Figure 5, which is a indication of a good model fit. The normality was confirmed by Shappiro’s Normality test, with a p-value equal to 0.1198.

Figure 5
Homocedasticy and histogram of residuals.

CONCLUSIONS

Multivariate analysis allows a grounded study of the relation between economic variables in iron ore context.

The three first principal components are capable of explaining 89.12% of the data variability. Besides, Biplot graph allows visually verify the behavior of the variables in relation to iron ore price.

Linear multiple technique allows define the most significative variables in the variation of iron ore price. They are average annual value of iron ore and concentrated exports from Brazil with 62% content (USD) (Br_Iron_Exp), steel production (t) in China (China_Steel), annual average value of prices (USD/t) of coal (Coal_Price), steel production (t) in India (India_Steel), and annual average value of prices (USD/t) of steel (Steel_Price).

This study demonstrates the relevance of China in the international iron ore market. The country is increasing its production to try to restraint the increase in the commodity prices. For this reason, there is a tendency to reduce prices in the coming years.

India was more relevant than expected. Despite it being a notable steel producer, it was not expected to the country presents more relevant significance than other variables. On the other hand, despite the great influence of the United States economy on various sectors of the world economy, it did not show great relevance in the price of iron ore.

Furthermore, the low influence of annual average value of oil price in iron ore price was not expected, considering principal component analysis and multiple linear regression.

A strong influence of Brazilian iron ore production in the international market was defined. Although the country’s exports have a strong link with China and India demands, the reduction in Brazilian production would create a scenario of rising commodity prices. The model created through multiple linear regression, could be used to future predict the iron ore price, once the independent variables can be known or estimated.

ACKNOWLEDGMENTS

The authors wish to thank Mining Department from Universidade Federal de Ouro Preto (UFOP) and Post-graduation Program in Mineral Engineering (PPGEM – UFOP). Besides, the authors are grateful for the collaboration of the Research Group on Data Science in Engineering (CIDENG).

  • ALAMEER Z. 2020. Multistep-Ahead Forecasting of Coal Prices Using a Hybrid Deep Learning Model. Resources Policy 65. URL https://doi.org/10.1016/j.resourpol.2020.101588. 2021-02-24.
  • ANBA. 2020. (Câmara de Comércio Árabe Brasileira) , Mineração Brasileira Aumentou Exportação Em 2020. URL https://anba.com.br/mineracao-brasileira-aumentouexportacao-em-2020/. 2021-02-24.
  • BARTLETT MS. 1951. The Effect of Standardization on a X2 Approximation in Factor Analysis. Biometrika 38(3/4): 337-344. URL http://www.jstor.org/stable/2332580. 2022-09-25.
  • BIAGE M. 2012. Estatística Econômica e Introdução á Econometria. 3rd ed. Florianópolis: Departamento de Ciências Econômicas/UFSC, 79-97 p.
  • BOUROCHE J & SAPORTA G. 1982. Análise de Dados. Rio de janeiro: Zahar Editores.
  • ENDERLEIN G. 1987. Hawkins, d. M.: Identification of Outliers. Chapman and Hall, London – New York 1980, 188 s., £ 14, 50. Biometrical Journal 29: 188p.
  • FILZMOSER P. 2004. A Multivariate Outlier Detection Method. In: Proceedings of the Seventh International Conference on Computer Data Analysis and Modeling, Vol. 1. Minsk: Belarusian State University.
  • GAGGIATO VC. 2014. Do aço Ao Minério: Um Novo Modelo de Avalição Da Oferta e Demanda Global e Precificação de Minério de Ferro. Universidade Federal de Minas Gerais.
  • HAIR JR JOSEPH F, BLACK WILLIAM C, BABIN BARRY J & ANDERSON ROLPH E. 2009. Multivariate Data Analysis.
  • HOFFMANN R. 2016. Análise de regressão: uma introdução á econometria.
  • HOTELLING H. 1933. Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6): 417-441, 498-520.
  • IBRAM. 2021. Mineração industrial brasileira fecha 2020 com desempenho positivo. Belo Horizonte. URL https://ibram.org.br/noticia/mineracao-industrialbrasileira-fecha-2020-com-desempenho-positivo/ 2021-02-25.
    » https://ibram.org.br/noticia/mineracao-industrialbrasileira-fecha-2020-com-desempenho-positivo/
  • KAISER HF. 1970. A Second Generation Little Jiffy. Psychometrika 35: 401–415.
  • KRIGE DG & MAGRI EJ. 1982. Studies of the Effects of Outliers and Data Transformation on Variogram Estimates for a Base Metal and a Gold Ore Body. Journal of the International Association for Mathematical Geology 14(6): 557–564.
  • LI D, MOGHADDAM MR, MONJEZI M, JAHED ARMAGHANI D & MEHRDANESH A. 2020. Development of a Group Method of Data Handling Technique to Forecast Iron Ore Price. Applied Sciences 10(7): 2364.
  • NONNENBERG MJB. 2010. China: Estabilidade e Crescimento Econômico. Brazilian Journal of Political Economy 30: 201–218.
  • PORTUGUESES P. 2018. Setor de construção da China registra rápido crescimento desde 1978. Beijing: O Diário do Povo Online URL http://portuguese.people.com.cn/n3/2018/0910/c309806-9498945.html 2021-02-16.
    » http://portuguese.people.com.cn/n3/2018/0910/c309806-9498945.html
  • RODRIGUES AFS. 2009. Economia Mineral do Brasil. Série Estatísticas e Economia Mineral. Brasília: DNPM/MME, 764 p. URL https://www.gov.br/anm/pt-br/centrais-deconteudo/publicacoes/serie-estatisticas-e-economiamineral/outras-publicacoes-1/2-2-carvao. 2021/02/19.
  • T&A. 2021. Consulting. URL https://investexportbrasil.dpr.gov.br/arquivos/PesquisasMercado/PMRIndiaIndustriaSiderurgica2017.pdf 2021/02/16.
    » https://investexportbrasil.dpr.gov.br/arquivos/PesquisasMercado/PMRIndiaIndustriaSiderurgica2017.pdf
  • TEAM RC. 2013. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
  • TRADING E. 2021. tradingeconomics. URL https://tradingeconomics.com/ 2021/02/09.
    » https://tradingeconomics.com/
  • TREVISAN C. 2004. Indústria superaquecida preocupa a China. São Paulo: Folha de S. Paulo. URL https://www1.folha.uol.com.br/folha/dinheiro/ult91u82949.shtml 2021/02/16.
    » https://www1.folha.uol.com.br/folha/dinheiro/ult91u82949.shtml
  • VALADARES FG, AQUINO A & RABELO R. 2012. Detecção de Outliers Multivariados Em Redes de Sensores Sem Fio. In: XLIV Simpósio Brasileiro de Pesquisa Operacional. SBPO.
  • VARELLA C. 2008. Análise de Componentes Principais. Rio de Janeiro: Instituto de Agronomia, Universidade Federal Rural do Rio de Janeiro/UFRJ.
  • VICINI L. 2005. Análise Multivariada Da Teoria à Prática. Santa Maria: UFSM/CCNE.
  • WRELL L. 2018. An Analysis of Iron Ore Prices During the Latest Commodity Boom. Mineral Economics 31(1): 203–216.

Publication Dates

  • Publication in this collection
    08 May 2023
  • Date of issue
    2023

History

  • Received
    27 Oct 2021
  • Accepted
    5 Feb 2022
Academia Brasileira de Ciências Rua Anfilófio de Carvalho, 29, 3º andar, 20030-060 Rio de Janeiro RJ Brasil, Tel: +55 21 3907-8100 - Rio de Janeiro - RJ - Brazil
E-mail: aabc@abc.org.br