1. Introduction

During the development of a mining project, in which there is mineral or metal processing to concentrate the metal of interest, it is important to know the geometallurgical response from the ore at the plant, as this is the main factor affecting mineral chain profitability. This information helps to maximize the economic benefit of the project through the integration of geological knowledge, mining, mineral processing, environmental controls and the market.

The metallurgical recovery information can be obtained experimentally at a pilot plant or by lab tests. Thus, representative ore samples are subjected to tests mimicking the ore concentration processing flow. These experiments demand time and are costly. One way to reduce the cost of these experiments is to identify chemical components currently analyzed in the ore with possible correlation with the metallurgical recovery. These correlated variables can be used to build a multivariate regression model (linear or nonlinear) to explain the response variable, i.e. metallurgical recovery.

To build and use models to predict and explain a phenomenon accurately, a careful selection is initially necessary to define the most significant variables that will explain in part, or in whole, the system's behavior. Although many variables and parameters is necessary to predict a response variable, a small number of variables, in general, explains much of it. Thus, the first stage of a modeling process is to identify the right variables and the relationships among them.

Once the importance of modeling the geometallurgical variable is contextualized, it is proposed to investigate the relationships among the variables to explain the ore response at the plant given statistics and additive variables (grades for instance). Therefore, it is necessary to measure the variables that are considered relevant to the understanding of the analyzed phenomenon. Many are the difficulties in converting the information obtained into knowledge, especially when it comes to the statistical evaluation of such information.

Using chemometrics to calculate the metallurgical recovery can contribute to reduce the quantity of experimental analyses of this variable, which will directly influence to reduce costs with the metallurgical characterization of the ore samples.

In this article, the development of a regression model for calculating the metallurgical niobium recovery at various stages of the production process is proposed. The results of the application are validated against the laboratory test data.

The chemical variables used in the response model are herein referred to as *X _{1}, X_{2}, X_{3}, ..., X_{9}* for the sake of confidentiality.

2. Material and methods

The data set considered in this study came from a drill hole campaign known as FSA. The samples obtained in this campaign were tested in the lab mimicking the processing flowchart and these samples were also chemically analyzed.

Cores from the drill holes were logged and the important geological domains were separated into: Soil; Orange weathered ore; Brown weathered ore; Saphrolite and fresh Rock (carbonatite).

Among the five geological domains mentioned, only one has enough data to be subjected to a multivariate analysis and to provide a meaningful response. This domain is the Orange weathered ore, which is the ore mined at the site.

Principal Component Analysis (PCA) helped in defining which chemical components should be considered in the multivariate regression. This mathematical procedure uses an orthogonal transformation to convert a set of observations possibly correlated to a set of linear uncorrelated variables known as principal components. This transformation is defined so that the first principal component has the greatest variance, and each subsequent component has the maximum variance under the constraint of being orthogonal to the previous components (^{Pearson, 1901}).

The first stage was the selection of which variables better explain the variance of the response (met recovery). Although many variables and parameters are necessary to predict the niobium metallurgical recovery, PCA analysis showed that five chemical variables (*X _{1}, X_{2}, X_{3}, X_{4}* and

*X*) can explain much of it. So, the other four variables (

_{5}*X*, and

_{6}, X_{7}, X_{8}*X*) were disregarded, as they did not have much to contribute to explain the plant metallurgical recovery. The second order multivariate regression methodology (Response Surface) was used to calculate the metallurgical recovery based on five chemical variables. This statistical methodology is commonly used for modeling and analyzing problems in which the dependent variable is influenced by several factors, and the goal is to optimize this response (

_{9}^{George, 2007}).

The Response Surface may be defined as the geometric representation obtained when a response variable is plotted as a function of two or more quantitative factors. This function can be defined as: *Y*=*f* ( *x _{1}, x_{2},* ...,

*x*)+e, where

_{k}*Y*is the response (dependent variable); and

*x*...,

_{1}, x_{2},*x*are the factors (independent variables); and e is the random error (

_{k}^{George, 2007}).

In its polynomial equation form, the Response Surface is defined as:

where *x _{1}, x_{2}, ..., x_{k}* are the independent variables which influence the response

*Y*; b

_{0}, b

*(*

_{i}*i*= 1, 2, ...,k), b

*(*

_{ij}*j*= 1, 2, ...,

*k*) are the unknown parameters; and e is the random error. (

^{Box and Draper, 1987}).

Among the advantages of Response Surface methodology, the main one is that their results are very consistent with the effects of non-ideal conditions, such as random errors and influential points, since this methodology is robust. Another advantage is the simplicity of obtaining the polynomials from the analytical Response Surface methodology. In general, polynomials of two or more variables are continuous functions.

For multivariate regressions and the principal component analysis MINITAB^{®} software was used, which has a number of statistical tools for both univariate analysis and for multivariate analysis. The software SGeMS was also used to build the maps and the scatter plots (^{Remy et al., 2009}).

3. Results

Among the nine variables considered for the principal component analysis (*X _{1}; X_{2}; X_{3}; X_{4}; X_{5}; X_{6}; X_{7}; X_{8} and X_{9}*), the ones that contributed most to explain the metallurgical recovery were

*X*and

_{1}; X_{2}; X_{3}; X_{4}*X*, which basically reduced four out of the nine initial number of independent variables. In addition to being able to identify which variables contribute most to explain a given phenomenon using PCA, it is also possible to identify which variables are redundant, which means a very high linear correlation from one variable to another.

_{5}The criterion used to make the statement that variables: *X _{1}, X_{2}, X_{3}, X_{4}* and

*X*were the most contributed in this study were based on the Loading Plot analyses, which revealed the relationships among the variables considered in the space of the first two components. Figure 1 shows the plot of the loadings of the variables on the components. Each variable is a point, whose coordinates are given by the loadings on the two principal components. In this case the

_{5}*X*and

_{6}, X_{7}*X*variables have similar low loadings for the first component, which is different from the

_{8}*X*high loading.

_{9}

The correlation between a component and a variable in PCA framework is called loading and it measures the information they share, so factor loadings represent how much a factor explains a variable in factor analysis. It is possible to examine the loading pattern in the Minitab software (^{Minitab, 2014}) factor loading analysis output to determine the factor that has the largest effect on each variable. Some variables might have high loadings on multiple factors. Factor loadings can range from -1 to 1. Loadings close to -1 or 1 indicate that the factor strongly affects the variable. Loadings close to zero indicate that the factor has a weak effect on a specific variable.

The first component is the most important one. The variables (*X _{1}* and

*X*) provide more information, and these two variables are also strongly correlated. The variables

_{4}*X*, and

_{6}, X_{7}*X*are closely associated and they only provide a small amount of information, as they are close to zero in the first component. Variable

_{8}*X*also has a low score in the first component, and differently from the variables

_{9}*X*, and

_{6}, X_{7}*X*, it has high score in the second component.

_{8}According to ^{Abdi and Williams (2010)}, in general, different meanings of 'loadings' lead to equivalent interpretations of the components. This happens because the different types of loadings differ mostly by their type of normalization.

The squared loadings could be also used to interpret the relationship between the variables. The sum of the squared coefficients of correlation between a variable and all the components is equal to 1. This is due to the fact that the squared loadings give the proportion of the variance of the variables explained by the components.

The first multiple regression was performed without excluding outlier values. This regression was made in the MINITAB^{®} software, and it was necessary to define the dependent variable (DCCG) and the independent variables (*X _{1}; X_{2}; X_{3}; X_{4}* and

*X*) to be used.

_{5}Figure 2 shows the result of the first regression performed without the outlier analysis. Note the scatter plots between the dependent variable and independent ones.

The result of the first regression without residual treatment presented a 65.04% coefficient of determination (R^{2}). Treating the residual appropriately can lead to a better result, optimizing the dependent variable response without influencing the equation representativeness in the geological domain considered.

The residue analysis was first performed excluding the samples that did not contain chemical analysis for any independent variable used for regression (*X _{1}; X_{2}; X_{3}; X_{4}* and

*X*), i.e. keeping only an isotopic subset from the data. Missing data in the chemical analysis of all the variables needed leads to an incorrect response model. Then some specific values that were anomalous to the geological domain considered (Orange weathered ore) were excluded. Some cores were verified to understand the reasons of the geological anomaly, in most cases, the anomalies are related to veins composed of different minerals cutting the orange weathered ore. In a last stage, a residual analysis was carried out to improve the fit (R

_{5}^{2}) of the proposed data model. No test was performed to characterize de outliers.

During the residual analysis, 202 samples were excluded, corresponding to 13% of the whole data set, where 1406 samples remained in the data set, which was used to build the regression equation.

Using the graph technique (residuals versus fitted values), the residuals that had standard deviation exceeding 35% were excluded. After the previous procedure, the linear regression coefficient between the calculated metallurgical recovery and metallurgical recovery analyzed was 82.59%. This correlation between these variables is very high, allowing the use of the regression equation to calculate the metallurgical recovery using the regression with an acceptable error.

The regression equation used to calculate the metallurgical recovery, after the residual treatment was defined as:

Figure 3 shows the scatter plot between the analyzed variable (DCCG), obtained from lab tests, and the resultant metallurgical recovery using the multivariate regression equation (SEC-REG).

Figure 4 presents the results of the multiple regression obtained with MINITAB^{®} software. It is possible to see the sequential regression model construction with the contribution of each variable, as well as their paired variable contribution. The variable that contributes most to the model's explanation is *X _{3}*, followed by the variable

*X*, and so on.

_{1}

4. Discussion

The application of Response Surface methodology has been used in the chemical industry, having its foundations formalized by ^{Box and Draper (1987)}. In the agronomic field, it has been used to study the yield, the effect of nutrient levels applied to the soil, including other factors such as planting density and irrigation. Although this method has been very rarely applied in mining, its use has a potential benefit. This methodology is very powerful and has many applications that can help to easily explain many phenomena involved in the mining industry.

^{Montgomery (2001)} wrote that the Response Surface equations can be graphically represented and used in three ways as follows:

Describing how the test variables affect the answers;

Determining the interrelationships between the variables under test; and to

Describe the combined effect of all test variables on the response.

For this specific work, the target was the use of the combined effects of five variables (*X _{1}, X_{2}, X_{3}, X_{4}* and

*X*) to explain the metallurgical recovery of niobium.

_{5}For the effective use of Response Surface, five assumptions should be considered (^{Montgomery, 2001}):

The factors that are critical to the process must be known;

The region where the factors influence the process must be known;

The factors must vary continuously along the chosen experimental group;

There is a mathematical function that relates to the factors measured response;

The answer that is set by this function is a smooth surface.

Before building the regression model by the Response Surface methodology, it was necessary to perform a careful multivariate statistical data exploratory analysis. The results demonstrate it was possible to apply this methodology to find out the effects of combining the chemical variables in the mineral recovery response during the concentration process.

Per ^{Montgomery (2001)}, in the absence of sufficient knowledge about the true Response Surface, the user usually tries the first order model. However, when the first order model is not enough to fit the surface, it is necessary to incorporate higher order terms to improve the model fitted.

The use of first order regression models, with the same data set used in this study, showed that the obtained model was insufficient to explain the behavior of metallurgical recovery from the same five chemical variables. Using this first order regression model, the best correlation obtained was 52% (R^{2}), which has already been improved after outlier exclusion using residual analysis.

Among all the interactions between the pairs of variables, the interaction between *X _{4}* and

*X*stands out for the phenomenon explanation, in addition to the interaction between

_{3}*X*and

_{3}*X*, and

_{1}*X*and

_{5}*X*that also exhibit high covariance coefficients.

_{4}The quadratic contribution of each variable also helps to define the regression model. The quadratic contribution of the *X _{4}* is more significant in this case than the quadratic contribution of

*X*. The quadratic contribution of other variables are very low.

_{3}Figure 5 shows the effect of each independent variable to the response (SEC-REG).

Without the residual analysis, the regression coefficient between the analyzed and calculated metallurgical recovery was 65.04%, which means that the variables considered would explain only 65.04% of the metallurgical recovery. Even after excluding some outliers that were affecting the regression, there were still 61 high value residues; however, they were kept to avoid creation a false impression of high correlation excluding data without well-founded criteria.

Figure 6 shows the data points with large residual (red dots in the plot) and atypical values of the response variable SEC-REG (blue dots).

Figure 7 shows two maps with the drill hole distribution: in the first map, the colored dots represent the drill holes with geometallurgical data from laboratory tests, and the second one, with the calculated geometallurgical data. These two data can be used together to a build simulation model in order to improve the metallurgical recovery prediction.

5. Conclusion

In order to create a model that explains the behavior of the metallurgical recovery, different mathematical regressions were tested. Second order multivariate regression (Response Surface) proved to be suitable for use in modeling and performance analysis of geometallurgical niobium ore, since the last is influenced by several factors.

The application of the obtained regression equation was restricted to samples associated with the geological domain "Orange weathered ore".

Before using the Response Surface methodology, it is important to define which independent variables contribute to explaining the phenomenon of interest, in this case, the metallurgical recovery of niobium.

The chemical interaction between the independent variables considered, as well as their quadratic interactions, contributed greatly to explain the phenomenon of niobium metallurgical recovery. Using the multivariate regression equation obtained, it was possible to calculate the metallurgical recovery for all samples or locations where the chemical variables are known.

The Response Surface methodology to predict the metallurgical niobium recovery provided processing information, having high correlation with the metallurgical recovery dependent data. Using this newly calculated data correctly, it is possible to increase the geometallurgical knowledge and incorporate this information into mine planning.