Artificial Neural Network for Predicting Silicon Content in the Hot Metal Produced in a Blast Furnace Fueled by Metallurgical Coke

2021 The main production route for cast iron and steel is through the blast furnace. The silicon content in cast iron is an important indicator of the thermal condition of a blast furnace. High silicon contents indicate an increase in the furnace’s thermal input and, in some cases, may indicate an excess of coke in the reactor. As coke costs predominate in the production of cast iron, tighter control of the silicon content therefore has economic advantages. The main objective of this article was to design an artificial neural network to predict the silicon content in hot metal, varying the number of neurons in the hidden layer by 10, 20, 25, 30, 40, 50, 75, 100, 125 , 150, 170 and 200 neurons. In general, all neural networks showed excellent results, with the network with 30 neurons showing the best results among the 12 modeled networks. The validation of the models was confirmed using the Mean Square Error (MSE) and Pearson’s correlation coefficient. The cross-validation technique was used to re-evaluate the performance of neural networks. In short, neural networks can be used in practical operations due to the excellent correlations between the real values and those calculated by the neural


Introduction
The main route for the production of cast iron and steel is through the blast furnace. Steel is generally produced in two steps: First, the pig iron, also called hot metal, is obtained, which consists mainly of iron, a high carbon content (about 4.5%), and impurities such as sulfur, phosphorus, and silicon 1-3 . Hot metal is usually produced in a blast furnace, although a chemically similar but spongy-looking material can be obtained by the process of direct reduction, which is then used in a similar way to the hot metal produced in a blast furnace. However, this article is about the control of silicon in blast furnaces fed with metallurgical coke [4][5][6] .
The blast furnace is fed from the top with lump iron ore, sinter, pellets, fluxes such as dolomite and limestone, and a fuel called metallurgical coke. Hot air is blown into the furnace from the bottom, through the tuyeres. Fuels such as pulverized coal, biogas and natural gas are also injected into this area. A part of the blast furnace gas is burned in the hot stoves to heat the air jet entering the furnace to about 1050°C. The air jet is enriched with oxygen [7][8][9][10] .
The physicochemical interactions gas-solid, solid-solid and liquid-liquid occur in different zones of the blast furnace. The hot air combines with the descending glowing coke to produce carbon monoxide gas and release the energy needed to raise the internal temperature of the blast furnace by about 1б50°C [10][11][12][13][14] .
The hot metal and molten slag are removed from the bottom of the furnace and sent to the steel mill for further processing 10 .
However, during hot metal production it is necessary to control the impurities of the hot metal, especially silicon. Hot metal with excess silicon is harmful to the secondary refining process in the steelworks. The final concentration of the hot metal in terms of impurities such as silicon is the result of the equilibrium between the iron phase and the slag phase as the iron droplets percolate through the slag phase 15 .
The silicon content in the cast iron is an important indicator of the thermal condition of a blast furnace and can therefore reflect the quality of the steel. High values of silicon content indicate increased heat input to the furnace and in some cases may indicate excess coke in the furnace. Since coke costs predominate in the production of cast iron, tighter control of silicon content therefore clearly has economic advantages 7,[15][16][17][18][19] .
In the field of complex process simulation, the application of solutions based on neural networks has become popular due to their versatility and possibility of development, as well as the greater reliability of the responses, since the neural network receives new data during the training process 20,21 .
Artificial neural networks are computational models that have a number of artificial neurons and mimic the functioning of a human biological neuron. The main feature of this technique is its ability to learn and solve problems that are not linearly separable using information from the environment in which it operates [22][23][24][25] .
The basic unit of the neural network is the artificial neuron, which can be divided into input signal, synaptic weight, bias, sum, activation function, and neuron output. These functions mimic the functioning of the human brain and make it possible to reproduce the synaptic communication between neurons 2,3,10, 26 .
From the analysis of Figure 1, it can be concluded that x j represents the input of the network; w ki represents the synaptic weight, where k is the neuron number and j is the input stimulus; b k represents the weight parameter (bias); f (.) represents the activation function of the neuron; u k represents the linear combination of the input signals; and y k represents the output response of the neuron.
The most common artificial neural network architectures are single-layer feed-forward networks, multilayer feedforward networks, recurrent networks, and lattice networks 2,27 . In this paper, a single-layer feedforward architecture with a sigmoid activation function and the Levenberg-Marquardt training algorithm was chosen. Figure 2 shows a single-layer feedforward neural network.
In this type of architecture, the input layer is directly connected to one or more neurons that generate the output response. This type of architecture is used to solve classification and pattern problems. Perceptron and Adaline neural networks work with feedforward architecture.
Defining the topology of an artificial neural network (ANN) is not a trivial task, as it is usually determined on the basis of past experience and trial-and-error processes. According to the literature, several authors suggest that one or two layers are sufficient for modeling metallurgical problems. Only one hidden layer, using a sigmoid-type activation function, is sufficient for the network to converge to good results as long as there are enough input variables to train the algorithm 2,16,28 .
The Levenberb-Marquardt (LM) is algorithm for approximating the minimum error function by Newton's method. This approximation is described by Equation 1.
Where, ( ) is the identity matrix, ( ( ) ) represents the error, and ( ) corresponds to the Jacobian matrix. Considering all the context presented so far, the main objective of this paper was to model 12 different artificial neural networks to find out what is the ideal number of neurons in the hidden layer that gives the best results for predicting the silicon content in the hot metal. The number   neurons in the hidden layer were 10, 20, 25, 30, 40, 50,  75, 100, 125, 150, 175, and 200 neurons.

Selection of process variables
The blast furnace studied in this paper has a total height of 110m and a useful height of 43m (considering the useful volume), while its internal volume is 3,617.01m 3 . The crucible of the blast furnace, where the liquid metal and slag are stored, has a volume of 899.03m 3 .
There are a large number of process variables in the blast furnace that affect the silicon content of the hot metal. Selecting an appropriate set of these variables is not a trivial task. The inclusion of many secondary variables in the model complicates its training and use. On the other hand, the inclusion of few variables makes the model poor 2,10,29 .
The input variables were classified into 7 groups. Table 1 shows the summary of the classification of the groups of input variables. Table 2 shows the output variables analyzed.
A total of 1100 operating days over a 3.5-year period were selected. 75 variables were selected as described in Tables 2 and 3. The database consists of 82,500 data.
During these 3.5 years the blast furnace was in normal operation and therefore the output of the neural network corresponds to the normal operation of the blast furnace.

Outlier removal
The database initially consisted of 1302 days of operation and after data processing and removal of the outlier resulted in a database composed of 1100 operational days.
In this paper, 2 techniques were used to identify the outliers. The first technique used the experience of engineers, consultants and operational technicians.
During a period of 3.5 years, the operation of the blast furnace under study was interrupted 10 times to perform maintenance on the reactor. The maintenance shutdowns lasted on average 3 days (72 hours) each. In the week before and after the maintenance shutdown, operational changes are made at the blast furnace. It was determined that the 7 days before the maintenance shutdown and the 7 days after the maintenance shutdown would be classified as an operational outilier.
During the 3.5 years, the blast furnace experienced 7 serious operational instability events. Five events involved an immediate reduction in the murmur volume and pressure of the reactor. It was determined that the day of the sudden reduction in blow volume event and the 2 days later of event were also classified as operational outlier.
There were also 2 events of load-lowering reduction (permeability) in the blast furnace. It was determined that the day of the event and the 2 days before and after the event are also classified as operational outilier. When the permeability of the blast furnace is reduced, the passage of air through the metallic charge inside the reactor is affected, and when this air is trapped in a zone, the internal pressure increases, resulting in a cage and consequently a decrease in the load of furnace. Table 3 illustrates the number of days that were considered outliers and excluded from the database.
The second technique to identify the outliers used the principle of exploratory data analysis. The method consists of defining a pair of inner fences and a pair of outer fences, as illustrated in Equations 2 to 5.
The data located between the inner fences is the correct data, those between the inner and outer fences are the moderate outliers and those outside the outer fences are the severe outliers. Where Q 1 =firt quartile; Q 3 =third quartile; and IQR=interquatrile range 10 .
( ) Following the principle of exploratory data analysis, 23 days were found to be moderate outliers and 7 operational days were found to be severe outliers. The 23 days (moderate outliers) were retained in the database, but 7 days considered severe outliers were eliminated. Table 4 illustrates the composition of the database. Figure 3 shows a normal distribution curve and the interval (> + 3σ and < -3σ ) that was considered a severe outlier and removed from the database.
In this paper, considering a normal distribution of the database, all points outside the range (µ + 3σ) and (µ -3σ) were considered as outliers and eliminated. In the case of normally distributed composition data, as in the case of the database of this research, the removal of outliers is justified 10 .

Database standardization
The database contains several physical variables such as temperature, hot metal production, slag production and blown air volume with very different magnitudes.
These data cannot be used directly to train an artificial neural network because variables with a high magnitude affect the respective synaptic weights compared to variables with a lower magnitude. In this paper, all variables were standardized to the range between 0 and 1, as shown in To standardize a variable (Z), the mean (μ) and standard deviation (σ) are calculated. Thus, for each observed value (x) of the variable, the mean is subtracted and divided by the standard deviation. Database standardization is important to interpret the skewness and kurtosis behavior of each variable in the database.
Skewness is the degree of deviation or departure from symmetry of a distribution. When the curve is symmetric, the mean, median, and mode coincide at the same point on the maximum ordinate, so there is perfect equilibrium in the distribution. If no equilibrium occurs, that is, the mean, median, and mode are at different points in the distribution, it is asymmetric, that is, skewed to the right or left 10 .
A distribution with negative skewness exists when the values are concentrated at the top of the scale and gradually spread to the lower left. The skewness is positive when the third quartile moves away from the median, while the first quartile approaches the median and has a limit: Q 1 = Q 2 when the skewness takes the maximum positive value: (S=1).
A distribution with positive skewness exists when the values are concentrated at the top of the scale and gradually spread to the lower right. The skewness becomes negative when the first quartile moves away from the median while the third quartile approaches the median, which is a limit: Q 3 = Q 2 when the skewness takes a maximum negative value 10 .
Kurtosis is the degree of flatness of a distribution, relative to the normal distribution. Kurtosis can be of three types: mesokurtic, when the distribution is normal; leptokurtic, when the distribution is sharper than normal; and platykurtic, when the distribution is flatter than normal 10 . Figure 4 illustrates the behavior of skewness, while Figure 5 illustrates the behavior of kurtosis.

Data segmentation
Several authors argue that normally 85% of the data is used to train and validate the neural network and the remaining 15% of the dataset is presented to the neural network only when its performance in the training and validation phase is considered satisfactory 2,16,30 .
The test step is important to evaluate the generalization and learning ability of the neural network. The neural network must be able to generalize what it has learned and reproduce the solutions from the trained examples for any problems similar to the training 2, 16 .
The database of the blast furnace operating records was segmented into 4 groups: (1) training, (2) validation, (3) test, and (4) cross-validation.  Segmenting the data into training set and validation set ensures that the weights do not converge to a local minimum. In this method, after the neural network has converged with the training data set, it is trained again with the validation data set 10 .
In the validation dataset, the final weights obtained in the training phase are used as the initial weights. If the error converges in the validation phase, the trained network can be used as a process model. To test the accuracy of the neural network model, test data is used. This data is used only once to test the accuracy of the model [31][32][33] .
After testing, it is possible to check the final result with an additional database to cross-validate the results obtained after the training, validation and testing steps.
The segmentation of the variables was performed randomly and an ANOVA test was performed to verify that the original set and the 4 segmented sets represent the same population given by their mean and standard deviation. Table 5 illustrates the segmentation of the database.

Network architecture
As explained above and shown in Figures 1 and 2, artificial neural networks are known to be computer models inspired by the human brain and used in machine learning and pattern recognition, where the smallest component of a neural network is the artificial neuron.
The optimization algorithm for training was the Levenberg-Marquardt (LM) because it allows for fast convergence.
The function used was of the sigmoid type. The model was evaluated using the mean square error (MSE) and the Pearson's correlation coefficient (R).
According to the literature, the number of neurons in the hidden layer must be determined empirically, without explicit rules for an ideal calculation. It is recognized in the literature that in most cases the use of a single hidden layer is sufficient since this structure is able to approximate any non-linear equation (such as quadratic or exponential equations), as long as there are enough input variables to train the neural network.
Two hidden layers are already capable of representing any relationship between the data, including those that cannot be represented by equations.
More than two hidden layers are needed only for even more complex problems, such as time series and computer vision, where there is some relationship between the dimensions contained in the data (time in the first case and geometric shapes in the second).   The neural network in this paper has 74 input variables and 1 output variable operating on 82,500 pieces of information. Figure 6 illustrates the architecture of an artificial neural network.
The purpose of this paper is to evaluate the behavior of a neural network with a single hidden layer, sigmoid activation function and using Levenberg-Marquardt algorithm varying the number of neurons in the hidden layer in 10, 20, 25, 30, 40, 50, 75, 100, 125, 150, 175 and 200 neurons.

Statistical analysis
Descriptive statistics applies techniques to describe and summarize a database. Some measures commonly used to summarize a database are measures of central tendency and measures of variability or dispersion.
Measures of central tendency include mean, median, and mode. Measures of variability include standard deviation, maximum and minimum values, skewness, and kurtosis.
As mentioned earlier, the segmentation of the database was done randomly and an ANOVA test was performed to verify that the original set and the segmented sets represent the same population given by their mean and standard deviation.
The mean, standard deviation, minimum, median, maximum, asymmetry and kurtosis were calculated. Tables 6 to 12 show the descriptive statistics for the groups of input variables. Table 13 shows the descriptive statistics for the output variable (silicon).

Model variables
In selecting of input variables, it was decided to select the most important variables that affect the operation of the blast furnace. The variables were selected considering 7 groups (Table 1): (1) blowing air; (2) blast furnace gas;  (3) thermal control; (4) fuel; (5) ore, sinter and pellets; (6) hot metal and (7) slag. Blowing air is supplied at the bottom of the blast furnace. Flow measurements are important for process control of the blast furnace and provide information on operating deviations.
Operational control of the blast furnace gas is important information for the application of thermochemical models and the frequency of its analysis is important to determine specific carbon consumption and specific air flow 2.
Thermal control is important to ensure the performance of the blast furnace and the quality of the final product. The amount of silicon in the hot metal is directly dependent on the temperature of the hot metal and the quality of the minerals fed from the top of the furnace 20. The control of the blast furnace fuels, which can be injected through the vents or loaded into the blast furnace from the top, is the main group of variables. The blast furnace studied in this paper was operated for 4 years with 7 different types of coal, according Table 10.
The same reactor has a pulverized coal injection system and can also be operated with natural gas injection. Coal and coke play a dual role in the production of hot metal. As a fuel, it enables the high temperatures (about 1,500º Celsius) required to melt the ore to be reached, and as a reducing agent, it combines with oxygen and reduces the ore at high temperature 16.
Mineral control is important to optimize the final quality of the product and reduce the levels of unwanted impurities such as silicon, phosphorus and sulfur. Four years were spent running the blast furnace and parameterizing the chemical composition of the minerals and fuels. The hot metal control is important because it is necessary to ensure the final quality of the steel 10 .
Production of hot metal with high content of silicon, phosphorus and sulfur increases the final cost of the product and makes secondary refining unprofitable 10 .
Slag is obtained by melting and separating the gangue from the raw materials and fluxes. It consists mainly of thermodynamically stable oxides such as MgO, CaO, Al 2 O 3 and SiO 2 , which constitute up to 95 wt% in the slag. The control of hot-metal/slag is important because silicon should preferably be slagged 2,10 .

Statistical analysis
Analyzing the descriptive statistics of the input variables, one can see that the database generally has little noise, low standard deviation, asymmetry, and near zero kurtosis. In terms of silicon (output variable), the behavior is similar to the input variables.
The hypothesis tests were performed using the database as a reference. The hypothesis tests were performed in Minitab software. The hypothesis test was performed for the mean and for the standard deviation.
The artificial neural network showed excellent results for all neurons. The hypothesis test proved that all sample groups calculated by the neural network are equal in the database considering 99% confidence interval using Welch method. Table 14 shows the descriptive statistics of the database (actual values) and the values calculated by the artificial neural network.

Model validation
The usual method for evaluating a neural network mathematical model is using the MSE (mean square error). Small MSE values indicate that the model has better predictive ability. The MSE is given by Equation 7.
In many cases, Pearson's correlation coefficient (R) can be used. However, this parameter evaluates the linear relationship between variables. The Pearson correlation coefficient (R) or linear correlation is given by Equation 8.
Where, (n) represents the number of observations, (C neural ) represents the value calculated by the artificial neural network and (C real ) represents the value measured during the blast furnace operation. The training, validation and testing phase was performed with up to 1,000 iterations and automatically interrupted when it converged to the smallest error. The model was validated using Pearson's correlation coefficient and mean square error. Five correlation coefficients were calculated: (1) training; (2) validation; (3) test; (4) cross-validation;  Figure 7 to Figure 10 show the results of model validation.
When analyzing the values for the MSE between training, validation, and testing, as shown in Figures 9 and 10, no differences were found that could indicate overfitting, i.e., when the model has a low error during training and a high error during testing.
From the analysis of Figures 7, 8, 9 and 10, the best results were obtained with 25 and 30 neurons. The neural network with 30 neurons in the hidden layer showed about 1% better results than the ANN with 25 neurons. From Figures 7 and 8, it can be seen that the neural networks with 25 and 30 neurons have 12% higher mathematical correlation than the network with 200 neurons.
Considering that the best results were those of the neural network with 30 neurons, it can be mentioned that silicon had a Pearson correlation coefficient of 0.975, while the mean square error (MSE) was 0.0006. Regarding cross-validation, it is noted that the best results were also obtained with 25 and 30 neurons. During the cross-validation, the network with 30 neurons showed a mean square error (MSE) of 0.00035 and a Pearson's correlation coefficient of 0.955.
From the analysis of Figure 9, the MSE decreases as the number of neurons increases. In the present study, the neural networks were configured in up to 1000 training epochs. The worst convergence result was obtained by the network with 200 neurons, which required 317 epochs to achieve convergence. The neural network with 30 neurons converged quickly, requiring only 28 epochs to reach convergence.
In this context, the authors Saxén and Pettersson 19 mention that the silicon content has a more irregular behavior, which makes the convergence of the results more difficult. However, this fact was not found in this paper, probably due to the big data used and the elimination of severe outliers, which allowed better learning of the artificial neural network 9,10 .
Based on the results of this research, a comparison was made with the other models mentioned in the literature, as shown in Table 15.
The analysis of Table 15 shows that the results of this paper were superior to the models reported in the literature, suggesting that the use of Big Data and the prior treatment of databases is a beneficial alternative in modeling situations to refine the results.   From a metallurgical point of view, the silicon content in hot metal is an important quality parameter that must be monitored, as this element serves as an indicator of the thermal condition of the reactor. Lower amounts of silicon in the hot metal indicate that the reactor is probably cooling down. On the other hand, an increase in silicon content indicates excessive heat generation and thus wastage of metallurgical coke and pulverized coal 10 .
The silicon in the production process comes from raw materials, especially coke ash and gangue from the metallic charge. In order to improve the quality of the final product, it is necessary to use raw materials with low variations in chemical composition and with low silicon content, and to keep it as constant as possible with respect to its optimum level, in order to to minimize the costs of secondary refining in the steelworks' converters. It is also worth noting that the excess silicon in the hot metal requires a greater amount of calcium oxide (CaO) in the steel mill to perform secondary refining, resulting in a greater amount of slag and increasing production costs 2,10,16 .
In this sense, silicon content prediction models are useful for the production process. They support the reactor operation and allow to work with smaller safety margins, to optimize the fuel consumption and to improve the reactor efficiency.
In conducting a comparison between machine learning and hot metal metallurgy and evaluating the synaptic weights of the models, it is found that the most important variables are sinter and blowing flow. As for sinter, it contains SiO 2 , which serves as a Si source for hot metal, which could explain its influence on the model. Regarding the blowing flow, higher values favor a stronger blowing penetration and affect the thermal level of the blast furnace, which affects the silicon content incorporated into hot metal.
Other variables that also have an effect on silicon content were the enrichment of O 2 , pressure, and the amount of air blow in the tuyeres. These are variables that can affect the shape, thickness and position of the cohesive zone and consequently the silicon content of the hot metal.
For example, low gas permeability may indicate a thicker cohesive zone. Increasing the O 2 enrichment tends to increase the reactor temperature and decrease the amount of nitrogen injected, increasing the thermal level and favoring the permeability of the blast furnace. Thus, a high or thicker cohesive zone causes an increase in the silicon content in the hot metal 10 .
Finally, it turns out that the silicon introduced into the process is dissipated into the hot metal and slag. Thus, based on the binary basicity (CaO/SiO 2 ) and using a mass balance calculation, it is possible to determine the silicon content in the hot metal. This information underlines the importance of controlling the conditions that affect the basicity of the slag. It is therefore entirely justifiable to use artificial neural networks to predict the silicon content during hot metal production 10,16-18 .

Conclusions
The increasing development of computing capacity, leading to cheaper devices with greater capacity, has driven the development of more complex algorithms with better results, as is the case with neural networks; It should be noted that the most important part of modeling a neural network is the previous treatment of the database to be used for model development; The neural network is an interesting tool to support decision making and operational planning in terms of fuel economy, operational stability, and delivery of a quality product for the steelworks, and helps to improve process monitoring;   Table 15. Comparison between models reported in the literature [16][17][18] . 19 Dobrzanski et al. 18 David et al. 16 Diniz et al. 17 This paper  Neural networks are tools capable of predicting silicon content based on parameters of the reduction process in blast furnaces and this can be verified by the precision of the model;

Saxén and Pettersson
The Pearson and MSE correlation coefficient values confirmed that the hidden layer with 30 neurons gave the best results; The analysis of the synaptic weights confirmed that the blower air, the sinter, the oxygenation and the pressure have a greater influence on the silicon content in the hot metal; In order to a lesser extent, the slag rate, (SiO 2 ) and (CaO) have a lesser influence on the variation of silicon content and do not directly contribute to this silicon to hot metal transfer mechanism; In conclusion, neural networks can be used in practise due to the excellent correlations between real values and the values calculated by the neural network.