Open-access Feature extraction and XGBoost prediction model for compaction process of roller compacted concrete based on discrete element simulation

ABSTRACT

To overcome the limitations of traditional roller-compacted concrete (RCC) compaction monitoring—which relies on macroscopic experiments, overlooks microscopic mechanisms, and lacks model interpretability—this study proposes a novel framework integrating the discrete element method (DEM) with an improved XGBoost algorithm. The framework incorporates multi-scale features to enable accurate prediction and interpretation of compaction quality. Key innovations include: (1) extracting microscopic parameters such as particle contact numbers, coordination numbers, and pore distributions via DEM simulations, and establishing a quantitative relationship between compaction passes and micro-parameter evolution; (2) introducing a SHAP-based XGBoost interpretability approach that identifies particle contact number as a key driver of compaction degree; and (3) developing a multi-scale feature fusion method to jointly optimize macroscopic and microscopic parameters. Results show that DEM simulations yield compaction degree and porosity errors below 1% and 8%, respectively. The improved XGBoost model achieves an average absolute error of 0.43—outperforming linear regression (1.22) and decision tree (0.91)—with a determination coefficient of 0.97. In practical validation, the prediction error remains within 1%. This research offers a high-precision, interpretable prediction system for RCC compaction, addressing the limitations of empirical methods and enabling real-time optimization of construction parameters.

Keywords:
Roller compacted concrete; Discrete element method; multi-scale features; extreme gradient boosting tree; Compactness

1. INTRODUCTION

Roller compacted concrete (RCC), as an efficient concrete construction method, has found extensive application across diverse domains like water conservancy engineering, road engineering, airport runway construction, and dam filling in recent years [1]. The core of this construction technology lies in using vibration compaction equipment to compact and shape freshly poured concrete, which boasts the benefits of rapid construction pace, affordability, and excellent durability. However, the compaction quality of RCC directly affects its mechanical properties and durability. During the compaction process, the particle arrangement, pore structure, and internal stress distribution of concrete undergo complex changes, which directly affect key performance indicators such as strength, impermeability, and frost resistance after concrete hardening. Insufficient compaction can introduce excessive porosity and internal defects, which degrade the concrete’s mechanical properties, compromise its durability, and may lead to premature structural failure and safety risks [2]. Therefore, effective monitoring and accurate prediction of the compaction process of RCC are crucial to ensuring optimal compaction results at every construction stage. This is essential for ensuring project quality, extending the service life of structures, and reducing maintenance costs in the later stages.

In recent years, many scholars have proposed their own solutions to this issue. MARJONO and ROCHMAN [3] used RCC as a building material to investigate the correlation of RCC pavement models under different compaction cycles. The outcomes indicated that the number of compaction cycles exerted a considerable impact on the performance of compacted concrete, especially on its compressive strength. For RCC with thicknesses of 5 cm, 6 cm, and 7 cm, the optimal compaction times were 16, 20, and 24, respectively [3]. HASSOON and ABBAS [4] used different compaction mechanisms and curing methods to explore the differences in the performance of RCC between laboratory environments and field conditions. It was found that the compressive strength of RCC was sensitive to the compaction method and curing process used [4]. LAM et al. [5] proposed a compressive strength prediction model for RCC pavement containing steel slag aggregate and fly ash to predict the compressive strength of RCC pavement. The results showed that among the prediction results of multiple regression analysis, artificial neural network (ANN), and fuzzy logic models, the reliability of the multiple regression analysis model was lower than that of the ANN model. The fuzzy logic model using three input variables (fly ash content, steel slag aggregate content, and age) could obtain prediction results comparable to the accuracy of the ANN model [5]. DEBBARMA and RANSINCHUNG [6] proposed an ANN model to predict the 28 day compressive strength of RCC pavement mixtures containing recycled asphalt pavement aggregates. The outcomes indicated that the model had good performance in predicting between input parameters and output values, and the coefficient of determination during the testing phase was 0.985 [6]. HABIB et al. [7] used commercial finite element (FE) software packages to estimate the stress and response of RCC dams under thermal and seismic effects. The simulation results showed: (1) The opening of the dam section would significantly increase the stress under earthquake action; (2) The crack index was proven to be an effective indicator that can better reflect the failure probability of RCC dams; (3) The heat of RCC dam gradually increased during the initial pouring stage, and then significantly increased in the following days [7]. In another study, HABIB et al. [8] compared and benchmarked the results of integrated machine learning models with traditional techniques to predict the safety factor of road slopes. It was found that machine learning models could quickly and accurately estimate the safety factor of road slopes, which was a promising alternative to traditional methods [8]. Furthermore, HABIB et al. [9] used Kernel Principal Component Analysis (KPCA) for data reconstruction, and then developed a predictive model using multivariate regression analysis to propose an improved method for estimating the mechanical properties of rubber concrete. The results showed that the performance of the proposed estimation method was significantly better than traditional regression techniques, and in the case of compressive strength, the method could reduce Root Mean Square Error (RMSE) by 80% [9].

Although previous research has achieved certain results, there are still significant shortcomings. (1) Traditional compaction research often focuses on macro indicators and lacks attention to the micro level; (2) Traditional research relies on macroscopic experimental data and cannot reveal the impact mechanism of particle size dynamic evolution on compaction quality; (3) Existing prediction models often ignore the dynamic nonlinear characteristics of the compaction process, resulting in insufficient on-site adaptability; (4) The most important thing is that the black box model is difficult to analyze the contribution of features, which seriously restricts the optimization decision-making of construction parameters. Based on this, the study proposes to introduce Discrete Element Method (DEM) on the basis of traditional macroscopic experiments, deeply explore microscopic features, and integrate macroscopic and microscopic features into multi-scale features to more comprehensively characterize the compaction process. In addition, the study also combined DEM and improved XGBoost algorithm to achieve dynamic fusion of micro macro features, and used SHAP method to reveal the key role of particle contact number as the dominant factor of compaction degree, providing theoretical basis for optimizing construction parameters.

This study makes a triple theoretical contribution to the field of compaction quality evaluation of RCC by integrating discrete element microscopic simulation and machine learning prediction models. Firstly, a quantitative correlation mechanism between particle contact number and macroscopic compaction degree is established, which compensates for the limitation of traditional macroscopic experiments that cannot reveal the microscopic physical essence. Secondly, the developed multi-scale feature fusion technology breaks through the one sidedness of single scale analysis and provides a portable methodological framework for cross scale research similar to particle materials. Finally, the SHAP explanatory model based on game theory quantified for the first time the nonlinear coupling effects of microscopic parameters during the compaction process. These findings not only validated the theoretical assumptions of early scholars about the aggregate interlocking mechanism, but also laid a theoretical foundation for the characteristic engineering design of the new generation of intelligent compaction systems. Compared to previous purely data-driven prediction models, the “physical mechanism data model” dual driving paradigm developed in this study not only maintains computational efficiency but also enhances the physical interpretability of results. This innovative approach is particularly suitable for civil engineering applications that require high transparency in mechanisms.

The research content of this article is arranged as follows: Chapter 2.1 provides a detailed introduction to the DEM-based simulation for RCC compaction process and multi-scale feature extraction techniques. Chapter 2.2 elaborates on the construction process of improving the XGBoost prediction model and the SHAP interpretation method. Chapter 3 validates the effectiveness of the model through comparative experiments and engineering examples. Finally, in Chapter 4, the research results are summarized and future improvement directions are proposed.

2. METHODS AND MATERIALS

2.1. RCC compaction process based on DEM simulation

RCC is a type of dry hard cement concrete that achieves high density and strength through vibration compaction construction technology. Compared with ordinary cement concrete, RCC has the characteristics of less water consumption, lower viscosity, faster construction speed, and shorter curing time. Therefore, the quality of RCC will directly affect the quality of construction. The traditional evaluation of RCC compaction quality generally adopts macroscopic compaction tests, which simulate actual construction conditions to compact RCC, and then observe its overall compaction effect and performance. The main purpose of this macro compaction test is to confirm whether the capability of the compaction equipment meets the engineering requirements, determine reasonable compaction parameters, and evaluate the compaction effect, thereby ensuring that the foundation treatment meets the design requirements. However, traditional macroscopic testing focuses on overall density and performance evaluation, and cannot reveal the influence of microscopic scale. However, the mechanical properties of RCC are essentially determined by microscopic mechanisms such as pore structure evolution, aggregate interface interlocking, and interlayer stress transfer. Therefore, macroscopic compaction tests cannot fully and accurately reflect the actual performance of RCC. Based on this, the study proposes using layered construction discrete element microscopic simulation to address the above issues. Micro feature extraction is added to traditional macro feature extraction to form multi-scale feature fusion, thereby allowing the compaction process to be more comprehensively characterized. Among them, the discrete element microscopic simulation of layered construction mainly relies on the particle flow code (PFC) program to achieve, and the simulation process of PFC can be seen in Figure 1.

Figure 1
PFC simulation process.

Figure 1 shows the simulation process of PFC. From the figure, RCC is composed of many interacting particles. Therefore, for each particle, the motion equation can be calculated based on physical laws, including the particle’s velocity and acceleration. After calculating the motion equation of each particle, the time step calculation is carried out to determine the position and velocity of each particle in the next time step of the simulation. Then the process updates the position information of each particle. Next, it calculates the contact force and displacement relationship between particles, and traverses all contact points between particles [10, 11]. Finally, the process updates the contact force information based on the force displacement equation and the contact between particles. The entire process runs in a loop, with each loop representing a time step in the simulation. By continuously cycling this process, the mechanical behavior and evolution of granular materials under various conditions can be simulated [12]. Based on this, a discrete element model of RCC can be constructed. Considering that the grading of dam materials is the foundation of modeling, it is necessary to first calculate the number of mortar particles, as shown in Equation (1).

(1) n i = V s u m ( 1 V i ) / ( 4 3 π R ¯ i 3 )

In Equation (1), ni is the number of mortar particles, Vsum is the total volume of the model, and Vi is the volume proportion of the i th coarse aggregate. R¯i is the average particle size of mortar particles. In addition, due to the use of layered construction, the specific operation of RCC layered construction discrete element modeling can be seen in Figure 2.

Figure 2
Discrete element modeling of layered construction.

Figure 2 shows the discrete element modeling operation for layered construction. The process is divided into four steps, starting with randomly generating particles that represent the aggregates in RCC materials. Then it replaces the irregular aggregates to simulate the distribution and shape of aggregates in actual construction. Then, the lower layer settlement compaction operation is implemented, which simulates the compaction process of the lower layer aggregates during construction, and increases the density and bearing capacity of the aggregates through compaction [13, 14]. After the settlement and compaction of the lower layer, the upper layer is compacted to complete the double-layer rolling. Due to the fact that the process is carried out under loose paving conditions, the study takes the cumulative settlement curve of the dam material surface as the calibration target. Its corresponding calibration requirements can be seen in Equation (2).

(2) { ε ¯ = 1 y i = 1 n | ( S m S s ) / S s 100 | ε z = | ( S m z S s z ) / S s z 100 |

In Equation (2), ε¯ represents the average error percentage, y represents the number of rolling passes, Sm represents the simulated settlement amount, Ss represents the actual measured settlement amount, εz represents the error percentage under specific conditions, Smz represents the simulated cumulative settlement amount under the final pass, and Ssz represents the actual cumulative settlement amount under the final pass [15]. To ensure the computational efficiency of the model, the equivalent load method was introduced, as shown in Equation (3).

(3) { E M = E S F M W M = F S W S

In Equation (3), ES represents the compaction work done by the actual roller to the dam material, WM represents the static pressure of the model roller, EM represents the compaction work done by the roller in the model, WS represents the static pressure of the actual roller, FS represents the excitation force of the actual roller, and FM represents the excitation force of the model roller. Based on this, the discrete element modeling of RCC can be completed.

2.2. Microscopic mechanism analysis of RCC construction layering based on multi-scale feature extraction

Based on the DEM simulation results, feature extraction can be further implemented by obtaining key feature data during the compaction process, and then analyzing the microscopic mechanism of RCC construction layering [16, 17]. The main operation of feature extraction is shown in Figure 3.

Figure 3
Feature extraction process based on DEM modeling.

Figure 3 shows the feature extraction process based on DEM modeling. From the figure, the process first defines the range of inter layer aggregate vertical coordinates, then traverses the particles, identifies specific parameters, and calculates new parameter values. If the conditions are met, the particles can be embedded into the aggregate. Next, it iterates through the embedded particles, calculates the parameters, and finds the minimum value. At the same time, it also checks the contact points, identifies the particle groups, and simulates interlayer contact if they are in the same group. If the particles at both ends of the contact are of mortar type, it considers the lower layer of mortar as layer particles, identifies parameters, and calculates new values. Finally, the process calculates the interlayer aggregate embedding value to complete key feature extraction. To achieve a comprehensive understanding of material performance, this study expands upon DEM-based micro-scale feature extraction by developing a multi-scale framework. Specifically, at the micro scale, DEM simulations are used to analyze the overall behavior of particle groups, while at the macro scale, traditional compaction tests are used to evaluate the overall performance of materials. Multi-scale feature extraction combines micro scale and macro scale, thus requiring the use of multi-scale statistical methods to integrate feature information from different scales. The first step in multi-scale statistical operations is to perform scale decomposition, as shown in Equation (4).

(4) f ( t ) = j k c j , k ψ j , k ( t )

In Equation (4), in scale decomposition, wavelet transform is used to decompose the signal into components of different scales. Among them, f (t) represents the original signal, ψj,k(t) represents the wavelet basis function, and cj,k represents the wavelet coefficients. It is worth noting that during the scale decomposition stage, the Daubechies 4 (db4) wavelet basis function was selected for 5-layer decomposition. This wavelet family has tight support and appropriate smoothness, making it suitable for analyzing non-stationary signals generated by DEM simulations. The original DEM time series data (including particle displacement, contact force, etc.) is first subjected to standardization preprocessing, namely mean zeroing and variance normalization, and then the Mallat algorithm is used to implement discrete wavelet transform. The decomposition level is determined as 5 layers based on the sampling frequency and feature scale range, corresponding to physical scales of L1 (0.5–1 ms), L2 (1–2 ms), L3 (2–4 ms), L4 (4–8 ms), and L5 (8–16 ms). The wavelet coefficients of each scale are denoised using a threshold (soft threshold, λ = 0.1) and used for subsequent analysis. The second step is feature extraction, which involves extracting statistical features at each scale, as shown in Equation (5).

(5) μ j = 1 N j k = 1 N j c j , k

In Equation (5), j represents scale, Nj represents the number of samples at the corresponding scale, and μj represents the mean at the corresponding scale. The third step is cross scale modeling, which involves using regression models to establish the relationship between features at different scales, as shown in Equation (6).

(6) Y = β 0 + β 1 X 1 + β 2 X 2 + + β n X n + Ò

In Equation (6), Y represents macro scale features, Xn represents micro scale features, β0 represents regression coefficients, and 0ˋ is the error term. Finally, by introducing principal component analysis for dimensionality reduction, the main features of multi-scale data can be extracted, as shown in Equation (7).

(7) Z j = i p a j i X i

In Equation (7), aji is the load factor and Zj is the new variable obtained after dimensionality reduction.

2.2.1 XGBoost-based compaction prediction model for RCC

Although DEM is very good at simulating the microscopic behavior of particle systems (such as collisions and friction), directly using it for macroscopic parameter prediction such as material strength and flowability may require a large number of repeated simulations, resulting in extremely high computational costs. To tackle this problem, the study proposes the introduction of XGBoost algorithm, which combines DEM with XGBoost algorithm to achieve more complex analysis. The core of XGBoost algorithm lies in gradient boosting and decision tree (DT). Gradient boosting is composed of multiple weak learners, while DT is a tree structure model used for decision-making. Every node signifies a characteristic, every branch denotes a decision outcome, and every leaf node indicates an ultimate result [18, 19]. The workflow of XGBoost can be seen in Figure 4.

Figure 4
Workflow of XGBoost.

Figure 4 shows the workflow of XGBoost. From the figure, XGBoost starts with a simple model and then trains a new DT to predict residuals. Next, the new tree gets incorporated into the current model for better prediction of the target value [20]. This procedure is continuously reiterated until the predetermined number of trees is reached or the error is reduced to an acceptable range. Based on this, combining DEM with XGBoost can help expand RCC compaction from “simulation” to “prediction”. Specifically, by training DEM generated data such as particle size and response under loading conditions, XGBoost can quickly establish a non-linear mapping relationship between input parameters and output results, thereby achieving efficient prediction of RCC compaction. The objective function of XGBoost can be seen in Equation (8).

(8) L ( θ ) = i = 1 n l ( y i , ŷ i ) + Ω ( f t )

In Equation (8), L(θ) represents the objective function, θ represents the model parameter, n represents the sample size, and l (yi, ŷi) represents the loss function. In order to prevent overfitting in XGBoost, the original XGBoost algorithm is optimized by adding a regularization term Ω(ft). Based on this, a compaction quality evaluation model for RCC can be constructed, as shown in Equation (9).

(9) D = f ( F , f , A , H , A 0 , A T , A t h r e e , x max , a , Δ φ )

In Equation (9), f represents the excitation frequency, Δφ represents the hysteresis phase angle, A represents the amplitude, a represents the acceleration, A0 represents the frequency domain amplitude corresponding to the fundamental frequency, mainly reflecting the response characteristics of the material at a specific frequency. F represents the excitation force, xmax represents the maximum displacement, D represents the degree of compaction achieved by the material during the compaction process, and H represents the rolling thickness [21]. To further improve the interpretability of the model, SHAP is further introduced on this basis, as shown in Equation (10).

(10) f ( x ) = ϕ 0 + i = 1 M ϕ i

In Equation (10), f (x) represents the model’s prediction of a certain input sample, Φ0 is the baseline value, and Φi is the SHAP value of the corresponding feature. To further improve the performance of XGBoost, the study also optimized the hyperparameters of XGBoost by adjusting the learning rate and maximum depth of the tree through grid search, as shown in Equation (11).

(11) { θ * = arg min θ C V e r r o r ( θ ) max d e p t h + Learning rate ( 0 , 1 )

In Equation (11), θ* represents the optimal parameter value, θ is the model parameter, CVerror(θ) represents the cross validation error, maxdepth represents the maximum depth of the tree, and Learning rate represents the learning rate. It is worth noting that in the process of optimizing XGBoost hyperparameters, a 5-fold cross validation (with a random seed set to 42) was used to evaluate the performance of parameter combinations to ensure the reproducibility of the results. The parameter range for testing includes: learning rate η ∈ [0.01,0.3], maximum tree depth d_max ∈ [3, 10], subsampling ratio γ ∈ [0.6,1.0], and regularization coefficient λ∈[0.1,1.5]. The MAE changes were monitored on the validation set via an early stop strategy (patience = 50 rounds), and the parameter combination that performs best on the validation set (η = 0.12, d_max = 6, γ = 0.85, λ = 0.8) was ultimately selected. The error changes between the training set and the validation set during the training process showed that the convergence trends of the two were consistent and the final error difference was 0.08, indicating that the model did not overfit. All experiments were repeated three times with a fixed hardware configuration (Intel Xeon E5-2680v4128GB RAM), and the results showed a difference of less than 1%, indicating that the tuning process had good stability. Based on the above, it is possible to predict the comprehensive evaluation of the compaction quality of RCC dam materials. The specific operation can be seen in Figure 5.

Figure 5
Prediction of comprehensive evaluation model for RCC dam material compaction quality.

Figure 5 indicates the prediction operation process of the comprehensive evaluation model for the compaction quality of RCC dam materials. From the figure, the process involves collecting and normalizing raw data, which is divided into training and testing sets. A model is built using XGBoost algorithm, then it is used to predict the compaction degree of the test set, evaluate the accuracy through error analysis, and output the decision result [22]. Although the improved XGBoost can comprehensively evaluate the compaction quality of RCC dam materials, it is an offline model that heavily relies on historical data and cannot reflect the construction situation in real time. To provide real-time decision support, a real-time evaluation framework as shown in Figure 6 is introduced to capture dynamic changes in construction.

Figure 6
Real time evaluation framework for RCC compaction quality.

Figure 6 shows the real-time evaluation framework for RCC compaction quality. This framework consists of four modules: data collection, processing, monitoring and command, and on-site alarm. The collection module transmits compaction parameters and GNSS positioning information to the data processing center through 4G network. The processed data is sent to the monitoring interface through the Internet to display the compaction parameters and predict the compaction degree. The on-site alarm module notifies the staff when the compaction quality does not meet the standard to prevent engineering problems.

3. RESULTS AND DISCUSSION

3.1. DEM simulation results and their experimental comparison verification

To verify the effectiveness of the design process of the research, the DEM simulation results were first tested. Firstly, the simulation parameters were set, where the particle size distribution of the aggregate adopted continuous grading, with a maximum particle size of 40 mm and a minimum particle size of 5 mm. The aggregate density was 2650 kg/m3, and the cementitious material density was 2200 kg/m3. The moisture content was 6%, the compaction speed was 0.5 m/s, and the vibration frequency was 30 Hz. Based on DEM simulation, compaction degree and porosity were extracted as key characteristic parameters, and the results were compared with experimental results, as presented in Figure 7.

Figure 7
Simulation results and experimental comparison of (a) compaction degree situation and (b) porosity situation based on DEM.

Figure 7 shows the comparison between the outcomes of DEM simulation and experiment outcomes. Figure 7a shows the compaction degree. From the figure, under 4 compaction cycles, the compaction degrees based on DEM simulation were 85.2%, 90.1%, 93.5%, and 95.8%, respectively. Compared with the experimental compaction degree, the error rates of the two were 0.83%, 0.56%, 0.54%, and 0.31%, all less than 1%. Figure 7b shows the porosity situation. Under the same 4 compaction cycles, the porosity based on DEM simulation was 14.8%, 9.9%, 6.5%, and 4.2%, respectively. Compared with the experimental porosity, the error rates of the two were 4.52%, 4.81%, 7.14%, and 6.67%, all less than 8%. Overall, there was a negative correlation between compaction degree and porosity, meaning that as compaction degree increased, porosity decreased. This indicated that simulations could accurately reflect the changes in compaction degree and porosity during the compaction process. In addition, although there was some error between the simulation and experimental results, the error was controlled within an acceptable range, which proved the effectiveness of DEM.

3.2 Analysis of the evolution law of compaction process based on multi-scale feature extraction

Considering that the compaction degree and porosity are macro-scale indicators, and in order to demonstrate the multi-scale features extracted from RCC after introducing the DEM, the study also extracted micro-scale results, including the particle contact number and particle coordination number, as shown in Figure 8.

Figure 8
Multi-scale feature representation of (a) particle contat number situation and (b) particle coordination number situation based on DEM.

Figure 8 shows the multi-scale feature representation based on DEM. From Figures 8a and 8b, as the number of compaction passes increased, both the particle contact number and particle coordination number increased. This indicated that the interaction between particles was enhanced, which was beneficial for improving compaction performance.

3.3. Prediction verification based on XGBoost model

Due to the introduction of the XGBoost algorithm in the proposed plan of the research, in order to test the effectiveness of this algorithm in RCC compaction quality evaluation, a comparative method was used in the experiment. Linear regression (LR) and DT of the same type were included as comparison models, and mean absolute error (MAE) and R-squared (R2) were selected as inspection indicators. Figure 9 presents the outcomes.

Figure 9
Performance of each model on (a) MAE and (b) R2.

Figure 9 shows the performance of each model on MAE and R2. Figure 9a shows the MAE performance of each model. From the figure, LR had an MAE value of 1.22, DT had an MAE value of 0.91, and the XGBoost model used in the study had an MAE value of 0.43. Figure 9b shows the performance of each model on the R2 index. From the figure, the R2 value of LR was 0.85, the R2 value of DT was 0.90, and the R2 value of XGBoost was 0.97. Overall, the XGBoost algorithm selected by the research performed well in predicting compaction degree, with an MAE lower than other comparative models and the highest coefficient of determination, indicating that the XGBoost model had higher prediction accuracy and stability. Furthermore, considering the introduction of SHAP method in the design scheme of the research to improve the interpretability of the XGBoost model, based on this study, the contribution of extracted features to the prediction results of the XGBoost model was calculated using SHAP, as shown in Table 1.

Table 1
Feature importance analysis.

The analysis reveals that the particle contact number is the most influential feature (mean SHAP value: 0.35), which quantitatively confirms the physical principle that denser particle packing is the primary driver of compaction quality. Beyond simple feature ranking, SHAP values also uncover crucial interaction effects. For instance, the analysis shows a strong positive synergy between particle contact number and compaction times (ρ = 0.68), indicating that the benefit of additional compaction passes is amplified when it effectively increases inter-particle contacts. Conversely, the analysis reveals a subtle inhibitory effect of compaction times on pore uniformity (β = -0.45). This suggests a potential trade-off where excessive compaction, while increasing overall density, might lead to localized over-compaction and a less uniform pore structure—a nuance that is difficult to capture with traditional analysis methods. These insights demonstrate the power of the SHAP-XGBoost model to not only predict outcomes with high accuracy but also to provide a deeper, physically interpretable understanding of the complex mechanisms in the RCC compaction process.

Due to the fact that the above results were all verification of the specific performance of the research’s design process, to test the validity of the research’s design scheme in practical applications, the experiment selected an RCC dam project of a hydropower station as an example for analysis. This construction project requires real-time monitoring and prediction of the compaction quality of the RCC to ensure the safety and stability of the dam body. Based on this, the study first obtained multi-scale feature data during the compaction process of RCC through DEM simulation, as presented in Table 2.

Table 2
Multi-scale characteristic data results during RCC compaction process.

Table 2 shows the multi-scale feature data obtained during the compaction process of RCC based on DEM simulation. From the table, multi-scale feature data was collected from three regions. By using these data as input and actual compaction degree as output, the predicted compaction degree of RCC under the research design scheme can be obtained, as shown in Figure 10.

Figure 10
RCC compaction degree prediction results of (a) frist test and (b) second test.

Figure 10 shows the predicted compaction degree of RCC. From the figure, the experiment was conducted twice. Figure 10a shows the results of the first test, and Figure 10b shows the results of the second test. By comparison, regardless of whether in region A, region B, or region C, the XGBoost model used in the two tests predicted compaction results that were very close to the actual results, with an error of less than 1%. This indicated that the model designed by the research had high prediction accuracy and stability.

3.4. Discussion

To accurately monitor and predict RCC compaction, micro feature extraction technology was introduced, and a multi-scale analysis framework was constructed through DEM to capture macroscopic mechanical response and microscopic particle behavior. A prediction model was constructed using the XGBoost algorithm and SHAP method optimized by regularization, forming a technical chain of “feature extraction model prediction mechanism explanation”. The experiment showed that DEM simulation had a compaction degree prediction error of less than 1% and a porosity error of less than 8% under 4 compaction cycles, verifying the reliability of feature extraction. Microscopic analysis showed that the particle contact number and coordination number were positively correlated with the number of compaction passes, which was consistent with the SHAP contribution predicted by the model with a particle contact number of up to 0.35, indicating that microscopic parameters had a significant impact on compaction quality. Compared with the research of CALIS et al. [23] relied on macroscopic experimental data to construct the Bagging model, which had an R2 of 0.962. Although it had engineering practicality, it ignored the microscopic mechanism and could not accurately obtain the contribution of microscopic parameters [23]. Through the extraction of microscopic features, the study not only improved the prediction accuracy, but also revealed the key influence of microscopic parameters on compaction quality, making up for this deficiency.

At the level of model comparison, the optimized XGBoost algorithm showed significant advantages, with an MAE value as low as 0.43 and an R2 coefficient as high as 0.97. Compared with LR and DT models, it achieved a leapfrog improvement in prediction accuracy and generalization ability. This performance difference may stem from XGBoost’s deep mining of nonlinear relationships and the effective suppression of overfitting by regularization terms. Compared with the research of ZHANG et al., although ZHANG et al. [24] used the MARS-GOA hybrid model (CoD = 0.811) for multi-parameter coupling, they introduced hyperparameter tuning in the optimization method, which increased the complexity of the model and lacked microscopic physical correlations [24]. In contrast, the XGBoost model studied not only simplified the parameter tuning process, but also achieved the physical interpretation of microscopic parameters through SHAP method, further improving the interpretability and engineering applicability of the model.

The case analysis based on a certain hydropower station RCC dam project further confirmed that the error between the predicted value of the research model and the actual compaction degree was always controlled within 1%, providing a scientific basis for dynamic optimization of construction parameters. This result not only validates the model’s reliability but also indicates that the proposed method has broad potential for practical engineering applications. In summary, the study combined multi-scale feature extraction with optimization algorithms, which not only improved the prediction accuracy of RCC compaction process, but also provided new ideas for the explanation of microscopic mechanisms.

However, although DEM simulation can effectively capture the micromechanical behavior of RCC, numerical models still have inherent uncertainties that may affect the reliability of the results. These uncertainties mainly come from three aspects: (1) Calibration errors in microscopic parameters, such as particle contact stiffness and friction coefficient, may persist despite calibration using experimental data, as the sensitivity of these parameters can exhibit nonlinear characteristics at different compaction stages. (2) Model simplification assumptions, such as idealizing particle shape as spherical, may underestimate local stress concentration effects due to differences in actual aggregate morphology. (3) The idealization of load boundary conditions did not fully reflect the dynamic vibration characteristics of on-site rolling equipment. To quantify these uncertainties, later reference can be made to clustering analysis methods used in earthquake engineering to screen representative working conditions, or to identify dominant factors through parameter sensitivity studies [25]. In addition, studies showed that the multi-stage nonlinear behavior of friction pendulum supports was significantly dependent on the characteristics of input excitation [26]. This suggested that future research needs to further couple random vibration theory to evaluate the confidence intervals of prediction results under different combinations of rolling parameters.

In addition, the reliability of DEM simulation results is highly dependent on initialization strategies and calibration of microscopic parameters. The layered random generation algorithm used in the study can reflect the randomness of aggregate distribution, but the local non-uniformity of initial particle arrangement may lead to differences in the distribution of contact force chains during compaction, thereby affecting the prediction accuracy of porosity evolution. In the study of fire resistance similar to fiber-reinforced concrete, the analysis of machine learning models’ sensitivity to input parameters showed that small deviations in material parameters may lead to significant differences in failure potential prediction results [27]. To quantify this impact, the parameter optimization method of Gene Expression Programming (GEP) can be used as a reference, and the sensitivity of different initialization schemes to compaction prediction can be evaluated through system grid search [28]. Preliminary tests showed that when the proportion of coarse aggregate deviated from the standard grading by more than 5%, the prediction error of the final compaction degree increased by 0.8% to 1.2%. In addition, the linearization assumption of excitation force in the equivalent load method may underestimate the nonlinear energy transfer effect of vibration compaction. Future research needs to combine parametric scanning and cross validation to clarify the scope of application of key assumptions.

4. CONCLUSION

A method combining multi-scale feature fusion and optimization algorithms was proposed to effectively monitor and accurately predict the compaction process of RCC. The multi-scale analysis framework, which integrates microscopic features (e.g., particle contact numbers and coordination numbers) extracted via DEM with macroscopic compaction degree and porosity, achieved a prediction accuracy of 0.3% for compaction degree (DEM: 95.8%, experimental: 95.5%) and a porosity error of less than 8% over 4 compaction cycles. After regularization optimization (η = 0.12, d_max = 6, γ = 0.85, λ = 0.8), the XGBoost model showed excellent predictive performance on the test set, with an average absolute error as low as 0.43 and a coefficient of determination as high as 0.97, significantly better than linear regression (MAE = 1.22, R2 = 0.85) and decision trees (MAE = 0.91, R2 = 0.90). SHAP analysis further quantified the influence mechanism of microscopic parameters, among which the contribution of particle contact number was the highest (SHAP value 0.35 ± 0.12), followed by compaction times (0.30 ± 0.08). This finding was highly consistent with the verification results of hydropower station examples (the error between predicted compaction degree and actual value was less than 1%). This method revealed the quantitative relationship (R2 = 0.91, p < 0.01) of “increasing particle contact number → increasing coordination number → decreasing porosity → increasing compaction degree”, providing a scientific basis for the construction of RCC dam body from micro mechanism to macro performance. It should be pointed out that the research model was calibrated based on standard gradation (maximum particle size 40mm, continuous gradation curve) and fixed moisture content (6%) conditions. The grading deviation (such as oversized coarse aggregates or missing fine particles), fluctuation in moisture content (± 2%), and different types of binders (such as high calcium fly ash replacing ordinary cement) that exist in actual engineering may affect the accuracy of the model. Preliminary tests showed that when the gradation deviated from the standard curve by more than 15% or the moisture content changes by more than ± 1.5%, the model prediction error increased to 1.8–2.4%. Subsequent research will focus on: (1) developing a computer vision-based system for real-time aggregate gradation recognition to improve dynamic process control; (2) establishing a moisture-compaction coupling correction coefficient to account for material behavior under varying moisture and compaction conditions; and (3) investigating the transferability of SHAP feature importance across diverse cementitious systems to enhance model robustness under non-ideal working conditions.

Although this study achieved significant results, there are still some limitations: Firstly, although the assumption of spherical particles used in DEM simulation improved computational efficiency, it did not fully reflect the influence of the actual angular shape of aggregates on the distribution of contact force chains. Secondly, the model training data mainly came from simulation results under laboratory conditions. Although the feasibility was verified through examples of hydropower stations, its applicability in more complex geological environments, such as high-altitude or extreme temperature difference areas, still needs further validation. Finally, the real-time prediction framework relied on the sampling frequency of sensor data and the stability of network transmission, which may pose technical challenges in some remote construction sites. These limitations point out the improvement direction for future research, including the development of non-spherical particle modeling algorithm, the establishment of multi-climate zone verification database, and the optimization of edge computing architecture.

5. ACKNOWLEDGMENTS

This work was supported by the Open Research Fund Program of the Key Laboratory of Hydraulic Engineering Materials, Ministry of Water Resources (Preparation), under the project titled “Research and Development of New Technology and Equipment for Rapid Detection of Roller-Compacted Concrete Compaction Degree” (Grant No. EMF202502).

6. BIBLIOGRAPHY

  • [1] BOULGHEBAR, K., SADOK, A.H., BRAHMA, A., “Effect of recycled brick sand on mechanical and transfer properties of roller compacted concrete “RCC” used for dams”, Matéria, v. 30, n. 5, pp. 20240703, Apr. 2025. doi: http://doi.org/10.1590/1517-7076-rmat-2024-0703.
    » https://doi.org/10.1590/1517-7076-rmat-2024-0703
  • [2] YU, S., SHEN, S., “Compaction prediction for asphalt mixtures using wireless sensor and machine learning algorithms”, IEEE Transactions on Intelligent Transportation Systems, v. 24, n. 1, pp. 778–786, Jan. 2023. doi: http://doi.org/10.1109/TITS.2022.3218692.
    » https://doi.org/10.1109/TITS.2022.3218692
  • [3] MARJONO, M., ROCHMAN, T., “The review on the roller compacted concrete performance: The effect of compaction number on the compressive strength”, Civil Engineering and Architecture, v. 11, n. 5, pp. 2392–2404, Sep. 2023. doi: http://doi.org/10.13189/cea.2023.110511.
    » https://doi.org/10.13189/cea.2023.110511
  • [4] HASSOON, H.R., ABBAS, Z.K., “Analyzing lab and field compaction methods for designing Roller Compacted Concrete Pavements (RCCP) with different curing processes”, Engineering, Technology & Applied Science Research, v. 14, n. 5, pp. 17488–17493, Oct. 2024. doi: http://doi.org/10.48084/etasr.8614.
    » https://doi.org/10.48084/etasr.8614
  • [5] LAM, N.T.M., NGUYEN, D.L., LE, D.H., “Predicting compressive strength of roller-compacted concrete pavement containing steel slag aggregate and fly ash”, The International Journal of Pavement Engineering, v. 23, n. 3, pp. 731–744, Feb. 2022. doi: http://doi.org/10.1080/10298436.2020.1766688.
    » https://doi.org/10.1080/10298436.2020.1766688
  • [6] DEBBARMA, S., RANSINCHUNG, R.N.G.D., “Using artificial neural networks to predict the 28-day compressive strength of roller-compacted concrete pavements containing RAP aggregates”, Road Materials and Pavement Design, v. 23, n. 1, pp. 149–167, Jan. 2022. doi: http://doi.org/10.1080/14680629.2020.1822202.
    » https://doi.org/10.1080/14680629.2020.1822202
  • [7] HABIB, A., HOURI, A.A.L., HABIB, M., et al., “Structural performance and Finite Element modeling of roller compacted concrete dams: a review”, Latin American Journal of Solids and Structures, v. 18, n. 04, pp. e376, 2021. doi: http://doi.org/10.1590/1679-78256467.
    » https://doi.org/10.1590/1679-78256467
  • [8] HABIB, M., BASHIR, B., ALSALMAN, A., et al., “Evaluating the accuracy and effectiveness of machine learning methods for rapidly determining the safety factor of road embankments”, Multidiscipline Modeling in Materials and Structures, v. 19, n. 5, pp. 966–983, Jul. 2023. doi: http://doi.org/10.1108/MMMS-12-2022-0290.
    » https://doi.org/10.1108/MMMS-12-2022-0290
  • [9] HABIB, A., YILDIRIM, U., HABIB, M., “Applying Kernel principal component analysis for enhanced multivariable regression modeling of rubberized concrete properties”, Arabian Journal for Science and Engineering, v. 48, n. 4, pp. 5383–5396, Nov. 2023. doi: http://doi.org/10.1007/s13369-022-07435-8.
    » https://doi.org/10.1007/s13369-022-07435-8
  • [10] ABBAS, Z.K., “Roller compacted concrete: literature review”, Journal of Engineering, v. 28, n. 6, pp. 65–83, Jun. 2022.
  • [11] HASSOUN, H.R., ABBAS, Z.K., “Effect of production and curing methods on the properties of roller-compacted concrete: a review”, Journal of Engineering, v. 30, n. 9, pp. 58–73, Sep. 2024.
  • [12] ROESLER, J.R., OUELLET, J., CHEUNG, J.S., et al., “Compaction delay and temperature effects on early-age properties of roller-compacted concrete”, Transportation Research Record: Journal of the Transportation Research Board, v. 2678, n. 10, pp. 1569–1579, Oct. 2024. doi: http://doi.org/10.1177/ 03611981241239660.
    » https://doi.org/10.1177/03611981241239660
  • [13] SELVAM, M., SINGH, S., “Tailoring of compaction parameters of the vibratory table and vibratory hammer for roller compacted concrete pavements to resemble field properties”, Transportation Research Record: Journal of the Transportation Research Board, v. 2678, n. 5, pp. 242–258, 2024. doi: http://doi.org/10.1177/03611981231188719.
    » https://doi.org/10.1177/03611981231188719
  • [14] KHALID, M.Q., ABBAS, Z.K., “Recycled concrete aggregated for the use in roller compacted concrete: a literature review”, Journal of Engineering, v. 29, n. 3, pp. 142–153, Mar. 2023.
  • [15] OJHA, P., PRAKASH, S., SINGH, P., et al., “Effect of high ratio fly ash on roller compacted concrete for dam construction”, Research on Engineering Structures and Materials, v. 8, n. 2, pp. 233–251, Feb. 2022. doi: http://doi.org/10.17515/resm2022.374ma1216.
    » https://doi.org/10.17515/resm2022.374ma1216
  • [16] QIU, Y., ZHOU, J., KHANDELWAL, M., et al., “Performance evaluation of hybrid WOA-XGBoost, GWO-XGBoost and BO-XGBoost models to predict blast-induced ground vibration”, Engineering with Computers, v. 38, n. 5, pp. 4145–4162, Dec. 2022. doi: http://doi.org/10.1007/s00366-021-01393-9.
    » https://doi.org/10.1007/s00366-021-01393-9
  • [17] ASSELMAN, A., KHALDI, M., AAMMOU, S., “Enhancing the prediction of student performance based on the machine learning XGBoost algorithm”, Interactive Learning Environments, v. 31, n. 6, pp. 3360–3379, Aug. 2023. doi: http://doi.org/10.1080/10494820.2021.1928235.
    » https://doi.org/10.1080/10494820.2021.1928235
  • [18] BEN JABEUR, S., STEF, N., CARMONA, P., “Bankruptcy prediction using the XGBoost Algorithm and variable importance feature engineering”, Computational Economics, v. 61, n. 2, pp. 715–741, Feb. 2023. doi: http://doi.org/10.1007/s10614-021-10227-1.
    » https://doi.org/10.1007/s10614-021-10227-1
  • [19] BUDHOLIYA, K., SHRIVASTAVA, S.K., SHARMA, V., “An optimized XGBoost based diagnostic system for effective prediction of heart disease”, Journal of King Saud University-Computer and Information Sciences, v. 34, n. 7, pp. 4514–4523, Jul. 2022. doi: http://doi.org/10.1016/j.jksuci.2020.10.013.
    » https://doi.org/10.1016/j.jksuci.2020.10.013
  • [20] DEMIR, S., SAHIN, E.K., “An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost”, Neural Computing & Applications, v. 35, n. 4, pp. 3173–3190, Feb. 2023. doi: http://doi.org/10.1007/s00521-022-07856-4.
    » https://doi.org/10.1007/s00521-022-07856-4
  • [21] JABEUR, S.B., MEFTEH-WALI, S., VIVIANI, J.-L., “Forecasting gold price with the XGBoost algorithm and SHAP interaction values”, Annals of Operations Research, v. 334, n. 1–3, pp. 679–699, Mar. 2024. doi: http://doi.org/10.1007/s10479-021-04187-w.
    » https://doi.org/10.1007/s10479-021-04187-w
  • [22] PIRAEI, R., AFZALI, S.H., NIAZKAR, M., “Assessment of XGBoost to Estimate Total Sediment Loads in Rivers”, Water Resources Management, v. 37, n. 13, pp. 5289–5306, Oct. 2023. doi: http://doi.org/10.1007/s11269-023-03606-w.
    » https://doi.org/10.1007/s11269-023-03606-w
  • [23] CALIS, G., YILDIZEL, S.A., KESKIN, U.S., “Predicting compressive strength of color pigment incorporated roller compacted concrete via machine learning algorithms: a comparative study”, International Journal of Pavement Research and Technology, v. 17, n. 6, pp. 1586–1602, Nov. 2024. doi: http://doi.org/10.1007/s42947-023-00321-y.
    » https://doi.org/10.1007/s42947-023-00321-y
  • [24] ZHANG, G., HAMZEHKOLAEI, N.S., RASHNOOZADEH, H., et al., “Reliability assessment of compressive and splitting tensile strength prediction of roller compacted concrete pavement: introducing MARS-GOA-MCS”, The International Journal of Pavement Engineering, v. 23, n. 14, pp. 5030–5047, Nov. 2022. doi: http://doi.org/10.1080/10298436.2021.1990920.
    » https://doi.org/10.1080/10298436.2021.1990920
  • [25] HABIB, A., YILDIRIM, U., “Proposing unsupervised clustering-based earthquake records selection framework for computationally efficient nonlinear response history analysis of structures equipped with multi-stage friction pendulum bearings”, Soil Dynamics and Earthquake Engineering, v. 182, pp. 108732, Jul. 2024. doi: http://doi.org/10.1016/j.soildyn.2024.108732.
    » https://doi.org/10.1016/j.soildyn.2024.108732
  • [26] HABIB, A., YILDIRIM, U., “Influence of isolator properties and earthquake characteristics on the seismic behavior of RC structure equipped with quintuple friction pendulum bearings”, International Journal of Structural Stability and Dynamics, v. 23, n. 6, pp. 2350060, 2023. doi: http://doi.org/10.1142/S0219455423500608.
    » https://doi.org/10.1142/S0219455423500608
  • [27] HABIB, A., BARAKAT, S., AL-TOUBAT, S., et al., “Developing machine learning models for identifying the failure potential of fire-exposed frp-strengthened concrete beams”, Arabian Journal for Science and Engineering, v. 50, n. 11, pp. 8475–8490, Aug. 2024. doi: http://doi.org/10.1007/s13369-024-09497-2.
    » https://doi.org/10.1007/s13369-024-09497-2
  • [28] SHRIF, M., AL-SADOON, Z.A., BARAKAT, S., et al., “Optimizing gene expression programming to predict shear capacity in corrugated web steel beams”, Civil Engineering Journal, v. 10, n. 5, pp. 1370–1385, 2024. doi: http://doi.org/10.28991/CEJ-2024-010-05-02.
    » https://doi.org/10.28991/CEJ-2024-010-05-02

Publication Dates

  • Publication in this collection
    13 Oct 2025
  • Date of issue
    2025

History

  • Received
    15 Apr 2025
  • Accepted
    21 Aug 2025
location_on
Laboratório de Hidrogênio, Coppe - Universidade Federal do Rio de Janeiro, em cooperação com a Associação Brasileira do Hidrogênio, ABH2 Av. Moniz Aragão, 207, 21941-594, Rio de Janeiro, RJ, Brasil, Tel: +55 (21) 3938-8791 - Rio de Janeiro - RJ - Brazil
E-mail: revmateria@gmail.com
rss_feed Acompanhe os números deste periódico no seu leitor de RSS
Reportar erro