Abstract
The electrooxidation of glycerol offers a promising pathway for energy transition and biomass valorization, making it a key area of research. This study employs machine learning (ML) to predict the onset of glycerol electrooxidation and anodic peak potentials, enhancing the understanding of factors influencing these metrics. A dataset derived from 155 research articles includes parameters such as pH, electrolyte type, reference electrode, electrode material, current density, and scan rate. Fourteen ML algorithms were evaluated, with adaptive boosting (AdaBoost) achieving the best performance: root mean square error (RMSE) of 0.117 and coefficient of determination (R2) of 0.902 for onset potential and RMSE of 0.122 and R2 of 0.870 for anodic peak potential. Explainable artificial intelligence (XAI) techniques like Shapley additive explanations (SHAP) analysis identified pH, electrolyte type, and electrode properties (e.g., atomic number, electronegativity) as key predictors. Replacing elemental features with atomic properties improved performance and reduced complexity. This work demonstrates the potential of ML to optimize glycerol oxidation and advance alcohol electrooxidation research.
Keywords:
machine learning; biomass valorization; glycerol oxidation; energy transition; explainable AI
Introduction
The continuous global expansion of the economy and population has increased worldwide energy demand. Currently, the world energy matrix is sustained by fossil fuels coal, natural gas, and oil, which are non-renewable sources.1 Petroleum-based fuels are subject to geopolitical pressures and are directly associated with climate change, such as global warming caused by the burning of these fuels and the subsequent emission of carbon dioxide (CO2). Given this scenario, substantial efforts in scientific research and development have been directed toward the search for environmentally sustainable and renewable alternative energy sources, such as green hydrogen, renewable electricity, and bioenergy.2, 3, 4, 5
From this perspective, opportunities for clean energy generation from abundantly available chemical substances and the use of clean energy for fuel production through electrochemical processes have been the focus of significant investments.2,6,7 Bioenergy stands out as a promising alternative to fossil fuels due to the possibility of waste valorization and because it is considered a low-carbon emission energy source. Among biomass derivatives, glycerol emerges as a platform molecule of great interest.8, 9 Glycerol is a by-product of the biodiesel transesterification process, thus significant quantities that exceed the current market demand are generated as biodiesel production grows.1
The conversion of this surplus glycerol into energy and high-value-added products represents a strategic opportunity for transforming the energy matrix and valorizing biomass.10 Additionally, its application in direct glycerol fuel cells allows for simultaneous electricity generation and the production of value-added products, maximizing the efficiency of this raw material.3, 4, 11, 12, 13 Alternatively, glycerol can be used in electrolyzers, replacing the oxygen evolution reaction (OER) with glycerol oxidation, resulting in a more thermodynamically favorable overall reaction.14, 15
The electrochemical valorization of glycerol requires heterogeneous catalysts that combine efficiency, accessibility, and durability, especially for energy applications.16, 17 This process involves not only the catalytic material but also the entire engineering of the electrochemical system and its components. At the laboratory scale, variables such as the reference electrode, electrolyte, substrate, and pH significantly influence the experimental results and the possibilities for industrial scale. Nevertheless, catalyst development is largely based on trial and error methods, which result in high costs of time and resources.2, 3 Traditional catalyst discovery remains constrained by the high costs of experimental screening, where each new material requires synthesis, characterization, and testing which does not allow, in some cases, a full exploration of chemical possibilities.18, 19
In recent decades, computational methods have contributed to this process.20, 21, 22 However, these methods still face fundamental limitations, as density functional theory (DFT) calculations cannot efficiently explore the entire catalyst composition space in a fast and cost-effective manner due to their high computational demands.22, 23, 24
In this context, machine learning (ML) emerges as a complementary tool to address these challenges, enabling the identification of structure-property relationships directly from data, whether experimental or derived from DFT calculations for various catalytic systems.25
ML algorithms are powerful function approximators capable of capturing complex relationships in physical systems, provided that sufficient and high-quality data is available.20 This approach complements traditional computational methods such as DFT: while DFT derives properties from fundamental laws of quantum mechanics,22 ML identifies statistical relationships in experimental and theoretical data.25 The synergy between these methods has accelerated catalyst discovery, as demonstrated in several studies.26, 27, 28 When high-quality data is available and appropriate algorithms and descriptors are employed, ML models can achieve high accuracy and efficiently capture a wide range of nonlinear relationships. Rather than replacing DFT or experiments, ML serves as a powerful approach that integrates and statistically explores vast amounts of theoretical and experimental data, opening new frontiers in catalyst discovery.26, 29
ML models can make accurate predictions, but to understand these predictions and their relationships with the features of our problem, we can leverage XAI (explainable artificial intelligence) methods. These techniques help to interpret how different features influence model outputs, providing insights in a human-interpretable manner.30 The current XAI literature presents a variety of approaches, including feature importance analysis (e.g., Shapley additive explanations - SHAP and local interpretable model-agnostic explanations - LIME),31 surrogate models,32 and rule-based explanations,33 all of which enhance the transparency and interpretability of complex ML models.
Additionally, XAI methods play a critical role in maintaining model performance over time, as they allow practitioners to monitor and address issues related to bias, fairness, and unexpected behavior, thus ensuring that the models remain reliable and aligned with their intended purpose.30
To predict how variables in an electrochemical system influence the catalytic process, valuable information can be obtained through cyclic voltammetry.34 Two fundamental parameters for assessing the energy efficiency of an electrocatalyst are the onset potential and the oxidation peak potential. For oxidation reactions, the onset potential corresponds to the lowest potential at which product formation is observed at the electrode,35 while the anodic peak potentials (oxidation peak potential) indicate the potential of maximum oxidation current density.
An interesting recent example of the application of ML in electrochemistry, particularly for predicting alcohol oxidation potentials, is presented by von Zuben et al.36 In this work, the authors showed a methodology that uses information about the electrochemical reaction conditions and the characteristics of working electrode to predict the onset and anodic peak potentials for electrooxidation processes of alcohols. Building on their approach, we extended this methodology to predict the oxidation potentials for glycerol.
Predicting oxidation potential is a challenging task, particularly in catalytic systems, where the influence of the catalyst material must be considered. While literature37, 38 often focuses on standard oxidation potentials for organic compounds, studies on catalytic systems remain scarce due to their inherent complexity of heterogeneous electrocatalysis. This complexity arises from both the properties of the catalytic material and the interactions among experimental factors, making predictive modeling especially demanding and underexplored.
Thus, in the following sections, we present the methodology used to construct the database, detailing the processes of raw data handling required to enable the application of machine learning algorithms to predict the onset and oxidation potentials of glycerol. Subsequently, we discuss the results of 14 different ML algorithms, namely: decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), linear regression (LR), gradient boost regression (GBR), Bayesian ridge regression (BRR), Gaussian process regression (GPR), k-nearest neighbors regression (KNN), least absolute shrinkage and selection operator (LASSO), ridge, histogram-based gradient boosting (HGB), adaptive boosting (Adaboost), Catboost, and artificial neural network (ANN).
The results are assessed based on XAI methods (feature importance and SHAP) to evaluate model insights. The evaluation also considers training time, the complexity of hyperparameter tuning, and the predictive performance of the models (computational cost, generalizability, and scalability) for glycerol oxidation potentials across various catalysts and experimental conditions. Our adopted modeling approach allows for the screening of new catalysts, including those containing elements that are not present in the original dataset. This is achieved by improving how catalysts are described: instead of relying solely on categorical encoding (e.g., one-hot encoding) as in our previous work,36 the method now represents catalysts using continuous attributes derived from intrinsic elemental properties, such as atomic number, ionization energy, and electron affinity. This choice enables model generalization, allowing infrequent elements (such as niobium and thallium) or even absent ones (such as calcium and scandium) to be tested in predictions.
This approach provides a deeper understanding of how the experimental parameters related to oxidation potentials, with a particular focus on the top-performing models in this study. The focus is on reducing the time and resources required, promoting advances in the design of new catalysts, while offering a comprehensive perspective on these materials.
Methodology
The proposed approach for developing ML models to predict the oxidation peak and onset potentials of glycerol is illustrated in Figure 1. Initially, a database was constructed from literature data. Subsequently, the extracted data underwent processing, where parameters from the articles were standardized, and any missing information was completed. Finally, with the database prepared, feature selection was performed, followed by the evaluation of all 14 models.
The proposed methodology involved the selection of 155 papers. From their text, tables, and voltammograms, 307 examples related to glycerol oxidation were extracted to build the database. This database was then split randomly into training (80%) and testing (20%) sets for machine learning model development. Fourteen different algorithms were compared to predict the oxidation potential of glycerol. The results are assessed based on XAI methods, feature importance and SHAP, to evaluate model insights.
Data collection and preprocessing
A database was created by systematically gathering relevant data from scientific literature focused on the glycerol electrooxidation. A thorough search was conducted in the Scopus database through April 2024. It used the combination of keywords: (glycerol OR glycerin*) AND (electrooxidation OR oxidation) AND (catalyst OR electrocatalyst OR catalysis) AND NOT *photo* AND NOT steam AND reform* AND NOT hydrogenolysis)) AND PUBYEAR > 2013. This search returned 327 articles of interest, of which 155 made up the final database. The articles were selected based on their data quality, emphasizing those with detailed documentation. Priority was given to studies clearly outlining the electrode material, reference electrode, anodic peak potentials (oxidation peak potential), onset potential values, and experimental conditions. This effort yielded a robust dataset with 307 individual entries, each capturing a unique electrochemical context for glycerol oxidation. These scenarios include various factors such as the working electrode used, scan rate, current density, deposited materials, pH, electrolytes, and electrolyte concentration. The information was extracted from text, tables, and voltammetric images. For data that was not listed directly in tables or text, but was present exclusively in voltammograms, the open-source software Web Plot Digitizer 5.2 by Automeris LLC39 was used to obtain the necessary data.16
All data preprocessing, analysis and machine learning modeling were performed on the Google Colab platform40 using the Python programming language. Python libraries were used including Pandas 2.2 by NumFOCUS, Inc.,41 Numpy 1.26.4 by NumPy team,42 Scipy 1.13.1 by Scipy team,43 Scikit-learn,44 TensorFlow 2.17.0 by Google Brain Team,45 and Matplotlib 1.1.1 by The Matplotlib development team46 for data handling, numerical operations, model implementation, and visualization.
Following data collection and extraction, several preprocessing steps were performed. Thirteen different substrates were considered in the database: gold (Au), carbon cloath (CC), copper foam (CF), carbon paper (CP), fluorine doped-tin oxide (FTO), glassy carbon disc (GCD), glassy carbon electrode (GCE), graphite electrode (GE), nanoporous stainless steel (NPSS), nickel (Ni), platinum (Pt), self-supported catalysts (SELF) and titanium (Ti). When pH values were not provided in the literature, they were inferred from electrolyte concentrations using the pHCalc python library.47 Standardization of all potential to the normal hydrogen electrode (NHE) reference was accomplished using the equations detailed in the Supplementary Information section (SI section). The reference electrodes were treated as saturated across all cases due to frequent gaps in electrolyte concentration data.
In academic research,48 optimizing electrodes often involves adjusting material concentrations or morphologies. In this work, however, we primarily focus on the elemental composition of the catalysts (i.e., the deposited materials). This focus is based on the following considerations: (i) standardizing a substantial number of studies reporting catalyst compositions has proven challenging along this work; (ii) our main goal is to identify active elemental combinations while prioritizing a broader range of catalysts, even if this means some loss of precision in capturing small concentration-dependent variations, thereby reducing experimental trial and error. Thus, our model is better aligned with the reality of the available data. Although atomic proportions are not explicitly modeled, identifying key elements can accelerate catalyst design by narrowing the space of possible elemental combinations. Stoichiometric adjustments can be addressed in subsequent stages using other machine learning paradigms, such as active learning and Bayesian optimization, leveraging surrogate models to efficiently explore the compositional space.49, 50, 51, 52
Our database encompasses several key pieces of information: the DOI (digital object identifier) of the paper, year of publication, working electrode (WE) with corresponding element specification (WE_(element)), deposited material with corresponding element specification (deposited_material(elements)), the electrolyte used (electrolyte), solution pH (pH), electrolyte concentration in molarity (El_conc), onset potential relative to the normal hydrogen electrode (onset_pot), oxidation peak potential (anodic peak potentials) relative to the normal hydrogen electrode (ox_pot), disparity between the oxidation potential and onset potential (ox_onset), current density achieved at the maximum value of the first peak related to glycerol oxidation in mA cm-2 (current_density), the reference electrode used (RE), scan rate in mV s-1 (scan_rate) and the concentration of the analyte in molarity (A_concentration). In terms of database notation, features beginning with “WE” denote aspects related to the working electrode made with the specified element or substrate. For more detailed descriptions of the database and explanations of the features, please refer to SI section.
Before extracting the models, it was necessary to provide the labels and encode for the features. Initially, following the literature,53, 54 a one-hot encoding was used, transforming all obtained data into a binary representation. This approach resulted in a total of 72 features, which is a considerable number given the small dataset size: 306 entries. Such a high number of features can lead to what is known as the “curse of dimensionality”. The curse of dimensionality is a phenomenon where the increasing number of features in a dataset exponentially raises the sample requirement to capture meaningful relationships between points.55 With many dimensions and limited data, space becomes sparse, making it challenging to identify patterns and generate statistical uncertainties. This often leads to overriding, reducing the ability of the model to generalize new data. Thus, certain steps were taken to reduce the dimensionality.
The initial approach was inspired by the methodology proposed by Jiang et at.56 We observed that, among the catalysts present in our database, 7.18% contain only one chemical element, 35.95% contain two elements, 37.9% contain three elements, and 15.68% contain four elements. Therefore, a total of 96.15% of all entries include catalysts with four or fewer elements. Based on this analysis, we opted to ignore any element beyond the fourth and categorized the constituents of each catalyst into four classes: E1, E2, E3, and E4. Each element was represented by its corresponding atomic number, and in cases where a catalyst has fewer than four elements, empty spaces were filled with the value zero. For example, a catalyst containing platinum, bismuth, and carbon would be represented as E1 = 78, E2 = 83, E3 = 6, and E4 = 0, where the zero indicates the absence of a fourth element. This approach can be found depicted in Figure 2.
The “non-elemental” features approach describes the catalyst based on its chemical elements. Each element is further characterized by its atomic number, ionization energy, and electron affinity.
This methodology, however, does not account for “non-elemental” features. These features represent substances or classes of substances that, when processing the data, were considered relevant for a more faithful description of the composition of the catalysts. These features are rGO (reduced graphene oxide), G (graphene), CNT (carbon nanotube), Org (organic compounds or polymers) and OH (hydroxide). This reduced the dimensionality of the elemental components of the catalysts from 43 features to 9. Additionally, to incorporate an atomic property of these elements, three new features were introduced, namely, E1_av, E2_av, and E3/4_av. These features represent the average value between the ionization energy (in eV) and electron affinity (in eV) of the constituent elements. For the E3/4_av feature, we considered an average of E3_av and E4_av, aiming to enhance the ML models performance. For the sake of simplicity, these features will be called electronegativities, due to their similarity to Mulliken’s electronegativity theory, although they do not maintain the rigor of the original theory and its details.57
To reduce dimensionality even more, the categorical features “RE” and “electrolyte” were encoded numerically. Specifically, the electrolytes sulfuric acid (H2SO4), perchloric acid (HClO4), borate buffer (represented as Na2B4O7), potassium sulfate (K2SO4), sodium hydroxide (NaOH), and potassium hydroxide (KOH) were assigned the numerical values 0 through 5, respectively. Similarly, the reference electrodes, silver chloride electrode (Ag/AgCl), mercury-mercury oxide electrode (Hg/HgO), mercury-mercurous sulfate electrode (MSE), normal hydrogen electrode (NHE), reversible hydrogen electrode (RHE), saturated calomel electrode (SCE), and standard hydrogen electrode (SHE) were represented by the numerical values 0 through 6. Thus, a set of 33 features was obtained, which was later reduced to 29 features. This reduction was based on the manual evaluation of the prediction capability of a random forest algorithm without any hyperparameter tuning. In this context, the features current_density, WE_NPSS (nanoporous stainless steel working electrode), WE_Ti (titanium working electrode), and WE_CF (copper foam working electrode) were excluded. It is worth highlighting that for the design of new materials, it is also interesting not to consider the onset values for predicting the oxidation peak potential and vice versa, as these values are not manipulable but rather characteristics of a material to be synthesized. As a result of feature processing, the feature dimensionality was reduced from 72 to 29 features. Once the relevant features were identified, a variety of regression algorithms were evaluated, as previously mentioned.
For more detailed descriptions of these features approach, please refer to the SI section (Table S2).
Performance of the models
The extensive database of 306 entries was utilized for both training and testing the ML models. The algorithms were implemented using the Python library scikit-learn. To ensure balanced representation, the dataset was stratified randomly to maintain proportional distribution between the training and test sets. Specifically, 80% of the data was allocated for training, while the remaining 20% was reserved for testing and evaluation.
To assess the model-building algorithms, we optimized critical hyperparameters, particularly the number of trees and nodes, as highlighted in the literature26, 27 for decision tree-based models. In this study, 5-fold cross-validation was used, rotating subsets of training data to simulate a test set and assess for underfitting or overfitting. Hyperparameter tuning was conducted using grid search and random search with 5-fold cross-validation, ensuring optimal model performance, accuracy and robustness (consistency across diverse datasets/conditions). To evaluate the performance of the model, we used four standard metrics: mean squared error (MSE), root mean squared error (RMSE), coefficient of determination (R2), and training time (in s), with RMSE serving as the primary evaluation criterion. Additionally, we applied feature importance analysis SHAP to gain insights into the models.
A brief description of the regression algorithms and interpretation methods applied is provided in the sequence.
Regression algorithms
(i) Linear regression: identifies a linear relationship between the input features and the target variable. This algorithm calculates the slope and intercept of the line that best fits the data, adjusting the parameters to minimize the sum of the squared differences between the observed and predicted values.58
(ii) Decision tree: is a supervised learning algorithm that splits the data into subsets based on the value of the input features. The model is constructed by recursively selecting the feature that best splits the data at each node, aiming to maximize the homogeneity of the target variable within each subset. The process continues until a stopping criterion is met at which point the tree assigns a prediction for the target variable.59
(iii) Random forest: improves regression models by creating an ensemble of decision trees. Each tree is constructed using a random subset of the training data and features. The process involves recursively splitting the data into branches based on specific input feature thresholds until a leaf node is reached, which provides the final prediction of the tree. At each decision node, the algorithm determines whether an input feature surpasses a certain threshold value. These splitting divides the data into subsets, allowing each tree to capture distinct patterns and relationships. The final prediction is obtained by averaging the predictions from all the individual trees.60
(iv) XGBoost: is an algorithm that enhances regression models by building an ensemble of decision trees, where each tree corrects the errors of the previous ones. It uses gradient calculations to adjust trees and minimize prediction errors. XGBoost also includes regularization to control tree complexity and reduce overfitting. It assigns weights to trees and samples, prioritizing correcting misclassified samples in future iterations. This method creates a strong model that balances complexity with accuracy for reliable predictions.61, 62
(v) Gradient boost regression: is an ensemble technique that builds sequential trees, where each new tree aims to correct the errors of the previous ones. It optimizes the loss function by computing the gradient of the residuals and fitting new trees to minimize this error. GBR is highly effective for both regression and classification tasks, as it leverages the strengths of multiple weak learners to produce a strong overall model. Its flexibility, combined with various regularization techniques, helps to reduce overfitting.63
(vi) Bayesian ridge regression: Bayesian inference applies to linear regression, introducing a prior distribution over the model parameters. This prior is updated based on the observed data, resulting in a posterior distribution. The method adds regularization by assuming that the coefficients are random variables with a Gaussian prior, which prevents overfitting, especially in cases of multicollinearity. The result is a more robust model, especially when the dataset is small or noisy.64
(vii) Gaussian process regression: is a non-parametric regression method that models the distribution of potential functions that could explain the observed data. Instead of directly predicting a target value, GPR predicts a distribution of possible values, allowing for both the mean prediction and uncertainty estimation. This method is particularly useful when dealing with noisy or sparse data and provides a flexible and probabilistic approach to regression.65
(viii) KNN regression: predictions are made by averaging the target values of the k nearest training examples to the input. The distance between examples is calculated using a metric such as Euclidean distance. The algorithm does not make strong assumptions about the data, making it useful for nonlinear problems.66
(ix) LASSO: is a linear regression technique that applies L1 regularization, which adds a penalty equal to the absolute value of the magnitude of the coefficients. This penalty forces some coefficients to become exactly zero, effectively performing feature selection. LASSO is useful when dealing with high-dimensional data and aims to balance model complexity with predictive accuracy.67
(x) Ridge regression: is a linear regression method that applies L2 regularization, which adds a penalty proportional to the square of the magnitude of the coefficients. This prevents the model from becoming overly complex, particularly in cases of multicollinearity, where the independent variables are highly correlated. Ridge regression shrinks the coefficients but does not set any of them to zero, making it useful for multicollinear datasets.68
(xi) Histogram-based gradient boosting: is a variant of gradient boosting that accelerates the learning process by discretizing continuous input features into histograms, making it more efficient when working with large datasets. Instead of evaluating every possible split for continuous features, this method calculates potential splits based on the histogram bins, significantly reducing the computational cost without sacrificing model accuracy.69
(xii) AdaBoost (adaptive boosting): is an ensemble method that combines multiple weak learners, typically decision stumps (trees with one split), to create a stronger predictor. Each subsequent model is trained to focus more on examples that were misclassified by previous models. The final prediction is a weighted average of all the individual model predictions, with models that perform better given more weight. AdaBoost is highly effective for reducing both bias and variance in models.70
(xiii) CatBoost: is a gradient boosting algorithm specifically designed to handle categorical features efficiently, without the need for extensive preprocessing. It uses ordered boosting to prevent overfitting and incorporates specialized methods for dealing with categorical variables, making it particularly useful in situations where the dataset contains many such variables. CatBoost is also known for its scalability and high performance on large datasets.71
(xiv) Artificial neural network: are inspired by the human brain and consist of layers of interconnected nodes (neurons) that process input data and adjust their weights based on the error of the predictions. These models are highly flexible and capable of capturing complex, nonlinear relationships in the data. ANNs are trained using backpropagation, where the error from the output layer is propagated back through the network to update the weights and reduce the overall error. This process enables the model to learn patterns from data for tasks such as regression, classification, and more.72
Model interpretation methods
(i) Feature importance: measures the impact of each feature on a model’s predictions. It helps to identify which features have the greatest influence on the target variable and provides insights into the relationships between the input features and the predicted outcome.73, 74 This method applies to tree-based models (DT, RF, XGBoost, CatBoost, AdaBoost, GBR, and HGB).
(ii) SHAP: is a powerful interpretability tool in machine learning that explains the contribution of each feature to model predictions. SHAP treats each feature as a participant, calculating its impact on predictions by comparing how altering feature values changes the output, starting from a baseline prediction like the average. It evaluates all possible feature combinations to assign Shapley values, which precisely measure the influence of each feature. SHAP offers both global insights into feature importance and detailed explanations for individual predictions.75, 76
Results and Discussion
Data curation outcomes
Using a compiled database, an in-depth assessment was performed to verify the reliability and relevance of the collected data. Visualizations, with percentage values displayed on the y-axis, highlight the information distribution within the database.
Figure 3 displays data on the typical electrochemical conditions under which glycerol oxidation reactions are conducted. In terms of electrolyte usage (Figure 3a), KOH is shown to be the most used, followed by NaOH. This trend is reflected in the pH analysis (Figure 3c), where glycerol oxidation typically occurs under high pH conditions, around pH 13. Regarding the reference electrode, Figure 3b indicates the reversible hydrogen electrode (RHE) as the standard. This finding supports the standardization of electrochemical results, even though the actual reference electrode used may vary.
Histograms of the analysis for glycerol regarding: (a) electrolyte, (b) reference electrode, and (c) pH.
Regarding the electrochemical reaction outcomes, Figure 4 illustrates the findings related to onset potential (Figure 4a), oxidation (anodic) peak potential (Figure 4b) and achieved current density (Figure 4c). For glycerol oxidation, the most common peak potential is approximately 0.1 V, while the onset potential is around −0.2 V. In terms of current density, the values do not typically exceed 50 mA cm-2.
Histograms of the analysis for glycerol regarding: (a) onset potential, (b) oxidation potential, and (c) current density.
Figure 5 visually represents the elements found in electrodes used for glycerol oxidation. Figure 5a presents the elements deposited on the substrate, with platinum (Pt) being the most common, followed by carbon. It is worth noting that carbon also appears very frequently in other forms, such as reduced graphene oxide (rGO) and carbon nanotubes (CNT). The most used substrate is the glass carbon electrode (GCE), as shown in Figure 5b. The metal elements used as substrates include nickel (Ni), platinum (Pt), gold (Au), and palladium (Pd). Figure 5c shows the relationship between the components used to form an electrode, indicating that most electrodes consist of three components.
Representation of the chemical elements in the working electrode. Histogram (a) illustrates the order of chemical elements and compounds usage; histogram (b) denotes the material of the substrate; histogram (c) presents the relationship between the percentage of entries and the number of chemical elements present in the working electrodes.
Figure 6 visually represents the elements found in electrodes used for glycerol oxidation. Carbon, in its various forms, is the most used element. Additionally, the analysis reveals the frequent use of transition metals from groups d6, d7, d8, and d9 in the electrodes, with the exclusion of elements from the last period of the periodic table.
Representation of the chemical elements in WE for glycerol oxidation. In the periodic table, the intensity of yellow shading reflects the frequency of the presence of each element in the electrode, while green indicates elements that are not utilized (adapted from reference 36).
Machine learning models
In our work, one of the main objectives was to develop predictive models that can help us evaluate and design new high-performance catalysts for glycerol oxidation. This performance is assessed by the onset potential and oxidation potential. We chose to evaluate a relatively small dataset across a range of different machine learning models, highlighting those with the best and worst predictive accuracy. Accuracy here stands for how closely the model predictions match the experimental data through statistical metrics. Ensemble techniques (e.g., RF, AdaBoost) demonstrated the highest predictive capability for the constructed models employing multiple decision trees to improve prediction accuracy. These methods stand out as they significantly reduce the risk of overfitting by averaging the predictions of numerous models, thus providing more stable and generalized results. Moreover, they enhance interpretability, allowing us to gain insights into the importance of various features. Both methods typically require minimal hyperparameter tuning, making them more user-friendly compared with other complex models.
Onset potential predictions
To predict the onset potential, 29 previously presented features from the database were used to train the model. The same feature set was used for all 14 algorithms employed in both studies: onset potential and oxidation (anodic) peak potential. Among the 14 algorithms explored, the Adaboost algorithm demonstrated superior performance for this prediction task, presenting the lowest RMSE, MSE and higher R2 (Figure 7). The model produced the following evaluation metrics: MSE = 0.014, RMSE = 0.118, R2 = 0.903.
Evaluation of algorithms for the onset potential model based on RMSE and R2 (a), and the average training time of the algorithms (b).
As shown in Figure 7a, the decision tree-based models (DT, RF, XGBoost, CatBoost, AdaBoost, GBR, and HGB) exhibit the best R2 and RMSE values. As expected, these models are highly effective in capturing complex and nonlinear relationships, providing superior performance by accounting for the influence of multiple variables on the outcomes.
Linear models (LR, LASSO, and ridge) are highly efficient due to their short training times, as shown in Figure 7b, and minimal need for hyperparameter tuning. However, they mainly serve as baselines for capturing linear relationships and struggle with nonlinear complexities, leading to the poorest performance. Bayesian models (BRR, GPR) and instance-based models (KNN) rely on statistical principles and proximity metrics, respectively, and often require large datasets, which is not the case in this study. ANNs, although powerful in modeling intricate patterns, are sensitive to data quality and hyperparameter tuning, and demand large datasets. Thus, these models also exhibit low predictive capability in this context.
It is worth noting that decision tree-based models required the longest training times. However, given the limited number of features and data, these time differences are in the order of seconds, which does not result in significant time consumption. While training time is reported for completeness, it is not a limiting factor in this study, as the primary focus is on prediction accuracy rather than speed, especially considering that the experimental process remains the true bottleneck.
The relationship between the actual oxidation potential values and the corresponding predicted values for the test dataset is shown in Figure 8. The predictions that align most closely with actual values are concentrated around −0.25 V, which are more typical of the dataset (Figure 4a). Figure 8b illustrates the feature importance method, where the importance of each feature was evaluated based on the best-performing model, AdaBoost. It is evident that E1_av stands out as the most important feature for the model, demonstrating how its inclusion improved the predictions. The pH and the difference between onset and oxidation potential are the next two most important features, followed by the first element of the deposited material (E1) and the electrolyte. These “non-elemental” features, such as E1_av, E1, E2, E3, E3/4_av, and E4, proved to be significant for the model, with their inclusion contributing to the improved performance of the models.
(a) Relationship between the real values and predicted values for the test dataset. (b) Importance of input variables (features) for AdaBoost trained to predict oxidation potential. The black bars indicated the standard deviation for each feature across 9 decision trees.
In Figure 9, the SHAP values are visualized, providing insights into how different features impact potential prediction values. Through a global interpretation of the model, we identified the ten most relevant features for our experiments.
Feature impact in the models according to SHAP values for (a) onset potential model using AdaBoost algorithm; (b) oxidation model potential using AdaBoost algorithm.
In Figure 9a, it is evident that the five most important features identified in Figure 8b remain consistent. A stronger impact of E1_av on the model is observed as the values of these electronegative elements progressively decrease. Similarly, electrolytes with lower values strongly influence the model, just as lower pH values do. These conclusions are interdependent, as it is well-established that oxidation potentials for glycerol are significantly worse in acidic environments. In this context, the pH of the electrolyte strongly dictates the “range” of the potential values, with lower pH values, typically associated with acidic electrolytes, exerting a major influence on the model. This aligns with the observed trend of higher oxidation and onset potential in acidic conditions.
In Figure 10, we can observe the impact of each feature on the predicted value for a given instance. In the provided example, the initial average prediction for the set of reference model is −0.104 V. We can then see how each feature contributes to shifting the observed value for the onset potential oxidation to the predicted value of −0.306 V (which matches exactly with the reported value for this instance). Notably, we observe how the electronegativity of palladium and the choice of KOH as the electrolyte (basic medium) play a significant role in bringing the prediction closer to the true value.
Waterfall plot of SHAP values for a selected instance in the dataset, illustrating the contributions of features to the prediction of onset oxidation potential using the AdaBoost algorithm.
Oxidation potential predictions
To predict the oxidation potential, the same process used in case of onset potential was applied. Among the 14 algorithms explored, the AdaBoost model once again demonstrated superior performance for this prediction task, presenting the lowest RMSE, MSE and higher R2 (Figure 11a). The model produced the following evaluation metrics: MSE = 0.0149, RMSE = 0.1221, and R2 = 0.8705.
Evaluation of algorithms for the oxidation potential model based on RMSE and R2 (a), and the average training time of the algorithms (b).
The trends observed for the onset potential remain consistent for the oxidation potential. Decision tree-based models demonstrate superior metrics, whereas linear models, KNN, Bayesian models, and ANNs exhibit lower predictive performance. As expected, the training time follows the same trend, with linear models requiring the shortest times and decision tree-based models requiring the longest as shown in Figure 11b.
In Figure 12, we once again observe how each feature influences the predicted oxidation potential for a given instance. In this example, the initial average prediction for the set of reference model is 0.21 V. We can then track how each feature contributes to shifting this value toward the predicted oxidation potential of 0.886 V (which deviates from the actual value by −0.03984V). Notably, the electrolyte (H2SO4) emerges as the primary factor driving the increase in oxidation potential, aligning with findings in the literature77 on glycerol oxidation. It is also interesting to note that the presence of tellurium as the second element, an element rarely found in the dataset, contributes to a more accurate prediction of the oxidation potential. Additionally, the acidic pH plays a significant role in increasing this potential.
Waterfall plot of SHAP values for a selected instance in the dataset, illustrating the contributions of features to the prediction of oxidation potential using the AdaBoost algorithm.
Figure 13b illustrates the importance of each feature for the AdaBoost model, based on the best results from feature importance. Like the onset potential model, the features “ox_onset,” “pH,” and “electrolyte” once again showed significance. The E1_av feature also proved important, highlighting the effectiveness of this approach. The A_concentration appears with greater importance compared to the onset model; however, the most important features remain similar, with some changes in their order of importance. In Figure 9b, the SHAP values for the oxidation potential model are visualized. In this case, similar trends are observed, though with a change in the order of importance of the features. Still, trends such as higher pH and electrolyte values leading to higher predicted potentials can be seen. Conversely, the opposite trend seems to hold true as well. It is notable that, for both the onset model (Figure 9a) and the oxidation model (Figure 9b), the same features retain their greatest importance. However, the El_conc feature, which was of high importance for the onset model, is replaced by rGO, which showed significant importance in the oxidation model.
(a) Relationship between the real values and predicted values for the test dataset. (b) Importance of input variables (features) for AdaBoost trained to predict oxidation potential. The black bars indicated the standard deviation for each feature across 40 decision trees.
Comparison between algorithms
To analyze the algorithms used for the ML models, a comparison was established among them, as shown in Table 1 and Figure 14, aiming to determine which algorithms might be better suited for this type of electrochemical problem, which involves a relatively small database and a high number of features. The comparison focused on training time for onset potential prediction (Figure 7b), the complexity of hyperparameter tuning (where simpler tuning was preferred), and prediction accuracy for onset and oxidation potentials based on RMSE values. Linear model algorithms tend to offer an easy and fast application. However, they produce poor results for potential prediction, with errors exceeding 0.259 V. Tree-based models, though more complex and time-consuming to apply, yield excellent results for potential prediction, especially AdaBoost and Catboost, with errors lower than 0.138 V. Bayesian and instance-based models appear to perform the worst, requiring significant time and effort to apply and delivering poor prediction results with errors reaching up to 0.297 V. While neural networks have a training time and hyperparameter tuning complexity like tree-based models, their prediction error is considerably higher.
Comparison between the ML algorithms, showing the training time of the onset models, the RMSE of onset potential prediction, the RMSE of the oxidation potential prediction and the complexity of hyperparameter tuning
Assessment of ML algorithms: training time, hyperparameter tuning complexity, and prediction accuracy for onset and oxidation potentials, and general model assessment based on the characteristics analyzed.
Conclusions
This study demonstrates the potential of machine learning to predict oxidation potentials in glycerol electrooxidation reactions. Using a comprehensive dataset compiled from 155 research articles, we trained 14 different machine learning models. The AdaBoost model achieved the best performance, with an RMSE of 0.118 and R2 of 0.9 for onset potential prediction, and an RMSE of 0.122 and R2 of 0.87 for oxidation potential prediction.
To enhance model interpretability, we applied feature importance and SHAP analyses, identifying variables such as pH, electrolyte type, and electrode properties as the most significant for predicting oxidation potentials. Previous work17 used elemental representations as features, which resulted in many features. Notably, we adopted a non-elemental representation approach, replacing individual elemental features with atomic number and electronegativity values. This approach not only reduced the feature count but also improved prediction accuracy, underscoring its efficiency for machine learning applications in electrochemistry.
These findings highlight the capacity of the machine learning to accelerate research in electrocatalysis, aiding in the optimization of catalysts for glycerol electrooxidation. This work underscores the importance of computational tools in advancing biomass valorization, where glycerol can be transformed into high-value-added products, supporting a transition to a more sustainable energy landscape.
Finally, this study lays the groundwork for future research by exploring machine learning applications across other electrochemical systems. Expanding this approach to additional features and developing even more robust models holds potential for revolutionary advancements in electrocatalysis, guiding both the discovery of new materials and the optimization of electrochemical processes for diverse applications.
Although our models perform well, ensuring generalizability to unseen compositions requires larger datasets and the adoption of new machine learning paradigms, such as active learning, which can iteratively expand data coverage.
Supplementary Information
Supplementary data is available free of charge at http://jbcs.sbq.org.br as PDF file.
Acknowledgments
All authors are grateful for the financial support of Brazilian funding agencies. This study was financed in part by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) -finance code 001, Conselho Nacional de Desenvolvimento Científico e Tecnológico CNPq (308203/2021-6) and Fundação de Amparo à Pesquisa do Estado de São Paulo, FAPESP (grant No. 2017/11986-5, grant No. 2021/05976-2 and grant No. 2024/09666-6).
Data Availability Statement
The complete machine learning code and dataset is available at: https://github.com/cypr1ss/machine-learning-algorithms-for-predicting-the-oxidation-potential-of-glycerol.git.
References
-
1 Fan, L.; Liu, B.; Liu, X.; Senthilkumar, N.; Wang, G.; Wen, Z.; Energy Technol. 2021, 9, 2000804. [Crossref]
» Crossref -
2 Mai, H.; Le, T. C.; Chen, D.; Winkler, D. A.; Caruso, R. A.; Chem. Rev. 2022, 122, 13478. [Crossref]
» Crossref -
3 Othman, P.; Karim, N.; Kamarudin, S.; Chem. Phys. 2023, 564, 111711. [Crossref]
» Crossref -
4 Palmara, G.; Carvajal, D.; Zanatta, M.; Mas-Marza, E.; Sans, V.; Curr. Res. Green Sustainable Chem. 2023, 7, 100386. [Crossref]
» Crossref -
5 Babar, P.; Mahmood, J.; Maligal-Ganesh, R. V.; Kim, S. J.; Xue, Z.; Yavuz, C. T.; J. Mater. Chem. A 2022, 10, 20218. [Crossref]
» Crossref -
6 Birdja, Y. Y.; Pérez-Gallent, E.; Figueiredo, M. C.; Göttle, A. J.; Calle-Vallejo, F.; Koper, M. T. M.; Nat. Energy 2019, 4, 732. [Crossref]
» Crossref -
7 Chi, H.; Liang, Z.; Kuang, S.; Jin, Y.; Li, M.; Yan, T.; Lin, J.; Wang, S.; Zhang, S.; Ma, X.; Nat. Commun. 2025, 16, 979. [Crossref]
» Crossref -
8 Wang, S.; Lin, Y.; Li, Y.; Tian, Z.; Wang, Y.; Lu, Z.; Ni, B.; Jiang, K.; Yu, H.; Wang, S.; Yin, H.; Chen, L.; Nat. Nanotechnol. 2025. [Crossref]
» Crossref -
9 He, Z.; Hwang, J.; Gong, Z.; Zhou, M.; Zhang, N.; Kang, X.; Han, J. W.; Chen, Y.; Nat. Commun. 2022, 13, 3777. [Crossref]
» Crossref -
10 Coutanceau, C.; Zalineeva, A.; Baranton, S.; Simoes, M.; Int. J. Hydrogen Energy 2014, 39, 15877. [Crossref]
» Crossref -
11 Coutanceau, C.; Baranton, S.; Kouamé, R. S. B.; Front. Chem. 2019, 7, 100. [Crossref]
» Crossref -
12 Liu, J.; Liu, H.; Wang, Q.; Li, T.; Yang, T.; Zhang, W.; Xu, H.; Li, H.; Qi, X.; Wang, Y.; Cabot, A.; Chem. Eng. J. 2024, 486, 150258. [Crossref]
» Crossref -
13 Hamada, T.; Chiku, M.; Higuchi, E.; Randall, C. A.; Inoue, H.; ACS Appl. Energy Mater. 2024, 7, 1970. [Crossref]
» Crossref -
14 Wu, G.; Dong, X.; Mao, J.; Li, G.; Zhu, C.; Li, S.; Chen, A.; Feng, G.; Song, Y.; Chen, W.; Wei, W.; Chem. Eng. J. 2023, 468, 143640. [Crossref]
» Crossref -
15 Yonamine, N. C.; Zanata, C. R.; de Souza, M. B. C.; Fernandez, P. S.; Wender, H.; Martins, C. A.; ACSAppl. Mater. Interfaces 2024, 16, 18918. [Crossref]
» Crossref -
16 Chen, W.; Zhang, L.; Xu, L.; He, Y.; Pang, H.; Wang, S.; Zou, Y.; Nat. Commun. 2024, 15, 2420. [Crossref]
» Crossref -
17 Oh, L. S.; Han, J.; Lim, E.; Kim, W. B.; Kim, H. J.; Catalysts 2023, 13, 892. [Crossref]
» Crossref -
18 Li, Z.; Wang, S.; Xin, H.; Nat. Catal. 2018, 1, 641. [Crossref]
» Crossref -
19 Xin, H.; Mou, T.; Pillai, H. S.; Wang, S. H.; Huang, Y.; Acc. Mater. Res. 2024, 5, 22. [Crossref]
» Crossref -
20 Chen, B.; Miao, H.; Hu, R.; Yin, M.; Wu, X.; Sun, S.; Wang, Q.; Li, S.; Yuan, J.; J. Alloys Compd. 2021, 852, 157012. [Crossref]
» Crossref -
21 Greeley, J.; Annu. Rev. Chem. Biomol. Eng. 2016, 7, 605. [Crossref]
» Crossref -
22 Nørskov, J. K.; Abild-Pedersen, F.; Studt, F.; Bligaard, T.; Proc. Natl. Acad. Sci. 2011, 108, 937. [Crossref]
» Crossref -
23 Wan, X.; Zhang, Z.; Yu, W.; Guo, Y.; Mater. Rep.: Energy 2021, 1, 100046. [Crossref]
» Crossref -
24 Steinmann, S. N.; Wang, Q.; Seh, Z. W.; Mater. Horiz. 2023, 10, 393. [Crossref]
» Crossref -
25 Wang, T.; Wu, Q.; Han, Y.; Guo, Z.; Chen, J.; Liu, C.; Appl. Phys. Rev. 2025, 12, 011316. [Crossref]
» Crossref -
26 Fiedler, L.; Shah, K.; Bussmann, M.; Cangi, A.; Phys. Rev. Mater. 2022, 6, 040301. [Crossref]
» Crossref -
27 Shi, J.; Pršlja, P.; Jin, B.; Suominen, M.; Sainio, J.; Jiang, H.; Han, N.; Robertson, D.; Košir, J.; Caro, M.; Kallio, T.; Small 2024, 20, 2402190. [Crossref]
» Crossref -
28 Miao, L.; Jia, W.; Cao, X.; Jiao, L.; Chem. Soc. Rev. 2024, 55, 2771. [Crossref]
» Crossref -
29 Shi, Y. F.; Yang, Z. X.; Ma, S.; Kang, P. L.; Shang, C.; Hu, P.; Liu, Z. P.; Engineering 2023, 27, 70. [Crossref]
» Crossref -
30 Gunning, D.; Stefik, M.; Choi, J.; Miller, T.; Stumpf, S.; Yang, G. Z.; Sci. Rob. 2019, 4, eaay 7120. [Crossref]
» Crossref -
31 Salih, A. M.; Raisi-Estabragh, Z.; Galazzo, I. B.; Radeva, P.; Petersen, S. E.; Lekadir, K.; Menegaz, G.; Adv. Intell. Syst. 2025, 7, 2400304. [Crossref]
» Crossref -
32 Wilhelm, A.; Zweig, K. A.; ArXiv 2024. [Crossref]
» Crossref -
33 Macha, D.; Kozielski, M.; Wróbel, Ł.; Sikora, M.; SoftwareX 2022, 20, 101209. [Link] accessed in May 2025
» Link -
34 Elgrishi, N.; Rountree, K. J.; McCarthy, B. D.; Rountree, E. S.; Eisenhart, T. T.; Dempsey, J. L.; J. Chem. Educ. 2018, 95, 197. [Crossref]
» Crossref -
35 Sanchis-Gual, R.; da Silva, A. S.; Coronado-Puchau, M.; Otero, T. F.; Abellán, G.; Coronado, E.; Electrochim. Acta 2021, 388, 138613. [Crossref]
» Crossref -
36 von Zuben, T. W.; Salles, A. G.; Bonacin, J. A.; Barbon, S.; Electrochim. Acta 2025, 509, 145285. [Crossref]
» Crossref -
37 Fedorov, R.; Gryn’ova, G.; J. Chem. Theory Comput. 2023, 19, 4796. [Crossref]
» Crossref -
38 Jia, L.; Brémond, E.; Zaida, L.; Gaüzère, B.; Tognetti, V.; Joubert, L.; J. Comput. Chem. 2024, 45, 2383. [Crossref]
» Crossref -
39 Drevon, D.; Fursa, S. R.; Malcolm, A. L.; Behav. Modif. 2017, 41, 323. [Crossref]
» Crossref -
40 Google Colaboratory, https://colab.research.google.com, accessed in May 2025.
» https://colab.research.google.com -
41 Gupta, P.; Bagchi, A.; Essentials of Python for Artificial Intelligence and Machine Learning; Springer: Cham, Switzerland, 2024, ch. 5. [Crossref]
» Crossref -
42 Harris, C. R.; Millman, K. J.; van der Walt, S. J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N. J.; Kern, R.; Picus, M.; Hoyer, S.; van Kerkwijk, M. H.; Brett, M.; Haldane, A.; Del Río, J. F.; Wiebe, M.; Peterson, P.; Gérard-Marchant, P.; Sheppard, K.; Reddy, T.; Weckesser, W.; Abbasi, H.; Gohlke, C.; Oliphant, T. E.; Nature 2020, 585, 357. [Crossref]
» Crossref -
43 Virtanen, P.; Gommers, R.; Oliphant, T. E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; van der Walt, S. J.; Brett, M.; Wilson, J.; Millman, K. J.; Mayorov, N.; Nelson, A. R. J.; Jones, E.; Kern, R.; Larson, E.; Carey, C. J.; Polat, İ.; Feng, Y.; Moore, E. W.; VanderPlas, J.; Laxalde, D.; Perktold, J.; Cimrman, R.; Henriksen, I.; Quintero, E. A.; Harris, C. R.; Archibald, A. M.; Ribeiro, A. H.; Pedregosa, F.; van Mulbregt, P.; Vijaykumar, A.; Bardelli, A. P.; Rothberg, A.; Hilboll, A.; Kloeckner, A.; Scopatz, A.; Lee, A.; Rokem, A.; Woods, C. N.; Fulton, C.; Masson, C.; Häggström, C.; Fitzgerald, C.; Nicholson, D. A.; Hagen, D. R.; Pasechnik, D. V.; Olivetti, E.; Martin, E.; Wieser, E.; Silva, F.; Lenders, F.; Wilhelm, F.; Young, G.; Price, G. A.; Ingold, G. L.; Allen, G. E.; Lee, G. R.; Audren, H.; Probst, I.; Dietrich, J. P.; Silterra, J.; Webber, J. T.; Slavič, J.; Nothman, J.; Buchner, J.; Kulick, J.; Schönberger, J. L.; Cardoso, J. V. M.; Reimer, J.; Harrington, J.; Rodríguez, J. L. C.; Nunez-Iglesias, J.; Kuczynski, J.; Tritz, K.; Thoma, M.; Newville, M.; Kümmerer, M.; Bolingbroke, M.; Tartre, M.; Pak, M.; Smith, N. J.; Nowaczyk, N.; Shebanov, N.; Pavlyk, O.; Brodtkorb, P. A.; Lee, P.; McGibbon, R. T.; Feldbauer, R.; Lewis, S.; Tygier, S.; Sievert, S.; Vigna, S.; Peterson, S.; More, S.; Pudlik, T.; Oshima, T.; Pingel, T. J.; Robitaille, T. P.; Spura, T.; Jones, T. R.; Cera, T.; Leslie, T.; Zito, T.; Krauss, T.; Upadhyay, U.; Halchenko, Y. O.; Vázquez-Baeza, Y.; Nat. Methods 2020, 17, 261. [Crossref]
» Crossref -
44 Hao, J.; Ho, T. K.; J. Educ. Behav. Stat. 2019, 44, 348. [Crossref]
» Crossref -
45 Pang, B.; Nijkamp, E.; Wu, Y. N.; J. Educ. Behav. Stat. 2020, 45, 227. [Crossref]
» Crossref -
46 Bisong, E.; Building Machine Learning and Deep Learning Models on Google Cloud Platform; Apress: Berkeley, CA, 2019, ch. 12. [Crossref]
» Crossref - 47 Nelson, R.;pHcalc, version 0.2.0; GitHub/PyPI, USA, 2023.
-
48 Seh, Z. W.; Kibsgaard, J.; Dickens, C. F.; Chorkendorff, I.; Nørskov, J. K.; Jaramillo, T. F.; Science 2017, 355, eaad4998. [Crossref]
» Crossref -
49 Xu, W.; Diesen, E.; He, T.; Reuter, K.; Margraf, J. T.; J. Am. Chem. Soc. 2024, 146, 7698. [Crossref]
» Crossref -
50 Kusne, A. G.; Yu, H.; Wu, C.; Zhang, H.; Hattrick-Simpers, J.; DeCost, B.; Sarker, S.; Oses, C.; Toher, C.; Curtarolo, S.; Davydov, A. V.; Agarwal, R.; Bendersky, L. A.; Li, M.; Mehta, A.; Takeuchi, I.; Nat. Commun. 2020, 11, 5966. [Crossref]
» Crossref -
51 Moon, J.; Beker, W.; Siek, M.; Kim, J.; Lee, H. S.; Hyeon, T.; Grzybowski, B. A.; Nat. Mater. 2024, 23, 108. [Crossref]
» Crossref -
52 Kim, M.; Ha, M. Y.; Jung, W.; Yoon, J.; Shin, E.; Kim, I.; Lee, W. B.; Kim, Y.; Jung, H.; Adv. Mater. 2022, 34, 2108900. [Crossref]
» Crossref -
53 Wei, C.; Shi, D.; Zhou, F.; Yang, Z.; Zhang, Z.; Xue, Z.; Mu, T.; Phys. Chem. Chem. Phys. 2023, 25, 7917. [Crossref]
» Crossref -
54 Wei, S.; Luo, Y.; Zhang, K.; Zhang, Z.; Liu, G.; Chem. Phys. 2024, 579, 112197. [Crossref]
» Crossref -
55 Aremu, O. O.; Hyland-Wood, D.; McAree, P. R.; Reliab. Eng. Syst. Saf. 2020, 195, 106706. [Crossref]
» Crossref -
56 Jiang, X.; Wang, Y.; Jia, B.; Qu, X.; Qin, M.; ACSAppl. Mater. Interfaces 2022, 14, 41141. [Crossref]
» Crossref -
57 Mulliken, R. S.; J. Chem. Phys. 1934, 2, 782. [Crossref]
» Crossref -
58 Su, X.; Yan, X.; Tsai, C.; WIREs Comput. Stat. 2012, 4, 275. [Crossref]
» Crossref -
59 Czajkowski, M.; Kretowski, M.; Appl. Soft Comput. 2016, 48, 458. [Crossref]
» Crossref -
60 Keller, C. A.; Evans, M. J.; Geosci. Model Dev. 2019, 12, 1209. [Crossref]
» Crossref -
61 Chen, T.; Guestrin, C.; KDD’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; San Diego, CA, USA, 2016, p. 785. [Crossref]
» Crossref -
62 Wu, Z.; Wang, X.; Jiang, B.; Appl. Sci. 2020, 10, 3258. [Crossref]
» Crossref -
63 Zemel, R.; Pitassi, T. In Advances in Neural Information Processing Systems, vol. 13; Leen, T.; Dietterich, T.; Tresp, V., eds.; MIT Press: Massachusetts, USA, 2000. [Link] accessed in May 2025
» Link -
64 Efendi, A.; Effrihan; AIP Conf. Proc. 2017, 1913, 020031. [Crossref]
» Crossref -
65 Schulz, E.; Speekenbrink, M.; Krause, A.; J. Math. Psychol. 2018, 85, 1. [Crossref]
» Crossref -
66 Sitienei, M.; Otieno, A.; Anapapa, A.; Asian J. Probab. Stat. 2023, 24, 1. [Crossref]
» Crossref -
67 Ranstam, J.; Cook, J. A.; Br. J. Surg. 2018, 105, 1348. [Crossref]
» Crossref -
68 McDonald, G. C.; WIREs Comput. Stat. 2009, 1, 93. [Crossref]
» Crossref -
69 Guryanov, A. In Analysis of Images, Social Networks and Texts; van der Aalst, W., ed.; Springer: Cham, Switzerland, 2019, p. 39. [Crossref]
» Crossref -
70 Wu, S.; Nagahashi, H.; J. Electr. Comput. Eng. 2015, 2015, 835357. [Crossref]
» Crossref -
71 Hancock, J. T.; Khoshgoftaar, T. M.; J. Big Data 2020, 7, 94. [Crossref]
» Crossref - 72 Géron, A.; Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems; O’Reilly Media: Cambridge, USA, 2022.
-
73 Calle, M. L.; Urrea, V.; BriefingsBioinf. 2011, 12, 86. [Crossref]
» Crossref -
74 Gregorutti, B.; Michel, B.; Saint-Pierre, P.; Stat. Comput. 2017, 27, 659. [Crossref]
» Crossref -
75 Aas, K.; Jullum, M.; Løland, A.; Artif. Intell. 2021, 298, 103502. [Crossref]
» Crossref -
76 Vitor, A. L. O.; Goedtel, A.; Barbon, S.; Bazan, G. H.; Castoldi, M. F.; Souza, W. A.; Expert Syst. Appl. 2023, 224, 119998. [Crossref]
» Crossref -
77 Gomes, J. F.; Martins, C. A.; Giz, M. J.; Tremiliosi-Filho, G.; Camara, G. A.; J. Catal. 2013, 301, 154. [Crossref]
» Crossref
Edited by
-
Editor handled this article:
Maurício Coutinho Neto (Guest)
Publication Dates
-
Publication in this collection
04 July 2025 -
Date of issue
2025
History
-
Received
22 Nov 2024 -
Accepted
27 May 2025




























