Abstract
This paper describes the development, evaluation, features and applications of Chromophoreasy, an alternative Excelbased program for recognition and integration of chromatographic and electrophoretic peaks. The proposed recognition is made according to parameters adjustable by the analyst, such as time range, noise smoothing window size and slope/curvature sensitivity. During integration, retention/migration time, area, height, halfheight width, plate numbers, asymmetry factor, US Pharmacopeia tailing factor, resolution and statistical moments are determined. A chromatogram/electropherogram is plotted along with the found baselines. The effect of peak shape (heights and symmetries) and baseline slope over accuracy was evaluated and the precision of recognition/integration was investigated under several simulated conditions, with varied signaltonoise levels, smoothing modes and smoothing window sizes. Data from liquid and gas chromatography, capillary electrophoresis and electrochromatography techniques with refractive index, flame ionization, capacitively coupled contactless conductivity (labmade) and ultraviolet absorbance detections, respectively, were treated, illustrating the broad applicability of the proposed program for standard and sample analysis. Statistically similar results were obtained, when compared with other commercial software, showing it to be a simple, practical and reliable tool for general use in the separation area.
Keywords:
peak recognition; peak integration; chromatography; capillary electrophoresis; Excel macros
Introduction
Separation science clearly occupies a prominent position in analytical chemistry. Several advantages such as sensitivity of detection modes, selectivity and efficiency of separation columns and short analysis times, have led the chromatographic and electromigration techniques to this high level.^{1}1 Righetti, P. G.; J. Chromatogr. A
2005, 1079, 24.
2 Meinert, C.; Meierhenrich, U. J.; Angew. Chem., Int. Ed.
2012, 51, 10460.^{}^{3}3 Núñez, O.; GallartAyala, H.; Martins, C. P. B.; Lucci, P.; J. Chromatogr. A
2012, 1228, 298. However, the dispersion of the analyte molecules during their continuous and differential motion along the separation system is one of the main unavoidable separation characteristics, so that the analyte registration should be as representative as possible. Since the presence of an analyte is observed through appearance of a chromatographic/electrophoretic peak, whose height and area are sensitive to concentration,^{4}4 Li, J.; Anal. Chim. Acta
1999, 388, 187. typical responses such as efficiency, resolution, signaltonoise ratio, analysis time, and symmetry, among others, are completely dependent on accurate measurements made on that peak, during optimization^{5}5 Mostafa, A.; Edwards, M.; Górecki, T.; J. Chromatogr. A
2012, 1255, 38.^{,}^{6}6 Ferreira, S. L. C.; Bruns, R. E.; da Silva, E. G. P.; dos Santos, W. N. L.; Quintella, C. M.; David, J. M.; de Andrade, J. B.; Breitkreitz, M. C.; Jardim, I. C. S. F.; Neto, B. B.; J. Chromatogr. A
2007, 1158, 2. and validation^{7}7 Thompson, M.; Ellison, S. L. R.; Wood, R.; Pure Appl. Chem.
2002, 74, 835.^{,}^{8}8 Ribani, M.; Bottoli, C. B. G.; Collins, C. H.; Jardim, I. C. S. F.; Melo, L. F. C.; Quim. Nova
2004, 27, 771. of a method.
Since the allocation (even if automated) of peak boundaries as well as the subsequent peak integration are not necessarily accurate and precise due to the presence of noise, for instance,^{9}9 Dyson, N.; J. Chromatogr. A
1999, 842, 321. the errors associated with these procedures may propagate up to the final result of an analysis. This issue becomes more critical with increasing demand for faster analysis and narrower peaks, which mobilizes the developers of algorithms for chromatographic data treatment. Therefore, various peak recognition methods^{10}10 Zhang, J.; Gonzalez, E.; Hestilow, T.; Haskins, W.; Huang, Y.; Curr. Genomics
2009, 10, 388.
11 VivóTruyols, G.; TorresLapasió, J. R.; van Nederkassel, A. M.; Vander Heyden, Y.; Massart, D. L.; J. Chromatogr. A
2005, 1096, 133.
12 Peters, S.; VivóTruyols, G.; Marriott, P. J.; Schoenmakers, P. J.; J. Chromatogr. A
2007, 1156, 14.
13 Fong, S. S.; Rearden, P.; Kanchagar, C.; Sassetti, C.; Trevejo, J.; Brereton, R. G.; Anal. Chem.
2011, 83, 1537.
14 Yu, Y.J.; Xia, Q.L.; Wang, S.; Wang, B.; Xie, F.W.; Zhang, X.B.; Ma, Y.M.; Wu, H.L.; J. Chromatogr. A
2014, 1359, 262.^{}^{15}15 Wang, X.; Zhao, Y.; Sun, P.; Ji, M.; Bao, M.; Anal. Methods
2015, 7, 2670. and mathematical models for deconvolution of overlapping peaks and integration in noisy and complex systems^{16}16 Marco, V. B. D.; Bombi, G. G.; J. Chromatogr. A
2001, 931, 1.^{,}^{17}17 VivóTruyols, G.; TorresLapasió, J. R.; van Nederkassel, A. M.; Vander Heyden, Y.; Massart, D. L.; J. Chromatogr. A
2005, 1096, 146. have been developed and evaluated. However, a deficiency in the availability of programs dedicated to chromatographic or electrophoretic data processing, which are simple, practical and accessible to researchers, students and specialized laboratories, is still noticed. Furthermore, from our knowledge, a program that allows one to find the details of calculations used during the peak recognition and integration does not exist. Thus, the way programs perform these procedures may be not possible to understand or control.
In this context, this paper describes the development, evaluation, features and some applications of Chromophoreasy, an alternative program proposed for recognition and integration of chromatographic and electrophoretic peaks, developed in Visual Basic for Applications (VBA), which operates in the familiar Microsoft Excel (2010) environment. In fact, Excel has been successfully employed in relevant works, such as interface for analysis of liquid chromatography (LC)mass spectrometry (MS) metabolomics data,^{18}18 Creek, D. J.; Jankevics, A.; Burgess, K. E. V.; Breitling, R.; Barrett, M. P.; Bioinformatics 2012, 28, 1048. retention prediction and separation optimization in LC,^{19}19 Fasoula, S.; Zisi, C.; Gika, H.; PappaLouisi, A.; Nikitas, P.; J. Chromatogr. A 2015, 1395, 109. and simulation of chromatographic runs under column and detector viewpoints.^{20}20 Kadjo, A.; Dasgupta, P. K.; Anal. Chim. Acta 2013, 773, 1. Chromophoreasy was tested in several simulated and experimental chromatograms, electrochromatograms and electropherograms. Some of these applications are shown here and some results were compared with those obtained from Agilent ChemStation and Shimadzu GC Solution softwares.
Experimental
Since several experimental analyses through different separation techniques were carried out, the description of chemicals, reagents and instrumental conditions is large. Thus, this section only describes basic elements that allow one to understand the purpose of the present work. Further details are available in Supplementary Information under Experimental section.
Samples
For demonstration of program use on real samples, a commercial milk sample was treated according to literature reports^{21}21 ChavezServin, J. L.; Castellote, A. I.; LopezSabater, M. C.; J. Chromatogr. A 2004, 1043, 211. and biodiesel samples were obtained from transesterification reactions.^{22}22 Ferrari, R. A.; Oliveira, V. S.; Scabio, A.; Quim. Nova 2005, 28, 19.
Instrumental
The LC system used was a Breeze Modular high performance liquid chromatograph (HPLC) (Waters, Milford, USA) equipped with refractive index detector for modular systems and controlled by Waters Breeze High Performance LC software. The method conditions were based on previous work.^{21}21 ChavezServin, J. L.; Castellote, A. I.; LopezSabater, M. C.; J. Chromatogr. A 2004, 1043, 211. The experiments involving gas chromatography (GC) analysis were performed in a GC 2010Plus gas chromatograph equipment (Shimadzu, Kyoto, Japan), equipped with a flame ionization detector (FID) and controlled by Shimadzu GC Solution software (V. 2.32.00). The method conditions were based on literature.^{23}23 Delmonte, P.; Kia, A.R. F.; Kramer, J. K. G.; Mossoba, M. M.; Sidisky, L.; Rader, J. I.; J. Chromatogr. A 2011, 1218, 545. For capillary zone electrophoresis (CZE) analysis of organic acids standards, an Agilent 1600 capillary electrophoresis (CE) system (HP^{3d} CE, Palo Alto, USA) equipped with a diode array detector (DAD) and controlled by HP ChemStation software (rev A.06.01) was used. The running conditions were based on a recent work.^{24}24 Vaz, F. A. S.; da Silva, P. A.; Passos, L. P.; Heller, M.; Micke, G. A.; Costa, A. C. O.; de Oliveira, M. A. L.; Phytochem. Anal. 2012, 23, 569. The experiments involving CZE analysis of lactose and lactulose standards based on literature reports^{25}25 Soga, T.; Serwe, M.; Food Chem. 2000, 69, 339. were performed in an Agilent 7100 CE system controlled by Agilent ChemStation software (rev. B.04.03) and equipped with a DAD and a labmade capacitively coupled contactless conductivity detector (C^{4}4 Li, J.; Anal. Chim. Acta 1999, 388, 187.D).^{26}26 Fracassi da Silva, J. A.; do Lago, C. L.; Anal. Chem. 1998, 70, 4339. For analysis of polycyclic aromatic hydrocarbons (PAH) standards by capillary electrochromatography (CEC) based on a previous work,^{27}27 Vaz, F. A. S.; Moutinho, A. D.; Mendonça, J. P. R. F.; Araújo, R. T.; Ribeiro, S. J. L.; Polachini, F. C.; Messaddeq, Y.; Oliveira, M. A. L.; Microchem. J. 2012, 100, 21. the Agilent 7100 CE system was also used.
Chromatogram simulation
Simulated chromatograms were generated through a peak model based on exponentially modified Gaussian (EMG) function expressed as:^{28}28 Pápai, Z.; Pap, T. L.; J. Chromatogr. A 2002, 953, 31.
where t_{R} is the retention time, h is the peak height, a is an asymmetry term (2 < a < 2, a ≠ 0) and s is the standard deviation (SD). This simple EMG function allows the generation of peaks with apex coordinates exactly known, which is useful for the evaluation of the integrator performance in the determination of "experimental" t_{R} and h. Fronting and tailing peaks can be obtained through negative and positive a values, respectively. Symmetrical Gaussian curves are obtained if a is approximately equal to zero. For generation of chromatograms with multiple peaks, each peak was generated independently and then, summed to each other (f_{1}(t) + f_{2}(t)...). The noise with Gaussian distribution was calculated with a function that returns the inverse of the normal cumulative distribution with random probability, average equal to zero and SD equal to one. The noises were multiplied by a scaling factor and added to the chromatograms.
Theory and method for peak detection and integration
Several VBA macros, required for peak recognition, peaks extraction to distinct spreadsheets, peak integration, fine adjustments of the peak boundaries, splitting coeluted peaks for distinct spreadsheets, grouping the integration results, plotting the chromatogram with baselines, and operation of a command box, were developed. All available functions were inserted into the command box, which can be opened in any (Excel) chromatogram file via keyboard shortcut. However, this section only describes some relevant characteristics that may affect the results discussed in this work. Readers interested in using and viewing other functions and aspects of the program are encouraged to see more details provided in Supplementary Information under Additional tools subsection.
Smoothing function
To evaluate the efficiency of peak detection, when signaltonoise ratio (S / N) is not sufficiently elevated, two types of smoothing by convolution were employed and tested. One of them is a moving average smoothing (MA) with possibility to set the amount of points to be averaged (window). Thus, a chromatogram point (S_{i}) is replaced by an average (S_{i(MA)}), according to equation given by:
where n, set by the analyst, is an integer greater than zero (if n = 0, there is no smoothing) and less than a third of the total number of chromatogram points. The second, SavitzkyGolay (SG) smoothing method,^{29}29 Savitzky, A.; Golay, M. J. E.; Anal. Chem. 1964, 36, 1627. is equivalent to a polynomial fitting, but the original points are just multiplied by specific integer numbers. The following equation was employed:
The result of this equation is exactly the same if the time value (t_{i}), in the middle of a set of nine points, is substituted in a thirdorder polynomial model fitted to these points by least squares. The advantage of equation 3 over fitting models is the computational simplicity that leads to a much shorter processing time. The use of SG smoothing requires that the chromatogram have a constant sampling rate. Depending on the selected smooth mode and window size (for the MA smoothing), among other factors discussed below, the behavior of peak recognition may vary severely, which actually allows this program to be employed to several chromatographic/electromigration techniques.
Thresholds calculation
A suitable way to distinguish the peaks from the (noisy) baseline, i.e., to recognize peaks, is through first (slope) and second (curvature) derivative analyses on every chromatogram segment (or a time range only, set by user). The behavior of a derivative curve is more predictable than the chromatogram signal itself, making the peak recognition, in general, more reproducible. Moreover, the simultaneous analysis of first derivatives (FD) and SD prevents that peak apexes, shoulders and valleys between two overlapped peaks are wrongly interpreted as a peak boundary.
Once the smoothing parameters are defined and chromatogram is smoothed (or not), FD are calculated and a median for these derivatives (M_{FD}) is obtained. If the chromatogram has a representative baseline, beside the noise and some peaks, M_{FD} may represent a robust value which is probably the "main" baseline slope. For drifting baselines, for instance, M_{FD} is not necessarily zero, which makes this step important to be executed. Deviations between each FD_{i} and M_{FD} are calculated and a new median (M_{D}) is obtained from this set of results. If chromatogram has no peak and noise, M_{D} should be zero. Otherwise, the presence of peaks and noise raises M_{D}. Finally, M_{FD} and M_{D} are associated to provide the threshold range (T_{FD}) as:
where Sens is a sensitivity factor set (and eventually optimized) by the analyst. As Sens increases, the range of superior and inferior thresholds (T_{S}T_{I}) decreases and, thus, the peak recognition mechanism gets more sensitive. The empirical number 5 in this equation gives suitable threshold ranges along with Sens. The same consideration viewed in this section is made for calculation of superior and inferior thresholds for second derivatives, i.e., T_{S(SD)} and T_{I(SD)}, respectively.
If signal (S) is processed with MA smoothing or it is not smoothed, FD_{i} and SD_{i} are calculated as:
where t is time. In the case of using SG smoothing method, the derivatives are based on the fitted thirdorder polynomial, calculated directly from original signal as:^{29}29 Savitzky, A.; Golay, M. J. E.; Anal. Chem. 1964, 36, 1627.
where the terms "t_{i+1}  t_{i}" in the denominators were inserted to make these derivatives dimensionally equivalent to equations 5 and 6.
Peak searching
Once T_{S(FD)}, T_{I(FD)}, T_{S(SD)} and T_{I(SD)} are obtained, the peak scanning procedure can be done in a matrix containing first and second derivatives smoothed by MA with a threepoint window. While derivative values are less than T_{S} and greater than T_{I}, no peak starting is found. When both FD_{i} and SD_{i} are greater than respective T_{S}, a positive peak starting is defined with the time coordinate of the point i  2 ("2" compensates a shift made by the MA window position, ensuring that the peak baseline will touch the correct chromatogram point). When both FD_{i} and SD_{i} are less than T_{I}, a negative peak start is defined. The algorithm for peak detection is summarized in Figure 1.
Scanning continues from next i point, now looking for a peak end, i.e., when both FD_{i} and SD_{i} are inside threshold ranges. In that condition, the time coordinate of i point is defined as a peak end. It is important to stress that "defining" a peak start/end is not the same as "registering" it for integration purposes, which depends on the option of peak type from the command box, set previously by the analyst. For instance, if the user selected only positive peaks to be detected/integrated, the program will "define" and process both kinds of peaks, but will only "register" positive ones. This process prevents that the apex of a negative peak is interpreted as the start of a positive peak and vice versa.
Integration
The first step of the integration process consists in the construction of a straight baseline fitted to the points of the peak limits (Figure 2a). Parameters, such as retention/migration time (t_{R}), height (h), area (A), halfheight width (w_{0.5}), plate numbers (N), asymmetry (As) and US Pharmacopeia tailing factor (Tf) are calculated from the adjusted peak, obtained through the subtraction of baseline from original signal (Figure 2b). t_{R} and h are, respectively, the x and ycoordinates of the parabola maximum (Figure 2c), obtained through the first derivative of its equation, which is determined with the highest three points of the peak apex.^{30}30 Dyson, N.; Chromatographic Integration Methods, 2nd ed.; RSC: Loughborough, 1998. The same reasoning is reflected on negative peaks, considering the lowest point of the parabola. The w_{0.5} value is the horizontal distance between two xcoordinates of the adjusted peak, whose ycoordinates are h / 2. Because the coordinates (x, h / 2) probably do not exist in the (discrete) peak data, an interpolation between the two points nearest to (x, h / 2) is made at both sides of the peak (solid red lines in the peak of Figure 2b). In the case of partially coeluted peaks w_{0.5} is estimated as following: since w_{0.5} corresponds to 2.355σ (where σ is the standard deviation of a Gaussian curve) and A / h corresponds to 2.507σ, i.e., the width at 45.6% of height,^{30}30 Dyson, N.; Chromatographic Integration Methods, 2nd ed.; RSC: Loughborough, 1998. w_{0.5} = (2.355σ) / (2.507σ) × A / h = 0.93937A / h. The plate number is calculated by N = 5.54t_{R}^{2} / w_{0.5}^{2}, useful for symmetrical peaks. The asymmetry is calculated by As = b_{0.1} / a_{0.1} and tailing factor by Tf = (a_{0.05} + b_{0.05}) / 2a_{0.05}, where a is the front halfwidth and b is the back halfwidth of the peak measured at 0.1h and 0.05h from the leading or trailing edge of the peak to the t_{R} (Figure 2d).
(a) Simulated peak (original signal) with sloped baseline (red line, y = 0.5t  1); (b) adjusted peak (baseline subtracted), from which t_{R}, h, A, w_{0.5}, N, As and Tf are obtained; (c) apex of the adjusted peak in detail, showing the parabola fitted to the three highest points; (d) base of the peak in detail, showing red straight lines used to obtain a and b, at 0.1h and 0.05h, for As and Tf calculations.
Next, A is calculated as the sum of increments by the trapezoidal rule:
where t is the time coordinate and S is the signal of a peak containing n points. Finally, a chart containing the peak and baseline is plotted in the worksheet.
It is important to stress that the calculation of statistical moments of a peak is also possible. In this optional step (useful for evaluating with accuracy any type of peak shape), zeroth moment (m_{0}) up to fourth central moment (m_{4}), and N are obtained. If there are two or more peaks in a chromatogram, the resolutions between adjacent peaks are calculated in a further step as R_{i,i+1} = 1.175 × (t_{Ri+1}  t_{Ri}) / (w_{0.5,i} + w_{0.5,i+1}).
Results and Discussion
Effect of peak shape on accuracy
The peak recognition and integration accuracy were tested in a simulated chromatogram with 14 min at a sampling rate of 2.0 Hz, without noise, containing positive, negative, symmetric, fronting and tailing peaks with heights varying in six magnitude orders (from 1 to 100000 units, Supplementary Information Figure S5). The program was submitted to detect the six peaks automatically (with no smoothing). Since t_{R} and h of peaks simulated through equation 1 are known, relative errors between reference and obtained values can be calculated, as shown in Table 1. The a values (second column) were used in equation 1 to provide the resulting As obtained from the integrator (values greater than 1 indicate tailing; less than 1 are fronting; and equal to 1 are symmetric Gaussian peaks).
Relative errors for retention time (t_{R}) and height (h) of a simulated chromatogram with different peak shapes
It is possible to see that the results obtained from asymmetric peaks are less accurate than that obtained for the symmetric peak (peak 6). However, even the highest relative error found (peak 1) is still small. Besides, positive and negative peaks with same dimensions, e.g., 3 and 4, show equivalent results. The peak (true) height also has some effect on the accuracy of the measured height, since the relative error tends to decrease for taller peaks. Finally, the same chromatogram was simulated with sloped baselines (y = 1000t and y = 1000t) and the integration results including all available parameters were identical (therefore, not shown) to those obtained when baseline is horizontal. These results indicate that the peak shape and slope of the chromatogram baseline do not seriously affect the detection and integration of the peaks.
Effect of noise over threshold ranges
Figure 3a shows a simulated chromatogram segment from 2 to 4 min (0.5 Hz), without noise, with a single symmetric Gaussian peak at 3.0 min, s = 1.5, h = 1.0 and a sloped baseline (y = t). Figures 3b and 3c show, respectively, FD and SD from this peak. In Figure 3d, the same peak was plotted, but with noise (SD = 0.01) sufficient to provide an S / N = 100. Figures 3e and 3f show FD and SD. These peaks were submitted to recognition through the proposed algorithm, with sensitivity set to 3.0 and without smoothing. As sensitivity is constant (in this case), the distance between red horizontal lines (threshold ranges), are only controlled by M_{D}, from equation 4. The M_{D} of FD for the left peak (Figure 3b) is 0.09 and for the right peak (Figure 3e) is 0.27 (three times higher). The only contribution for the M_{D} in left peak is the signal variation, but for the right peak, there is also a noise contribution, leading the threshold range to adapt automatically for each case. The same reasoning is valid for SD. In this case, M_{D} for the left peak (Figure 3c) is 1.8 and for the right peak (Figure 3f) is 5.7 (again, about three times higher). It is important to keep in mind that the peak window (between vertical dashed arrows) is defined by the algorithm as the time range where both FD and SD are outside threshold ranges. Outside the peak window, FD and SD are necessarily inside threshold ranges. Thus, when no noise is present, making threshold ranges smaller, even small changes in the slope and curvature are perceived, resulting in a larger peak baseline. Otherwise, these small changes are confounded with noise, leading to a shorter baseline.
(a) Gaussian peak with negative slope baseline (y = t); (b) first and (c) second derivatives of the Gaussian curve. Red horizontal lines define the threshold ranges, outside of which the peak window (between vertical arrows) is obtained. Sensitivity set to 3.0. Idem for (d), (e), and (f), except for the noise added (SD = 0.01, S / N = 100).
Effect of noise over recognition precision
To study the S / N effect over the recognition and, consequently, the precision of integration results, symmetrical (a = 0), tailing (a = 1) and fronting (a = 1) peaks with Gaussian height (h = 0.3989) and standard deviation (s = 1) were simulated with six levels of noise (five replicates for each level) at a sampling rate of 1 / 15 Hz (to provide about 30 to 40 data points to draw a peak with ca. 10 min of base width). The chromatograms were submitted to peak recognition with default sensitivity (3.0) and subsequent integration. Figure 4 shows the relative standard deviations (RSD) of t_{R}, A, h, w_{0.5} and N plotted against S / N levels. Both available smoothing modes were used.
Precision of recognition of simulated (a) Gaussian, (b) tailing and (c) fronting peaks as a function of signaltonoise ratio (true h / SD of baseline). Left graphs were obtained through moving average smoothing and the right ones through SavitzkyGolay polynomial smoothing.
The main effect when S / N increases is the overall decrease of RSD for all parameters, as expected, since the integration depends on the correct allocation of peak baseline. As the position of starting time (t_{S}) and ending time (t_{E}) gets more hidden in the noise, more uncertainty arises. No significant difference is observed when comparing MA smoothing (used for the left graphs) with the SG method (right graphs). While tailing peaks (Figure 4b) are severely affected when S / N is low, Gaussian (Figure 4a) and fronting (Figure 4c) peaks provide RSD lower than 14%. The less affected parameter on all conditions is clearly t_{R}. In fact, even the highest noises added to the peak contributed less than 1% to its RSD. The RSDs of other parameters are more grouped, suggesting equivalent influences of S / N. Because N depends on t_{R} and w_{0.5}, its RSD is a bit larger than RSD of w_{0.5}, providing a similar profile in all S / N range.
Effect of smoothing over recognition
Figure 5a shows a chromatogram with three peaks (h = 0.399; s = 0.5; a_{1} = 1; a_{2} = 0.00001; a_{3} = 1; sampling rate = 2 / 15 Hz and S / N = 10), smoothed by SG and MA (with several window sizes) methods. The chromatogram of Figure 5b has S / N = 50.
Simulated chromatograms with (a) S / N = 10 and (b) S / N = 50. The first chromatogram of each graph (top) is the original data; the second was obtained through SavitzkyGolay (SG) polynomial smoothing, and the ten last were obtained through moving average (MA) smoothing, with increasing window size (321 points, odd numbers).
The peaks from the chromatogram (Figure 5a) could not be detected without smoothing. With the SG smoothing, the first peak (tailing) was poorly detected (one fragment detected), while the second (Gaussian) was better defined and the third (fronting) was not detected. This recognition profile was similar to that made after MA smoothing with 3 points window (first MA curve). In fact, these smoothed curves are similar. From 5 to 15 (next six MA curves) points, MA smoothing led to detection of all peaks. With more than 15 points (last three MA curves), the shape of the peaks is damaged and recognition fails (two fragments per peak detected). For the chromatogram (Figure 5b), the first two peaks were recognized directly on original data. With SG and MA (with up to 17 points) smoothing modes, the three peaks were normally detected. When the number of points of MA smoothing window increases the peak height decreases and the width enlarges. As a result, the found peak limits may not be appropriated.
The SG method provided a smoothing with greater fidelity to the original peaks, but with lower capacity to reduce the baseline noise. On the other hand, the MA smoothing was more efficient for reducing noise, although the peak shape is damaged with height and asymmetry losses, when higher number of points are used. Thus, if SG smoothing is not adequate to improve recognition, the window size of MS smoothing should be as small as possible to detect satisfactorily all desired peaks.
Liquid chromatography data treatment
To illustrate the applicability of the proposed program to several real situations, some separation techniques were used and the chromatograms were submitted to recognition and integration. Figure 6 shows the recognition of lactulose (1) and lactose (2) peaks in the presence of impurities separated by LC of a commercial milk sample. The position of the baselines indicates the automatic recognition profile with MA smoothing (9 points) and sensitivity set to 9. With these settings, impurity peaks in time range between 6.00 and 14.25 min were also detected (Figure 6 inset).
LC chromatogram of a milk sample for separation of lactulose (inset, 11.22 mmol L^{1}) and lactose (2143 mmol L^{1}), with index refraction detection. The baselines (red) indicate the automatic recognition profile with MA smoothing (9 points) and sensitivity set to 9.
Capillary zone electrophoresis data treatment
Figure 7 shows data obtained from a labmade C^{4}D, where lactulose (1) and lactose (2) were partially separated by CZE. The inset shows the recognition profile of these peaks, which were split through dropline mode in a further step. Although noise is not apparent, an MA smooth (9point window) was applied in this electropherogram, in order to avoid excess of peak fragments detection. Nevertheless, abrupt variations in electropherogram profile, e.g., system peak in 4 min and electroosmotic flow (EOF) signal are still detected as peaks and thus, should be ignored.
Electropherogram showing partial separation of lactulose (11.22 mmol L^{1}) and lactose (2143 mmol L^{1}) by CZE, with C^{4}D. The baselines indicate the peak recognition profile with MA smoothing (9point window) and sensitivity set to 2. Electroosmotic flow signal (EOF).
A more detailed study was made through separation of a mixture of ten organic acids standards by CZE (Figure 8), under indirect UV detection (220 nm), in order to compare the recognition and integration results of the present program with ChemStation. Table 2 shows the comparisons between t_{S}, t_{E}, t_{R}, A, h and w_{0.5} through paired ttest. Chromophoreasy was set with SG smoothing with default sensitivity and ChemStation was configured to detect negative peaks with slope sensitivity set to 100 and peak width set to 0.02 in the time events. To match the units and signals of the programs, the areas of Chromophoreasy (initially given in mAU min) were multiplied by 60 s min^{1} and the heights multiplied by 1. The limits of the pyruvic acid peak (2) were manually adjusted in both programs.
Electropherogram of organic acids (1 mmol L^{1} each): oxalic (1), pyruvic (2), tartaric (3), citric (4), formic (5), malic (6), lactic (7), succinic (8), aspartic (9) and acetic (10); *unidentified peak; electroosmotic flow (EOF). (a) ChemStation and (b) Chromophoreasy stretched views of peaks 36 from (c) electropherogram of all analytes. Dashed arrows indicate the Chromophoreasy integration limits of peak 5.
Comparison of integration results of Chromophoreasy and ChemStation for the electropherogram of Figure 8
A systematic difference on t_{S} between the programs can be observed in Table 2, evidencedby the elevated t_{calculated} value (4.453). t_{S} values obtained by ChemStation integration are slightly higher (delayed) than those obtained by Chromophoreasy, leading to this result. In fact, this difference can be observed in Figures 8a and 8b, evidenced by dashed arrows on peak 5. Left arrow points to t_{S} detected from Chromophoreasy, which is located at a different position related to t_{S} from ChemStation. However, no differences between t_{E} from programs are evident (as the right arrow shows). This observation is valid for all peaks in this electropherogram. Other parameters showed statistically similar behaviors. Figure 8c shows the electropherogram used for this comparison. The S / N varied from 10.5 (peak *) to 143.6 (peak 5).
Capillary electrochromatography data treatment
Figure 9 shows an electrochromatogram of a PAH standard analysis. As there is no unique suitable wavelength for detection of all analytes simultaneously, 220 and 250 nm data were collected. MA smoothing and sensitivity set to 2 were the best choice for both wavelengths. The group of peaks 2 and 3 (at 220 nm) and the group 4 and 5 (at 220 and 250 nm) were split with dropline mode. This electrochromatogram is a good example of the program application in drifting baseline. As stated earlier, the slope of the chromatogram baseline does not seem to have affected the recognition of the peaks.
Electrochromatogram of PAH (1 mmol L^{1} each): naphthalene (1), acenaphthene (2), fluorene (3), phenanthrene (4) and anthracene (5). Thiourea (t) was used as flow marker. The 220 nm data were plotted 4 mAU higher for a better view.
Gas chromatography data treatment
Five different biodiesel samples were analyzed in GC equipment with FID (Figure 10). In this case, Chromophoreasy was set to detect peaks from 0 to 6 min, with SG polynomial smoothing and sensitivity set to 1. In GC Solution software peaks were detected and integrated automatically. Table 3 shows the results of comparisons made through paired ttest between Chromophoreasy and GC Solution for A and h. To match the units, the areas of Chromophoreasy (given in V min) were multiplied by 60 s min^{1}.
Chromatograms of biodiesel samples from different sources: basic catalysis of (a) soybean oil; (b) sunflower oil; (c) food frying oils; (d) acid catalysis of soybean oil; (e) acid pretreatment followed by basic catalysis of food frying oil. Analytes are fatty acids methyl esters: methyl palmitate (1), methyl stearate (2), methyl oleate (3), methyl linoleate (4), and methyl linolenate (5). The sequence of analytes is the same for all graphs.
Comparison of areas (A) and heights (h) from Chromophoreasy and GC Solution for chromatograms of Figure 10
In Table 3, a small difference between areas from Chromophoreasy and GC Solution can be observed. In fact, t_{S} and t_{E} obtained in these programs (not shown) are slightly different, probably because of differences in peak recognition algorithms. Thus, GC Solution areas are a little greater than the Chromophoreasy ones. However, results are still statistically similar, with all t_{calculated} lower than t_{(4;0.05/2)} (2.776).
Conclusions
In this work, the development, evaluation, features and applications of an alternative program for recognition and integration of chromatographic and electrophoretic peaks in the familiar environment of Excel were demonstrated. The possibility to adjust parameters, such as time range, sensitivity of threshold ranges and chromatogram smoothing, developed to increase the efficiency of peak detection, allowed the program application for several experimental situations, including analysis of standards and samples through LC, GC, CZE and CEC techniques with various types of detectors, in addition to simulated chromatograms. The results are easily upgradeable in cases where adjustments are necessary. Some data and formulas can be found in the cells where they were generated, so one can view how the recognition and integration were obtained. This feature may be useful for academic purposes, for instance.
The use of the several functions, such as peak recognition, extraction, integration, results grouping and plotting chromatogram with baseline, which could be performed separately or through a single command, made the data treatment more practical and provided high throughput. Therefore, the study of the effect of peak shape, chromatogram noise, smoothing modes over accuracy and precision was made with low time consumption. Finally, the proposed program showed to be a reliable tool, providing statistically similar results when compared with other commercial software used, meeting the proposal addressed in this paper.
Supplementary Information
Supplementary data (Experimental section, additional tools, recognition efficiency, figures), Chromophoreasy program and electropherogram sample are available free of charge at http://jbcs.sbq.org.br as PDF file.
Acknowledgments
The authors wish to acknowledge Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES: PNPD 23038.007000/201170), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq: 471288/20136 and 302432/20140) and Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG: CEXPPM 0039813) for fellowships and financial support.
References

^{1}Righetti, P. G.; J. Chromatogr. A 2005, 1079, 24.

^{2}Meinert, C.; Meierhenrich, U. J.; Angew. Chem., Int. Ed. 2012, 51, 10460.

^{3}Núñez, O.; GallartAyala, H.; Martins, C. P. B.; Lucci, P.; J. Chromatogr. A 2012, 1228, 298.

^{4}Li, J.; Anal. Chim. Acta 1999, 388, 187.

^{5}Mostafa, A.; Edwards, M.; Górecki, T.; J. Chromatogr. A 2012, 1255, 38.

^{6}Ferreira, S. L. C.; Bruns, R. E.; da Silva, E. G. P.; dos Santos, W. N. L.; Quintella, C. M.; David, J. M.; de Andrade, J. B.; Breitkreitz, M. C.; Jardim, I. C. S. F.; Neto, B. B.; J. Chromatogr. A 2007, 1158, 2.

^{7}Thompson, M.; Ellison, S. L. R.; Wood, R.; Pure Appl. Chem. 2002, 74, 835.

^{8}Ribani, M.; Bottoli, C. B. G.; Collins, C. H.; Jardim, I. C. S. F.; Melo, L. F. C.; Quim. Nova 2004, 27, 771.

^{9}Dyson, N.; J. Chromatogr. A 1999, 842, 321.

^{10}Zhang, J.; Gonzalez, E.; Hestilow, T.; Haskins, W.; Huang, Y.; Curr. Genomics 2009, 10, 388.

^{11}VivóTruyols, G.; TorresLapasió, J. R.; van Nederkassel, A. M.; Vander Heyden, Y.; Massart, D. L.; J. Chromatogr. A 2005, 1096, 133.

^{12}Peters, S.; VivóTruyols, G.; Marriott, P. J.; Schoenmakers, P. J.; J. Chromatogr. A 2007, 1156, 14.

^{13}Fong, S. S.; Rearden, P.; Kanchagar, C.; Sassetti, C.; Trevejo, J.; Brereton, R. G.; Anal. Chem. 2011, 83, 1537.

^{14}Yu, Y.J.; Xia, Q.L.; Wang, S.; Wang, B.; Xie, F.W.; Zhang, X.B.; Ma, Y.M.; Wu, H.L.; J. Chromatogr. A 2014, 1359, 262.

^{15}Wang, X.; Zhao, Y.; Sun, P.; Ji, M.; Bao, M.; Anal. Methods 2015, 7, 2670.

^{16}Marco, V. B. D.; Bombi, G. G.; J. Chromatogr. A 2001, 931, 1.

^{17}VivóTruyols, G.; TorresLapasió, J. R.; van Nederkassel, A. M.; Vander Heyden, Y.; Massart, D. L.; J. Chromatogr. A 2005, 1096, 146.

^{18}Creek, D. J.; Jankevics, A.; Burgess, K. E. V.; Breitling, R.; Barrett, M. P.; Bioinformatics 2012, 28, 1048.

^{19}Fasoula, S.; Zisi, C.; Gika, H.; PappaLouisi, A.; Nikitas, P.; J. Chromatogr. A 2015, 1395, 109.

^{20}Kadjo, A.; Dasgupta, P. K.; Anal. Chim. Acta 2013, 773, 1.

^{21}ChavezServin, J. L.; Castellote, A. I.; LopezSabater, M. C.; J. Chromatogr. A 2004, 1043, 211.

^{22}Ferrari, R. A.; Oliveira, V. S.; Scabio, A.; Quim. Nova 2005, 28, 19.

^{23}Delmonte, P.; Kia, A.R. F.; Kramer, J. K. G.; Mossoba, M. M.; Sidisky, L.; Rader, J. I.; J. Chromatogr. A 2011, 1218, 545.

^{24}Vaz, F. A. S.; da Silva, P. A.; Passos, L. P.; Heller, M.; Micke, G. A.; Costa, A. C. O.; de Oliveira, M. A. L.; Phytochem. Anal. 2012, 23, 569.

^{25}Soga, T.; Serwe, M.; Food Chem. 2000, 69, 339.

^{26}Fracassi da Silva, J. A.; do Lago, C. L.; Anal. Chem. 1998, 70, 4339.

^{27}Vaz, F. A. S.; Moutinho, A. D.; Mendonça, J. P. R. F.; Araújo, R. T.; Ribeiro, S. J. L.; Polachini, F. C.; Messaddeq, Y.; Oliveira, M. A. L.; Microchem. J. 2012, 100, 21.

^{28}Pápai, Z.; Pap, T. L.; J. Chromatogr. A 2002, 953, 31.

^{29}Savitzky, A.; Golay, M. J. E.; Anal. Chem. 1964, 36, 1627.

^{30}Dyson, N.; Chromatographic Integration Methods, 2nd ed.; RSC: Loughborough, 1998.
Publication Dates

Publication in this collection
Oct 2016
History

Received
21 Jan 2016 
Accepted
15 Mar 2016