Green method by diffuse re fl ectance infrared spectroscopy and spectral region selection for the quanti fi cation of sulphamethoxazole and trimethoprim in pharmaceutical formulations

An alternative method for the quantifi cation of sulphametoxazole (SMZ) and trimethoprim (TMP) using diffuse refl ectance infrared Fourier-transform spectroscopy (DRIFTS) and partial least square regression (PLS) was developed. Interval Partial Least Square (iPLS) and Synergy Partial Least Square (siPLS) were applied to select a spectral range that provided the lowest prediction error in comparison to the full-spectrum model. Fifteen commercial tablet formulations and forty-nine synthetic samples were used. The ranges of concentration considered were 400 to 900 mg g SMZ and 80 to 240 mg g TMP. Spectral data were recorded between 600 and 4000 cm with a 4 cm resolution by Diffuse Refl ectance Infrared Fourier Transform Spectroscopy (DRIFTS). The proposed procedure was compared to high performance liquid chromatography (HPLC). The results obtained from the root mean square error of prediction (RMSEP), during the validation of the models for samples of sulphamethoxazole (SMZ) and trimethoprim (TMP) using siPLS, demonstrate that this approach is a valid technique for use in quantitative analysis of pharmaceutical formulations. The selected interval algorithm allowed building regression models with minor errors when compared to the full spectrum PLS model. A RMSEP of 13.03 mg g for SMZ and 4.88 mg g for TMP was obtained after the selection the best spectral regions by siPLS.


INTRODUCTION
Quantitative analysis of pharmaceutical samples by spectroscopy is typically accomplished by univariate regression methods.Infrared (IR) spectros-copy is generally only able to provide qualitative and semi-quantitative analysis of pharmaceutical samples due to deviations from Beer's law as a result of instrument and sample effects.However, the developments of reliable FTIR instrumentation and strong computerized dataprocessing capabilities have greatly improved the FABIANA E.B. DA SILVA, ÉRICO M.M. FLORES, GRACIELE PARISOTTO, EDSON I. MÜLLER and MARCO F. FERRÃO performance of quantitative IR work.Thus, modern infrared spectroscopy has gained acceptance as a reliable tool for quantitative analysis (Settle 1997).Different accessories with diffuse or attenuated reflectance operating in mid-infrared and refl ectance or transmittance mode operating in near infrared enabled the analysis of samples in many different forms such as solutions, powders and intact tablets (Kipouros et al. 2006, Armenta et al. 2005, Boyer et al. 2006, Silva et al. 2009, 2012, Ferreira et al. 2013).In some instances, previous sample treatment is unnecessary and the results are obtained in real time (Lin et al. 2006).Quantitative analysis involving infrared spectroscopy has been applied to pharmaceutical samples in association with multivariate methods (Bunaciu et al. 2010).Partial Least Square (PLS) regression is the most popular multivariate calibration technique to build prediction models using spectroscopic signals (Lavine and Workman 2010).This association is very important now, since infrared spectroscopy technology may be a quick, non-destructive and environmentally friendly method in comparison to traditional analyses methods.In addition, this procedure is considered low time-consuming and requires only few milligrams of sample (Ferrão and Davanzo 2005).There are a whole series of problems in quantitative analysis for which multivariate calibration is appropriate, such as treatment for spectra with strong band overlapping.However, some spectral regions may contain information due to other analytes, non-modeled interferences, background variations and interactions, which degrade model accuracy (Hemmateenejad et al. 2007).
Recent applications have been published showing that spectral region selection using appropriate algorithms can signifi cantly improve the performance of these full-spectrum calibration techniques, avoiding non-modeled interferences and building a well-fi tted model (Lee et al. 2012, Nørgaard et al. 2005, Friedel et al. 2013).In practice, multivariate regression model optimization is based on the identifi cation of a complete data subset that will produce the lowest prediction error (Chen et al. 2008).Several approaches have been proposed for selection of optimal set of spectral regions for multivariate calibration such as genetic algorithms, interval PLS (iPLS) and synergy PLS (siPLS) (Silva et al. 2009, Friedel et al. 2013, Leardi and Nørgaard 2004, Navea et al. 2005, Bogomolov and Hachey 2007, Menezes et al. 2014, Ruschel et al. 2014).Interval PLS allows the construction of models with a spectral interval, and Root Mean Square Error of Cross Validation (RMSECV) values can be used as the criterion to evaluate the prediction ability of this interval.However, the exclusion of intervals with higher RMSECV values can cause the loss of useful information.Thus, advanced regression algorithms like siPLS can be applied to fi nd favorable interval combinations for calibration.Spectroscopy procedures involving multivariate calibration have received increasingly wider applications in pharmaceutical analysis (Bodson et al. 2006, Blanco et al. 2007, Garcia-Reiriz et al. 2007, Müller et al. 2011, Li et al. 2012, Ferreira et al. 2013).However, mid-infrared (MIR) in combination with multivariate calibration is underutilized in pharmaceutical analysis in comparison to other spectroscopic techniques (Lundstedt-Enkel et al. 2006, Moros et al. 2007).
One of the most interesting pharmacological groups that can be analyzed involving multivariate calibration methods are the antimicrobial compounds.These compounds are usually pharmaceuticals combined and, prior to be analyzed, require a separation step.Sulphamethoxazole (SMZ) is a sulfonamide used in combination with trimethoprim (TMP) in a single pharmaceutical product to treat infections such as bronchitis, middle ear infection, urinary tract infection, and traveler's diarrhea (O'Neil 2006).The structural formulas of the sulphamethoxazole and trimethoprim are shown in Figure 1.Quantifi cation of SMZ and TMP in pharmaceutical preparations has been described using the spectrophotometric method based on red-colored product formation by diazotization of sulphonamides (Nagaraja et al. 2002), the fl ow injection systems (Tomšů et al. 2004), high performance liquid chromatography (Akay andOzkan 2002, Goulas et al. 2014), second derivative spectrophotometry (Granero et al. 2002), adsorptive stripping voltammetry (Carapuça et al. 2005) and multivariate methods (Ni et al. 2006, Cordeiro et al. 2008).
Pharmacopoeial methods list HPLC as the offi cial assay procedure for quality control in pharmaceutical preparations (USP 2007).In the present work, DRIFTS quantification of commercial tablets containing SMZ and TMP were presented.Interval Partial Least Square (iPLS) and Synergy Partial Least Square (siPLS) were applied to select a spectral range that provided the lowest prediction error in comparison to the full-spectrum model.mg of SMZ and TMP per tablet, respectively) from nine manufactures (named commercial samples) were purchased from local drugstore or acquired by means of donation from pharmaceutical industries.SMZ and TMP certifi ed reference materials were acquired from Brazilian Pharmacopoeia (batches 1010 and 1011 for SMZ and TMP, respectively).Methanol, acetonitrile and triethylamine were HPLC grade.For building of the clusters by hierarchical cluster analysis (HCA), the Euclidian distance and incremental linkage for were used.To carry out the HCA Pirouete ® (Infometrix) software was used.For the selection of the calibration and the validation sets was employed The calibration set was constructed with thirty-two synthetic samples and nine commercial samples and the prediction set was constructed using seventeen synthetic samples and six commercial samples.Synthetic and commercial samples were prepared by powder mixing in a cryogenic mill Spex Certiprep (model 6750 Freezer Mill, Metuchen, EUA).A time period of 2 min was enough to mix each sample, which was ground up to particle sizes smaller than 80 μm.

MATERIALS AND SAMPLE PREPARATION
SMZ (batch 22960805) and TMP (batch 200504246) bulk drugs were purchased from Henrifarma (São Paulo, Brazil) and used for the preparation of synthetic samples.Forty-nine formulations (named synthetic samples) containing SMZ (400 to 900 mg g -1 range), TMP (80 to 240 mg g -1 range) and diluent (starch and magnesium stearate (99:1)) were prepared in laboratory, as shown in Table I A Nicolet Magna 550 spectrometer (Nicolet Instrument Co., Madison, WI) was used for all the experiments.All spectra were recorded from 4000 cm -1 to 600 cm -1 with 16 scans and spectral resolution of 4 cm -1 .This instrument was equipped with an EasiDiff ® diffuse refl ectance sampling accessory (Pike Technologies Inc., USA).For DRIFTS data acquisition, 34.0 ± 0.3 mg of solid sample was placed onto the accessory and its spectrum was recorded without any dilution in KBr (Wu et al. 2010).For the background spectrum, we used only KBr grade spectroscopic.For each sample, three spectra were acquired and the average spectrum was used for building the multivariate models.Data were handled using Matlab software 6.5 version (The Math Works, Natick, USA).For PLS multivariate calibration models, the "PLS Toolbox" 2.0 version was used (Eigenvector Technologies, Manson, USA).The iToolbox for Matlab was used for the variable selection and the multivariate model development (Nørgaard et al. 2000).Software program was run on an IBM-compatible Intel Pentium 4 CPU 3 GHz and 2 Gbytes RAM microcomputer.The spectral band was divided into 10, 25 and 50 intervals for evaluation of the models generated from iPLS and siPLS algorithms.The differential compaction degree and particle size may lead to baseline variations and artefacts because of physical light scattering, therefore, multiplicative scatter correction (MSC) was employed to reduce this scattering effect.The spectra of samples were preprocessed by mean centering.A statistical F test (α = 0.5%) was introduced in order to show if there were significant differences between prediction errors of the constructed models.
HPLC REFERENCE METHOD SMZ and TMP content was carried out using HPLC procedure according to the method described in the United States Pharmacopoeia (USP 2007).This procedure was chosen as reference and it was performed with a HPLC system consisting of Agilent 1100 Series system.Commercial tablets were fi nely powdered.A mass corresponding to 160 mg of sulphamethoxazole and 32 mg trimethoprim for each formulation was accurately weighed and dissolved in 100 mL of methanol.The sample preparations were subjected to sonication using an ultrasonic bath for fi fteen minutes.An aliquot of 5 Samples SMZ (mg g -1 ) TMP (mg g -1 ) Excipients (mg g  ( ) Where: ŷ i is the predicted value for the test set sample i, y i the measured value for the test set sample i, and n is the number of observation in the tested set.Root Mean Square Error of Cross-Validation was used to evaluate the error of the proposed calibration models and to select the number of latent variables.Root Mean Square Error of Prediction (RMSEP) was used to evaluate the prediction ability between different PLS models (Brereton 2003).Performance of the obtained calibration models was checked through relative Standard Error of Prediction (RSEP) as calculated by: 100 Where: ŷ i is the predicted value for the test set sample i, y i the measured value for the test set sample i.The iPLS models were built with the spectrum divided into 10, 25 and 50 intervals.The iPLS routine generated graphical information indicating the optimal number of latent variables used in each interval model, and RMSECV values.In this case, the subinterval that presented the lowest RMSECV values was selected.Synergy PLS models were constructed with the spectrum set divided into 10, 25 and 50 intervals and combinations from two to fi ve intervals.The combined subintervals that presented the lowest RMSECV values were selected.The systematic error ("bias") and the Standard Deviation of Validation (SDV) were calculated from equations 3 and 4, respectively (ASTM E1655-05 2005): Thereafter, the t-test was applied, according to equation 5 (ASTM E1655-05 2005): SDV n bias t sist = (5) The systematic error was not considered significant for the t sist values lower than critical value at alpha = 0.05 and df = n-1.
Results obtained by DRIFTS for SMZ and TMP quantification in commercial tablets were compared with the interval permitted by Brazilian Pharmacopoeia (93-107% declared value).

SELECTION OF CALIBRATION AND VALIDATION SAMPLES
The variations in the formulations could impose quite a challenge for the development of the universal model.Although the drugs in the tablets are the same, the types and amounts of excipients FABIANA E.B. DA SILVA, ÉRICO M.M. FLORES, GRACIELE PARISOTTO, EDSON I. MÜLLER and MARCO F. FERRÃO in their formulations can vary considerably as per manufacturer products.If careful considerations are made when selecting the representative calibration sample set that will cover these variations, the universal model should be achievable.A Hierarchical Cluster Analysis (HCA) was then performed for a representative calibration and prediction sets for different samples (synthetic and commercial samples).Initially, in order to have a measurement of the quality of the variable selection algorithms, as well as the effects that pretreatment, models were built using DRIFTS full-spectrum information.Full-spectrum PLS models were obtained with fourteen and eight latent variables for spectra with or without pretreatment, and results are shown in Tables II and III.Through RMSEP value was calculated for the accuracy of the results obtained with DRIFTS technique.
When the results with and without pretreatment were compared, the number of latent variables increase for the models without pretreatment.The RMSECV and RMSEP values also increased for the models without pretreatment.These results demonstrate the necessity of pretreatments of the spectral data to build a multivariate regression models.On this basis in other tables will be presented only the results that employ them preprocessed spectral data.

SULPHAMETHOXAZOLE iPLS MODELS
The principle behind the interval PLS algorithm is to split the spectrum into smaller equidistant regions and develop models for each subinterval.Thereafter, the subintervals RMSECV are compared to full-spectrum RMSECV values.The results are shown in Table III.
Interval PLS plots RMSECV values for each interval selected and the RMSECV values for the full-spectrum model using eight latent variables are shown in Figure 3. Interval of number 9 for model PLS with 10 intervals (iPLS10) produced the lowest RMSECV but did not produce RMSEP lower than the full-spectrum PLS model.Problems associated with overfitting were present in this model, which led to higher errors than the ones generated by the global model.This fact can be due to the lack of robustness of these models which, despite producing RMSECV in the same order as the global model, did not have enough information to build models with low prediction errors (Faber and Rajkó 2007).It is possible that the most important spectral information for the regression are not contiguous.In this case the selection of a single range is insuffi cient, leading to increased error in prediction (Friedel et al. 2013).Moreover, the calibration using the full spectrum may include non-informative spectral regions making the obtained model more vulnerable to noise.In this case, a judicious selection of spectral regions would improve the predictive ability of the PLS model (Lee et al. 2012).Therefore, variable selection by siPLS was implemented to verify if the combination of more than one interval would result in models with better predictive capacity.The siPLS algorithm principle is to split the data set into a number of intervals (variable-wise) and to calculate all possible PLS model combinations of two, three or more intervals.Thereafter, the combined subinterval RMSECV is compared to full-spectrum RMSECV values.The spectrum was divided into 10, 25 or 50 intervals combined in up to 5 subintervals.The best results were achieved when the spectrum was split into ten intervals and the intervals of number 6, 7 and 10 were selected, as shown in Table IV.For this siPLS model, results showed good correlation between  reference and predicted values indicated by a correlation coeffi cient of 0.994, as shown in Figure 4.The selected intervals included the regions of 1,960 -2,300 cm -1 (interval 6) and 1,620 -1,960 cm -1 (interval 7).Both intervals correspond to harmonic bands by aromatic ring (Colthup et al. 1990).Interval 10 (600 -939 cm -1 ) corresponds to out-of-plane N-H bending vibration.On the whole, the combination of intervals 6, 7 and 10 by siPLS algorithm, reduced RMSECV and RMSEP values.Therefore, it was possible to fi nd a narrow region for SMZ determination with small prediction errors; reduced variable numbers (525 variables compared to 1,764 used in the full-spectrum model) and reduced latent variables (4 LV compared to 8 LV used in the full-spectrum model) resulting in a more robust model with better predictive power.Average prediction results, and RMSEP for the selected siPLS calibration models, are shown in Table V.The siPLS model using intervals 6, 7 and 10 resulted in low Relative Standard Error of Prediction (RSEP = 1.77%), suggesting that the method used is accurate as also shown in Table V.
The errors calculated for the prediction samples showed random behavior as shown by this model with insignifi cant systematic error (bias = 1.77 and t sist <t crit ).For a subset of commercial samples, no signifi cant trend was observed (bias = 1.29 and t sist <t crit ), which shows that the systematic error for the model may be considered insignifi cant.DRIFTS method (mg g -1 ) Reference HPLC method (mg g -1 ) DRIFTS method (mg g -1 ) Figure 5 shows the central iPLS plots, the RMSECV values for each interval selected (bars) and the RMSECV values for full-spectrum model (line) using four latent variables.Table VI shows the statistical indicator for TMP iPLS calibration models using the spectrum subdivided into 10, 25 and 50 intervals.The models were developed from the division of the spectrum into 10 and 25 selected intervals in a similar region (941-1280 and 1110-1250 cm -1 ), showing that this region is suffi cient to create a model for drug quantifi cation.For these regions does not occur a significant increase in RMSECV value compared to the value of the global model, but the RMSEP value and the number of variables have been reduced.As in the previous case, siPLS was implemented to verify if the combination of more than one interval would result in models with better predictive capacity.
TRIMETHOPRIM siPLS MODELS The algorithm siPLS was implemented using the spectrum subdivided into 10, 25 or 50 intervals combined in up to 5 subintervals.Table VII shows the statistical indicators for TMP siPLS calibration models.The results showed a good correlation between reference and predicted values, indicated by a correlation coeffi cient of 0.983, as shown in Figure 6.The lowest RMSEP value was obtained when the spectrum was split into 25 intervals and intervals 15 and 17 were combined.For this siPLS model, the results showed a good correlation between reference and predicted values, indicated Samples SMZ TMP Reference HPLC method (mg g -1 ) DRIFTS method (mg g -1 ) Reference HPLC method (mg g -1 ) DRIFTS method (mg g      The siPLS model combined intervals 15 and 17 allowing better predictive ability when compared to iPLS models and full-spectrum PLS model.Therefore, it was possible to fi nd a narrow region for TMP determination with small prediction errors and reduced variable numbers.Average prediction results, RMSEP and RSEP (%) for the selected siPLS calibration model are shown in Table V.This siPLS model combining three intervals resulted in low prediction errors (RSEP = 3.16%).The systematic error obtained for the model was not signifi cant.The errors calculated for the prediction samples showed random behavior (bias = 1.26 and t sist <t crit ).For a subset of commercial samples, no signifi cant trend was observed (bias = -0.09and t sist <t crit ), which shows that the systematic error for the model may be considered insignifi cant.

CONCLUSIONS
Using the PLS regression algorithm combined with DRIFTS data it was possible to develop multivariate models for simultaneous determination of SMZ and TMP in commercial pharmaceutical products.Assay results, expressed as the percentage of the label claim, were found to be 95.8 to 103.9% for SMZ and 95.7 to 106.4% for TMP.These results were in agreement with the content of SMZ and TMP in powder mixtures according to the USP 30 requirements (93 to 107%) for the solid preparations.The variable selection techniques used in this work, produced models with better predictive ability compared to full-spectrum PLS models.The siPLS algorithm proved to be most appropriate, combining the spectral regions containing the most relevant information for each analyte quantifi ed.The proposed method is simple, solvent-free and allows potential applications for simultaneous, fast and reliable determination of SMZ and TMP in solid pharmaceutical dosage forms.

FULL
Figure2shows the SMZ and TMP spectra used for the preparation of the synthetic samples.These

Figure 3 -
Figure 3 -Cross-Validated Prediction Errors (RMSECV) values for full-spectrum model and interval models (bars) for the SMZ determination using PLS and iPLS algorithms (dotted line and numbers above interval numbers refer to full-spectrum RMSECV and latent variables used in each model, respectively).

Figure 4 -
Figure 4 -Reference HPLC values versus predicted SMZ values for siPLS model using intervals 6, 7 and 10 and 4 latent variables.

Figure 5 -
Figure 5 -Cross-Validated Prediction Errors (RMSECV) values for full-spectrum model and interval models (bars) for the TMP determination using PLS and iPLS algorithms (dotted line and numbers above interval numbers refer to full-spectrum RMSECV and latent variables used in each model, respectively).
by a correlation coefficient of 0.983, as shown in Figure5.The selected intervals included the regions of 2100 to 2230 cm -1 (interval 15) and 1830 to 1960 cm -1 (interval 17).Both intervals include harmonic bands vibrations of the pyrimidine ring presented in structure of TMP(Colthup et al. 1990).

Figure 6 -
Figure 6 -Reference HPLC values versus predicted TMP values for siPLS model using 15 and 17 intervals and 6 latent variables.

TABLE I (continuation) mL
of each sample was added to 50 mL volumetric fl asks and the mobile phase was used to complete the volume.All these determinations were performed in triplicate for synthetic and commercial samples.

TABLE II Statistical results to iPLS calibration models and full-spectrum PLS model without pretreatment for the SMZ.
b LVs: latent variables.

TABLE III Statistical results to iPLS calibration models and full-spectrum PLS model for the SMZ.
a VN: total number of variables.b LVs: latent variables.FABIANA E.B. DA SILVA, ÉRICO M.M. FLORES, GRACIELE PARISOTTO, EDSON I. MÜLLER and MARCO F. FERRÃO SULPHAMETHOXAZOLE siPLS MODELS

TABLE IV Statistical results to siPLS calibration models and full-spectrum PLS model for the SMZ.
a VN: total number of variables.b LVs: latent variables.*selected model.

TABLE VI Statistical results to iPLS calibration models and full-spectrum PLS model for the TMP. Model TMP VN a Intervals LVs b RMSECV TMP (mg g -1 ) R 2 cal RMSEP TMP (mg g -1 )
a VN: total number of variables.b LVs: latent variables.

TABLE VII Statistical results to siPLS calibration models and full-spectrum PLS model for the TMP.
a VN: total number of variables.b VLs: latent variables.*selected model.