Multivariate Classification Based on Chemical and Stable Isotopic Profiles in Sourcing the Origin of Marijuana Samples Seized in Brazil

Laboratório de Caracterização Química e Isotópica, Centro de Química e Meio Ambiente, Instituto de Pesquisas Energéticas e Nucleares, IPEN/CNEN-SP, Av. Lineu Prestes 2242, Cidade Universitária, 05508-970 São Paulo-SP, Brazil Laboratório de Referências Metrológicas, Instituto de Pesquisas Tecnológicas do Estado de São Paulo, IPT-SP, Av. Prof. Almeida Prado, 532, Cidade Universitária, 05508-901 São Paulo-SP, Brazil


Introduction
Nowadays, the consumption and production of drugs involves almost all countries, moving about US$ 500 billions per year around the world.According to 2004 United Nations Office on Drugs and Crime (UNODC) report, 3% of the worldwide population (185 millions of people) or 4.7% in the 15 to 64 year-old population bracket use illicit drugs. 1 Marijuana or Cannabis reaches about 150 millions users, presenting a productivity from 30,000 a 32,000 tons a year.Amphetamine, cocaine and heroin users are about 30, 13 and 9 millions of people, respectively. 1outh America is one of the major producer, dealer and consumer points of illicit drugs.Originally, Brazil was conceived as a classical transit country for cocaine produced in Colombia and Bolivia.However, in the last decades this picture has been changed and Brazil became a significant producer of narcotics moving about 10 billion dollars a year. 1 The efforts to combat drug trafficking allied to the rise in production, place Brazil in fourth in quantities of marijuana apprehension, while Mexico, USA and Nigeria lead this rank. 1 As a consequence, the country has been gathering all the features related to this problem: production, traffic, consumption, violence and money laundering. 2 Despite of this huge and increasing problem, detailed studies regarding to production, dealing and consumption of this drug in Brazil are still scarce.Official data provided by the Brazilian Federal Police Department 3 (DPF) suggest that the main producing zones in the country are located in the Northeastern region, more specifically in the semi-arid zone between Pernambuco and Bahia States, known as the Marijuana Polygon, and in the Midwestern region, along the Brazil-Paraguayan border.However, the intensive and sustained field raids promoted by Brazilian Federal Police between 1999 and 2003 in the Northern Region eradicated about 10 millions of Cannabis plants, reducing the local productivity, forcing the producers to migrate to other States, mainly those located in country's North region such as Pará and Maranhão which could cause changes in the traffic routes throughout the country. 3,4Nowadays, the apprehensions accomplished by the Brazilian Police in the main consumption centers as is the case of São Paulo city, suggest that most of these drugs are coming from Paraguay, via Mato Grosso do Sul route or from producing regions located along this Brazil-Paraguayan border. 5Furthermore, recent information also indicates the rising of traffic between Paraguay and Northern Region, mainly Ceará State.
Shibuya et al. 6,7 used carbon and nitrogen stable isotopes to investigate the climatic conditions of marijuana growth for samples seized in different Brazilian regions.Some samples from Ceará and Pernambuco appear to be cultivated in humid regions, inconsistently to the local climate, except by the existence of highly irrigated area with soil managemental practices. 7According to the authors, although these results point to the existence of traffic routes between Northern and Midwestern region, 7 additional investigations would be necessary for conclusive results.
The use of chemical fingerprint has escalated in order to set a classification pattern that can match drug samples to their geographical origin.This fingerprint can be established using samples organic, 8-11 inorganic 6,[12][13][14][15][16][17][18][19] or isotopic [20][21][22][23][24] profiles which are related to the plant growth conditions such as climatic conditions and elements availability in the soil.Although each one of these strategies provide very interesting and powerful results, the use of a set of variables independents from each other would be able to provides complementary information which could make it a distinct feature in the sourcing their origins. 19,25enerally, the plants inorganic profile reflects the elemental availability in the soil.This feature allows the using of plant samples as environmental pollution biomonitor, hence they can appear as element accumulator, mainly for toxic metals. 26,27There are many articles in the literature that show the feasibility of using these chemical profile to source the origin of agricultural materials such as wine, 18 wheat, 19 tea 17,28 and barley 25 however data related to Cannabis samples are still scarce. 3,11,13offman and Gentner had shown that Cannabis plants cultivated in acidic soils present higher absorption for Mn, Fe and Zn. 8 Lately, the same authors evaluated the chemical composition (nutrients and cannabinoids levels) of plants cultivated using mineral fertilizers. 9Although some correlations were noted, the data gathered did not prove conclusive to track the geographical origin of these samples due to the large quantity of related parameters. 9he nutrient profile determined by atomic absorption spectrometry technique associated to linear discriminant analysis were used by Landi 14 in classifying samples from different regions of Italy, previously separated in inflorescences and leaves.Although 100% of inflorescences samples were successfully classified, according to the author the methodology did not prove itself potentially viable in the analysis of real samples due to the complexity of the problem. 14he use of inductively coupled plasma mass spectrometry (ICP-MS), a more sensitive technique, allowed Watling 16 to analyze non-essential elements and micronutrients that are present in lower concentrations in plants.About 45 elements were measured qualitatively in samples from Australia, and the most relevant ones to discriminate the different regions in the country were rare earth elements (REE), Au, Th, U e W. Some hydroponics Cannabis were also analyzed and their results are extremely concise with low dispersion rates. 16][31][32] Although for significant differences the samples must have been grown under very distinct climatic conditions, 6,24 this correlation makes them useful to delimit their potential producing areas, and works that use these parameters in the sourcing the origin of illicit drugs such as heroin, 20 cocaine 21,22 and Cannabis 24,33,34 have escalated in recent years.Shibuya et al. 6 proved the potentiality of isotope ratio mass spectrometry (IRMS) in the establishment of the potential regions of Cannabis productions in Brazil, clearly separating samples from humid and dry regions.However this technique was not satisfactory in identifying the origin of some samples from areas far away from each other that present similar climatic conditions, and in this way, the model shows some overlapping between Midwestern and Amazon groups. 6,7According to the authors, in this case, additional information such as the inorganic profiles could be used to access a better discrimination. 6,7he aim of this work was to verify the existing differences in the elemental composition of samples seized in the main Brazilian regions of marijuana production, previously separated by their stable isotope results, and use these differences to separate samples according to their origins.The sector field inductively coupled plasma mass spectrometry techniques (HR-ICP-MS) was the analytical technique applied and the data analysis was performed using hierarchical cluster and linear discriminant analysis.This work was based on samples seized in the street and has been performed in collaboration to São Paulo State Police Department.

Samples
All samples analyzed in this work were seized by the State law enforcement officers in the three different Brazilian regions that present the highest levels of seizures and eradications (see Table 1) and forwarded to the Instituto de Pesquisas Energéticas e Nucleares (IPEN) by the Institutes of Criminalistic (IC) of each State.With exception of 5 samples from Mato Grosso do Sul apprehended in 2004 all seizures were performed between 1999 and 2002.In previous works, 6,7 these samples had their carbon and nitrogen stable isotopic compositions measured and were previously clustered according to their climatic growth conditions (see Table 1).

Methodology
The measurements of inorganic constituents were carried out using a sector field inductively coupled plasma mass spectrometry HR-ICP-MS, Element 1, Finnigan MAT, with the following configuration: CETAC 500 AX autosampler, nickel skimmer cones, Scott nebulizer chamber and Meinhard nebulizer.The wet digestion was performed using a closed vessel microwave oven (MARS 5, CEM CO., model HP-500), a system with temperature and pressure controls.
Initially 39 elements were analyzed (Ag, Al, Au, Ba, Be, Bi, Cd, Ce, Co, Cr, Er, Eu, Fe, Ga, Hf, Ho, In, La, Li, Lu, Mn, Mo, Nd, Ni, Pb, Pr, Rb, Sb, Sc, Sn, Sm, Sr, Th, Tl, Tm, U, V, Yb, Zn) but some on them (such as heavy rare earth elements, Ag, Cd, Au, In and Sc) were below the quantification limits, and the methodology was reduced to 19 elements.The measured isotopes were 27 Al, 59 Co, 85 Rb, 87 Sr, 89 Y, 95 Mo, 138 Ba, 139 La, 140 Ce, 141 Pr, 143 Nd, 208 Pb, 232 Th, 238 U in low (m/Δm = 300) and 69 Ga, 65 Cu, 66 Zn, 56 Fe, 55 Mn in medium resolution (m/Δm = 3000), with indium and scandium as internal standard, respectively.The accuracy and precision of this methodology were evaluated using a standard reference material NIST SRM 1547 (peach leaves), which was analyzed in hexaplicate.For Sr, In and Ba the isobaric interferences were corrected using mathematical equations available in the Element software.It was not observed significant interferences for 59 Co in low resolution.The HR-ICP-MS experimental parameters were listed in Table 2.

Sample preparation
The samples were cleaned in sonicator for about 30 minutes in de-ionized water (twice), dried at 40 °C for about 24 hours, and ground in an electric mill with ceramic mortar and pestle.

Data analysis
The assessment of the results and the building of a model to classify unknown samples followed the hierarchical cluster (HCA) and linear discriminant analysis (LDA), using SPSS (Statistical Package for Social Science) program, version 10.0.5.
HCA aims to identify relatively homogeneous groups of objects (cases or variables) based on selected characteristics, 36 using an algorithm that starts with each sample or variable in a separate cluster and combines clusters until only one is left.Distance (dissimilarity) or similarity measures are generated comparing each pair of sample or variable and each one of these objects will be very similar to the other ones in the same cluster, regarding to predetermined selection criteria.The results are presented in dendrograms, a diagram that displays the distance or similarity between groups and provide a visual means of estimating relationships among multidimensional points.The Ward's method, which was applied in this work, uses an analysis of variance approach to evaluate the distances between clusters and has been proved to outperform the other hierarchical methods. 37DA identifies a linear combination of quantitative predictor variables that best characterizes the difference among known groups (called calibration set). 36Linear discriminant analysis enhances the separation of the groups, allows a classification of unknown samples and lists the group in which each case is most likely a member, and also the probability for belonging to this group. 36The enter-independent-together method builds the model using simultaneously all disposable variables while stepwise method selects the most relevant ones, reducing the data set.These new variables are called canonical variables.The first canonical variable account for a large proportion of the variability within the original data and the plot is defined, so that the most significant differences among the groups are displayed along the horizontal axis.The second canonical variable represents the maximum dispersion in a perpendicular direction to the first one, and so forth. 36

Previous results
The isotopic profiles between samples from dry (Pernambuco, Bahia e Ceará) and humid regions (Maranhão, Pará e Mato Grosso do Sul) presented by Shibuya et al. 6,7 were clearly different, indicating that these variables can be used to obtain information regarding the climate of the producing zones.The high dispersion rate for δ 13 C for the Maranhão group could be explained by the peculiar geographical situation of its Western expanse, whose climate and vegetation are similar to those of the Northern region (the Amazon region), whereas its Eastern area presents a semi-arid climate. 6espite the existing information about highly irrigated Cannabis cultures in Pernambuco and Bahia, no overlapping was observed between these groups and samples from more humid regions.On the other hand, the high dispersion rate observed for δ 15 N especially for samples from Pernambuco indicates Cannabis cultures throughout, from the semi-arid region to areas around the São Francisco River Basin and the woodlands (Zona da Mata), as well as utilization of fertilizers. 6,7his information obtained through the IR-MS technique was used to group the samples and defining the calibration set, as per Table 1.Four samples from Pernambuco State presented δ 15 N values above 8‰, which may be associated to the use of organic fertilizers (such as animal manure) and nine samples from Ceará seem to be originated from Paraguay-Mato Grosso do Sul Route. 6,7

Results and Discussion
The results for SRM NIST 1547 indicated good accuracy, with a recovery better than 90% for the certified  115 In 80,000 cps μg -1 kg elements.Quantification limits, defined as 10σ (n=8) ranges from 0.02 (for U) to 12 ng g -1 (for Zn).
The results of all set of elements are listed in Table 3 together with the data obtained from literature where by comparison the Brazilian samples presented high Fe, Sr and Ba levels and low concentrations of Mo and Cu. 8,9t can be observed that nutrient contents in samples from Region 1 (Mato Grosso do Sul) were high, mainly for Cu and Fe.Aluminium, Ga and Ba levels were also higher than those for the other groups (Table 3).Most soils of this region are of volcanic origin (magmatic rocks, most of them pholeiitic basalt), naturally acidic soils with excess Al.Iron and Mn, also are abundant element in basalt-derived soils, such as this case and Fe can reach toxic levels according the soil's pH (pH < 5.0). 38n comparison to the literature data, Cu and Zn concentrations for these samples were similar to those obtained by Landi, 14 although lower than those presented by Coffman and Gentner. 9It is well known that Zn deficiency is the most generalized and critical in those regions and is probably related to the composition of the parent material. 38The Mo value in Cannabis samples was also lower than those available in the literature, which may point out to a deficiency of this element in the soil. 38amples seized in Region 2 (Marijuana Polygon) presented high levels of U, Th, Pb, Mn and rare earth elements (REE), mainly lanthanum and cerium.The soils of that region are in largely formed of granitic rocks and granulites, 39 which are naturally lanthanides-enriched (in some cases extremely enriched). 40,41The levels of Zn and Mo nutrients are similar to those observed for Regions 1 and 3, the Zn level below the values described by Coffman and Gentner 9 (Table 3), whereas the Mn average appeared above those of the other groups.According to Horowitz and Dantas, Mn concentrations are high in the Northeastern dry backcountry (Sertão) probably as a consequence of low precipitation rates, low content of organic matter and relatively low pH. 40Copper, nevertheless, was considerably lower than those of Region 1.The deficiency of that element is well known, notably its absorption by plants inhibited by high Fe concentrations, current in that region. 42,46Leon finds that the soils in the dry backcountry of the Northeast (Sertão) apparently do not present Mn deficiencies nor do those of the semi-arid region (Agreste) appear to present Zn and Mo deficiencies. 43Variability of those elements in the Northeastern samples was large, and in accordance to the results obtained by stable isotopic profiles may indicate that they were produced not only in the Pernambuco dry backcountry (Sertão), but also in others locations with lower deficiency of nutrients, such as the semi-arid region (Agreste) and the woodlands (Zona da Mata).
The concentrations of Cu, Mo and Zn nutrients obtained for Region 3 (Amazon) samples were similar to the levels reported for Region 2. The samples seized in Region 3 also showed low concentrations of cobalt (below 100 ng g -1 ), whereas the average arrived at for this element in samples coming from other regions was above 600 ng g -1 (Table 3).The low levels of concentration referring to practically all of the elements in samples from the Amazon Region may be explained by the intense leaching of those soils. 44Despite the region's great geological diversity, 45 the climatic conditions strongly affect the soil's physical and chemical properties, 46 hence the similarity in chemical profile of this group's samples.
As expected, the correlation between the elements for each region, evaluated using hierarchical cluster analysis shows a strong correlation between the lanthanides and also between Al, Mn and Fe (Figure 1).In the Amazon Region, Co, Cu and Zn appear correlated to lanthanides, while in the Northern group this nutrients appear related to Rb, Sr, Ba and Ga.In Region 1 (MS), Zn and Cu appear as independent from the others elements while Co is related to U (Figure 1).
The hierarchical cluster analysis using Ward's method, based on inorganic profile was tested to evaluate the discriminatory capability of these variables.This analysis provided three different clusters, as follow.
The first one was composed by samples from humid and dry regions.These groups is subdivided in two branches, the first one composed by samples from MS (6), PA (18), MA (10), and Marijuana Polygon ( 16) and the second one formed by samples from Marijuana Polygon (15), with 2 samples from MA and one from PA.
The second group was basically formed by samples from MS (23), including samples from PA (1), BA (1)  and PE (1).It is very interesting to notice that the 9 samples from Ceará State, which were previously considered outliers based on stable isotopic profiles (Table 1), clearly belong to this group, presenting elemental profile similar to those seized in Mato Grosso do Sul.The last group was formed basically by samples from Marijuana Polygon, except by one sample from MS, including the remaining ones from BA, PE and CE.
Evaluating the chemical profile of each cluster it can be noted that the first cluster gathers samples with low concentrations levels for almost all set of elements, while the last one presents high levels of lanthanides, U, Th, Mn, Al, Cu and Pb.The second cluster presented high levels of Ba, Sr, Fe and Mo.However, despite this separation, the classification regarding to their origin was not satisfactory.
The use of all set of parameters (inorganic and isotopic profiles) improves this separation eliminating the previously observed overlapping although samples from Marijuana Polygon were divided in two distinct groups, which was also observed for Mato Grosso do Sul group.As follow, this analysis provides four different clusters: i) samples from Northern Region (9 from BA, 18 from PE and 4 from CE), including the 7 samples from MA with isotopic profile typical of dry regions (see Table 1) and one sample from PA; ii) samples from humid regions subdivided in Amazon (19 from PA and 5 from MA) and Mato Grosso do Sul (18), including the 9 samples from CE that probably were originated from Paraguay, 3 from PE and 1 from BA; iii) the remaining samples from MS; iv) the rest of samples from Marijuana Polygon, including the remaining ones from CE.It is interesting to notice that 2 samples from Pernambuco that presented isotopic profiles different from the rest of the group (see Table 1) are clustered together with samples from Mato Grosso do Sul.The discrimination between clusters 1 and 4 (both formed by samples from Marijuana Polygon) is related mainly to the levels of lanthanides, Y, Th, U, Co, Rb and Al, as can be seen in Figure 2a.The groups from MS (cluster 2) that seems to be different from the other samples from this location (cluster 3) differs by the levels of Fe, Al, Co, Cu, Zn, Ga and also lanthanides (see Figure 2b).
Despite of the indicatives that Cannabis is cultivated in different locations within the producing States, or under different cultivation conditions such as the use of mineral fertilizers that could alter the soil geochemistry properties, in general, these exploratory analyses seem to confirm the previous groupings achieved by using carbon and nitrogen isotopic profiles.Hence, samples from different Northern States cannot be separated from each other however the elemental compositions appear to be very useful to improve the separation between Amazon Region and Mato Grosso do Sul groups, that present similar climatic conditions and could not be clearly separated based only in isotopic profiles.
The pre-evaluation of data by using stable isotopes is important, since the temperature and water availability bear a significant influence upon the soil's physical and chemical properties, which affect directly the chemical reactions occurring therein, defining besides its pH, also the availability of nutrient elements and the ion predominant in the soil solution. 46Generally speaking, soils formed in tropical climates present high weathering rates as compared to those under temperate climates. 46he more humid and warmer the climate, the stronger the mineral leaching, once heavy rainfall leaches basic ions such as calcium and magnesium and replaces them with acidic ions such as hydrogen and aluminum, such as for example that of the Amazon Region. 47On the contrary, soils from arid regions tend to become alkaline, once rainfall is not heavy enough to leach basic ions. 46nce the exploratory analysis and the establishment of the calibration set were completed, a classification model was proposed based on the linear discriminant analysis (LDA).This evaluation aims to determine the most relevant parameters and the establishment of a methodology that provides the probability of the samples for belonging to each group.Once this technique should be based on samples of known origin and the inclusion of some of them with unclear source could mislead the development of the model, the previously identified isotopic outliers (see Table 1) were not taken into consideration in the calibration set.
Using all the set of variables (enter independent together method) 100% of the calibration set are assigned to their correct group.The stepwise selection of variables also allowed a 100% successful classification with using 10 out of the 21 available variables.The cross validation, the procedure in which each case is classified into a group according to the classification functions computed from all the data except the case being classified, 36 offered similar results (97.8% overall rate of successfully classified cases), with two samples seized in the Region 2 (Marijuana Polygon) classified as originating from Region 1, and one sample from Region 3 (Amazon) as originating from Region 2 (the same one classified in cluster 1 in the cluster analysis using all the set of variables).
The most relevant parameters in this discrimination were: δ 13 C, δ 15 N, Ba and Mn in function 1 (F1) and δ 13 C, Cu, Co, La, Zn, Y, and Fe in function 2 (F2) -Table 4. According to these results, the climatic conditions are the most relevant features in defining the differences between samples from the studied regions, followed by the micro and macronutrients as well as the rare earth elements availability in the soil.
The territorial map and samples from Regions 1, 2 and 3 in the canonical variable plot can be seen in Figure 3.This figure shows the discrimination between Regions 1 and 2 is basically related to F1, a parameter associated with climatic conditions; on the other hand, Region 3 samples can be distinguished from the rest of them by the fact that they present a lower F2 (mostly associated with δ 13 C in addition to the levels of nutrient elements).

Conclusions
This work demonstrated that the use of high performance analytical techniques associated to chemometric methods represent a powerful tool in tracking the geographic origin of marijuana samples.Despite the inaccuracy of the elemental profile alone in the sourcing  the origin of marijuana samples, the results clearly show the existence of three, or even four different sites of Cannabis production regarding to the soil geochemical properties.The IR-MS technique has proved itself particularly useful in a preliminary evaluation of the samples, by clearly separating materials from humid and dry regions.Furthermore, these parameters were indispensable for the correct classification of the groups by cluster and linear discriminant analysis that could be achieved successfully using HR-ICP-MS and IR-MS data jointly.
While hierarchical cluster analysis represents a very useful toll in exploratory analysis, linear discriminant analysis was extremely helpful to summarize many variables by a few factors, determining the most important parameters in these groups separation and proving itself to be fundamental to the evaluation of results.Despite of that, the obtaining of reliable results depends on a full understanding of the problem under consideration and of all variables involved.
The sampling strategy adopted yielded highly significant results, mainly from the point of view of forensic science.The results will be helpfull in the establishment of chemical fingerprints for marijuana traded in Brazil and this model would be useful in the future in classifying unknown samples.A great agreement between the results obtained and the geological characteristics of the region under study was observed such as low rare earth elements levels for regions 1 and 3, in contrast with those seized in the Polygon region.This work demonstrated that samples daily seized by the Law Enforcement Officers could be used for the purpose of creating a national databank of production sites, which will be very helpful in tracking the routes of distribution of the drug throughout the country.

Figure 1 .
Figure 1.Dendograms showing groupings of variables for Mato Grosso do Sul (a), Northern (b) and Amazon (c) samples by Ward's method.

Figure 2 .
Figure 2. (a) mean concentration values for clusters 1 and 4 (Marijuana Polygon) and (b) clusters 2 and 3 (Amazon Region and Mato Grosso do Sul and Mato Grosso do Sul, respectively).The data were normalized by mean value considering all set of samples.

Table 1 .
Main Brazilian producer regions, total of samples analyzed and groupings according to their IR-MS results

Table 2 .
HR-ICP-MS instrumental settings and data acquisition parameters

Table 3 .
Range of concentration for Cannabis samples from three main Brazilian producing zones and data from the literature.Mean values are presented in parenthesis

Table 4 .
Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions.The variables are ordered by absolute size of correlation within function