The Violacein Biosynthesis Monitored by Multi-Wavelength Fluorescence Spectroscopy and by the PARAFAC Method

A obtenção de informações sobre uma rota biossintética é um procedimento complexo e trabalhoso. Neste sentido, o presente trabalho apresenta uma nova abordagem para a análise inicial da biossíntese de produtos naturais fluorescentes usando a biossíntese da violaceína como exemplo. Para tanto, uma cultura da Chromobacterium violaceum foi inoculada em um biorreator de onde alíquotas eram retiradas a cada 2 h para posterior análise por espectroscopia de fluorescência 2D. As matrizes de emissão-excitação obtidas demonstraram o comportamento dinâmico dos sinais de fluoróforos que são consumidos e produzidos pela bactéria. Estes sinais foram resolvidos pelo método PARAFAC (análise paralela de fatores) perfazendo um total de seis espécies químicas. O triptofano e a violaceína foram identificados por comparação espectral. A identificação dos outros fluoróforos mostrou-se a etapa crítica devido à falta de banco de dados de produtos naturais fluorescentes para comparação. Por fim, esta metodologia apresenta um grande potencial para fornecer informações da biossíntese de produtos naturais.


Introduction
Natural products have had historical success as biologically active structures.Among these, the violet pigment, named violacein, produced mainly by bacteria of the genus Chromobacterium has attracted increased interest owing to its important biological activities and pharmacological and industrial potentials. 1,2iolacein (Figure 1) is a secondary metabolite with a molar mass of 343.3 amu and is constituted of 5-hydroxyindole, 2-pyrrolidone and 2-oxindole moieties that show strong absorption in the visible region due to resonance. 3tudies about violacein biosynthesis began in 1934 with Tobie, 4 who observed that oxygenation of a Chromobacterium violaceum culture greatly reduced the time required for maximum pigment production.Twenty five years later, DeMoss and Evans 5,6 discovered that to synthesize violacein, the bacteria needed molecular oxygen and that L-tryptophan was incorporated into violacein, except for the carboxylic carbon, eliminated by a decarboxylation process during the biosynthesis.From 1987 to 1990, Hoshino et al. [7][8][9] demonstrated that (i) the carbon skeleton of the pyrrolidone moiety was built up by the condensation of the side chains of two L-tryptophan molecules accompanied by the 1,2-shift of the indole ring on one molecule of tryptophan; 7 (ii) all carbon, nitrogen and hydrogen atoms in the pyrrolidone moiety were provided exclusively by L-tryptophan and that the oxygen atoms come from molecular oxygen; 8 and (iii) the intermediacy of 5-hydroxy-L-tryptophan in the violacein biosynthesis since the hydroxylation of tryptophan was the first step of violacein biosynthesis. 9fter that, the violacein biosynthesis has only been evaluated by radiolabel incorporation experiments.In 1991, Pemberton et al. 10 began a study at the genetic level by isolating a 14.5 kb fragment containing the genes encoding violacein biosynthesis from C. violaceum.From 1993 to 2000, in a series of works, Hoshino et al. [11][12][13][14] discovered that (i) chromopyrrolic acid (CPA) was produced independently from the violacein biosynthesis; 11 (ii) proviolacein, prodeoxyviolacein and pseudoviolacein were identified as plausible biosynthetic intermediates of violacein and that oxygenation the 2-position of the indole ring occurs in a final step in this biosynthesis; 12 (iii) oxygenases were involved and cofactor NADPH was needed for the biosynthesis of violacein; 13 and (iv) that the 1,2-shift of the indole ring occurred through an intramolecular process during the formation of the left part (5-hydroxyindole side) of the violacein skeleton. 14n 2000, August et al. 15 showed that the violacein biosynthetic gene cluster was comprised of four genes, VioABCD.The products of the VioA, VioC and VioD were nucleotide-dependent monooxygenases.Disruption of VioA or VioB completely abrogated the biosynthesis of violacein intermediates, while disruption of VioC or VioD genes resulted in the production of violacein precursors.Collecting all information obtained so far, August et al. 15 proposed a hypothetical route for violacein biosynthesis from L-tryptophan.This pathway consisted of: (i) modification of one L-tryptophan by VioD, forming 5-hydroxy-L-tryptophan, and oxidative deamination of the other by VioA, producing indole-3-pyruvic acid (IPA); (ii) decarboxylative fusion of these two tryptophan-derived units and a 1,2-shift of the 5-hydroxyindole ring by VioB, resulting in prodeoxyviolacein; and (iii) oxygenation of the latter by VioC yielding violacein.
In 2005, Howard-Jones and Walsh 16 reported a similarity between violacein biosynthesis and indolecarbazole biosynthesis.Both pathways include a decarboxylative fusion of two tryptophan-derived units.Additionally, the authors found a level of identity of 34% between VioB and RebD (a heme protein required for formation of CPA and involved in the biosynthesis of the indocarbazoles rebeccamycin and staurosporine).Based on the similarities between these two pathways, Sánchez et al., 17 in 2006, hypothesized the construction of hybrid pathways for production of oxygenated indolocarbazole, by coexpression of the genes.The hybrid pathways did not yield oxygenated indolocarbazoles, however their results provided new information on violacein biosynthesis.First, a pair of genes (VioAB), responsible for the earliest steps in violacein biosynthesis, was equivalent to the homologous pair in the indolocarbazole pathway (RebOD), directing the formation of CPA.Second, in addition to VioABCD, a fifth gene (VioE) was essential for violacein biosynthesis.Third, CPA was not an intermediate product of this biosynthesis.However, its formation was not independent of violacein biosynthesis as it was produced by VioAB in the absence of VioE.Fourth, both VioC and VioD oxygenases appeared to act on the latter steps of violacein biosynthesis.The last two results were contradictory to those reported by Hoshino et al., 9,11,14 and, consequently, to the mechanism proposed by August et al. 15 In 2006, Balibar and Walsh 18 reconstituted in vitro the entire violacein pathway.Their results demonstrated that: (i) L-tryptophan, and not 5-hydroxy-L-tryptophan, was the sole precursor of violacein; (ii) VioB was responsible for the oxidative coupling of two molecules of IPA imine generated by VioA for formation of the pyrrole core; (iii) the fifth gene, VioE, was responsible for the 1,2-shift of the indole ring that resulted in the formation of prodeoxyviolacein rather than CPA; (iv) formation of violacein then proceeded by the sequential action of VioD and VioC.These results confirmed the in vivo results proposed by Sánchez et al. 17 In 2007, Shinoda et al. 19 reported the identification of a true intermediate of violacein biosynthesis, named protoviolaceinic acid, which was produced by incubating L-tryptophan with VioABDE in the presence of NADPH.The production of this compound confirmed that VioC works to oxygenate the 2-position of the right side indole ring.Additionally, their results indicated that the oxygenation reaction to form the central pyrrolidone core proceeded in a non-enzymatic fashion.An overview of the violacein biosynthesis proposed by these authors is shown in Figure 2.
It begins with the VioA oxidation of L-tryptophan to indole-3-pyruvic acid (IPA) imine.Then, the oxidative coupling of two molecules of IPA imine by VioB gives an uncharacterized intermediate proposed to be an IPA imine dimer.The skeleton of this compound is catalytically arranged by VioE through an intramolecular rearrangement of the indole ring producing protodeoxyviolaceinic acid.This latter can be converted spontaneously by autoxidation into prodeoxyviolacein or it can undergo hydroxylation catalyzed by VioD at the 5-position of the left side indole ring giving protoviolaceinic acid.The protoviolaceinic acid is hydroxylated by VioC at the 2-position of the right side indole ring to form violaceinic acid.The subsequent conversion to violacein involves a non-enzymatic process of oxidative decarboxylation.
The mechanistic investigation of this complex biosynthetic pathway at the chemical, biochemical and genetic levels involved many steps.Radiolabel incorporation experiments coupled with techniques for separation and structural elucidation, molecular tools and recombinant methods were used.However, a special property of this system was not explored.In the violacein biosynthesis, the precursor (L-tryptophan), the product (violacein) and, probably, the intermediaries are fluorescent compounds.Hence, it is expected that information about this biosynthesis can be obtained from fluorescence spectroscopy when it is applied to detect compounds that are consumed and produced during cultivation.
Thus, a methodology for obtaining information about the biosynthetic pathway of fluorescent compounds is proposed in this work.It consists in employing multiwavelength fluorescence spectroscopy and PARAFAC method to detect and identify (through spectral resolution) fluorophores that are consumed and produced by a bacterium and, finally, to establish a relationship with the natural product biosynthesis.
1][22][23] However, this methodology was not employed to directly link the spectral resolution information to a biosynthetic pathway.Therefore, this is a new and complementary approach for the initial analysis of the biosynthesis of fluorescent natural products.

Organisms, culture media and cultivation conditions
Starting from a culture of C. violaceum CCT 3496, a colony was isolated, after culture by the streak plate method.This pure colony was inoculated into a 250 mL Erlenmeyer flask containing 50 mL of sterilized culture medium (0.50% D-glucose, 0.50% bacteriologic peptone, 0.25% yeast extract and 0.03% L-tryptophan) and grown for 24 h at 33 ºC on an orbital shaker at 200 rpm.Afterwards, 10 mL of this bacterial culture was transferred to a 1500 mL BioFlo bioreactor (New Brunswick Scientific) containing 1000 mL of sterilized culture medium and the parameters temperature, stirring and air flow were adjusted (33 ºC, 200 rpm and 1.0 L min -1 , respectively) and kept constant throughout the cultivation (for 36 h).

Sample collection
Every 2 h, aliquots of 10 mL were withdrawn from the bioreactor with the aid of a glass syringe, stored in 15 mL Falcon tubes and frozen (-20 ºC).At a later stage, each aliquot was thawed in a thermostatic bath at 30 ºC for 5 min and then centrifuged at 7000 rpm for 10 min.The supernatant was eliminated and 5 mL absolute ethanol was added to extract intracellular compounds, which were then centrifuged at 7000 rpm for 10 min for cell removal and the new supernatant was collected.The crude ethanolic extract was passed through a 0.45 μm pore-size filter.Altogether, 18 samples corresponding to the crude ethanolic extract of C. violaceum at different stages of biosynthesis were collected.

Multi-wavelength fluorescence spectroscopy
Fluorescence measurements were performed using a Varian Cary Eclipse fluorescence spectrophotometer in the scan mode.Aliquots of 0.5 mL of the samples were diluted in 2.5 mL of absolute ethanol in the quartz cuvette to reduce effects of quenching, inner filter and energy transfer processes.Initially, the excitation-emission matrices (EEMs) were collected for each sample with spectral ranges of 250-620 nm for excitation and 270-800 nm for emission.The scan rate for each EEM was 1200 nm min -1 .The increments for the excitation and emission wavelengths were 5 and 2 nm, respectively.The excitation and emission slits were adjusted to 5 nm.Filters were set on "Auto" for both excitation and emission monochromators.This means that the filter wheel is automatically moved to the appropriate position according to the selected excitation/emission wavelength.The photomultiplier tube (PMT) voltage was set to 600 V.The measurements were carried out in 90° geometry.
It is known that the violacein molecule shows an emission band at 675 nm when excited at 575 nm. 2 However, with the measurement conditions described above, it was not possible to detect fluorescence signals in the spectral region of 550-620 nm for excitation and 600-800 nm for emission (Figure 3a).To explore this problem, new instrumental settings were tested.Filters were set on "Open" for both excitation and emission monochromators, which means that no filter was used and the photomultiplier tube (PMT) voltage was increased to 800 V. Usually, excitation and emission filters are used to reduce the stray radiation effect.Additionally, the output signal for a given amount of light will be enhanced by increasing the PMT voltage.
These new instrumental settings allowed visualization of fluorescence signals in the low energy region as shown in Figure 3b.Hence, it was decided to investigate the entire spectral range by dividing it into smaller regions and using specific instrumental settings for each.Altogether, five regions were defined (Figure 3a) and their instrumental settings are summarized in Table 1.
Following this procedure, six EEMs, covering the entire spectral region and the regions B, C, D, E and F were collected for each sample.

Data preprocessing
Rayleigh scatter peaks are largely unrelated to the chemical properties of the sample.They occur when a molecule has been excited to a virtual energy state by a photon with insufficient energy to completely excite the molecule. 24They are situated diagonally in EEMs (Figure 3a) and do not present linear behavior which may complicate the fluorescence data modeling.Therefore, Rayleigh scatter peaks should be treated prior to modeling.
][27][28] The procedure employed here consisted in removing and replacing affected areas by interpolated values according to the data in the remaining parts of EEM.Interpolation of scattered areas was performed by applying the method developed by Bahram et al., 29 which is available from another work.

Data modeling
Multi-wavelength fluorescence spectroscopy involves successive acquisition of emission spectra at multiple excitation wavelengths generating a two-dimensional matrix (EEM) per sample.Using the culture time, it is possible to produce a 3D array X with dimensions I × J × K by stacking EEMs one on top of another (Figure 4), in which I is the number of samples collected during the reaction, J is the number of emission wavelengths and K is the number of excitation wavelengths.
This array X can be modeled by PARAFAC, 31 a trilinear decomposition method for higher order data whose structural basis is given by three loading matrices A, B and C (Figure 4).These matrices are obtained using an alternating least squares algorithm and contain information related to relative concentration, emission spectrum and excitation spectrum of each analyte.The core H is a binary array (F × F × F) with ones in the super-diagonal and zeros in all other elements and indicates that only loading vectors with the same column number interact, i.e., only the f th column of A interacts with the f th columns of the B and C matrices.
The fluorescence excitation-emission matrices are known to approximately follow a trilinear model. 32herefore, the application of the PARAFAC method in fluorescence excitation-emission data can resolve overlapping signals into pure concentration and spectral profiles.
The trilinear model is found to minimize the sum of squares of the residuals, e ijk , according to equation 1, (1)   where, x ijk is the intensity of the i th sample at the j th emission wavelength and at the k th excitation wavelength.The a if element may be interpreted as the relative concentration of analyte f in sample i.The j-vector b f with elements b jf (j = 1,…, J) is the estimated emission spectrum of this analyte and likewise c f is the estimated excitation spectrum.
The most important step when applying the PARAFAC method is to estimate the appropriate number of components or factors (F) in the data set (see equation 1).3][34] In this work, the number of components was assessed on the basis of three criteria: explained variance, CORCONDIA and lack of fit (LOF) of the models.
The explained variance (R 2 ) is calculated by taking into account the sum of the squares of the residuals, e ijk , and the sum of the squares of the elements of the array X, x ijk , according to equation 2.
(2)  CORCONDIA is defined as where, g def is the calculated element of the core using the PARAFAC model; h def the element of a binary array and F is the number of factors in the model.In an ideal PARAFAC model, g def is equal to h def and, in this case, CORCONDIA will be equal to 100%.A model with a CORCONDIA value above 90% be understood "very trilinear", whereas a model with a CORCONDIA in the neighborhood of 50% would mean a problematic model with signs of both trilinear and not trilinear variations.A CORCONDIA close to zero or even negative implies an invalid model.More detailed information can be seen in elsewhere. 33n complex biological matrices, the determination of the number of factors by CORCONDIA is sometimes elusive and cannot be automated. 35In this sense, the appropriate number of factors should be evaluated by other criteria.Here, the LOF criterion is introduced and calculated according to equation 4.

(4)
The PARAFAC algorithm used in this work is from the N-way Toolbox, 36 downloaded from elsewhere 37 and used with the software MatLab 7.0 (Mathworks Inc, USA).

Growth phases of bacteria
The violacein biosynthesis was monitored in the bioreactor for 36 h producing a total of 18 samples.The visual changes that occurred in the culture medium during this period are presented in Figure 5.
The first hours after inoculation corresponded to a lag phase that is the period of adaptation of bacteria to the environment and is characterized by intense metabolic activity and the absence of cell division.Following the lag phase is the exponential or log phase, in which the population grows in a logarithmic fashion.This phase can be easily visualized by the turbidity of the culture medium.Finally, it is possible to see the stationary phase where the growth rate slows down as a result of nutrient depletion and accumulation of toxic products.It is in the stationary phase that several secondary metabolites (among them, violacein) are synthesized. 38

Fluorescence data
The fluorescence data set consisted of 18 excitation-emission matrices (EEMs) of ethanolic extracts of C. violaceum collected at different stages of violacein biosynthesis.The fluorescence landscapes were initially measured at 266 emission wavelengths (from 270 to 800 nm with 2 nm intervals) and 75 excitation wavelengths (from 250 to 620 nm with 5 nm intervals) providing a detailed map of the fluorescence properties of the sample.The dynamic behavior of fluorophores involved in the violacein biosynthesis is shown in Figure 6 in the form of contour plots of some preprocessed EEMs.
In Figure 6a, the first fluorophore shows a broad band of emission between 330 and 370 nm when excited from 270-290 nm.This compound seems to be consumed during the fermentation since the intensity of the emission band decreases with the reaction time.The second fluorophore shows two excitation maxima (270-285 and 295-320 nm) and one emission maximum between 400-450 nm.The third fluorophore shows three excitation maxima (315-320, 325-340 and 345-355 nm) and one emission maximum from 440-480 nm.These last two compounds are possible intermediates in the violacein biosynthesis since they are being produced with the progress of the fermentation.
As previously described in the Experimental section, with the initial instrumental settings it was not possible to detect the fluorescence signal from violacein.However, the violet color of the culture medium (Figure 5) indicates that this compound is being biosynthesized, a fact that was confirmed by UV-Vis absorption spectroscopy (data not shown).This apparent contradiction is probably caused by the dependence of sensitivity of fluorescence in both the fluorophore and the instrument.To assess this problem, the entire spectral range was divided into five smaller regions, defined in Table 1, and specific measurement conditions were used for each of them.This procedure produced 18 fluorescence landscapes for each region.
The dynamic behavior of fluorophores in the low energy regions (D, E and F regions) is shown in Figure 6b The fluorophore with maximum emission between 590-620 nm shows three excitation maxima (450-470, 525-535 and 550-570 nm).In this case, the absorption peak of lowest energy is due to fluorophore excitation to an accessible vibrational level of the first electronic excited state of the same multiplicity.The absorption peak of  concluded that the fourth component is not a chemical but a physical component.Therefore, the number of factors was chosen to be three for both regions.This observation reinforces that the determination of the number of factors by CORCONDIA in complex biological matrices is sometimes elusive and cannot be automated.
For regions C and E, the explained variance, LOF values and CORCONDIA values indicate that two is an appropriate number of factors to model the data.Although these same criteria point to three factor models for regions D and F, it was assumed that two is a more reasonable value.This choice was based on drastic variation of the CORCONDIA values for models with 3 or more factors when N-way Toolbox and PLS-Toolbox 39 were run several times (an indication of ill-posed models).
The concentration, emission and excitation profiles obtained by the PARAFAC models for each region, with the suitable number of components, are shown in Figure 7.
In all, six chemical species probably involved in the biosynthesis of violacein could be detected by this methodology.In Figure 7a, the first excitation profile has a narrow maximum centered at 280 nm and a sharp emission maximum at 342 nm.The respective concentration profile shows a maximum concentration at 6 h (lag phase) and then decreases with reaction time.The emission loadings are similar to the pure emission spectrum of tryptophan. 40his fact corroborates the mechanism presented in Figure 2, in which tryptophan is the precursor in the violacein biosynthesis.The second fluorophore has an emission peak at 423 nm when excited at 280 and 308 nm (see Figure 6a, sample-20 h).This compound is probably a tryptophan derivative since the excitation at 280 nm is due to the indole nucleus.The double absorption can be understood as electronic transitions to different excited states or as electronic transitions between two different vibrational levels in the same excited state.The concentration profile indicates that this fluorophore reaches its maximum concentration after approximately 30 h and then begins to be consumed.Unfortunately, it was not possible to identify this compound by comparison to spectra available in the literature.However, the concentration profile suggests that it is an intermediate of the violacein biosynthesis.The third fluorophore shows three excitation peaks at 315, 330 and 347 nm and one emission peak at 470 nm (see Figure 6a, sample-36 h).The concentration profile indicates that this fluorophore starts to be produced after approximately 18 h and reaches its maximum concentration at 32 h.
The emission profiles obtained by the resolution in the regions B and C are the same visualized in Figure 7a.The excitation profiles are also the same.However, as the excitation range is smaller, the excitation profiles are truncated.The juxtaposition of the excitation profiles obtained for the regions B and C recovers the excitation profile of the entire spectral range.
The fluorophore represented by the continuous red line in Figures 7b shows similar concentration profiles to that of tryptophan.This compound seems to have three excitation peaks at 460, 525 and 560 nm and one emission peak close to 605 nm (see Figure 6b, sample-8 h).
Asamizu et al. 41 reported the electronic absorption spectra of StaD, an enzyme involved in saturosporine biosynthesis that is a homolog to VioB and responsible for the coupling reaction between two molecules of indole-3-pyruvic acid and NH 4 + to yield CPA.It has an absorption maximum at 430 nm and smaller absorption bands at 530 and 565 nm and presents an excitation spectra similar to that of the fluorophore above described.This fact probably indicates that other indolocarbazoles are being produced during the cultivation.
The compound represented by the dashed red line in Figures 7b shows a broad excitation band at 400-470 nm and peak emission around 512 nm.The concentration profile indicates that this fluorophore is produced during violacein biosynthesis.Finally, in Figure 7b, the excitation profile represented by the dotted red line with narrow maximum centered at 575 nm and an emission maximum at 675 nm is similar to the spectrum of violacein. 3As expected, the concentration profile of this compound increases with reaction time.

Conclusion
The use of the multi-wavelength fluorescence spectroscopy and PARAFAC was introduced as a new approach for obtaining mechanistic information about the biosynthesis of fluorescent natural products.Violacein biosynthesis is used as an example in which six fluorescent compounds were detected and their concentration and spectral profiles were resolved by this methodology.Tryptophan (identified by comparison to emission and excitation spectra) is consumed during the bioprocess, corroborating the mechanism presented in Figure 2, in which this compound is the precursor molecule in violacein biosynthesis.Additionally, the final product of this pathway, the violacein molecule, also was identified by means of its concentration and spectral profiles.
Unfortunately, the identification of other fluorophores was not possible by spectra comparison due the lack of a database of fluorescent natural products.This critical step highlights the necessity of the improvement of databases to aid the type of analysis proposed herein, where a complex mechanism could be elucidated by simpler methodology than the traditional methods.Thus, it is believed that the dissemination of this methodology is the first step in its consolidation.Future efforts are being conducted to try identifying these compounds using techniques of separation and structural elucidation.
Another important conclusion is to show that the use of different instrumental settings is important to enable the visualization of fluorescence signals in the lower energy spectral region due the wavelength dependent sensitivity of the spectrofluorometer PMT.
Finally, this methodology has great potential to achieve a deeper insight into the biosynthesis of natural products.

Figure 3 .
Figure 3. (a) 2D contour plot of an EEM and spectral range of each region; (b) 3D surface and contour plots of an EEM in the spectral region of 550-620 nm for excitation and 600-800 nm for emission.

Figure 4 .
Figure 4. Three-way data array obtained by stacking EEMs for each sample and graphical representation of PARAFAC decomposition.
, in which new EEMs were created by juxtaposition of preprocessed D, E and F EEMs to give an overview of the fluorescence signals.The missing values were set to NaN.

Figure 6 .
Figure 6.Kinetic evolution of EEMs obtained during the violacein biosynthesis (a) entire spectral range and (b) low energy regions (D, E and F).

Figure 7 .
Figure 7. Estimates of emission and excitation spectra obtained by PARAFAC: (a) entire spectral range and (b) regions D, E and F juxtaposed.

Table 1 .
Measurement conditions for each spectral region

Table 2 .
Percentage of explained variance, CORCONDIA and LOF, calculated for trilinear models using the PARAFAC method with 1 to 6 components 2: explained variance; CORCONDIA: core consistency diagnostic; LOF: lack of fit.