13 C ss-NMR Singular value decomposition and fitting for sorghum proteins conformation elucidation

Carlos, SP, Brasil *tatianaribeiro@ufscar.br Obstract Kafirins, water-insoluble proteins from Sweet Sorghum BR501 grains, have been an alternative to prepare edible coatings for food due to their hydrophobic character. In this work, the secondary structures (SS) content of reduced (SSr) and unreduced (SSu) kafirins were determined by 13 C solid-state-NMR spectroscopy using areas of carbonyl peak. The SS elucidate by fitting the signal with the Lorentzian function shows 56% and 59% of α-helix and 40% and 12% of β-sheet structures for SSr and SSu, respectively. The SS also were elucidated by a Singular value decomposition- SVD method shows 55% and 49% of α-helix and 12% and 8% of β-sheet structure for SSr and SSu, respectively. Since SVD does not depend on the operator and has higher correlation coefficients for α-helix (0.96%) and β-sheet (0.91%), it is a reliable method to quantify the SS of insoluble proteins using


Introduction
The sweet sorghum has a sucrose-rich stem (like sugar cane). It can be used for ethanol production with advantages such as fast-growing and wide adaptability to different environments [1][2][3] . The grain is the by-product when sweet sorghum is used [4] . Although the grains can be used to feed livestock, it is preferable to find more technological applications that could increase the value of this by-product. Some food industries already use zeins, proteins extracted from the endosperm of corn kernels, to produce edible films and coatings. Kafirins are proteins extracted from sweet sorghum and can also be a good strategy for making these films and/or other technological applications [5][6][7] .
Kafirins are a group of proteins that respond for 68% to 73% of the total protein contents of the sorghum grain and between 77% and 82% of the endosperm. Kafirins are classified according to their solubility and kDa in dodecylsulfate polyacrylamide gel (SDS/PAGE). The electrophoresis analysis shows four kafirin bands known as α-, γ-, β-and δ-. The most abundant ones are the α-kafirins with values between 23 and 25 kDa, representing 66% to 84% of the total kafirins. The α-kafirins, in the same way as α-zeins, are insoluble in water and soluble in aqueous alcohol solutions of 70% ethanol (w/w). γ-kafirins are soluble in water after disulfide bonds reduction and present the monomer and dimer bands at 28 kDa and 49 kDa, respectively. The γ-kafirins represent 9% to 21% of the total kafirins. The β-kafirins represent 7% to 13% of the total kafirins, and there is some controversy about their kDa values. Some authors assigned the bands at 15, 17, and 18 kDa to these fractions, and others declare that it is only one band from 18 kDa to 19 kDa. β-kafirins are soluble in aqueous alcohol solutions after disulfide bonds reduction process [8] . The δ-kafirins possibly comprise less than 1% of the protein content of sorghum grain when in its mature stage [8,9] . Some authors studied the kafirins' conformation, vital information to understand their secondary structure. Gao et al. [10] suggest that the secondary structure of these proteins is quite relevant to obtaining a good film formation. Wu et al. [11] analyzed kafirins in 60% t-butanol solution by optical rotatory dispersion and circular dichroism (CD). The authors extracted kafirins from grain sorghum hybrids cultivars OK612, RS626, TE77, and Funk G-766, and they found 40% of α-helix content. Duodu et al. [12] analyzed the sorghum protein bodies (PB) by ATR-FTIR and concluded that the kafirins in the PB have 54%, 55%, 58%, and 59% of α-helix for condensed-tannin-free sorghum P851171, P850029, KAT 369, and NK 283 mutants respectively. Gao et al. [10] analyzed kafirins from a mixture of two tannin-free cultivars (PANNAR 202 and 606) extracted with different solvents and drying methods using ATR-FTIR. They observed that kafirins extracted with 60% t-butyl alcohol + 0.5% DTT and freeze-dried had FTIR peak intensity ratio of helix/ intermolecular β-sheet (1650 cm -1 /1620 cm -1 ) of 1.39. Using 70% ethanol, 0.5% sodium metabisulfite, and 0.35% sodium hydroxide, they noticed a 1.10 ratio, and then using 70% ethanol plus 0.5% sodium metabisulfite as the solvent, they observed a 1.00 ratio. However, during all preparations, the ratio was 0.90 when samples were heated and dried at 40 °C. They concluded that the best conditions for film formation occurred when there were more α-helix contents since the intermolecular β-sheet can induce protein aggregation.
Wang et al. [13] extracted kafirins from distillers dried grains with soluble (DDGS) using three different extraction procedures based on HCl/ethanol, acetic acid, and sodium hydroxide-ethanol. In all preparations, using the FTIR technique with samples prepared as KBr pellets, it was noticed that the α-helix is the predominant SS with a small portion of β-sheet.
Xiao et al. [14] extracted kafirins using 60% t-butanol and ultrasonication, and their SS were evaluated by ATR -FTIR of kafirin powder. The authors used high-resolution methods to calculate the amount of SS as second derivative and Gaussian fitting and found 49% of α-helix, 24% of β-sheet, and 27% of β-turns. They also analyzed the SS kafirins in solution by Circular Dichroism (CD). They found 68.4% of α-helix for kafirins dissolved in 85% ethanol: 57.9% in 60% t-butanol and 53.1% in 65% isopropanol. The authors concluded that these differences are due to different polarities of the solvents and that the lower polarity is related to the higher helical content. Dianda et al. [15] prepared kafirins from pre-heated sorghum flour (70°C) using 70% ethanol at 68°C added by sodium metabisulfite and acetic acid in a water bath (70 °C) as described by Olivera et al. [16] . By processing ATR-FTIR with Gaussian fitting of kafirin powder, Dianda et al. [15] found 43.88%, 23.53%, and 19.30% for α-helix, β-sheet, and β-turns, respectively. They also evaluated the SS kafirins in ethanol at 70%, 80%, 85% and 90% and at 50%, 60%, 70% and 80% t-butanol using CD. In ethanol solutions, α-helix proportion varied from 40.47 to 47.12% and β-sheet from 8.76 to 15.12%. In t-butanol solutions, α-helix proportion varied from 26.89 to 41.72% and β-sheet from 16.54 to 11.626%. Compared to Xiao et al. [14] , these authors concluded that the differences are due to the different calculation methods for the solvent polarities. Dianda et al. [15] also predicted a theoretical value of about 66% of α-helix based on amino acid sequences analysis of the kafirins.
High-resolution solid-state 13 C NMR studies using crosspolarization magic angle spinning (CPMAS) sequence have been developed to determinate insoluble proteins. It uses 13 C chemical shift of carbonyl (C=O) peaks to determine α-helix and β-sheet conformation, usually observed between 172 ppm and 176 ppm, respectively [17] . However, 13 C solid-state NMR shows problems as the broad linewidth leads to low resolution, overlapped peaks, and inaccurate determination of secondary structures. The fitting methods have been used for analyzing them, though it depends on operators such as the number of peaks, specific line shape, and other parameters that are set by the operator [17] . In this research, SS kafirins were quantified using a pattern recognition method based on singular value decomposition (SVD) applied in areas of carbonyl peaks by solid-state 13 C NMR spectra. The Lorentz fitting method was also used for comparison. The SVD method does not depend on analyst information, such as function fitting and peak numbers and positions, leading to 0.96% and 0.90% correlation coefficient for α-helix and β-sheet, respectively [17] .

Protein extraction
Kafirins extraction was adapted according to the literature [18] using the Sweet Sorghum grains (BR501-white) provided by Embrapa Maize and Sorghum, Minas Gerais, Brazil. Reduced protein fractions were extracted: Sorghum grains were ground and defatted with hexane in a Soxhlet apparatus for 24-hour. Defatted flour was mixed with NaCl 1,25 mol/L solution and maintained by agitation for 3 hours to solubilize the albumins and globulins. After vacuum filtration, a 100 mmol/L sodium bisulfite aqueous solution was added to the flour for 2 hours, and then the residue was added to 70% ethanol (w/w) for 24 hours under agitation (at room temperature). The solvent was evaporated, and the obtained kafirins were lyophilized. The unreduced fractions were extracted with this mentioned methodology without only the step of adding sodium bisulfite.

SDS/PAGE
The SDS/PAGE was a 15% polyacrylamide gel, and the gel was stained with Coomassie Blue dye. The standard molecular weight employed was Benchmark Protein Ladder Cat. Nº. 10747-012 from Invitrogen.

13C solid-state Nuclear Magnetic Resonance (13C ss-NMR)
The solid-state 13 C NMR spectra were obtained with a Bruker Advance III HD 400 MHz spectrometer equipped with a solid-state MAS probe, with two channels configured for 1 H and 13 C frequencies of 400 MHz and 100.5 MHz, respectively. 13 C cross-polarization magic angle spinning (CP/MAS) was the sequence used for the analysis. The operational conditions were: 90 1 H pulse length of 2.3 us, contact time of 2 ms, a spectral width of 50 kHz, and a recycle delay of 3 s and 4096 scans [17] . Spectra were filtered by an exponentially decaying function with 20 Hz of line broadening. Samples were packed in a zirconia rotor of 5 mm and rotated at the magic angle at 10 kHz. External hexamethylbenzene standard chemical shift (at 17.3 ppm) was used as reference.
The carbonyl peak area in the 13 C NMR spectra for reduced and unreduced kafirins has been used to calculate the SS with CP/MAS 13 C NMR pulse sequence [19,20] . The region from 180 to 160 ppm was obtained with a total of 859 data points from each ss-13 C NMR spectrum, and its second derivative was used to find the three individual peaks: at 172 ppm that were assigned to β-sheet, 174 ppm to unordered ones, and 176 ppm to the α-helix structure.
These three peaks were used in the Lorentzian multiple peak fitting, and the area corresponded to each SS proportion as already indicated. The Lorentzian multiple peaks fitting Adjusted R-Square was 0.994 and 0.992 for unreduced and reduced kafirins, respectively, and both fittings reached the chi-square tolerance of 1x10 -9 .
The calibration matrix to the SVD method was published elsewhere [17] . Equation 1 shows the SVD method correlation with each NMR protein spectrum (R) and protein secondary structure concentration (F) by a calibration matrix (X). The F and R matrices with a medium square error are ± 1.5 for a-helix and ±1.8 for β sheet structures prediction were used by Andrade et al. [17] . The F matrix (15x4) consists of 15 proteins with four secondary structures proteins proportions obtained by X-ray crystallographic data [21] . The R matrix comprises the 13 C NMR spectra (carbonyl area) of each protein in the F matrix.
The SVD method reduces the rank of the R matrix, making it consistent with the information, and calculates the generalized inverse as shown in Equation 2: It is possible to calculate the SS of a protein that was not determined (SS unknown protein) by its NMR spectrum (N) multiplied by the X matrix as shown in Equation 4:
Reduced kafirin fraction showed four bands (Figure 1, second well) : β-kafirin at 17 kDa and other three bands at 21, 25, and 26 kDa were assigned to α2-, α1-and γ-kafirins, respectively. This result is in good agreement with the description of Byaruhanga et al. [22] . Unreduced kafirins fraction (Figure 1, first well) showed a more intense band at 17 kDa attributed to β-kafirin and a weak one at 25-26 kDa to the α+γ-kafirins agglomerate. El Nour et al. [23] extracted kafirin with 60% t-butanol without reducing agent, and they also found the presence of the β-kafirin and weak bands for α2-, α1-and γ-kafirins. They concluded that in unreduced conditions, β-kafirin is extracted in the monomeric form. Figures 2A and 2B show the NMR spectra of unreduced and reduced kafirins fractions, respectively. The NMR spectra show typical protein signals: from 172 ppm to 176 ppm due to carbonyl peaks, from 140 ppm to 100 ppm due to amino acids with aromatic side chains; from 70 ppm to 45 ppm to α carbons, and from 45 ppm to 15 ppm to the aliphatic amino acid side chains [19,21,24] . The signals at 73 ppm were assigned to starch in unreduced kafirin (2A), which are stronger than in the reduced kafirins (2B). Figure 2 spectra also showed an intense peak at 130 ppm. This peak has been assigned to unsaturated fatty acids, obtained using the single-pulse technique (Figure 3) used to detect the presence of mobile molecules [25,26] . Figure 3 shows the unreduced kafirin spectrum by single-pulse sequence, similar for both reduced and unreduced kafirins. The carboxyl signal at 173 ppm was attributed to unsaturated fatty acids. The peak at 130 ppm was attributed to the double bond carbons. The intense peaks from 10 ppm to 40 ppm were assigned to the methyl and methylene carbons. Similar fatty acid contents have been noted in zeins extract [25,27] , indicating that kafirins may also be fatty acid-binding proteins. The carboxyl peak at 173 ppm can be overlapped with other carbonyl signals from the aminoacids. SVD method has the advantage over Lorentzian fitting because it does not need the assignment of peaks to calculate SS % proteins. The area of carbonyl peak was calculated by Equation 3 (section 2.3) using the 13 C NMR of the 15 proteins (R matrix) as well as the generalized inverse of F (F -1 ) calculated by the SVD method [17] .  fitting presents three signals, at 176 ppm, 174 ppm, and 172 ppm, attributed to the α-helix, unordered, and β-sheet structures, respectively [19,20] .

SS analysis using solid-state 13 C NMR spectroscopy
The results of SS calculated for kafirins in this method and with the Lorentzian fitting and SVD method are in table 1. Spectrum 4A (unreduced kafirins) peaks areas show 59% of α-helix, 12% of β-sheet, and 29% of unordered structures. The carbonyl signals of reduced kafirins ( Figure 4B) show 56% of α-helix, 40% of β-sheet, and 4% of unordered structures. As a result, both extracted kafirins offer a high content of α-helix, regardless of the reducing agent utilized. The high content of α-helix and β-sheet is confirmed through the signals of α-carbon. Signals at 56 ppm and 59 ppm are typical of the α-helix, and signals at 53 ppm are typical of the β-sheet [19,20] .
This fitting method depends on the following factors: the number of peaks and respective positions and the line shape function used in the fitting process, among other factors [28,29] .
The SS was also calculated by the SVD method. This method multiplied the carbonyl area of kafirins ss-13 C NMR spectra by a calibration matrix [17], as shown by Equation 3.
The SVD method also observed high helical content for reduced and unreduced kafirins, 55% and 49%, respectively, and 13% and 8% for β-sheet, respectively. The difference in the SS values may occur due to the different electrophoretic patterns, given that β-, α1-, α2-and γ-kafirin fractions were observed in the reduced kafirins, while in unreduced kafirins an intense signal for β-kafirin was observed, which may occur as a monomer [23] , and a weak signal in the range of α1-, α2-and γ-kafirin that was named as α+γ-agglomerate.
The difference in absolute values of SS kafirins used to calculate Lorentzian fitting must be attributed because this fitting method depends on the operator, necessary to choose the number of peaks and positions assigned to each carbonyl area. The SVD method does not need the process described above, which is dependent of the operator. The αand β-secondary structures content of reduced (SSr) and unreduced (SSu) kafirins has a high correlation coefficient for reduced and unreduced kafirins by the SVD method applied in CP/MAS 13 C NMR. Table 1 shows the result of SS % calculate to reduced and unreduced kafirins using Lorentzian fitting and SVD methods. It is noteworthy that the CP/MAS 13 C NMR is not a quantitative method. Its signal depends on cross-polarization (CP) efficiency and spin-lattice relaxation in the rotating frame [17,30]. Despite that, when the carbons have similar chemical environments, such as the Table 1. α-helix and β-sheet kafirins solid structures (SS %) elucidated by CP/MAS 13 C NMR spectroscopy. These kafirins were extracted with sodium bisulfite (reduced form) and without sodium bisulfite (unreduced form). Lorentzian fitting and SVD method were used for SS % calculation.  carbonyl proteins, these two parameters are alike, giving quantitative results. The high content of α-helix structure in kafirins from sweet sorghum grains agrees with the data published by Gao et al. [10] , Wu et al. [11] , and Duodu et al. [12] , who used another sorghum cultivar. According to these authors, the study of SS kafirins was quite relevant for obtaining good films to apply on foods once a higher proportion of β-sheet structures could induce protein agglomeration and reduce their solubility.

Conclusion
We applied the Lorentz fitting and a SVD method in carbonyl peaks area by CP/MAS 13 C NMR of unreduced and reduced kafirins spectra to verify which mathematical method is more reliable to calculate their %SS. Kafirins reduced fraction extract from sweet sorghum grains are composed of four protein fractions assigned to β-, α1-, α2-and γ-kafirins and the unreduced by β-kafirins in the monomeric form and an α+γ agglomerate. Higher proportions of α-helix structure and smaller β-sheet structure were identified in the reduced and unreduced kafirins, confirmed by other spectroscopy techniques. Thus, high-resolution 13 C solidstate NMR spectroscopy is promising to elucidate secondary structures of insoluble proteins. The main difference was in the absolute values for SS proportions since the SVD method does not depend on the operator, and it has high coefficient correlations for α-helix and β-sheet prediction [17] . Our research indicates that the SVD method is more reliable than the Lorentzian fitting to calculate the % SS of proteins by CP/MAS 13 C NMR spectroscopy.