Fitting Structure-Data Files (.SDF) Libraries to Progenesis QI Identification Searches

Sanches, Pedro H. G.; Oliveira, Danilo C. de; Reis, Ivan G. M. dos; Fernandes, Anna M. A. P.; Silva, Alex A. R.; Eberlin, Marcos N.; Carvalho, Patrícia O.; Duarte, Gustavo H. B.; Porcari, Andreia M.

doi:10.21577/0103-5053.20230016

Abstract

Progenesis QI (PQI) is a multiplatform bioinformatics tool that facilitates the identification workflow for metabolomics experiments. PQI uses fragmentation data provided by MassBank of North America (MoNA) libraries, among others, for metabolite annotation. However, PQI does not officially support MoNA libraries and other libraries based on structure-data files (.sdf). This paper describes the development and application of a software named MoNA to Progenesis QI Library Converter, allowing PQI and MoNA by correcting the fragmentation data of the library for Progenesis readability. We evaluated several public experimental datasets, including human plasma, plant extracts, cultured cells, bacteria, rat serum, and rat hippocampus. The results showed that it is mandatory to proceed with file conversion of each library to allow PQI to access fragmentation information from .msp (main spectra profile) files. This step is highly recommended to improve the identification level of the metabolites.

Keywords:
Progenesis QI; MoNA; metabolite identification; fragmentation; feature annotation

Introduction

Metabolite identification is the most challenging and important part of an untargeted metabolomic investigation.¹1 Johnson, C. H.; Gonzalez, F. J.; J. Cell. Physiol. 2012, 227, 2975. [Crossref]
Crossref... This step is critical for turning instrumental data into meaningful biological information²2 Creek, D. J.; Dunn, W. B.; Fiehn, O.; Griffin, J. L.; Hall, R. D.; Lei, Z.; Mistrik, R.; Neumann, S.; Schymanski, E. L.; Sumner, L. W.; Trengove, R.; Wolfender, J.-L.; Metabolomics 2014, 10, 350. [Crossref]
Crossref... and remains the main bottleneck in the field.¹1 Johnson, C. H.; Gonzalez, F. J.; J. Cell. Physiol. 2012, 227, 2975. [Crossref]
Crossref...

Fragmentation information from tandem mass spectrometry (MS/MS) experiments is crucial in metabolomics, improving the metabolomics standard initiative (MSI) identification level from merely putatively characterized compound classes to putatively annotated compounds, allowing more confident metabolite annotation.³3 Sumner, L. W.; Amberg, A.; Barrett, D.; Beale, M. H.; Beger, R.; Daykin, C. A.; Fan, T. W.-M.; Fiehn, O.; Goodacre, R.; Griffin, J. L.; Hankemeier, T.; Hardy, N.; Harnly, J.; Higashi, R.; Kopka, J.; Lane, A. N.; Lindon, J. C.; Marriott, P.; Nicholls, A. W.; Reily, M. D.; Thaden, J. J.; Viant, M. R.; Metabolomics 2007, 3, 211. [Crossref]
Crossref...

Since metabolite fragmentation data carries the structural signature of the molecule, experimentally acquired MS/MS data have been used to identify metabolites by using reliable software to match this data with known available metabolite databases.⁴4 Kwak, M.; Kang, K.; Wang, Y.; J. Comput. Inf. Syst. 2022, 62, 12. [Crossref]
Crossref...

5 Progenesis™ QI, version 2.4; Waters Corporation™© Nonlinear Dynamics, Newcastle, UK, 2018.-⁶6 Waters Corporation™, A Facile Database Search Engine for Metabolite Identification and Biomarker Discovery in Metabolomics, https://www.waters.com/nextgen/en/library/application-notes/2014/facile-database-search-engine-metabolite-identification-biomarker-discovery-metabolomics.html, accessed in December 2022.
https://www.waters.com/nextgen/en/librar...

Unlike the expected peptide fragmentation patterns, the large structural heterogeneity from MS/MS metabolomics experiments makes the metabolite fragmentation pattern challenging. Until this day, they are not well established and are mostly unknown.⁷7 Ji, H.; Xu, Y.; Lu, H.; Zhang, Z.; Anal. Chem. 2019, 91, 5629. [Crossref]
Crossref...

Progenesis™ QI (PQI, Waters Corporation™^© Nonlinear Dynamics)⁵5 Progenesis™ QI, version 2.4; Waters Corporation™© Nonlinear Dynamics, Newcastle, UK, 2018. is a data processing tool for high-resolution mass spectrometry (MS). It deals with full scan, data-dependent analysis (DDA), and data-independent analysis (DIA) through peak alignment, peak picking, and mining.

PQI is a commercial proprietary software not limited to instruments and data from Waters Corporation^TM. It accepts most of the regular file formats, including Waters UNIFI (v1.0.6744.42923) and raw (.raw; v1.0.6901.37225) data, SCIEX (.wiff; v1.0.6680.30256) and Thermo and Thermo with Fourier-transform ion cyclotron resonance (FT-ICR) data (.raw; v1.0.6680.30349) files. PQI also accepts open MS files format (.mzML and .mzXML; v1.0.6680.30233), making it compatible with instruments from any vendor.

Furthermore, it applies a search-based approach for experimental feature annotation by matching its physicochemical properties and spectral similarity with public/commercial spectral libraries.³3 Sumner, L. W.; Amberg, A.; Barrett, D.; Beale, M. H.; Beger, R.; Daykin, C. A.; Fan, T. W.-M.; Fiehn, O.; Goodacre, R.; Griffin, J. L.; Hankemeier, T.; Hardy, N.; Harnly, J.; Higashi, R.; Kopka, J.; Lane, A. N.; Lindon, J. C.; Marriott, P.; Nicholls, A. W.; Reily, M. D.; Thaden, J. J.; Viant, M. R.; Metabolomics 2007, 3, 211. [Crossref]
Crossref... ,⁵5 Progenesis™ QI, version 2.4; Waters Corporation™© Nonlinear Dynamics, Newcastle, UK, 2018.,⁶6 Waters Corporation™, A Facile Database Search Engine for Metabolite Identification and Biomarker Discovery in Metabolomics, https://www.waters.com/nextgen/en/library/application-notes/2014/facile-database-search-engine-metabolite-identification-biomarker-discovery-metabolomics.html, accessed in December 2022.
https://www.waters.com/nextgen/en/librar... Using this identification method, PQI compares experimental data with the data downloaded from the libraries and evaluates the identification quality using up to five similarity parameters: (i) mass similarity (in ppm, (parts per million)), (ii) isotope similarity (in ppm; 0-100%), (iii) retention time similarity (in minutes; 0-100%), (iv) collision cross-section (CCS) similarity (in percentage or Å²2 Creek, D. J.; Dunn, W. B.; Fiehn, O.; Griffin, J. L.; Hall, R. D.; Lei, Z.; Mistrik, R.; Neumann, S.; Schymanski, E. L.; Sumner, L. W.; Trengove, R.; Wolfender, J.-L.; Metabolomics 2014, 10, 350. [Crossref]
Crossref... ; 0-100%), and (v) fragmentation score (by cos(θ) similarity method;⁸8 Horai, H.; Arita, M.; Kanaya, S.; Nihei, Y.; Ikeda, T.; Suwa, K.; Ojima, Y.; Tanaka, K.; Tanaka, S.; Aoshima, K.; Oda, Y.; Kakazu, Y.; Kusano, M.; Tohge, T.; Matsuda, F.; Sawada, Y.; Hirai, M. Y.; Nakanishi, H.; Ikeda, K.; Akimoto, N.; Maoka, T.; Takahashi, H.; Ara, T.; Sakurai, N.; Suzuki, H.; Shibata, D.; Neumann, S.; Iida, T.; Tanaka, K.; Funatsu, K.; Matsuura, F.; Soga, T.; Taguchi, R.; Saito, K.; Nishioka, T.; J. Mass Spectrom. 2010, 45, 703. [Crossref]
Crossref... ,⁹9 How does Database Fragmentation Scoring Work?, https://www.nonlinear.com/progenesis/qi/v2.4/faq/database-fragmentation-algorithm.aspx, accessed in December 2022.
https://www.nonlinear.com/progenesis/qi/... 0-100%).¹⁰10 How are the Scores Calculated for Possible Compound identifications?, https://www.nonlinear.com/progenesis/qi/v2.4/faq/identifications-scoring-algorithm.aspx, accessed in December 2022.
https://www.nonlinear.com/progenesis/qi/...

These parameters consider methodological and instrumental information. If a given parameter is unavailable or disabled, or the external library does not include it, this given parameter will assume a value of 0% and will not be considered for matching. Each parameter represents 20% of the final score calculation. For example, if only mass similarity, isotope similarity, and fragmentation score are used, the maximum achievable score will be 60%. Yet, this approach relies on libraries structure-data (.sdf) and the main spectra profile (.msp) file formats, used to annotate the mass of precursor and adduct, and check fragmentation patterns, respectively.

The MassBank of North America (MoNA) plataform¹¹11 Fiehn Laboratory, MassBank of North America (MoNA), https://mona.fiehnlab.ucdavis.edu, accessed in December 2022.
https://mona.fiehnlab.ucdavis.edu... is a well-known curated, centralized, and collaborative public source of experimental and in silico fragmentation spectra, which associates compounds in both .sdf and .msp formats. With that, it is possible to find public records for compounds from Human Metabolome Database (HMDB),¹²12 Wishart, D. S.; Feunang, Y. D.; Marcu, A.; Guo, A. C.; Liang, K.; Vázquez-Fresno, R.; Sajed, T.; Johnson, D.; Li, C.; Karu, N.; Sayeeda, Z.; Lo, E.; Assempour, N.; Berjanskii, M.; Singhal, S.; Arndt, D.; Liang, Y.; Badran, H.; Grant, J.; Serra-Cayuela, A.; Liu, Y.; Mandal, R.; Neveu, V.; Pon, A.; Knox, C.; Wilson, M.; Manach, C.; Scalbert, A.; Nucleic Acids Res. 2018, 46, D608. [Crossref]
Crossref... LipidBlast,¹³13 Kind, T.; Liu, K.-H.; Lee, D. Y.; DeFelice, B.; Meissen, J. K.; Fiehn, O.; Nat. Methods 2013, 10, 755. [Crossref]
Crossref... Global Natural Social Molecular Networking (GNPS),¹⁴14 Wang, M.; Carver, J. J.; Phelan, V. V.; Sanchez, L. M.; Garg, N.; Peng, Y.; Nguyen, D. D.; Watrous, J.; Kapono, C. A.; Luzzatto-Knaan, T.; Porto, C.; Bouslimani, A.; Melnik, A. V.; Meehan, M. J.; Liu, W.-T.; Crüsemann, M.; Boudreau, P. D.; Esquenazi, E.; Sandoval-Calderón, M.; Kersten, R. D.; Pace, L. A.; Quinn, R. A.; Duncan, K. R.; Hsu, C.-C.; Floros, D. J.; Gavilan, R. G.; Kleigrewe, K.; Northen, T.; Dutton, R. J.; Parrot, D.; Carlson, E. E.; Aigle, B.; Michelsen, C. F.; Jelsbak, L.; Sohlenkamp, C.; Pevzner, P.; Edlund, A.; McLean, J.; Piel, J.; Murphy, B. T.; Gerwick, L.; Liaw, C.-C.; Yang, Y.-L.; Humpf, H.-U.; Maansson, M.; Keyzers, R. A.; Sims, A. C.; Johnson, A. R.; Sidebottom, A. M.; Sedio, B. E.; Klitgaard, A.; Larson, C. B.; Boya, P. C. A.; Torres-Mendoza, D.; Gonzalez, D. J.; Silva, D. B.; Marques, L. M.; Demarque, D. P.; Pociute, E.; O’Neill, E. C.; Briand, E.; Helfrich, E. J. N.; Granatosky, E. A.; Glukhov, E.; Ryffel, F.; Houson, H.; Mohimani, H.; Kharbush, J. J.; Zeng, Y.; Vorholt, J. A.; Kurita, K. L.; Charusanti, P.; McPhail, K. L.; Nielsen, K. F.; Vuong, L.; Elfeki, M.; Traxler, M. F.; Engene, N.; Koyama, N.; Vining, O. B.; Baric, R.; Silva, R. R.; Mascuch, S. J.; Tomasi, S.; Jenkins, S.; Macherla, V.; Hoffman, T.; Agarwal, V.; Williams, P. G.; Dai, J.; Neupane, R.; Gurr, J.; Rodríguez, A. M. C.; Lamsa, A.; Zhang, C.; Dorrestein, K.; Duggan, B. M.; Almaliti, J.; Allard, P.-M.; Phapale, P.; Nothias, L.-F.; Alexandrov, T.; Litaudon, M.; Wolfender, J.-L.; Kyle, J. E.; Metz, T. O.; Peryea, T.; Nguyen, D.-T.; VanLeer, D.; Shinn, P.; Jadhav, A.; Müller, R.; Waters, K. M.; Shi, W.; Liu, X.; Zhang, L.; Knight, R.; Jensen, P. R.; Palsson, B. Ø.; Pogliano, K.; Linington, R. G.; Gutiérrez, M.; Lopes, N. P.; Gerwick, W. H.; Moore, B. S.; Dorrestein, P. C.; Bandeira, N.; Nat. Biotechnol. 2016, 34, 828. [Crossref]
Crossref... and others. The MoNA users can also submit their novel spectra for broad sharing, and the corresponding curated spectra can eventually be downloaded as well.¹⁵15 Kind, T.; Tsugawa, H.; Cajka, T.; Ma, Y.; Lai, Z.; Mehta, S. S.; Wohlgemuth, G.; Barupal, D. K.; Showalter, M. R.; Arita, M.; Fiehn, O.; Mass Spectrom. Rev. 2018, 37, 513. [Crossref]
Crossref...

Metabolite annotation processed by PQI cannot, by default, access .msp files downloaded from the MoNA database. However, an incompatibility between PQI and external .SDF libraries, such as MoNA, was noticed, implying the loss of fragmentation data. Therefore, this work aimed to develop and apply a computational tool named SDF to Progenesis QI Library Converter to enable the correction of these libraries and their compatibilization with PQI annotation searches. The application and code are publicly available on GitHub repository.¹⁶16 SDF2PQI (SDF to PQI Library Converter), https://github.com/pedrohgodoys/sdf_to_pqi, accessed in December 2022.
https://github.com/pedrohgodoys/sdf_to_p...

Experimental

Datasets assessed

The list of studies assessed in this work refers to six liquid chromatographymass spectrometry (LC MS) metabolomics experiments using Data Dependent Acquisition (DDA) and Data Independent Acquisition (DIA) datasets, comprising human plasma, plant extracts, rat hippocampus and serum, chicken cells, and bacteria. All studies have already been published in peer-reviewed journals and have their .raw data, study information, and the list of identified metabolites made publicly available in the MetaboLights data repository.¹⁷17 Haug, K.; Cochrane, K.; Nainala, V. C.; Williams, M.; Chang, J.; Jayaseelan, K. V.; O’Donovan, C.; Nucleic Acids Res. 2020, 48, D440. [Crossref]
Crossref... The chosen datasets have the following identifiers: (i) MTBLS1584,¹⁸18 Fernandes, A. M. A. P.; Messias, M. C. F.; Duarte, G. H. B.; de Santis, G. K. D.; Mecatti, G. C.; Porcari, A. M.; Murgu, M.; Simionato, A. V. C.; Rocha, T.; Martinez, C. A. R.; Carvalho, P. O.; Metabolites 2020, 10, 262. [Crossref]
Crossref... (ii) MTBLS1783,¹⁹19 Dávila-Lara, A.; Rodríguez-López, C. E.; O’Connor, S. E.; Mithöfer, A.; Int. J. Mol. Sci. 2020, 21, 4376. [Crossref]
Crossref... (iii) MTBLS1115,²⁰20 Liu, P.; Yin, Y.; Gong, Y.; Qiu, X.; Sun, Y.; Tan, L.; Song, C.; Liu, W.; Liao, Y.; Meng, C.; Ding, C.; Viruses 2019, 11, 962. [Crossref]
Crossref... (iv) MTBLS496,²¹21 Fu, Q.; Liu, D.; Wang, Y.; Li, X.; Wang, L.; Yu, F.; Shen, J.; Xia, X.; J. Chromatog. B 2018, 1079, 62. [Crossref]
Crossref... and (v) MTBLS952.²²22 Wang, C.; Lin, H.; Yang, N.; Wang, H.; Zhao, Y.; Li, P.; Liu, J.; Wang, F.; Molecules 2019, 24, 1712. [Crossref]
Crossref...

Data processing, tool development, and application

The SDF to Progenesis QI Library Converter (SDF2PQI) was developed as a console application in CODE::Blocks 13.12 (open source) integrated development environment and C programming language using Minimalist GNU for Windows (MinGW) implementation of GNU Compiler Collection (GCC), and it is publicly available on GitHub.¹⁶16 SDF2PQI (SDF to PQI Library Converter), https://github.com/pedrohgodoys/sdf_to_pqi, accessed in December 2022.
https://github.com/pedrohgodoys/sdf_to_p...

LC-MS raw data were processed using the Progenesis MetaScope search engine. For molecular feature annotation, the following libraries, were used: HMDB, LipidBlast, Fatty Acid ester of Hydroxyl Fatty Acid (FAHFA), Oxidized Phospholipids, Vaniya/Fiehn Natural Products Library, Plant Specialized Metabolome Annotation (RIKEN PlaSMA), ReSpect Bruker Sumner, MetaboBASE Plant Library (MetaboBASE), Lipid Maps and GNPS.

For the PQI Metascope identification process, quality control (QC) samples were chosen when available with a tolerance of 15 ppm for both precursor and fragments; otherwise, all the samples of the study were used, with a tolerance of 100 ppm for precursor and fragment.

All the computational processing was carried out on a processing station equipped with Intel^® Core™ i9-9900K CPU@3.60 GHz, 64 GB of RAM, and Windows 10 Enterprise 64-bit operational system.

Results and Discussion

PQI and MoNA have a similar data structure in .sdf files separated in different fields, as presented in Figure 1. Each field plays a different role in the annotation processing for every compound. Fields 1 and 2 indicate the name of the molecule and the information about the software/instrument used to generate the record. Field 3, called the “count line”, describes the number of atoms and bonds for a given compound. The following field, number 4, is the “atoms block” and provides information about the coordinates of the atom on the x, y, and z-axis and the atom symbol to be used (e.g., C for the carbon atom). It is also possible to include information about the charge of the molecule in the same field.

Figure 1
Templates from Progenesis QI (a) and MoNA (b) platforms for the compound records in .sdf files. (c) The seventh field, named “data field”, highlighted in red, was zoomed in to indicate the different number of “space characters” causing the observed failure in the matching processing.

Field 5, the “bonds block”, informs the bonds among atoms, designating the position of the atom and bond type. Field 6, called “terminator” is included to indicate the end of the given compound record. The last, field 7, is called the “additional data field”, and similarly to an XML file, it contains a header but, in this case, it must begin with “> <ID_Info>”, followed by an identification code, showed as “00001” for PQI and as “MMS553002” for MoNA (Figure 1c).

The comparison of Progenesis and MoNA data structures (Figures 1a and 1b) reveals that the record of the .sdf file from MoNA follows the requirements of Progenesis QI for fields 1 to 6. For the seventh field, highlighted in red, the number of “space characters” differs from Figure 1a (n = 1 space characters) and 1b (n = 2 space characters), which is one of the causes of the incompatibility.

According to the documentation of Dassault Systèmes^® on .sdf files, this extra character should not exist, and the “M END” must be followed by a blank line.²³23 BIOVIA CTFile Formats - BIOVIA Databases 2020; BIOVIA, Dassault Systèmes Software: Vélizy-Villacoublay, France, 2020. [Link] accessed in December 2022.
Link...

Through the analysis of an in-house MS/MS library obtained from one hypothetical molecular feature (i.e., extracted ion chromatogram peak), the expected template for .msp fragmentation files was unraveled. We observed that the fields “Name”, “Precursor_type”, and “Formula” were not considered in the identification process and, therefore, left blank (Figure 2).

Figure 2
Lipid blast single compound template for .msp files obtained from (a) Progenesis QI and (b) MoNA. The key issue for Progenesis QI annotation processing to match with MoNA library is highlighted in red.

To enable PQI to correctly verify the correspondence between the experimental MS/MS and the external library, the .msp’s “DATABASE_ID” field (Figure 2a) must match with the seventh field in the .sdf file, namely, “> <ID>” (Figure 1a, field 7). If there is no match, the field “DATABASE_ID” is displayed as “DB#” in the .msp files (Figure 2b) and might be ignored by PQI in the annotation searches.

The mismatched fields force PQI to skip them, which causes the error in the matching process between the experimental MS/MS spectra, as shown in Figure 3.

Figure 3
Progenesis fail message.

After conceiving the PQI requirements for the downloaded .sdf and .msp files and identifying the data format and patterns, we developed a console application that iterates each line while searching for specific pre defined patterns, replacing it to match the required SDF syntax, allowing files from the MoNA library to be correctly used by PQI.

To exemplify the utility of our tool, we selected six datasets available at the MetaboLights platform¹⁷17 Haug, K.; Cochrane, K.; Nainala, V. C.; Williams, M.; Chang, J.; Jayaseelan, K. V.; O’Donovan, C.; Nucleic Acids Res. 2020, 48, D440. [Crossref]
Crossref... containing public MS data. These data files were submitted to the identification of PQI process using both unconverted and converted libraries. Table 1 displays the overall results for positive and negative ion modes using different libraries for different datasets evaluating the results in terms of the number of MS/MS library-matched molecular features. Figure 4 shows the annotation results of selected molecular features for different datasets before and after library correction.

Thumbnail

Table 1
The total number of MS/MS library-matched molecular features for positive and negative ion modes using raw and converted libraries for different datasets

Figure 4
Fragmentation mass spectra of selected library-matched molecular features from different datasets. The left panels refer to identification processing results using raw library files, and the right panels refer to processing using converted library files. Library-matched fragments are highlighted in red.

Figure 4 reveals the conversion of the library, enabling the correspondence between the experimental and external library spectra (mirror plot). Moreover, the identification quality parameters (namely Score and Fragmentation Score, respectively) increased, indicating an improvement in identification quality.

The processing time varies on the chosen library. To verify the average processing time, we used the MassBank library as an example, one of the most comprehensive ones available on the MoNA repository, comprising about 72,439 mass spectra at the time of the experiment (2022 10-21). To estimate time consumption over the file size, we used a downloaded .zip file of ca. 29.5 MB, with an unpacked .msp file of 197.2 MB, containing around 4,287,601 lines (depending on the number of indexed molecules). This file was fully converted with our tool to a compatible one in about 2:30 min, and the following identification process took 1:10 min, resulting in 591 matches for 4,169 searched features.

Conclusions

This work described the successful development and application of a tool that corrects SDF library formats for PQI annotation searches. This tool, as an additional identification resource, enabled a significant increase in the number of MS/MS library-matched molecular features. Nowadays, identifying unknown molecules based on MS/MS library search remains a burden to overcome.

Despite all the improvements in instrumentation, the number of identified compounds remains limited and is highly dependent on instrumental settings (i.e., duty cycle, MS/MS acquisition speed, precursor ion isolation width, accumulation time per single MS/MS spectrum, intensity threshold, collision energy, activation mode, instrumental design of tandem mass spectrometer), and should be optimized according to each experimental requirements. Moreover, it heavily depends on the availability of public mass spectral data and authentic analytical standards.¹⁵15 Kind, T.; Tsugawa, H.; Cajka, T.; Ma, Y.; Lai, Z.; Mehta, S. S.; Wohlgemuth, G.; Barupal, D. K.; Showalter, M. R.; Arita, M.; Fiehn, O.; Mass Spectrom. Rev. 2018, 37, 513. [Crossref]
Crossref... Thus, the presented tool is not intended to overcome this barrier.

The strongest feature of the tool we present here is to offer a simple solution for the users of Progenesis QI to access any library contained in MoNA, as well as other .sdf based libraries, such as the ones from the HMDB and Lipid Maps platforms.²⁴24 Ni, Z.; Wölk, M.; Jukes, G.; Mendivelso Espinosa, K.; Ahrends, R.; Aimo, L.; Alvarez-Jarreta, J.; Andrews, S.; Andrews, R.; Bridge, A.; Clair, G. C.; Conroy, M. J.; Fahy, E.; Gaud, C.; Goracci, L.; Hartler, J.; Hoffmann, N.; Kopczyinki, D.; Korf, A.; Lopez-Clavijo, A. F.; Malik, A.; Ackerman, J. M.; Molenaar, M. R.; O’Donovan, C.; Pluskal, T.; Shevchenko, A.; Slenter, D.; Siuzdak, G.; Kutmon, M.; Tsugawa, H.; Willighagen, E. L.; Xia, J.; O’Donnell, V. B.; Fedorova, M.; Nat. Methods 2022. [Crossref]
Crossref... With this tool, any PQI user will have all MoNA libraries available to increase the quality and quantity of feature annotation. Therefore, the novelty relies not on computational issues but on its application, and this is precisely where the scientific merit of our work is.

Acknowledgments

This study was supported by the Fundação de Amparo à Pesquisa do Estado de São Paulo - FAPESP, via the processes No. 2018/13317-6 granted to P.O.C. and No. 2019/04314-6 to A.M.P. Also, by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior-CAPES, via the process No. 88887.639447/2021-00 granted to P.H.G.S., and the Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq via the process No. 870359/1997-5 granted to G.H.B.D.

References

¹
Johnson, C. H.; Gonzalez, F. J.; J. Cell. Physiol. 2012, 227, 2975. [Crossref]
» Crossref
²
Creek, D. J.; Dunn, W. B.; Fiehn, O.; Griffin, J. L.; Hall, R. D.; Lei, Z.; Mistrik, R.; Neumann, S.; Schymanski, E. L.; Sumner, L. W.; Trengove, R.; Wolfender, J.-L.; Metabolomics 2014, 10, 350. [Crossref]
» Crossref
³
Sumner, L. W.; Amberg, A.; Barrett, D.; Beale, M. H.; Beger, R.; Daykin, C. A.; Fan, T. W.-M.; Fiehn, O.; Goodacre, R.; Griffin, J. L.; Hankemeier, T.; Hardy, N.; Harnly, J.; Higashi, R.; Kopka, J.; Lane, A. N.; Lindon, J. C.; Marriott, P.; Nicholls, A. W.; Reily, M. D.; Thaden, J. J.; Viant, M. R.; Metabolomics 2007, 3, 211. [Crossref]
» Crossref
⁴
Kwak, M.; Kang, K.; Wang, Y.; J. Comput. Inf. Syst. 2022, 62, 12. [Crossref]
» Crossref
⁵
Progenesis™ QI, version 2.4; Waters Corporation™^© Nonlinear Dynamics, Newcastle, UK, 2018.
⁶
Waters Corporation™, A Facile Database Search Engine for Metabolite Identification and Biomarker Discovery in Metabolomics, https://www.waters.com/nextgen/en/library/application-notes/2014/facile-database-search-engine-metabolite-identification-biomarker-discovery-metabolomics.html, accessed in December 2022.
» https://www.waters.com/nextgen/en/library/application-notes/2014/facile-database-search-engine-metabolite-identification-biomarker-discovery-metabolomics.html
⁷
Ji, H.; Xu, Y.; Lu, H.; Zhang, Z.; Anal. Chem. 2019, 91, 5629. [Crossref]
» Crossref
⁸
Horai, H.; Arita, M.; Kanaya, S.; Nihei, Y.; Ikeda, T.; Suwa, K.; Ojima, Y.; Tanaka, K.; Tanaka, S.; Aoshima, K.; Oda, Y.; Kakazu, Y.; Kusano, M.; Tohge, T.; Matsuda, F.; Sawada, Y.; Hirai, M. Y.; Nakanishi, H.; Ikeda, K.; Akimoto, N.; Maoka, T.; Takahashi, H.; Ara, T.; Sakurai, N.; Suzuki, H.; Shibata, D.; Neumann, S.; Iida, T.; Tanaka, K.; Funatsu, K.; Matsuura, F.; Soga, T.; Taguchi, R.; Saito, K.; Nishioka, T.; J. Mass Spectrom. 2010, 45, 703. [Crossref]
» Crossref
⁹
How does Database Fragmentation Scoring Work?, https://www.nonlinear.com/progenesis/qi/v2.4/faq/database-fragmentation-algorithm.aspx, accessed in December 2022.
» https://www.nonlinear.com/progenesis/qi/v2.4/faq/database-fragmentation-algorithm.aspx
¹⁰
How are the Scores Calculated for Possible Compound identifications?, https://www.nonlinear.com/progenesis/qi/v2.4/faq/identifications-scoring-algorithm.aspx, accessed in December 2022.
» https://www.nonlinear.com/progenesis/qi/v2.4/faq/identifications-scoring-algorithm.aspx
¹¹
Fiehn Laboratory, MassBank of North America (MoNA), https://mona.fiehnlab.ucdavis.edu, accessed in December 2022.
» https://mona.fiehnlab.ucdavis.edu
¹²
Wishart, D. S.; Feunang, Y. D.; Marcu, A.; Guo, A. C.; Liang, K.; Vázquez-Fresno, R.; Sajed, T.; Johnson, D.; Li, C.; Karu, N.; Sayeeda, Z.; Lo, E.; Assempour, N.; Berjanskii, M.; Singhal, S.; Arndt, D.; Liang, Y.; Badran, H.; Grant, J.; Serra-Cayuela, A.; Liu, Y.; Mandal, R.; Neveu, V.; Pon, A.; Knox, C.; Wilson, M.; Manach, C.; Scalbert, A.; Nucleic Acids Res. 2018, 46, D608. [Crossref]
» Crossref
¹³
Kind, T.; Liu, K.-H.; Lee, D. Y.; DeFelice, B.; Meissen, J. K.; Fiehn, O.; Nat. Methods 2013, 10, 755. [Crossref]
» Crossref
¹⁴
Wang, M.; Carver, J. J.; Phelan, V. V.; Sanchez, L. M.; Garg, N.; Peng, Y.; Nguyen, D. D.; Watrous, J.; Kapono, C. A.; Luzzatto-Knaan, T.; Porto, C.; Bouslimani, A.; Melnik, A. V.; Meehan, M. J.; Liu, W.-T.; Crüsemann, M.; Boudreau, P. D.; Esquenazi, E.; Sandoval-Calderón, M.; Kersten, R. D.; Pace, L. A.; Quinn, R. A.; Duncan, K. R.; Hsu, C.-C.; Floros, D. J.; Gavilan, R. G.; Kleigrewe, K.; Northen, T.; Dutton, R. J.; Parrot, D.; Carlson, E. E.; Aigle, B.; Michelsen, C. F.; Jelsbak, L.; Sohlenkamp, C.; Pevzner, P.; Edlund, A.; McLean, J.; Piel, J.; Murphy, B. T.; Gerwick, L.; Liaw, C.-C.; Yang, Y.-L.; Humpf, H.-U.; Maansson, M.; Keyzers, R. A.; Sims, A. C.; Johnson, A. R.; Sidebottom, A. M.; Sedio, B. E.; Klitgaard, A.; Larson, C. B.; Boya, P. C. A.; Torres-Mendoza, D.; Gonzalez, D. J.; Silva, D. B.; Marques, L. M.; Demarque, D. P.; Pociute, E.; O’Neill, E. C.; Briand, E.; Helfrich, E. J. N.; Granatosky, E. A.; Glukhov, E.; Ryffel, F.; Houson, H.; Mohimani, H.; Kharbush, J. J.; Zeng, Y.; Vorholt, J. A.; Kurita, K. L.; Charusanti, P.; McPhail, K. L.; Nielsen, K. F.; Vuong, L.; Elfeki, M.; Traxler, M. F.; Engene, N.; Koyama, N.; Vining, O. B.; Baric, R.; Silva, R. R.; Mascuch, S. J.; Tomasi, S.; Jenkins, S.; Macherla, V.; Hoffman, T.; Agarwal, V.; Williams, P. G.; Dai, J.; Neupane, R.; Gurr, J.; Rodríguez, A. M. C.; Lamsa, A.; Zhang, C.; Dorrestein, K.; Duggan, B. M.; Almaliti, J.; Allard, P.-M.; Phapale, P.; Nothias, L.-F.; Alexandrov, T.; Litaudon, M.; Wolfender, J.-L.; Kyle, J. E.; Metz, T. O.; Peryea, T.; Nguyen, D.-T.; VanLeer, D.; Shinn, P.; Jadhav, A.; Müller, R.; Waters, K. M.; Shi, W.; Liu, X.; Zhang, L.; Knight, R.; Jensen, P. R.; Palsson, B. Ø.; Pogliano, K.; Linington, R. G.; Gutiérrez, M.; Lopes, N. P.; Gerwick, W. H.; Moore, B. S.; Dorrestein, P. C.; Bandeira, N.; Nat. Biotechnol. 2016, 34, 828. [Crossref]
» Crossref
¹⁵
Kind, T.; Tsugawa, H.; Cajka, T.; Ma, Y.; Lai, Z.; Mehta, S. S.; Wohlgemuth, G.; Barupal, D. K.; Showalter, M. R.; Arita, M.; Fiehn, O.; Mass Spectrom. Rev. 2018, 37, 513. [Crossref]
» Crossref
¹⁶
SDF2PQI (SDF to PQI Library Converter), https://github.com/pedrohgodoys/sdf_to_pqi, accessed in December 2022.
» https://github.com/pedrohgodoys/sdf_to_pqi
¹⁷
Haug, K.; Cochrane, K.; Nainala, V. C.; Williams, M.; Chang, J.; Jayaseelan, K. V.; O’Donovan, C.; Nucleic Acids Res. 2020, 48, D440. [Crossref]
» Crossref
¹⁸
Fernandes, A. M. A. P.; Messias, M. C. F.; Duarte, G. H. B.; de Santis, G. K. D.; Mecatti, G. C.; Porcari, A. M.; Murgu, M.; Simionato, A. V. C.; Rocha, T.; Martinez, C. A. R.; Carvalho, P. O.; Metabolites 2020, 10, 262. [Crossref]
» Crossref
¹⁹
Dávila-Lara, A.; Rodríguez-López, C. E.; O’Connor, S. E.; Mithöfer, A.; Int. J. Mol. Sci. 2020, 21, 4376. [Crossref]
» Crossref
²⁰
Liu, P.; Yin, Y.; Gong, Y.; Qiu, X.; Sun, Y.; Tan, L.; Song, C.; Liu, W.; Liao, Y.; Meng, C.; Ding, C.; Viruses 2019, 11, 962. [Crossref]
» Crossref
²¹
Fu, Q.; Liu, D.; Wang, Y.; Li, X.; Wang, L.; Yu, F.; Shen, J.; Xia, X.; J. Chromatog. B 2018, 1079, 62. [Crossref]
» Crossref
²²
Wang, C.; Lin, H.; Yang, N.; Wang, H.; Zhao, Y.; Li, P.; Liu, J.; Wang, F.; Molecules 2019, 24, 1712. [Crossref]
» Crossref
²³
BIOVIA CTFile Formats - BIOVIA Databases 2020; BIOVIA, Dassault Systèmes Software: Vélizy-Villacoublay, France, 2020. [Link] accessed in December 2022.
» Link
²⁴
Ni, Z.; Wölk, M.; Jukes, G.; Mendivelso Espinosa, K.; Ahrends, R.; Aimo, L.; Alvarez-Jarreta, J.; Andrews, S.; Andrews, R.; Bridge, A.; Clair, G. C.; Conroy, M. J.; Fahy, E.; Gaud, C.; Goracci, L.; Hartler, J.; Hoffmann, N.; Kopczyinki, D.; Korf, A.; Lopez-Clavijo, A. F.; Malik, A.; Ackerman, J. M.; Molenaar, M. R.; O’Donovan, C.; Pluskal, T.; Shevchenko, A.; Slenter, D.; Siuzdak, G.; Kutmon, M.; Tsugawa, H.; Willighagen, E. L.; Xia, J.; O’Donnell, V. B.; Fedorova, M.; Nat. Methods 2022 [Crossref]
» Crossref

Edited by

Editor handled this article: Andrea R. Chaves (Associate)

Publication Dates

Publication in this collection
23 June 2023
Date of issue
July 2023

History

Received
26 Sept 2022
Accepted
03 Feb 2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

[1] ¹
Johnson, C. H.; Gonzalez, F. J.; J. Cell. Physiol. 2012, 227, 2975. [Crossref]
» Crossref

[2] ²
Creek, D. J.; Dunn, W. B.; Fiehn, O.; Griffin, J. L.; Hall, R. D.; Lei, Z.; Mistrik, R.; Neumann, S.; Schymanski, E. L.; Sumner, L. W.; Trengove, R.; Wolfender, J.-L.; Metabolomics 2014, 10, 350. [Crossref]
» Crossref

[3] ³
Sumner, L. W.; Amberg, A.; Barrett, D.; Beale, M. H.; Beger, R.; Daykin, C. A.; Fan, T. W.-M.; Fiehn, O.; Goodacre, R.; Griffin, J. L.; Hankemeier, T.; Hardy, N.; Harnly, J.; Higashi, R.; Kopka, J.; Lane, A. N.; Lindon, J. C.; Marriott, P.; Nicholls, A. W.; Reily, M. D.; Thaden, J. J.; Viant, M. R.; Metabolomics 2007, 3, 211. [Crossref]
» Crossref

[4] ⁴
Kwak, M.; Kang, K.; Wang, Y.; J. Comput. Inf. Syst. 2022, 62, 12. [Crossref]
» Crossref

[5] ⁵
Progenesis™ QI, version 2.4; Waters Corporation™^© Nonlinear Dynamics, Newcastle, UK, 2018.

[6] ⁶
Waters Corporation™, A Facile Database Search Engine for Metabolite Identification and Biomarker Discovery in Metabolomics, https://www.waters.com/nextgen/en/library/application-notes/2014/facile-database-search-engine-metabolite-identification-biomarker-discovery-metabolomics.html, accessed in December 2022.
» https://www.waters.com/nextgen/en/library/application-notes/2014/facile-database-search-engine-metabolite-identification-biomarker-discovery-metabolomics.html

[7] ⁷
Ji, H.; Xu, Y.; Lu, H.; Zhang, Z.; Anal. Chem. 2019, 91, 5629. [Crossref]
» Crossref

[8] ⁸
Horai, H.; Arita, M.; Kanaya, S.; Nihei, Y.; Ikeda, T.; Suwa, K.; Ojima, Y.; Tanaka, K.; Tanaka, S.; Aoshima, K.; Oda, Y.; Kakazu, Y.; Kusano, M.; Tohge, T.; Matsuda, F.; Sawada, Y.; Hirai, M. Y.; Nakanishi, H.; Ikeda, K.; Akimoto, N.; Maoka, T.; Takahashi, H.; Ara, T.; Sakurai, N.; Suzuki, H.; Shibata, D.; Neumann, S.; Iida, T.; Tanaka, K.; Funatsu, K.; Matsuura, F.; Soga, T.; Taguchi, R.; Saito, K.; Nishioka, T.; J. Mass Spectrom. 2010, 45, 703. [Crossref]
» Crossref

[9] ⁹
How does Database Fragmentation Scoring Work?, https://www.nonlinear.com/progenesis/qi/v2.4/faq/database-fragmentation-algorithm.aspx, accessed in December 2022.
» https://www.nonlinear.com/progenesis/qi/v2.4/faq/database-fragmentation-algorithm.aspx

[10] ¹⁰
How are the Scores Calculated for Possible Compound identifications?, https://www.nonlinear.com/progenesis/qi/v2.4/faq/identifications-scoring-algorithm.aspx, accessed in December 2022.
» https://www.nonlinear.com/progenesis/qi/v2.4/faq/identifications-scoring-algorithm.aspx

[11] ¹¹
Fiehn Laboratory, MassBank of North America (MoNA), https://mona.fiehnlab.ucdavis.edu, accessed in December 2022.
» https://mona.fiehnlab.ucdavis.edu

[12] ¹²
Wishart, D. S.; Feunang, Y. D.; Marcu, A.; Guo, A. C.; Liang, K.; Vázquez-Fresno, R.; Sajed, T.; Johnson, D.; Li, C.; Karu, N.; Sayeeda, Z.; Lo, E.; Assempour, N.; Berjanskii, M.; Singhal, S.; Arndt, D.; Liang, Y.; Badran, H.; Grant, J.; Serra-Cayuela, A.; Liu, Y.; Mandal, R.; Neveu, V.; Pon, A.; Knox, C.; Wilson, M.; Manach, C.; Scalbert, A.; Nucleic Acids Res. 2018, 46, D608. [Crossref]
» Crossref

[13] ¹³
Kind, T.; Liu, K.-H.; Lee, D. Y.; DeFelice, B.; Meissen, J. K.; Fiehn, O.; Nat. Methods 2013, 10, 755. [Crossref]
» Crossref

[14] ¹⁴
Wang, M.; Carver, J. J.; Phelan, V. V.; Sanchez, L. M.; Garg, N.; Peng, Y.; Nguyen, D. D.; Watrous, J.; Kapono, C. A.; Luzzatto-Knaan, T.; Porto, C.; Bouslimani, A.; Melnik, A. V.; Meehan, M. J.; Liu, W.-T.; Crüsemann, M.; Boudreau, P. D.; Esquenazi, E.; Sandoval-Calderón, M.; Kersten, R. D.; Pace, L. A.; Quinn, R. A.; Duncan, K. R.; Hsu, C.-C.; Floros, D. J.; Gavilan, R. G.; Kleigrewe, K.; Northen, T.; Dutton, R. J.; Parrot, D.; Carlson, E. E.; Aigle, B.; Michelsen, C. F.; Jelsbak, L.; Sohlenkamp, C.; Pevzner, P.; Edlund, A.; McLean, J.; Piel, J.; Murphy, B. T.; Gerwick, L.; Liaw, C.-C.; Yang, Y.-L.; Humpf, H.-U.; Maansson, M.; Keyzers, R. A.; Sims, A. C.; Johnson, A. R.; Sidebottom, A. M.; Sedio, B. E.; Klitgaard, A.; Larson, C. B.; Boya, P. C. A.; Torres-Mendoza, D.; Gonzalez, D. J.; Silva, D. B.; Marques, L. M.; Demarque, D. P.; Pociute, E.; O’Neill, E. C.; Briand, E.; Helfrich, E. J. N.; Granatosky, E. A.; Glukhov, E.; Ryffel, F.; Houson, H.; Mohimani, H.; Kharbush, J. J.; Zeng, Y.; Vorholt, J. A.; Kurita, K. L.; Charusanti, P.; McPhail, K. L.; Nielsen, K. F.; Vuong, L.; Elfeki, M.; Traxler, M. F.; Engene, N.; Koyama, N.; Vining, O. B.; Baric, R.; Silva, R. R.; Mascuch, S. J.; Tomasi, S.; Jenkins, S.; Macherla, V.; Hoffman, T.; Agarwal, V.; Williams, P. G.; Dai, J.; Neupane, R.; Gurr, J.; Rodríguez, A. M. C.; Lamsa, A.; Zhang, C.; Dorrestein, K.; Duggan, B. M.; Almaliti, J.; Allard, P.-M.; Phapale, P.; Nothias, L.-F.; Alexandrov, T.; Litaudon, M.; Wolfender, J.-L.; Kyle, J. E.; Metz, T. O.; Peryea, T.; Nguyen, D.-T.; VanLeer, D.; Shinn, P.; Jadhav, A.; Müller, R.; Waters, K. M.; Shi, W.; Liu, X.; Zhang, L.; Knight, R.; Jensen, P. R.; Palsson, B. Ø.; Pogliano, K.; Linington, R. G.; Gutiérrez, M.; Lopes, N. P.; Gerwick, W. H.; Moore, B. S.; Dorrestein, P. C.; Bandeira, N.; Nat. Biotechnol. 2016, 34, 828. [Crossref]
» Crossref

[15] ¹⁵
Kind, T.; Tsugawa, H.; Cajka, T.; Ma, Y.; Lai, Z.; Mehta, S. S.; Wohlgemuth, G.; Barupal, D. K.; Showalter, M. R.; Arita, M.; Fiehn, O.; Mass Spectrom. Rev. 2018, 37, 513. [Crossref]
» Crossref

[16] ¹⁶
SDF2PQI (SDF to PQI Library Converter), https://github.com/pedrohgodoys/sdf_to_pqi, accessed in December 2022.
» https://github.com/pedrohgodoys/sdf_to_pqi

[17] ¹⁷
Haug, K.; Cochrane, K.; Nainala, V. C.; Williams, M.; Chang, J.; Jayaseelan, K. V.; O’Donovan, C.; Nucleic Acids Res. 2020, 48, D440. [Crossref]
» Crossref

[18] ¹⁸
Fernandes, A. M. A. P.; Messias, M. C. F.; Duarte, G. H. B.; de Santis, G. K. D.; Mecatti, G. C.; Porcari, A. M.; Murgu, M.; Simionato, A. V. C.; Rocha, T.; Martinez, C. A. R.; Carvalho, P. O.; Metabolites 2020, 10, 262. [Crossref]
» Crossref

[19] ¹⁹
Dávila-Lara, A.; Rodríguez-López, C. E.; O’Connor, S. E.; Mithöfer, A.; Int. J. Mol. Sci. 2020, 21, 4376. [Crossref]
» Crossref

[20] ²⁰
Liu, P.; Yin, Y.; Gong, Y.; Qiu, X.; Sun, Y.; Tan, L.; Song, C.; Liu, W.; Liao, Y.; Meng, C.; Ding, C.; Viruses 2019, 11, 962. [Crossref]
» Crossref

[21] ²¹
Fu, Q.; Liu, D.; Wang, Y.; Li, X.; Wang, L.; Yu, F.; Shen, J.; Xia, X.; J. Chromatog. B 2018, 1079, 62. [Crossref]
» Crossref

[22] ²²
Wang, C.; Lin, H.; Yang, N.; Wang, H.; Zhao, Y.; Li, P.; Liu, J.; Wang, F.; Molecules 2019, 24, 1712. [Crossref]
» Crossref

[23] ²³
BIOVIA CTFile Formats - BIOVIA Databases 2020; BIOVIA, Dassault Systèmes Software: Vélizy-Villacoublay, France, 2020. [Link] accessed in December 2022.
» Link

[24] ²⁴
Ni, Z.; Wölk, M.; Jukes, G.; Mendivelso Espinosa, K.; Ahrends, R.; Aimo, L.; Alvarez-Jarreta, J.; Andrews, S.; Andrews, R.; Bridge, A.; Clair, G. C.; Conroy, M. J.; Fahy, E.; Gaud, C.; Goracci, L.; Hartler, J.; Hoffmann, N.; Kopczyinki, D.; Korf, A.; Lopez-Clavijo, A. F.; Malik, A.; Ackerman, J. M.; Molenaar, M. R.; O’Donovan, C.; Pluskal, T.; Shevchenko, A.; Slenter, D.; Siuzdak, G.; Kutmon, M.; Tsugawa, H.; Willighagen, E. L.; Xia, J.; O’Donnell, V. B.; Fedorova, M.; Nat. Methods 2022 [Crossref]
» Crossref

Dataset	Ionization mode	MS data acquisition mode	Matches for raw/converted fragmentation library^a a Number of MS/MS-matched features before (raw) and after (converted) the library conversion; bHMDB: Human Metabolome Data Base; cLipidBlast; dFAHFA: Fatty Acid ester of Hydroxyl Fatty Acid; eOxidized Phospholipids, fGNPS: Global Natural Products Social Molecular Network Library; gVanya/Fiehn Natural Products Library; hReSpec: RIKEN MSN Spectral Database for Phytochemicals; iMetaboBASE; jRIKEN PlaSMA: Plant Specialized Metabolome Annotation; kPathogen Box. DIA: data independent analysis; DDA: data dependent analysis.	Library
Human plasma	positive	DIA	0/277	^b,c,d,e
Human plasma	negative		0/282
Plant extract	positive	DDA	0/102	^c,f,g,h,i,j
Chicken cells	positive	DDA	0/690	^b,c,d,e
Chicken cells	negative		0/591
Bacteria	positive	DIA	0/603	^b,c,k
Bacteria	negative		0/229
Rat serum	positive	DIA	0/2010	^b,c,d,e
Rat serum	negative		0/2909
Rat hippocampus	positive	DIA	0/1182	^b,c,d,e
Rat hippocampus	negative		0/384

Brasil