SISTAX – An Intelligent Tool for Recovering Information on Natural Products Chemistry

Este trabalho descreve o desenvolvimento de um novo programa para o sistema especialista SISTEMAT, denominado de SISTAX. Este programa permite aos interessados em quimiotaxonomia realizar uma “pesquisa inteligente” de substâncias orgânicas em bancos de dados através de estruturas químicas. Quando acoplado com um eficiente sistema de códigos, este programa reconhece tipos de esqueletos e pode encontrar quaisquer restrições subestruturais solicitadas pelo usuário. Um exemplo da aplicação do programa para diterpenos encontrados em plantas é descrito.


Introduction
The identification of substructures, parts of structures, has several applications in organic chemistry.Two research fields that apply the concept of structures are computerassisted structure elucidation and chemotaxonomy.In both fields the implementation of computer programs involves chemists, mathematicians, computer engineers, and the interdisciplinarity of the problems results in a great challenge.
[3][4][5][6][7][8][9][10][11] Substructures, allied to other biochemical inferences, are the main tools for chemotaxonomy methodology, and may be useful to discrimimate genera, species etc. 12,13 The aim of this work is to demonstrate how a specialist system developed to assist the chemist in both fields described above can be used for chemotaxonomic purposes.
To accomplish the recognition of substructures for classification purposes in chemotaxonomy and evolution, a new program named SISTAX was developed, which permits realization of a search, at a determined botanical rank (family or genera) by chemical category, such as chemical class, carbon skeleton type and functional groups.This program is stored in a database, especially built for chemical data.We show applications of this program to natural products chemistry, due to the great diversity of compounds already recorded in this field of science as well as the great number of plants chemically studied in laboratories.8][9][10][11][12][13][14][15][16]

Methods
The expert system SISTEMAT The specialist system SISTEMAT is formed by a set of programs projected to be used primarily as an auxiliary tool in natural product determination processes, and secondly for chemotaxonomic studies.][6][7][8][9][10][11][12][13][14][15][16] The latter is only beginning. 17The system allows the analysis of the J. Braz.Chem.Soc.stored data due to an efficient method of structure encoding rendered by SISTEMAT.][16] This enables anyone to recover any chemical information contained within the substance encoding.This type of codification still allows that this information is obtained quickly and simply, which has been of fundamental importance in the development of the SISTAX program.

The SISTAX program: Definitions
Skeletons are different carbon arrangements exhibited by a determined chemical class. 20Chemical classes are large groupings of natural products possessing a common biosynthetic origin, that is, a same chemical precursor.In Figure 1, chemical classes with their respective precursors and different carbon skeletons are shown.For the chemist dealing with natural products chemistry the concept of skeletal types is frequently used for taxonomic and structural determination purposes.

The SISTAX program
The SISTAX program was developed to realize intelligent searches in SISTEMAT's database.At this moment the program has a version written in FORTRAN, with the facilities for controlling screens and data entries in PASCAL.
The first approach utilized by the chemists in this field is investigation of the distribution of the structural types (carbon skeletons or substructures) of one or several existing classes of natural products in a botanical taxon (family or genus).
The intelligent search processed by the program uses the method of encoding compounds from SISTEMAT.Through this encoding type it is possible to investigate within the connectivity matrices of a given compound information about its chemical structure.As the database containing chemical information is interlinked to others containing botanical data, one can therefore recover both types of data simultaneously.The searching processes in the database are performed quickly and simply by the user, who has only to answer the questions from the program through the encoding exhibited on the computer screen.With these answers, the researcher defines the structure types and the extent of the search relevant to his or her research.The research results are listed in tables that can be imported to statistical worksheets.
The structure types to be defined are: Chemical class: triterpene, diterpene, monoterpene etc.; Carbon skeleton: lupane, clerodane, menthane etc.; Substructures (parts of structures): they can be functional groups such as hydroxyl or carbonyl and also sets of interlinked atoms such as an acetate, an aromatic or furanic ring among others.
The extent of the search can be represented by means of a flow chart (Figure 2), where the user can examine: the  occurrence of one or various chemical classes among the plant families and genera; the occurrence of a specific skeleton or various skeletons belonging to a chemical class; the occurrence of one or various substructures in one chemical class or on a specific skeleton belonging to a given chemical class.

The database
To evaluate the SISTAX program, SISTEMAT's database containing 2359 occurrences of diterpenes isolated from the Lamiaceae family was used.This database was built based on data published in the literature from journals indexed by Chemical Abstracts up to 1997.

Verification of occurrence of a determined skeleton
In this test the occurrence of the clerodane skeleton (Figure 3) was verified among genera of the Lamiaceae family.In Figure 4 the information demanded by the SISTAX program from the user is exhibited, so that the analysis can be done.The results obtained are shown in Table 1.This approach enables one to verify, for example, whether an accumulation of a preferential skeleton exists in some genera of a family.It is important to note that the skeleton in Figure 3 is numbered according to an arbitrary criterion adopted by the chemists, named as "biosynthetic numbering".From the computational point of view, the SISTEMAT program stores this numbering as a vector attached to the conectivity matrix of the compounds.By analyzing this encoding computationally, the biosynthetic vector permits a more precise search within the connectivity matrices, so that the user can discriminate, for instance, which, between C-6 and C-7, is methylenic (Figure 3).
Another utility of the biosynthetic vector is searching for functional groups attached to specific positions of a carbon skeleton.Generally these groups are associated with some pharmacological properties or appear in compounds that are mainly isolated from characteristic genera of plants. 21

Occurrence verification of a defined substructure
To carry out the search of a substructure on the SISTEMAT's data banks, a substructure code is needed, that is, it is necessary to define the size and type of existing atoms in that substructure, whose presence is to be searched in the connectivity matrices.The possible substructures are presented in Table 2 and the chemical groupings in Table 3, wherein it is feasible to select a substructure and the desired chemical groupings.As an example, we show in Table 4 the encoding for a furanic ring that may be present in clerodane diterpenes.
The aim of this test is to verify the occurrence of the furan ring in clerodanes from among the genera of the Lamiaceae family.As a demand, it was established that the furan ring should be located at carbons 13-16, according to biogenetic numbering, which is a numbering often used by natural product chemists for the clerodane skeleton (Figure 3).In Figure 5, the results of the search for the furan ring requested by the user through the SISTAX program are exhibited.Table 5 summarizes the results obtained through the analysis carried out by the program,     that is, discriminating family, genera, the number of compounds from the clerodane skeleton having a furan ring at carbons 13-16.

Verification of oxidation in specifics
The SISTAX program permits to verify whether a determined position in a skeleton type, a taxon, shows oxidation more frequently than another position does.For example, one can search for the occurrence of CH 2 groups at C-6 and C-7 in clerodanes (Figure 6).The results are presented in Table 6, where one can see that in clerodanes from Teucrium, Scutellaria and Ajuga, C-6 is more frequently oxidized than C-7.

Conclusions
With the SISTAX program development, the expert system SISTEMAT acquires a new tool that allows the search for requirements such as chemical classes, carbon skeletons and substructures at a determined level in botanical classification.This program permits to correlate botanical information with chemical constraints.Thus, the results obtained can help forthcoming chemosystematic and evolutive studies.Since chemosystematics and evolution papers usually comprise studies on occurrences of compounds at several hierarchical levels 13 , the SISTAX program may be seen as a powerful computer program at the basic step of chemotaxonomic tasks.At this time, correlations between hundreds of genera and, for example, dozens of chemical constraints are a task impossible to be carried out without a computer-assisted tool.

Figure 1 .
Figure 1.Some chemical classes of natural products, skeletal types and biosynthetic precursors.

Figure 2 .
Figure 2. Flow chart of the SISTAX program.

Figure 4 .
Figure 4. Information requested by the SISTAX program for verification of occurrences of clerodane skeleton from Lamiaceae.
labels are in agreement with the programming code.
a triple bond and * = represents the aromatic ring.

Figure 5 .
Figure 5. Information requested by the SISTAX program for verification of occurrences of a furan ring in the clerodane skeleton from Lamiaceae.

Table 1 .
Ocurrence number of clerodane skeletal types in Lamiaceae's genera

Table 2 .
The substructure sets used by SISTEMAT

Table 6 .
Ocurrence number of CH 2 groups at C-6 and C-7 in clerodane skeletal types in Lamiaceae's genera

Table 5 .
Occurrence number of clerodanes with a furan ring in Lamiaceae's genera

Table 3 .
Atomic groupings used by SISTEMAT

Table 4 .
Furan ring code