Abstract in English:Clinical and cytogenetic studies were performed in 65 infertile individuals, and 56 of them were also screened for microdeletions in Yq11 (AZF region). Relevant environmental etiological factors were identified in 10 cases (15.4%). Sertoli-cell-only syndrome was diagnosed in six patients (9,2%). Karyotype abnormalities were detected in six individuals, and five other patients presented desynapsis of bivalents in meiosis. Three out of the 56 patients studied were carriers of microdeletions in the AZF region, one of them also presenting a chromosomal mosaicism for an extra i(22p).
Abstract in English:The multisteps of tumorigenesis involve the classic chromosomal instability and the mutator phenotype pathways featured by a predisposition to acquire mutations in tumor suppressor genes and oncogenes. Expansion and contraction of microsatellite sequences due to a deficient mismatch repair system are a marker of the mutator phenotype. Controversial results regarding the extent of microsatellite instability (MSI) have been reported in the development and progression of myeloid malignancies. Here, we investigated MSI and loss of heterozygosity (LOH) frequencies at the microsatellite loci BAT-26, D7S486, D8S135, ANK1, IFNA, TP53 and bcr of 19 Brazilian patients with acute (AML) and chronic myeloid leukemia (CML). One AML patient and one CML patient were categorized as having a high degree of microsatellite instability (MSI-H), corresponding to 10.5% (2/19) of all patients. LOH at loci BAT-26 and TP53 was present in 30% of the patients with AML alone. Despite the small sample size, our results suggest that the mutator phenotype, as verified by MSI frequency, could play a role in the leukemogenesis of a small subset of patients with myeloid leukemia.
Abstract in English:The allelic variability of four dinucleotide microsatellites located in the HLA region (MOGc, D6S265, MIB, and TNFa) was analyzed in 67 individuals representing three Amerindian populations of the Argentine Gran Chaco: Toba, Wichi and Chorote. Genomic DNA was prepared from peripheral blood and DNA was extracted using the standard phenol-chloroform procedure. Alleles were identified by PCR, using an end-labelled reverse oligonucleotide primer (fluorescent 6 - Fam labeling). Despite the low number of samples studied, a high level of gene diversity was observed in each population and for each locus. Moreover, the mean number of alleles was 7.7, 5.3, 10.0, and 7.0 at loci MOGc, D6S265, MIB and TNFa, respectively. Differentiation tests between pairs of populations showed a clear differentiation between the Wichi and the other two groups. However, the proportion of the total genetic variability that is due to differences among populations, estimated by the Gst' index, was relatively low (6%). Almost all the genetic variation occurred at the intra-population level (96%). The high intra-populational genetic variation suggests the existence of an intensive gene flow among the Gran Chaco tribes. Historical information seems to confirm this result.
Abstract in English:The p53 codon 72 polymorphism seems to be associated with HPV-carcinogenesis, although controversial data have been reported. A series of Brazilian women with cervix carcinomas were analyzed. Ninety-nine (67%) of 148 women were found to be homozygous (arg/arg) for the arginine polymorphism, and 49 (33%) were heterozygous (arg/pro). This polymorphism may be an important determinant of the risk for cervix cancer, but does not seem to be sufficient for carcinogenesis.
Abstract in English:BRCA1 mutations are known to be responsible for the majority of hereditary breast and ovarian cancers in women with early onset and a family history of the disease. In this paper we present a mutational survey conducted in 47 Brazilian patients with breast/ovarian cancer, selected based on age at diagnosis, family history, tumor laterality, and presence of breast cancer in male patients. All 22 coding exons and intron-exon junctions were sequenced. Constitutional mutations were found in seven families, consisting of one insertion (insC5382) in exon 20 (four patients), one four base-pair deletion (3450-3453delCAAG) in exon 11 resulting in a premature stop codon (one patient), one transition (IVS17+2T> C) in intron 17 affecting a mRNA splicing site (one patient), and a C> T transition resulting in a stop-codon (Q1135X) in exon 11 (one patient). The identification of these mutations which are associated to hereditary breast and ovarian cancers will contribute to the characterization of the mutational spectrum of BRCA1 and to the improvement of genetic counseling for familial breast/ovarian cancer patients in Brazil.
Abstract in English:The PrP C prion protein contains 250 amino acids with some variation among species and is expressed in several cell types. PrP C is converted to PrP Sc by a post-translational process in which it acquires amino acid sequences of three-dimensional conformation of beta-sheets. Variations in the prion protein gene were observed among 16 genera of New World primates (Platyrrhini), and resulted in amino acid substitutions when compared with the human sequence. Seven substitutions not yet described in the literature were found: W -> R at position 31 in Cebuella, T -> A at position 95 in Cacajao and Chiropotes, N-> S at position 100 in Brachyteles, L -> Q at position 130 in Leontopithecus (in the sequence responsible for generating the beta-sheet 1), D -> E at position 144 in Lagothrix (in the sequence responsible for the alpha-helix 1), D-> G at position 147 in Saguinus (also located in the alpha-helix 1 region), and M -> I at position 232 in Alouatta. The phylogenetic trees generated by parsimony, neighbor-joining and Bayesian analyses strongly support the monophyletic status of the platyrrhines, but did not resolve relationships among families. However, the results do corroborate previous findings, which indicate that the three platyrrhine families radiated rapidly from an ancient split.
Abstract in English:One of the main concerns of Conservation Biology is the identification of priority areas for conservation, and the development of quantitative methods is important to achieve this task. Many phylogenetic diversity indexes and higher-taxon approaches have been used in this context. In this study, Faith's phylogenetic indexes and the number of evolutionary independent lineages of Carnivora were calculated at the average patch level based on phylogenetic autocorrelation analysis of phenotypic traits, in 18 conservation units in America (frequently National Parks). Despite controversies about the hierarchical level to be adopted, the characters included in this study suggest that the family level produces independent units for the analysis of phenotypic diversity in Carnivora. A positive correlation between species richness and the number of evolutionary independent lineages appeared (r = 0.67; P < 0.05), showing that this is a valid criterion to priorize conservation areas. Faith's phylogenetic diversity index is also highly correlated with species richness (r = 0.87; P < 0.05), as well as with the number of evolutionary independent lineages (r = 0.89; P < 0.05). Thus, the conservation units with more species have also more evolutionary information to be preserved.
Abstract in English:The aim of this investigation was to study the influence of maternal effects on the genetic evaluation of sire production in Tabapuã beef cattle. Single and multiple trait analyses were done with adjusted animal weights at 120, 240 and 420 days of age. Antagonism was observed between additive direct and maternal genetic effects, with the maternal effect being higher until weaning. The inclusion of maternal effects in the models removed part of the additive variance only in single trait analyses and resulted in smaller means and standard deviations for the sire breeding values. The use of maternal effect associated with single or multiple traits may affect sire ranking. The contradictory results of the single and multiple trait analyses for additive direct and maternal effects indicate that caution is needed when considering recommendations about the importance of maternal effects in Tabapuã beef cattle.
Abstract in English:Behavior, morphology, allozyme studies and DNA hybridization and sequencing data all suggest the independent evolution of the Old and New World parrots and support tribe status for the American species, although the phylogenetic relationships within this tribe are still poorly understood. A previous study has shown that the Yellow-faced parrot (Amazona xanthops Spix 1824) exhibits large karyotypic differences compared to the other Amazona species and suggested that this species should be renamed Salvatoria xanthops, although the relationships between S. xanthops and the other New World parrots remain unclear. In the present work, we describe the karyotype of the Scaly-headed parrot (Pionus maximiliani, Kuhl 1820) and the karyotype and C-banding pattern of the Short-tailed parrot (Graydidascalus brachyurus, Kuhl 1820) and compare them to the karyotype and C-banding pattern of S. xanthops, as well as to the karyotypes of other New World parrots. The chromosomal similarity between these three species and the karyotypic differences between them and other New World parrots suggest that G. brachyurus and S. xanthops are sister species and are most closely related to members of the genus Pionus.
Abstract in English:Morphological and chromosomal markers were used to infer the structure and genetic variability of a population of fish of the genus Astyanax, geographically isolated at sinkhole 2 of Vila Velha State Park, Paraná, Brazil. Two morphotypes types were observed, the standard phenotype I and phenotype II which showed an anatomical alteration probably due to an inbreeding process. Fluctuating asymmetry (FA) analysis of different characters showed low levels of morphological variation among the population from sinkhole 2 and in another population from the Tibagi river (Paraná, Brazil). The Astyanax karyotype was characterized in terms of chromosomal morphology, constitutive heterochromatin and nucleolar organizer regions. Males and females presented similar karyotypes (2n=48, 6M+18SM+14ST+10A) with no evidence of a sex chromosome system. One female from sinkhole 2 was a natural triploid with 2n=3x=72 chromosomes (9M+27SM+21ST+15A). The data are discussed regarding the maintenance of population structure and their evolutionary importance, our data suggesting that Astyanax from the Vila Velha State Park sinkhole 2 is a recently isolated population.
Abstract in English:The composition of heterochromatin classes along the chromosomes of specimens from two populations of the fish Astyanax scabripinnis was examined using fluorescence banding with GC- and AT-DNA specific fluorochromes and fluorescence in situ hybridization (FISH) with an AT-rich satellite DNA (As51) probe. For the pericentromeric heterochromatin blocks neither GC/AT-DNA specific fluorochromes nor the FISH technique produce any response with chromosomes from either of the populations. On the other hand, the telomeric distal heterochromatin blocks of both populations fluoresced when the FISH technique was applied but showed distinct responses after GC-specific fluorochrome treatments, leading us to propose different structural arrangements of the FISH-positive heterochromatins. Such differences in chromosome banding patterns together with other karyotypic differences suggest differentiation of these populations at taxonomic level.
Abstract in English:Damselfishes (Pomacentridae, Perciformes) occur in all major oceans of the world and, with approximately 320 species, represent one of the most diverse families of marine Teleostei. The taxonomy of these reef fishes is problematic because of the large number of complex species and the range of color patterns they display, which vary among individuals and populations of the same species. In this study, we examined the cytogenetic composition of four species of Stegastes (S. pictus, S. fuscus, S. variabilis and S. leucostictus) found along the coast of Brazil. Stegastes pictus had a chromosomal number of 2n = 48 (14m+28sm+2st+4a, fundamental number (FN) = 92), S. fuscus had 2n = 48 (20m+22sm+6a, FN = 90), S. variabilis had 2n = 48 (18m+22sm+8a, FN = 88), and S. leucostictus had 2n = 48, (18m+22sm+8a, FN = 88). The nucleolar organizing regions were single and homologous in all of the species, and were located in the interstitial region on the short arm of the first submetacentric pair. The heterochromatin segments were reduced in size and were distributed conservatively over the centromeric and pericentromeric regions of most of the chromosomes. The marked divergence in the number of chromosomal arms, compared to other perciformes (2n = 48, FN = 48), indicated that varying degrees of multiple pericentric inversions had occurred during the karyotypic evolution of the Pomacentridae. Subtle karyotypic differences between S. variabilis and S. leucostictus suggested a recent divergence or that their karyotypes were less susceptible to changes. These results indicate that cytogenetic analyses could provide important complementary data for the characterization of populations and species of Stegastes and damselfishes in general.
Abstract in English:In the present study, five callichthyid species belonging to the subfamily Corydoradinae were karyotyped: three species of Aspidoras and two of Corydoras. The three species of Aspidoras had the same diploid number, 2n = 46 chromosomes, similar karyotypic formulae, with most chromosomes metacentric or submetacentric, single interstitial Ag-NORs and C-band positive segments mainly found in the centromeric position. The comparative analysis of cytogenetic data available for the genus Aspidoras and other species of Corydoradinae suggest that several events of centric fusion occurred in the origin of the species of Aspidoras. The two analyzed species of Corydoras showed high diploid numbers, 2n = 74 in C. sodalis and 2n = 90 in C. britskii. While C. sodalis exhibited single Ag-NORs and terminal, interstitial and centromeric C-band positive segments in almost all chromosomes, C. britskii showed multiple Ag-NORs and a small number of C-band positive segments found in the terminal position in one acrocentric (A) pair and in the interstitial position in one subtelocentric (ST) pair. The occurrence of high diploid numbers and many ST and A chromosomes are uncommon among the Corydoradinae, suggesting the occurrence of a high number of chromosome rearrangements, mainly centric fissions, in the origin of the Corydoras species studied.
Abstract in English:Three tiger beetle species from the Cicindelini tribe were examined cytogenetically and found to have the following karyotypes: Cicindela argentata, 2n = 18 + X1X2Y/X1X1X2X 2; Cicindela aurulenta, 2n=18 + X1X2X3Y/X1X1X 2X2X3X3 and Cicindela suturalis, 2n = 18 + X1X2X3X4Y/X1X 1X2X2X3X3X4X 4. Fluorescence in situ hybridization (FISH) using a PCR-amplified 18S rDNA fragment as a probe showed the presence of ribosomal clusters in two autosomes in C. argentata, two autosomes and two heterosomes in C. aurulenta and in two heterosomes in C. suturalis (male configuration), revealing two new patterns of rDNA localization. Such results are representative of the cytogenetic variability observed in the species rich genus Cicindela (sensu lato) mainly as regards the localization of rDNA genes and the number and morphology of the heterosomes, in spite of the stability of autosome numbers. Changes in the localization and number of rDNA clusters were independent of changes in the number of sex chromosomes, indicating that several processes might have contributed to the great karyotypic diversity found within this speciose Coleopteran group.
Abstract in English:The analysis of polytene chromosomes in 26 strains of seven species in the Drosophila fasciola subgroup, from several locations in Brazil, in addition to strains of two species belonging to the Drosophila mulleri subgroup (D. aldrichi and D. mulleri), enabled us to determine that the 3c inversion found in the latter species differ in one of its break points from that present in the species of the fasciola subgroup. Therefore, a change in the mulleri complex denomination from inversion 3c to inversion 3u is proposed. Accordingly, the fasciola subgroup is no longer a lesser phylogenetic part within the mulleri subgroup. Rather, it is directly related to the likely ancestor of the repleta group, called Primitive I. This information removes the main obstacle to considering the Drosophila fasciola subgroup as an ancestral group within the Drosophila repleta species group, according to the hypothesis of Throckmorton. Our data also support the conclusion that D. onca and D. carolinae are closely related species based on one new inversion in chromosome 4 (4f²), in both species. D. fascioloides and D. ellisoni also form a pair of sister species based on the presence of fusions of chromosomes 2-4 and 3-5. D. rosinae is related only to the likely ancestor of the fasciola subgroup, where the 3c inversion was fixed.
Abstract in English:In this study, the breeding perspectives of 41 open-pollinated progenies of Eucalyptus grandis were evaluated based on their wood traits. The progenies were distributed in two experiments in a randomized complete block design, with three replicates and linear plots containing six plants each. The traits were assessed at eight years of age. Two trees from each plot were selected for this assessment based on better growth, stem form and phytosanity. Significant differences in basic density, sapwood/heartwood ratio, bowing, specific gravity, parallel compression and static bending were detected among the progenies. These traits were potentially promising for breeding programs, with heritability coefficients that varied from 0.34 to 0.61 on a progeny mean basis. There was no genetic variation in the moisture content, board end-splitting, log volume under bark, log eccentricity, bark content, crooking, and shear strength of the progenies. Intermediate to highly significant genetic correlations were detected among the physical and mechanical properties, as well as between pairs of traits such as basic density and log end-splitting, basic density and bowing, specific gravity and bowing, sapwood/heartwood ratio and bowing, log volume and bowing, and log volume and log end-splitting. These results show that the levels of growth stress in trees can be reduced by selection using indirect traits such as the sapwood/heartwood ratio and bowing.
Abstract in English:A segregant bulk population derived from a single cross between the Carioca MG cultivar and the ESAL 686 line was used to investigate whether the action of natural selection in the direction required by the breeders and the delaying line extraction would increase the chance of obtaining families with greater grain yield. The populations were advanced from F2 to F24 and obtained families F2, F8 and F24 from the plants. These families and their parents were assessed for grain yield (kg/ha) in Lavras-MG in three sowing seasons (July 2001, November 2001 and March 2002) in an 18 x 18 lattice design with two replications in the first sowing and three in the other two. The largest mean yield, regardless of sowing season, was among the families derived from the F24 plants. The frequency of superior families increased when line extraction was delayed to more advanced generations.
Abstract in English:In order to compare their relative efficiencies as markers and to find the most suitable marker for maize diversity studies we evaluated 18 inbred tropical maize lines using a number of different loci as markers. The loci used were: 774 amplified fragment length polymorphisms (AFLPs); 262 random amplified polymorphic DNAs (RAPDs); 185 restriction fragment length polymorphisms (RFLPs); and 68 simple sequence repeats (SSR). For estimating genetic distance the AFLP and RFLP markers gave the most correlated results, with a correlation coefficient of r = 0.87. Bootstrap analysis were used to evaluate the number of loci for the markers and the coefficients of variation (CV) revealed a skewed distribution. The dominant markers (AFLP and RAPD) had small CV values indicating a skewed distribution while the codominant markers gave high CV values. The use of maximum values of genetic distance CVs within each sample size was efficient in determining the number of loci needed to obtain a maximum CV of 10%. The number of RFLP and AFLP loci used was enough to give CV values of below 5%, while the SSRs and RAPD loci gave higher CV values. Except for the RAPD markers, all the markers correlated genetic distance with single cross performance and heterosis which showed that they could be useful in predicting single cross performance and heterosis in intrapopulation crosses for broad-based populations. Our results indicate that AFLP seemed to be the best-suited molecular assay for fingerprinting and assessing genetic relationships among tropical maize inbred lines with high accuracy.
Abstract in English:tRNA genes are known target sites for the integration of pathogenicity islands (PAI) and other genetic elements, such as bacteriophages, into bacterial genome. In most STEC (Shiga toxin-producing Escherichia coli), the PAI called LEE (locus of enterocyte effacement) is related to bacterial virulence and is mostly associated to the tRNA genes selC and pheU. In this work, we first investigated the relationship of LEE with tRNA genes selC and pheU in 43 STEC strains. We found that 28 strains (65%) had a disrupted selC and/or pheU. Three of these strains (637/1, 650/5 and 654/3) were chosen to be submitted to a RAPD-PCR technique modified by the introduction of specific primers (corresponding to the 5'end of genes selC and pheU) into the reaction, which we called "anchored RAPD-PCR". The PCR fragments obtained were transferred onto membranes, and those fragments which hybridized to selC and pheU probes were isolated. One of these fragments from strain 637/1 was partially sequenced. An 85-nucleotide sequence was found to be similar to the cfxA2 gene that encodes a beta-lactamase and is part of transposon Tn4555, a pathogenicity island originally integrated into the Bacteroides genome.
Abstract in English:Although linkage disequilibrium, epistasis and inbreeding are common phenomena in genetic systems that control quantitative traits, theory development and analysis are very complex, especially when they are considered together. The objective of this study is to offer additional quantitative genetics theory to define and analyze, in relation to non-inbred cross pollinating populations, components of genotypic variance, heritabilities and predicted gains, assuming linkage disequilibrium and absence of epistasis. The genotypic variance and its components, additive and due to dominance genetic variances, are invariant over the generations only in regard to completely linked genes and to those in equilibrium. When the population is structured in half-sib families, the additive variance in the parents' generation and the genotypic variance in the population can be estimated. When the population is structured in full-sib families, none of the components of genotypic variance can be estimated. The narrow sense heritability level at plant level can be estimated from the parent-offspring or mid parent-offspring regression. When there is dominance, the narrow sense heritability estimate in the in F2 is biased due to linkage disequilibrium when estimated by the Warner method, but not when estimated by means of the plant F2-family F3 regression. The bias is proportional to the number of pairs of linked genes, without independent assortment, and to the degree of dominance, and tends to be positive when genes in the coupling phase predominate or negative and of higher value when genes in the repulsion phase predominate. Linkage disequilibrium is also cause of bias in estimates of the narrow sense heritabilities at full-sib family mean and at plant within half-sib and full-sib families levels. Generally, the magnitude of the bias is proportional to the number of pairs of genes in disequilibrium and to the frequency of recombining gametes.
Abstract in English:We analyzed the performance of a real coded "steady-state" genetic algorithm (SSGA) using a grid-based methodology in docking five HIV-1 protease-ligand complexes having known three-dimensional structures. All ligands tested are highly flexible, having more than 10 conformational degrees of freedom. The SSGA was tested for the rigid and flexible ligand docking cases. The implemented genetic algorithm was able to dock successfully rigid and flexible ligand molecules, but with a decreasing performance when the number of ligand conformational degrees of freedom increased. The docked lowest-energy structures have root mean square deviation (RMSD) with respect to the corresponding experimental crystallographic structure ranging from 0.037 Å to 0.090 Å in the rigid docking, and 0.420 Å to 1.943 Å in the flexible docking. We found that not only the number of ligand conformational degrees of freedom is an important aspect to the algorithm performance, but also that the more internal dihedral angles are critical. Furthermore, our results showed that the initial population distribution can be relevant for the algorithm performance.
Abstract in English:An approach to the hydrophobic-polar (HP) protein folding model was developed using a genetic algorithm (GA) to find the optimal structures on a 3D cubic lattice. A modification was introduced to the scoring system of the original model to improve the model's capacity to generate more natural-like structures. The modification was based on the assumption that it may be preferable for a hydrophobic monomer to have a polar neighbor than to be in direct contact with the polar solvent. The compactness and the segregation criteria were used to compare structures created by the original HP model and by the modified one. An islands' algorithm, a new selection scheme and multiple-points crossover were used to improve the performance of the algorithm. Ten sequences, seven with length 27 and three with length 64 were analyzed. Our results suggest that the modified model has a greater tendency to form globular structures. This might be preferable, since the original HP model does not take into account the positioning of long polar segments. The algorithm was implemented in the form of a program with a graphical user interface that might have a didactical potential in the study of GA and on the understanding of hydrophobic core formation.
Abstract in English:The main goal of this study is to find the most effective set of parameters for the Simplified Generalized Simulated Annealing algorithm, SGSA, when applied to distinct cost function as well as to find a possible correlation between the values of these parameters sets and some topological characteristics of the hypersurface of the respective cost function. The SGSA algorithm is an extended and simplified derivative of the GSA algorithm, a Markovian stochastic process based on Tsallis statistics that has been used in many classes of problems, in particular, in biological molecular systems optimization. In all but one of the studied cost functions, the global minimum was found in 100% of the 50 runs. For these functions the best visiting parameter, qV, belongs to the interval [1.2, 1.7]. Also, the temperature decaying parameter, qT, should be increased when better precision is required. Moreover, the similarity in the locus of optimal parameter sets observed in some functions indicates that possibly one could extract topological information about the cost functions from these sets.
Abstract in English:This work performs a data driven comparative study of clustering methods used in the analysis of gene expression time courses (or time series). Five clustering methods found in the literature of gene expression analysis are compared: agglomerative hierarchical clustering, CLICK, dynamical clustering, k-means and self-organizing maps. In order to evaluate the methods, a k-fold cross-validation procedure adapted to unsupervised methods is applied. The accuracy of the results is assessed by the comparison of the partitions obtained in these experiments with gene annotation, such as protein function and series classification.
Abstract in English:To evaluate the effects of non-reversibility on compositional base changes and the distribution of branch lengths along a phylogeny, we extended, by means of computer simulations, our previous sequential PCR in vitro evolution experiment. In that study a 18S rRNA gene evolved neutrally for 280 generations and a homogeneous non-stationary model of base substitution based on a non-reversible dynamics was built from the in vitro evolution data to describe the observed pattern of nucleotide substitutions. Here, the process was extended to 840 generations without selection, using the model parameters calculated from the in vitro evolution experiment. We observed that under a non-reversible model the G+C content of the sequences significantly increases when compared to simulations with a reversible model. The values of mean and variance of the branch lengths are reduced under a non-reversible dynamics although they follow a Poisson distribution. We conclude that the major implication of non-reversibility is the overall decrease of branch lengths, although no transition from a stochastic to an ordered process is observed. According to our model the result of this neutral process will be the increase in the G+C content of the descendant sequences with an overall decrease in the frequency of substitutions.
Abstract in English:The aim of data mining is to find useful knowledge inout of databases. In order to extract such knowledge, several methods can be used, among them machine learning (ML) algorithms. In this work we focus on ML algorithms that express the extracted knowledge in a symbolic form, such as rules. This representation may allow us to ''explain'' the data. Rule learning algorithms are mainly designed to induce classification rules that can predict new cases with high accuracy. However, these sorts of rules generally express common sense knowledge, resulting in many interesting and useful rules not being discovered. Furthermore, the domain independent biases, especially those related to the language used to express the induced knowledge, could induce rules that are difficult to understand. Exceptions might be used in order to overcome these drawbacks. Exceptions are defined as rules that contradict common believebeliefs. This kind of rules can play an important role in the process of understanding the underlying data as well as in making critical decisions. By contradicting the user's common beliefves, exceptions are bound to be interesting. This work proposes a method to find exceptions. In order to illustrate the potential of our approach, we apply the method in a real world data set to discover rules and exceptions in the HIV virus protein cleavage process. A good understanding of the process that generates this data plays an important role oin the research of cleavage inhibitors. We consider believe that the proposed approach may help the domain expert to further understand this process.
Abstract in English:Pattern recognition is an important process for gene localization in genomes. The ribosome binding sites are signals that can help in the identification of a gene. It is difficult to find these signals in the genome through conventional methods because they are highly degenerated. Artificial Neural Networks is the approach used in this work to address this problem.
Abstract in English:Gene expression profiles contain the expression level of thousands of genes. Depending on the issue under investigation, this large amount of data makes analysis impractical. Thus, it is important to select subsets of relevant genes to work with. This paper investigates different metrics for gene selection. The metrics are evaluated based on their ability in selecting genes whose expression profile provides information to distinguish between tumor and normal tissues. This evaluation is made by constructing classifiers using the genes selected by each metric and then comparing the performance of these classifiers. The performance of the classifiers is evaluated using the error rate in the classification of new tissues. As the dataset has few tissue samples, the leave-one-out methodology was employed to guarantee more reliable results. The classifiers are generated using different machine learning algorithms. Support Vector Machines (SVMs) and the C4.5 algorithm are employed. The experiments are conduced employing SAGE data obtained from the NCBI web site. There are few analysis involving SAGE data in the literature. It was found that the best metric for the data and algorithms employed is the metric logistic.
Abstract in English:The study of Euclidean Steiner Trees is one of the alternative methods to unveil Nature's plans for the internal architecture of biomacromolecules. Recently, the minimum surface structure of the A-DNA and of the Tobacco Mosaic Virus was shown to be described by a "strake" surface. These results have been substantiated by an explicit calculation of the Steiner Ratio Function in a very restrictive modelling scheme. In the present work, we also introduce the measure of chirality as an essential part of a thermodynamical approach to model biomolecular structure. In a certain sense, the Steiner Ratio function is constrained by the chirality measure to assume a value dictated by Nature. This value is a measure of the free energy of the molecular configuration.
Abstract in English:The Human Genome Project has generated a large amount of sequence data. A number of works are currently concerned with analyzing these data. One of the analyses carried out is the identification of genes' structures on the sequences obtained. As such, one can search for particular signals associated with gene expression. Splice junctions represent a type of signal present on eukaryote genes. Many studies have applied Machine Learning techniques in the recognition of such regions. However, most of the genetic databases are characterized by the presence of noisy data, which can affect the performance of the learning techniques. This paper evaluates the effectiveness of five data pre-processing algorithms in the elimination of noisy instances from two splice junction recognition datasets. After the pre-processing phase, two learning techniques, Decision Trees and Support Vector Machines, are employed in the recognition process.
Abstract in English:A new scheme for representing proteins of different lengths in number of amino acids that can be presented to a fixed number of inputs Artificial Neural Networks (ANNs) speel-out classification is described. K-Means's clustering of the new vectors with subsequent classification was then possible with the dimension reduction technique Principal Component Analysis applied previously. The new representation scheme was applied to a set of 112 antigens sequences from several parasitic helminths, selected in the National Center for Biotechnology Information and classified into fourth different groups. This bioinformatic tool permitted the establishment of a good correlation with domains that are already well characterized, regardless of the differences between the sequences that were confirmed by the PFAM database. Additionally, sequences were grouped according to their similarity, confirmed by hierarchical clustering using ClustalW.
Abstract in English:This work presents a method to analyze characteristics of a set of genes that can have an influence in a certain anomaly, such as a particular type of cancer. A measure is proposed with the objective of diagnosing individuals regarding the anomaly under study and some characteristics of the genes are analyzed. Maximum likelihood equations for general and particular cases are presented.
Abstract in English:GCLASS is an algorithm which explores small samples of two distinct biological states for finding small sets of genes, which form a feature vector that is enough to separate these two states. A typical sample is a set of 60 microarrays, 30 for each biological state, with several thousand genes. The technique consists of the following: a spreading model defined in the space of small sets of genes studied and centered in each feature vector considered; the designing of optimal linear classifiers under this spreading model; and ranking the designed classifiers, based on their error and robustness relative to the spreading. The feature vectors used in the best classifiers are considered the best feature vectors. Due to the great number of potential feature sets, a parallel implementation is a good option for reducing the procedure execution time. This paper presents a parallel solution of GCLASS and shows some performance results. The experimental results show that the proposed solution provides quasi linear speedup if compared to the sequential implementation. For example, using 60 genes as the complete feature space and 6 genes as the small feature space, our parallel version with 11 processors is approximately 10.98 times faster than the sequential version.
Abstract in English:A large number of DNA sequencing projects all over the world have yielded a fantastic amount of data, whose analysis is, currently, a big challenge for computational biology. The limiting step in this task is the integration of large volumes of data stored in highly heterogeneous repositories of genomic and cDNA sequences, as well as gene expression results. Solving this problem requires automated analytical tools to optimize operations and efficiently generate knowledge. This paper presents an information flow model , called GenFlow, that can tackle this analytical task.