Linking dopamine neurotransmission and neurogenesis: The evolutionary history of the NTAD (NCAM1-TTC12-ANKK1-DRD2) gene cluster

Genetic studies have long suggested the important role of the DRD2 gene in psychiatric disorders and behavior. Further research has shown a conjoined effect of genes in the Chr11q22–23 region, which includes the NCAM1, TTC12, ANKK1 and DRD2 genes, or NTAD cluster. Despite a growing need to unravel the role of this cluster, few studies have taken into account interspecies and evolutionary approaches. This study shows that behaviorally relevant SNPs from the NTAD cluster, such as rs1800497 (Taq1A) and rs6277, are ancient polymorphisms that date back to the common ancestor between modern humans and Neanderthals/Denisovans. Conserved synteny and neighborhood indicate the NTAD cluster seems to have been established at least 400 million years ago, when the first Sarcopterygians emerged. The NTAD genes are apparently co-regulated and this could be attributed to adaptive functional properties, including those that emerged when the central nervous system became more complex. Finally, our findings indicate that NTAD genes, which are related to neurogenesis and dopaminergic neurotransmission, should be approached as a unit in behavioral and psychiatric genetic studies.


Introduction
The role of the NTAD cluster in psychiatric disorders The crucial role played by dopaminergic neurotransmission systems in psychiatric genetics has been of great interest for several years.Among the candidate genes, one of the main focuses of research has been the dopamine receptor D2 (DRD2), especially its neighboring singlenucleotide polymorphism (SNP) rs1800497 (Taq1A).The rs1800497 SNP was considered to be a silent mutation located about 10 kb from DRD2, in the 3' untranslated region of this gene.The identification of a novel gene in the neighboring forward-strand region of DRD2, named ANKK1, showed that the rs1800497 SNP is actually located in exon 8 of ANKK1 (Neville et al., 2004), where it causes an amino acid change (Glu713Lys) in its 11 th ankyrin repeat.Although the rs1800497 polymorphism is localized in ANKK1, it seems to be in linkage disequilibrium with several DRD2 variants (Gelernter et al., 2006;Dubertret et al., 2010).
Despite the enormous number of studies regarding the role of the rs1800497 SNP in psychiatric disorders (es-pecially alcohol and nicotine dependence), it is still far from clear how and to what degree it affects psychiatric disorders.The three most recent meta-analyses support the link between the rs1800497 T allele and alcoholism (Munafò et al., 2007;Smith et al., 2008;Le Foll et al., 2009).There are two meta-analysis studies focusing on the relationship between rs1800497 and smoking behavior, but with conflicting results (Li et al., 2004;Munafò et al., 2004).Strong heterogeneity has been a hallmark of several of these meta-analyses.Consequently, the strong divergence among such findings raises the need to identify the variables that might explain such heterogeneity.
It has been suggested that two other nearby genes, NCAM1 and TTC12, are also good candidate genes for psychiatric disorders and could comprise causative variants to phenotypes previously attributed to DRD2 polymorphisms (Gelernter et al., 2006;Huang et al., 2008;Dubertret et al., 2010).For example, Yang et al. (2007) found an association of SNPs in the NCAM1, TTC12 and ANKK1 genes with alcohol dependence, but not so for SNPs in the DRD2 gene.In another study, a single haplotype spanning TTC12 and ANKK1, as well as multiple SNPs in these two genes, were associated with nicotine dependence (Gelernter et al., 2006).
These genes are located on chromosome 11 (more precisely, the 11q22-23 region) and form a 521 kb gene cluster that comprises the NCAM1, TTC12, ANKK1 and DRD2 genes, known as the NTAD gene cluster (Figure S1).The four genes that comprise the NTAD cluster all seem to act on the brain, although details of their specific functions in neural tissue have yet to be discovered.The neural cell adhesion molecule 1 (NCAM1) plays an important role in neurogenesis, specifically in axon and dendrite growth (McIntyre et al., 2010).TTC12 encodes the poorly understood tetratricopeptide repeat domain 12 protein, which seems to be involved in dopaminergic transmission and neurodevelopment via the Wnt signaling pathway (Kahto and Kahto, 2003;Castelo-Branco and Arenas, 2006).The ankyrin repeat and kinase domain containing 1 (ANKK1) gene encodes a signaling protein which takes part in indirect modulation of the expression of DRD2 (Huang et al., 2008), thus constituting the clearest currently known evidence of co-regulation in the NTAD cluster.

Gene cluster and genomic architecture: Evolutionary aspects
Gene order in eukaryotes cannot be attributed in its entirety to mere randomness.Multiple lines of evidence indicate that co-expressed, co-regulated and co-functional genes can be maintained as a gene cluster due to the pressure of natural selection (Hurst et al., 2004;Sémon and Duret, 2006;Michalak, 2008).
Considering that polymorphisms of the NTAD gene cluster seem to have a conjoined effect, the question raised is how long these genes are being maintained with shared synteny (genes in the same chromosome) and conserved neighborhood (genes side-by-side in the same order) through the course of evolution.If the NTAD conformation is recent or human-specific it may reflect an adaptive novelty or stochastic clustering in Homo/Homo sapiens.Notwithstanding, the maintenance of a specific gene cluster such as NTAD over long evolutionary periods is much more difficult to be explained by actions of random processes only.An ancient clustering in this case might also reflect a functional benefit and/or an orchestrated evolution of genes involved in neurotransmission and neurogenesis, probably mediated by co-regulation, co-expression or molecular co-functionality.
Hominid comparative genomics has made great strides after the publication of the Neanderthal and Denisovan nuclear genomes since it has allowed researchers to build on knowledge about unique human phenotypes, including psychiatric disorder susceptibilities in present-day populations.Some of these psychiatric disorders have been repeatedly associated with variation in the whole NTAD cluster, but no previous study investigated it from a wider evolutionary perspective using comparative genomic approaches.Taking this information into account we herein performed the first in silico study addressing the genomic architecture and chromosomal dynamics of the NTAD cluster from an evolutionary perspective.Our results pro-vide a more comprehensive view about this gene cluster and how its dynamics could shape future genetic studies of complex behavior phenotypes and psychiatric disorder in humans.

Analysis of NTAD SNP status in primates
Seven human SNPs were chosen based on previous association with psychiatric disorders and/or with evidence of functionality: rs646558 from NCAM1; rs723077 and rs2303380 from TTC12; rs2734849 and rs1800497 from ANKK1; and rs6277 and rs2283265 from DRD2 (Figure S1).These polymorphic sites were then compared with their counterparts in the genomes of two archaic hominids, Homo neanderthalensis (Green et al., 2010) and a Denisovan specimen (Reich et al., 2010), as well as nine non-human primates.The derived allele age was estimated for the rs1800497 and rs6277 SNPs according to the frequency based method, as proposed by Slatkin and Rannala (2000).
Three different approaches were used to predict whether the three non-synonymous mutations (rs1800497 and rs2734849 in ANKK1; rs723077 in TTC12) among the seven SNPs promote important functional changes in the proteins: PolyPhen2, SNAP and the assessment of Grantham scores of chemical distance (Grantham, 1974).For the latter we used the classification by Li et al. (1985) as: conservative (Grantham score = 0-50), moderately conservative (51-100), moderately radical (101-150) and radical (> 151).These methods are used to predict the possible impact of amino acid substitutions on the structure and function of proteins by means of chemico-physical and comparative evidence.

Orthology, synteny and neighborhood status of the NTAD cluster in vertebrate genomes
The online databases Ensembl release 66, UCSC Genome Browser and UniProt were used as sources for the nucleotide and protein sequences of NCAM1, TTC12, ANKK1 and DRD2 genes in the human genome and their orthologues in 47 other vertebrate species (Table S1).Their synteny and neighborhood status were inferred from the available contigs.BLAST/BLAT searches in these databases were performed to find possible unannotated orthologues.BioEdit version 7.0.9.0 (Hall, 1999) was used to align the sequences of the orthologues, when necessary.The DECODE database was used to infer transcription factor binding sites in the NTAD cluster.DECODE is based on text mining applications by SABiosciences and gene annotations of regulatory binding sites available at the UCSC Genome Browser.
Mota et al.

Results and Discussion
Comparative analyses of seven SNPs belonging to the NTAD gene cluster in human and non-human primates Our analyses based on the seven SNPs selected according to their association with psychiatric disorders and/or with evidence of functionality, revealed interesting results (Table 1).For instance, we were able to show that polymorphisms believed to be H. sapiens-specific turned out to be plausibly widespread in the Homo genus.Although the introgression of H. sapiens derived alleles to other hominids could not be disregarded, these polymorphisms could be traced back to at least 270,000-800,000 years when our lineage diverged from Neanderthals/Denisovans (Green et al., 2010;Reich et al., 2010).This seems to be the case for the rs1800497 SNP, where both ancestral and derived alleles (taking into account that a mutation event produces a new, mutant or derived allele, that is different from the "original" or ancestral one) are present in the Denisovan genome, while in Neanderthals only the derived allele could be found, denoting that the A ® G (Glu713Lys) mutation occurred before the origin of Homo sapiens.Our estimate for the age of the derived allele (Slatkin and Rannala, 2000) is compatible with this hypothesis (379,000-447,000 years).
A similar situation was detected with the DRD2 rs6277 SNP, but in this case both alleles were found in Neanderthals (Table 1).Interestingly, the estimation of allele age for the derived allele of this polymorphism presented a greater discrepancy among populations.For Africans and Asians it was about 65,000 years, whereas for Europeans it was set at 359,000 years.These results raised some instigating hypotheses, including allele introgression from Neanderthals to H. sapiens in European populations, with subsequent dispersion to other continents.Selection processes and genetic drift, inflating the derived allele frequency in Europe, are other possible explanations.It seems however unlikely that stochastic processes would keep these polymorphisms unchanged for such a long time.Signals of positive selection were found in variant alleles of a related dopaminergic gene (DRD4) associated with modern psychiatric disorders.Some of these variants are associated with behavioral traits that could have had some adaptive advantage in the past, but today may have clinical implications (Ding et al., 2002;Wang et al., 2004;Tovo-Rodrigues et al., 2011).A similar scenario may be responsible for the maintenance of these polymorphic sites in the human lineage, although only further studies can test this hypothesis.
On the other hand, the TTC12 rs2303380 and DRD2 rs2283265 SNPs are likely to be Homo sapiens-specific.The NCAM1 rs646558 derived allele (C) is also H. sapiens-specific, but curiously its ancestral allele (A) is exclusively found in Old World primates, while a third allele (G) is found in lemurs and New World monkeys (Table 1).This   denotes that the G ® A mutation possibly occurred on the phylogenetic branch leading to the Old World primates.
The additional analysis of the three non-synonymous SNPs rendered only the Met73Leu (rs723077) mutation in the TTC12 gene as a non-neutral amino acid change, which the SNAP prediction tool indicated that the protein's chemico-physical properties may change in response to this substitution.The failure in detecting similar results for the other two mutations does not necessarily imply that they have no functional impact since the SNAP, PolyPhen2 and Grantham Score programs were designed to indicate only significant chemical changes in protein structure.Likewise, detecting a non-neutral signal does not necessarily imply direct association with relevant phenotypic changes.
Orthology, synteny and neighborhood status of the NTAD cluster in vertebrate genomes Our comparative analysis involved human NTAD gene sequences as query for BLAST searches in the other 46 available vertebrate genome sequences (Table S1).Most of the orthologous genes were previously annotated in the databases used.Additionally, we were able to identify some new ANKK1 orthologues.This included an unannotated ~1.5kb long ANKK1 ortholog sequence in the porcine (Sus scrofa) genome (Table S1), comprising three exons (orthologous to the human exons 3, 4 and 5; 76% identity), from which a 119 amino acid-long protein sequence could be predicted (86% identity).This ANKK1 orthologue in pigs is located downstream from DRD2, changing the gene order in the cluster from NTAD to NTDA (NCAM1-TTC12-DRD2-ANKK1).This interesting result illustrates a unique chromosomal inversion in the porcine lineage (Figure 1).Two other probable new ANKK1 orthologues were identified in the alpaca (Vicugna pacos) and rabbit (Oryctolagus cuniculus) genomes (Table S1), but in both cases the expected NTAD gene order was retained.
Conserved synteny and neighborhood of the whole NTAD cluster was observed in 22 of the 48 vertebrate species studied (46%), other 4 species present conserved synteny but do not share full neighborhood (Teleostei: Danio rerio, Gasterosteus aculeatus, Tetraodon nigroviridis; Artiodactlyla: Sus scrofa; Figure 1).The synteny and neighborhood status of the NTAD cluster in the other genomes could not be ascertained due to low coverage and/or incomplete assembly.
One of the main features of the NTAD cluster (Figure S1) is that DRD2 is located in the reverse strand (or minus strand) in relation to the other three genes in all vertebrate genomes, except in the porcine genome due to the inversion described above.The presence of more than one NCAM1 and DRD2 orthologue copies in teleost fishes (Figure 1B, Table S1) is in agreement with the known genome duplication in this taxon (Jaillon et al., 2009).However, we did not observe conserved synteny in all copies of NCAM1 and DRD2, since the extant copies not included in the NTAD cluster are located in different chromosomes (Table S1).This raises the question of possible subfunctionalization or neofunctionalization of these gene copies.
It is worthy of note that NTAD cluster synteny is conserved in teleost fishes, but not the neighborhood of its genes.About three million bases, containing ~100 genes, separate NCAM1 from the TAD cluster (TTC12-ANKK1-DRD2), which can be tracked back to at least ~525 Mya in the origin of vertebrates.The NCAM1 neighborhood was apparently gained when the Sarcopterygians emerged ~400 Mya and seems to have been maintained since then (Figure 1C).
Although phenomena such as tandem duplications, inversions, rearrangements and indels may account for non-random patterns of the genome (Hurst et al., 2004), none of these explain by themselves the origin and conservation of the NTAD cluster.Thus other hypotheses need to be considered.
Several authors have demonstrated that clustered genes are kept together for a long period of time to preserve intact their co-regulatory system and consequently phenotype integrity.For instance, several cis-regulatory sequences are preserved throughout all vertebrate genomes due to their role in development (Kikuta et al., 2007).One interesting fact about the NTAD gene cluster is that a polymorphism in one gene might indirectly affect the expression levels of a neighboring gene.Huang et al. (2008) demonstrated that the ANKK1 rs2734849 SNP alters the expression level of NF-kB-regulated genes and, since DRD2 gene expression is regulated by the transcription factor NF-kB (Fiorentini et al., 2002;Bontempi et al., 2007), it might be indirectly regulated by ANKK1.The search for regulatory transcription factors in the DECODE database showed that the NCAM1 gene also seems to be regulated by NF-kB, as well as TTC12 paralogues (Table S2), denoting a possible role of this transcription factor in the co-regulation of the NTAD genes.
Based on computer simulations, Yerushalmi and Teicher ( 2007) showed an extraordinary tendency for essential genes to cluster as a result of natural selection pressures.Our results illustrate, for the first time to our knowledge, such a tendency for genes with essential functions in neurogenesis and dopaminergic neurotransmission.Thus, it is likely that natural selection, through the formation of the NTAD cluster, has played a role in the emergence of an efficient mechanism of co-regulation, when the vertebrate central nervous system acquired novel traits and gained complexity. Equally important must have been the role of natural selection in maintaining the NTAD cluster practically intact for at least 400 million years.
Notwithstanding, certain limitations must be considered when interpreting the results of the present study.Several assumptions presented here rely on available genomic data and analyses that are still preliminary.This is especially true for tests considering alternative explanations for specific SNPs that are widespread in the Homo genus and for the rate of maintained of four-gene clusters since or before the origin of vertebrates.We understand that a major goal of this study is to stimulate further research in the evolutionary history of gene clusters using the currently available genomic data and emerging bioinformatics tools.

Conclusion
Our results suggest that genes related to neurogenesis and dopaminergic neurotransmission may be interconnected in the course of the evolution of the complex vertebrate neural system via a common functional genomic architecture and chromosomal dynamics.Associated with due consideration of linkage disequilibrium patterns, this denotes the importance of approaching the NTAD cluster as a candidate functional unit, rather than its genes separately, in behavioral and psychiatric genetic studies.Table S2.Transcription factors predicted by DECODE as regulators of the NTAD genes in humans.
*Although NF-κB was not predicted as a regulator of TTC12, it regulates the TTC12 paralogues TOMM34 and STIP1 as well as other genes encoding the tetratricopeptide repeat domain (not shown).

Figure S1 -
Figure S1-The NTAD cluster in the human genome, showing the single nucleotide

Table 1 -
Single nucleotide variation in the NTAD cluster across primate genomes.

Table S1 .
Orthologues of genes in the NTAD cluster in 47 available vertebrate genomes, showing accession number and location.

Table S1 (
cont).Orthologues of genes in the NTAD cluster in 47 available vertebrate genomes, showing accession number and location.

Table S1 (
cont).Orthologues of genes in the NTAD cluster in 47 available vertebrate genomes, showing accession number and location.

Table S1 (
cont).Orthologues of genes in the NTAD cluster in 47 available vertebrate genomes, showing accession number and location.

Table S1 (
cont).Orthologues of genes in the NTAD cluster in 47 available vertebrate genomes, showing accession number and location.