Genetics and sport performance : current challenges and directions to the future

 e determinants of human athletic performance have long been a challenging  eld of study in sport sciences. Sports performance is an enormously complex multifactorial phenomenon, and is determined by numerous intrinsic (e.g., genetics, motor behavior, physiological and psychological pro le) and extrinsic factors (e.g., training, nutrition, development opportunities and overall health conditions) as well as by the interaction between them. Although it is impossible to set a unique formula to make anyone becoming a successful athlete, it is widely accepted that any individual who is highly committed and dedicated to training is able to improve athletic performance. Likewise, to be a top-level athlete, several years of dedication to an organized and rigid training system is indeed a prerequisite, although not a guarantee of success. However, a few athletes seem to be exceptionally gifted and demonstrate extraordinarily high performance levels even before taking part in training programs; some athletes demonstrate better responses to training than others, or may be able to consistently sustain high levels of performance over their competitive career. Despite the awareness of the genetic in uences on competitive success, genetics of sports performance is a quite recent area of investigation. As a consequence, the currently available knowledge is largely incipient and some authors consider we are at infant stages of the area. Hence, every e ort aiming at improving our understanding on this phenomenon is of great importance.  e earlier studies on genetics of human performance were focused on estimating the heritability of di erent complex traits. Using approaches such as twin-studies Abstract


Introduction
Genetics and sport performance: current challenges and directions to the future CDD.20.ed.575.1 796.092 796.5 e determinants of human athletic performance have long been a challenging eld of study in sport sciences.Sports performance is an enormously complex multifactorial phenomenon, and is determined by numerous intrinsic (e.g., genetics, motor behavior, physiological and psychological pro le) and extrinsic factors (e.g., training, nutrition, development opportunities and overall health conditions) as well as by the interaction between them 1 .Although it is impossible to set a unique formula to make anyone becoming a successful athlete, it is widely accepted that any individual who is highly committed and dedicated to training is able to improve athletic performance.Likewise, to be a top-level athlete, several years of dedication to an organized and rigid training system is indeed a prerequisite, although not a guarantee of success.However, a few athletes seem to be exceptionally gifted and demonstrate extraordinarily high performance levels even before taking part in training programs; some athletes demonstrate better responses to training than others, or may be able to consistently sustain high levels of performance over their competitive career [2][3] .
Despite the awareness of the genetic in uences on competitive success, genetics of sports performance is a quite recent area of investigation.As a consequence, the currently available knowledge is largely incipient and some authors consider we are at infant stages of the area 4 .Hence, every e ort aiming at improving our understanding on this phenomenon is of great importance.
e earlier studies on genetics of human performance were focused on estimating the heritability of di erent complex traits.Using approaches such as twin-studies

Abstract
In recent years there has been a great progress in molecular biology techniques, which has facilitated the researches on infl uence of genetics on human performance.There are specifi c regions of DNA that can vary between individuals.Such variations (i.e., polymorphisms) may, in part, explain why some individuals have differentiated responses to certain stimuli, including the responses to sports training.In a particular sport, the presence of specifi c polymorphisms may contribute to high levels of performance.Since 1998, several polymorphisms have been associated with athletic phenotypes; however the accumulation of information generated over these 15 years shows that the infl uence of genetics to sport is extremely complex.In this review, we will summarise the current status of the fi eld, discussing the implications of available knowledge for the practice of professionals involved with the sport and suggesting future directions for research.We also discuss topics related to the importance of polygenic profi le characterization of athletes, methods for the identifi cation of new polymorphisms associated with physical performance, the use of genetic testing for predicting competitive success, and how crucial is the genetic profi le for the success athletes in competition.KEY WORDS: Genetic; Polymorphism; Athletes; Sports; Performance.and familial aggregation studies, investigators were able to estimate the percentage contribution of genetic factors to muscle fibre type distribution and enzyme activities [5][6] , bone density and muscle strength 7 , aerobic 8 and anaerobic capacities 9 , among other performance-relevant variables.Although unquestionably relevant, these studies did not provide speci c information on which particular genes and genetic variants would be involved in such genetic in uences.e rst polymorphism related to sport performance (i.e., angiotensin-conversion enzyme, ACE) was not identi ed until 1998 [10][11] .
Over the past 15 years, the advances in biotechnology and molecular biology tools have facilitated a rapid increase in the identi cation of structural genetic variations capable of exerting some in uence on the phenotypes related to athletic performance 12 .Association studies made possible the establishment of a human gene map for exercise 13 and constant updates re ect the rapid increase in the number of polymorphisms that entered in this map [14][15] .To date, more than 200 polymorphisms have been associated with some feature related to physical exercise 15 , and it is expected that this number will increase in the following years 15 .However, only about 20 of these >200 genetic variants were speci cally observed in athletes 4 .Furthermore, most of these genes and variants have failed to con rm association in replication studies 16 , so that less than 10 genetic variants have been consistently associated

Basic concepts
with sports performance 17 .Considering that the human genome has over 20,000 genes and that each gene may present an enormous diversity of common variants that could theoretically in uence some performance-related phenotype, it is extremely likely that our currently knowledge represents only a small fraction of the genetic factors that in uence sports performance.Hence, numerous new genetic variants are yet to be discovered, and we still barely understand how genes interact with each other and with environmental factors.
High-level sports performance is an extremely complex phenotype and genetic background is only one of its multiple contributory factors.It is likely that the contribution of heritability to a particular phenotype will largely depend on the speci c sport discipline, among other factors.Even if only the genetic factors are considered, sports success remains an extremely complex phenomenon because it is a multigenic trait 18 .
Understanding the in uence of genetics in sport is a critical step to unravel the determinants of sports excellence.However, the challenges are enormous and the experimental approaches to address this highly complex and multifaceted phenotype are somewhat limited.In this critical review, we discuss the potential and limitations of research methods on genetics of sports performance, the implications of this knowledge in "real-world" sport settings, as well as directions for future researchs.
For the bene t of reader, the terms and concepts most important for understanding this discussion are brie y revised below.
Genome refers to the entire collection of genetic material hereditarily transmitted to the next generation that a given species possesses 19 .In the case of most eukaryotes, including humans, the genome is encoded in DNA sequences.ese sequences are composed by four di erent nitrogen bases (Adenine, ymine, Cytosine and Guanidine -A, T, C and G, respectively) that, by binding in sequence, make a single DNA strand (e.g., AACGGT is a sequence of nitrogen bases forming a single-stranded DNA).Each nitrogen base also binds to its complimentary base (i.e., A binds to T and C binds to G) so that a single DNA strand is attached to a complimentary single DNA strand, forming a double-stranded DNA.e DNA molecules are predominantly found inside the nucleus, although a small portion is also found inside mitochondria.Inside the nucleus, the genome is organized in 23 di erent pairs of chromosomes.
e human genome has over 3 billion nitrogen base pairs.Of this total, only ~5% are encoding regions.A gene is considered a speci c region of the genome whose DNA sequence encodes a biologically active product, which is, in most cases, a RNA molecule that ultimately results in a protein.Every individual has two copies of each gene, which are called alleles.In the coding sequences, a sequence of 3 nucleotides encodes for 1 speci c amino acid, which will take place in the peptide chain.ese 3-nucleotide sequences encoding for 1 amino acid are called codons.
Although all human beings share all the same genes, they display some slight structural variations in their nitrogen base sequences.It is estimated that only 0.1% of the genome varies between individuals 20 .However, this minor portion of the genome explains the enormous phenotypic diversity that exists among humans.ese variations may occur as: 1) changes in a single base pair (for example, in the sequence AACGGT the nucleotide G is swapped by a nucleotide A, so the variant sequence is AACAGT); 2) deletions of a single base pair (for example, in the sequence AACGGT, the nucleotide G is deleted, so the variant sequence is AACGT); 3) insertions of single base pairs (for example, in the sequence AACGGT, the nucleotide C is inserted, so the variant sequence is AACCGT); or 4) changes, deletions or insertions in two or more base pairs.e most common type of variation in the human genome consists in a change in a single base pair 21 .is type of variant can be a single nucleotide polymorphism (SNP) or a point mutation, depending on how prevalent it is in the population and on the impact it has on phenotype (see further discussion).
Despite being non-coding sequences, 95% of the genome do play fundamental physiological roles, especially by regulating the rate of genes expression (i.e., whether or not a gene will be transcribed into RNA to produce a protein, and how much RNA will be produced from its corresponding gene).erefore, genetic variants may encompass both coding and non-coding regions of the genome 22 .Genetic variants that occur in coding regions may a ect the sequence of amino acids in a protein and, depending on the type of variant, the protein structure can be slightly or severely a ected.Obviously, the more a protein structure is a ected, the greater its physiological impact tends to be. e variant protein may be less functional or even not functional at all.On the other hand, when genetic variants occur in noncoding regions of the genome, the protein structure is normally una ected and its physiological impact tends to be less pronounced.In these cases, it is more likely that the rate of the gene expression is a ected.Some genetic variations are rare and some are common.When a variant appears in less than 1% of the population, it is considered a mutation; when its frequency in the population is greater than 1%, it is considered a polymorphism 19 .Normally, a mutation has a greater impact on physiological functions than a polymorphism 23 .As a consequence, mutations tend to have some health impact whereas polymorphisms tend to account for normal phenotypic variation.However, there are cases of rare mutations that do not lead to a disease, as well as there are common polymorphisms that increase the likelihood of an individual to develop a disease.Whether these variants should be considered mutations or polymorphisms is still a matter under debate and are not under the scope of this review.
As discussed earlier, there are two copies of each gene in the genome a .One allele is found in a speci c region of a speci c chromosome whilst the other allele (which is exactly the same gene but not formed by exactly the same sequence) is found in the same region of the homologous chromosome.Considering a given variation in a given gene, an individual may have one or two copies of the most frequent variant (which is often referred as the "normal" copy) and/or one or two copies of the least frequent variant (which is often referred as the polymorphic or mutated copy).
erefore, for this speci c variation, the genotype of an individual can be: 1) homozygous (two alleles of a "normal" copy of the gene); 2) heterozygous (one allele of a "normal" copy and the other of a polymorphic copy of the gene); or 3) homozygous (two alleles of a polymorphic or mutated copy of the gene).
Phenotypic traits are observable characteristics controlled by genes.us, a given genotype a ects a given phenotype to some extent.Some traits are controlled by one single gene, and they are referred to as monogenic traits.Normally, it is relatively easy to establish a link between a genotype and a phenotype in monogenic traits since they obey the Mendelian logic of inheritance.On the other hand, polygenic traits are far more complex because they are in uenced by several genes, as well as by multiple non-genetic environmental factors.Due to its multifactorial nature, it is normally di cult to establish a strong association between one single genetic variant and a complex phenotype, which often imposes a hurdle to the studies attempting to identify speci c genes that in uence a complex phenotype.is explains, at least in part, why the associations between genetic polymorphisms and athletic performance are normally weak and frequently not con rmed in replication studies.

Experimental approaches for studying genetics of sport performance
Genetic infl uences on quantitative traits and sports performance Sports performance is an extremely complex phenotypic trait, which is in turn influenced, although not determined, by many other traits, such as muscle bre type distribution, aerobic power and capacity, anaerobic power and capacity, and trainability of physical capacities 24 .
Most traits that are relevant to sports performance are quantitative, meaning that they are possible to be measured and quanti ed.Some examples of quantitative traits that are relevant to physical performance are: body composition, aerobic power and muscle strength.In some cases, the nal outcome of sport performance can also be a quantitative trait.For examples, swimming distance times, running races, jumps, throws and all other sports in which nal performance is quanti able can be considered quantitative traits.In other cases, however, sports performance "per se" is not a quantitative trait. is is the case of unpredictable sports, such as team sports, individual sports that depend on natures' conditions (e.g., sur ng and sailing) and individual sports that depend on opponents' actions (e.g., combat sports).eoretically, some performancerelevant quantitative traits are strongly in uence by genetic factors, which is also the case of some "predictable sports" (FIGURE 1).On the other hand, other traits as well as "unpredictable sports" are less in uenced by genetic factors (FIGURE 1) and, therefore, genotype-phenotype relationships are less likely to be established.is must be kept in mind when performing association studies, as latter discussed in this review.Over the past 15 years, there has been a great e ort to identify, at the molecular level, variations in DNA sequence that may contribute to sports performance or to any trait that is relevant to sport performance, so-called genotype-phenotype correlations.However, in complex polygenic multifactorial traits, genotypephenotype correlations are often elusive and di cult to be clearly identi ed.

Candidate genes association studies
One of the most frequent experimental approaches for assessing genotype-phenotype correlations is the genetic association.In genetic association studies, a candidate polymorphism is correlated with a performance-relevant trait.For example, the frequency of a candidate polymorphism is compared between two highly distinct populations: elite athletes and non-athletes.If the polymorphism is significantly more frequent in the athletic group, it is assumed that this polymorphism is associated with athletic status and contributes to elite performance.In general, a polymorphism is considered a candidate based on the physiological role of the gene and on how the di erence in nucleotide sequence a ects gene function and/or expression.
Association studies can be divided into three main categories: 1) case-control studies, which compare the frequency of genotypes in a cohort of controls (non-athletes) and a cohort of elite athletes; 2) cross-sectional studies, which compare selected physiological and/or performance data between di erent genotypes 25 and 3) longitudinal studies, in which responses to a given intervention (e.g., exercise training or diet) are compared between genotype groups.All approaches are important to demonstrate the relevance of a genetic variant to performance 26 .However, polymorphisms emerging from association studies remain as "candidate" genetic variants until the association is replicated in other independent cohorts and a plausible biological explanation for the impact of the polymorphism is formulated 25 .
Case-control association studies are relatively cost-e ective and easy to be performed, especially when a large number of DNA samples from top-level athletes and controls is readily available, which makes this approach interesting to initially screen potential candidate polymorphisms.In contrast, association studies do not provide cause-e ect relationships, meaning that establishing an association between a candidate polymorphism and elite athletic status is not su cient to accept a candidate polymorphism as valid.
us, providing further evidence on the in uence of that polymorphism on physical performance it is extremely important.is evidence can be produced using the cross-sectional and longitudinal prospective approaches detailed in the previous paragraph, as well as by determining a physiological role of the polymorphism on sports-related phenotypes.e critical steps for producing compelling evidence on the role of a genetic variant in enhancing sports performance are schematised in FIGURE 2.

Genome-wide studies
Although relatively cost-effective and straightforward, association studies often face some obstacles that may hamper researchers to securely draw conclusions from the data obtained.
To circumvent these problems, some measures are recommended when performing association studies.
In some cases, a particular polymorphism correlates with athletic phenotypes in a group of individuals from a speci c region and, later, the same association is observed in a di erent set of individuals from a distinct region [27][28] . is positive replication (i.e., a consistent association) strengthens the evidence for the in uence of the polymorphism on phenotype.However, even if the replication occurs in more than one population, it does not mean that the same association will be found in every population of the world 29 .In fact, associations found in a study are frequently not replicated in subsequent studies [30][31][32][33] .Depending on the characteristics of the studies that showed the association (e.g., sample size, ethnic background and homogeneity of athletic cohort in terms of competitive level and sports disciplines), such inconsistencies may be interpreted as: 1) the polymorphism is not relevant to physical performance; 2) the polymorphism is relevant in a population with a speci c ethnic background; 3) the polymorphism is relevant for some speci c sports disciplines.
According to the ethnic background of the studies cohort, the frequency of each genotype can vary dramatically [33][34] .erefore, all polymorphisms reported in the literature should be replicated in di erent populations 4,16 .It is possible that some polymorphisms are relevant to performance only in some speci c regions or under some speci c conditions, whereas other polymorphisms may have a more "universal" e ect.
One major disadvantage of candidate-gene studies is that only one or a few (in the case of polygenic pro le studies) genes can be assessed at a time.In view of this, new methods allowing the screening of the whole genome were developed.
Genome-wide linkage studies (GWLS) were the rst approach to analyse genetic markers across the entire genome.is method identi es chromosomal regions that harbour genes a ecting quantitative traits over generations (i.e., it identi es quantitative trait loci) 24 .GWLS have been used to discover QTL associated with a variety of diseases and other phenotypes 25 .
More recently, technological advances originated another technique, the so-called genome-wide association study (GWAS), capable of identifying genes, rather than genomic loci, associated with a phenotype 35 .Unlike GWLS, which requires familial data (the basic unit of observation is a pair of parents, usually brothers), GWAS studies analyse individual data 24 .is new approach is becoming increasingly popular in the search for variations that contribute to complex traits 24 .
While candidate gene studies are driven by the theoretical impact that a variant would have on physical performance, GWAS do not make any prior assumption regarding genes and variants involved with physical performance 2 .Due to our limited ability to select candidate polymorphisms for association with performance based only on available theory, the design of candidate-genes association studies will be always restricted to a certain degree 24 .In this sense, the fact that GWAS studies are "theory-free" and that polymorphism selection is based on observational data makes this approach more robust.is increases the chances of nding new and perhaps unexpected genes and variants affecting physical performance, which might open new venues for investigation.
Despite being considered more robust than candidate genes studies, GWAS are very expensive and, therefore, not widely used in sports sciences or not used in truly large athletic cohorts 35 .In 2008, the cost of a run containing two human genomes (30x coverage -Illumina sequencing machine -HiSeq 2000) was around US $60.000 36 .Even though this value has been falling and in 2011, the price of a similar run was around US $10.000 36 , meaning that a very large study with adequate statistical power would reach extremely high costs.
Because of the multivariate nature of GWAS, the p value must be times lower than usual to accept the correlation as signi cant (normally 5x 10 -8 rather than 0.05) 2,37 , which obviously require a very large sample size to achieve a desirable statistical power.By their very nature, cohorts of fairly homogenous top-elite athletes are small.us, reaching desirable statistical power in GWAS studies can be an enormous challenge in sports sciences.e study by P and W 36 describes a good example on how sample size can a ect GWAS outcomes, reporting two studies designed to quantify the heritability of stature, a very stable and easy to be measured variable, using the GWAS approach.In one study, a set of ~30,000 individuals was analysed, and it was found that less than 5% of the variation in the phenotype is explained by genetics.In contrast, the other study using a much larger set of individuals (n ≅ 180.000) showed that genetics explained 10% of the variance in height.
Likewise, the number of SNPs included in the analysis a ects the GWAS statistical power.e greater the number of SNPs, the greater the number of comparisons carried out by statistical analysis, so the value accepted to consider statistical signi cance is reduced 16 .Another limitation of GWAS is that, despite screening for genetic variants along the entire genome, only SNPs are detected in this analysis, meaning that not all types of genetic variations will be captured 4,24 , such as copy number variants.
A massive amount of information is generated by one GWAS.However, as with any other association study, the results of GWAS can be vague or uninformative 38 .Within the same study, the following situations can be possible: 1) a signi cant association is found between a SNP and a phenotype, and the result is replicated in other cohorts (true-positive); 2) a signi cant association between the SNP and the phenotype is found, but replication studies fail to con rm it in other cohorts (false-positive) ; 3) GWAS was not sensitive enough to detect associations which were already con rmed in previous studies (lost-results) 38 .It has been shown that non-replicating results and inconsistency in the magnitude of results (heterogeneity) are commons aws in GWAS 38 .
In the last 5 years, several investigations were carried out using GWAS for complex phenotypes, mainly pathologies 35 .In 2011, B et al. 39 conducted the rst study applying GWAS in a group of individuals (non-athletes) before and after a training period.ey examined the association of SNPs with the VO 2max responses to physical training.
e authors managed to perform the study in a quite large sample (> 1000 subjects) and to compare the results in di erent cohorts.e study successfully identi ed genetic variants associated with training responses, and then it was possible to construct a panel of 21 SNPs, which accounted for 49% of the variance in VO 2max responses (p < 0.05).To date, there are no studies using the GWAS approach in a cohort of athletes.Although it is acceptable that access to large cohorts of truly elite athletes represents a tremendous barrier to high-quality genome-wide studies in this population, a collective e ort from research groups around world is indeed necessary to undertake this extremely relevant type of study.

The role of sample size
e number of participants is probably the most important limitation of genetic association studies 26 , especially when referring to elite athletes 40 .Because performance-relevant polymorphisms are normally found in low frequencies in a given population 12 , reduced sample sizes will probably return very small numbers of individuals presenting the rare genotype.Each polymorphism exhibits distinct frequencies in di erent populations, so the sample size necessary to detect an association can vary according to the polymorphism and to the population analysed 17,41 .Sample size is directly related to the statistical power of the study, and a reduced number of participants can hamper the drawing of rm conclusions 4 .is limitation becomes even more evident when multiple comparisons enter into statistical model (e.g., in GWAS and polygenic studies), which makes the analysis more rigid and lowers the set level of signi cance 16 .
Some authors claim that this limitation is justi able since there is a very limited number of high-level athletes in most regions and countries.is makes the collection of a su ciently large cohort almost impossible and it explains why studies with elite athletes usually assess a small number of individuals (i.e., n > 100) 42 .Indeed, this is a very strong and truthful argument and, regrettably, no much can be done to enlarge athletic cohorts, especially in countries where competitive sport is less prominent.However, researchers in this area should endeavour to maximize the number of athletes in their cohorts.In fact, some authors advocate that the research groups around the world should create an international consortium with DNA samples from worldwide elite athletes 4,16 . is is probably the best way to circumvent this relevant limitation in sample size.However, researchers should be conscious to the fact that this procedure could result in heterogeneous groups regarding both athletic and ethnic backgrounds, which would add extra confounding variables to the analysis.
Alternatively, increasing the number of participants in the non-athletic (control) group can be an appealing manner to minimize problems with low sample sizes and underpowered studies.For example, by running a statistical simulation, we observe that a group of athletes (n = 100; frequency of the rare allele = 30%) when compared to a control group (n = 100; frequency of the rare allele = 20%), no signi cant di erences are observed for the allele frequency between groups (x 2 = 2.16, p = 0.14).However, as shown in FIGURE 3, even if the number of athletes remains unchanged (n = 100; frequency of the rare allele = 30%), it is possible to detect a signi cant di erence between groups by increasing the number of controls to 325 (frequency of the rare allele remains 20%) (x 2 = 3.85, p = 0.0497).As displayed in FIGURE 3, successive increases in the number of controls are paralleled by increases in statistical power.
However, at a certain point, further large increases in sample size result in merely slight increases in power.In view of this, researchers are advised to increase the number of controls to a maximum in their analysis.Possibly, a control sample size between 1000 and 1500 will yield a desirably good statistical power for the majority of the situations.

The importance of making comparisons between homogeneous groups
It is widely accepted that each sport discipline has its particular physiological, psychological, anthropometrical and biomechanical demands, which directly in uence the characteristics that would most contribute to competitive success [43][44][45] .In fact, it has been considered that the polymorphisms in uencing endurance sports are di erent from the polymorphisms in uencing sprint sports, which display a greater demand for muscle strength and power 46 .Considering that some (if not all) of these characteristics are in part genetically determined, studies comparing genotypes between athletes should carefully select which sports disciplines will be included in a given group when categorizing athletes.
Most studies have been categorized athletes into two opposing groups, namely sprint/powerorientated sports vs. endurance-orientated sports 31,47 .Although this is an interesting approach, since it would group together sports under the in uence of the same genes and polymorphisms, the inclusion of sports in the same group that are not really analogous can represent a major flaw in the analysis.For example, football, 200-m swimming and powerlifting could be classi ed as power/sprint-orientated sports, even though the determinants of these disciplines are clearly di erent.Even nearly identical sprint/powerorientated sports could have quite di erent demands, such as 50-m and 400-m swimming.In view of this, it becomes evident that researchers should wisely include in the same group only athletes who compete in truly comparable disciplines.
The study by M et al. 48compared the distribution of 6 polymorphisms between endurance runners (> 5000 meters) and endurance cyclists, sports that are usually clustered on the same group due to their similar metabolic demand.Interestingly, the frequency of the I and D alleles (indel polymorphism in the ACE gene) was di erent between these groups.Despite the similarities regarding the metabolic demands, factors such as mechanical e ciency and movement economy may have in uenced the di erential role of the ACE polymorphism in these sports 48 .
Another matter of concern is how to categorize athletes according to competitive levels.Ideally, a polymorphism that exerts some in uence on sports performance should be capable of di erentiating not Most studies on genetics of sports performance have assessed the impact of only one variant on athletic performance 46 .Due to the multifactorial nature of sports performance, the e ect of a single variant is most likely to be small 49 .According to F et al. 50, one important drawback of the single gene approach in sports sciences is that the selected gene may only marginally contribute to performance, or it may not be the "bottleneck" of performance, since many other genes may compensate the altered function of the polymorphic gene.e regulation of every physiological system involved in exercise performance and training responses is dependent on a complex network of interconnected genes 51 .us, it is assumed that the contribution of genetics to physical performance relies on the combined action of di erent genes and, consequently, a number of di erent polymorphisms Single gene approach vs. polygenic profi le only athletes from non-athletes, but also elite athletes from non-elite athletes.us, the criteria utilised to categorise athletes into competitive groups can be also confounding factors in association studies.Usually, "high-elite" athletes are those who participate in Olympic Games or World Championships; "elite" are those who participate in international-level competitions (e.g., continental championships), "sub-elite" participate at the national-level; and "non-elite" in state-and regional-level competitions.Although slightly di erent classi cations have been used in literature 40 , this is an easy and straightforward manner to classify athletes according to competitive levels.However, researchers from di erent countries should be cautious in relation to how a particular sport is developed in that country.For example, judo is a very popular and developed sport in Brazil, whilst rugby is not.On the other hand, judo is not a popular and developed sport in New Zealand, whereas rugby is probably the most prominent sport.In this context, reaching national judo level in Brazil is probably more di cult than reaching international high-elite levels in judo in New Zealand.Likewise, a world-cup rugby player in Brazil would probably have less of the necessary characteristics to succeed in rugby than a national-level New Zealand athlete.Because of these national disparities, authors should ponder the most appropriate way to classify the athletes from theirs cohorts taking into consideration how the speci c sport is developed.
accounts to the variation in sports excellence and training responses.
Despite their limitations, single gene studies are important to first identify polymorphisms associated with performance, especially when it is not possible to perform GWAS.However, because one single polymorphism would only account for a minor part of the total variation, a combination of polymorphisms (i.e., a polygenic pro le) would provide a better model to explain how changes at the molecular level in DNA would a ect the phenotype.
e concept of polygenic pro le is now becoming more solid.It was recently shown that healthy individuals who possess more alleles associated with aerobic metabolism present better responses to aerobic training (according to VO 2max responses to training) 39 .Likewise, endurance athletes with a greater amount of alleles already associated with endurance performance have a greater chance of being successful in endurance-orientated sports 52 .erefore, it is becoming increasingly clear that high-level athletes are individuals genetically distinguished, meaning that they probably present a combination of numerous polymorphic alleles associated with physical performance 12 .
In 2008, W and F 17 proposed the calculation of the total score genotyping (TGS).To calculate TGS, the polymorphisms that have been previously associated with a target phenotype must be selected.Each allele receives a score, which ranges from 0 to 2, according to the existing genotype.Homozygous genotypes associated with the phenotype receive the score "2", while heterozygotes receive the score "1".Homozygous for the allele not associated with performance returns the score "zero".After determining the scores for each polymorphism, these are summed and the result is expressed in a scale ranging from 0 to 100, so that the TGS is obtained.Using this score, it is possible to assess the balance between several selected di erent polymorphisms that are knowingly relevant to performance 41 .
From 2009 to the present day, the TGS has been used by some research groups and has proved to be a sensitive tool to di erentiate endurance from strength/power athletes 31,47,[53][54] .Additionally, the use of TGS has revealed that high-elite athletes have a polygenic pro le signi cantly di erent from the general population, which probably makes athletes more favourable to sports success 18 .
e TGS assumes a dose-response e ect, i.e., the more associated alleles an athlete has, the better his/her genetic pro le for sports success.Hence, it is assumed that there is an additive e ect of polymorphisms 4 .Moreover, the sensibility of the TGS to di erentiate individuals with di erent genetic predisposition to excel in a given type of sport seems to be dependent on the number of polymorphisms included in the calculation 18 .
e choice of which polymorphisms are included in the TGS calculation should be made in light of the characteristics of the studied population.For example, if a study aims to determine the polygenic profile of sprinters, polymorphisms associated with muscle size and strength, as well as anaerobic energy metabolism and other sprint-orientated phenotypes should be inserted in the formula 41 .Besides, the choice of the polymorphisms should include only those truly consistently associated with the phenotype.Otherwise, the TGS will lack power in di erentiating athletes between non-athletes or sprinters endurance athletes 17,41 .
A potential limitation of the TGS is the fact that it considers all polymorphisms as in uencing athletic performance to the same extent 4 .With the currently available knowledge, it is impossible to determine the exact weight that each polymorphism should have in TGS calculation, simply because it is not yet determined the weight of each polymorphism in the regulation of the di erent sports-related phenotypes 55 .
e number of genes that are modulated in response to physical training is relatively large.In contrast, the scienti c literature shows a relatively limited number of polymorphisms associated with physical performance in athletes, i.e., about 20 genes 17,41 .e scores so far published have included 6-10 polymorphisms in the TGS calculation, which clearly represents only a small fraction of the genes that putatively a ect sportsrelated phenotypes 52,56 .As a consequence, the optimal combination of polymorphisms that would result in an ideal TGS model is yet to be determined 49 .However, this does not preclude the TGS to be successfully used to distinguish endurance athletes from strength/ power athletes and from non-athletes.Furthermore, the TGS will probably be updated and optimized in the upcoming years as the knowledge on single genes associated with performance will improve 42 .One important caveat, however, is that as the number of polymorphisms is included in the calculation increases, the likelihood of nding an individual with optimum genotypes decreases exponentially 17,41 , which could cause TGS to lose sensitivity.us, an ideal model of TGS will include only the genes that most contribute to that speci c trait.e exact number and the most important genes for each phenotype are fundamental questions that researchers should address in the future.

Identifi cation of new candidate polymorphisms
To choose a new polymorphism that will be analysed in an association study, it is rst necessary to choose a candidate gene to be explored.A convenient way to choose a candidate gene is to observe which genes are modulated during or after exercise 57 .If a particular gene is consistently modulated, it would be worthy to search for structural variations in this gene that could in uence phenotypes related to physical performance 58 .
Recently, T et al. 57 screened, through the use of microarray, the mRNA expression in "vastus lateralis" muscle of healthy subjects who underwent 20 weeks of aerobic training.Afterwards, the authors searched for polymorphism in the genes whose expression accounted for the variation in VO 2max responses.Six SNPs that may account for the variability in the cardiorespiratory response to aerobic training were identi ed.
Although useful, microarrays are somewhat expensive and time-consuming.The enormous amount of data generated might also represent another obstacle to the use of this technique.In order to identify new candidate polymorphisms influencing athletic performance, alternative methods other than microarrays can be employed.A considerable amount of information on mRNA expression and exercise is available in literature and potential genes are waiting to be explored 59 .
After choosing one candidate gene to be explored, an investigator should examine all genetic variations described for the gene, which can easily be done with online public databases (e.g., NCBI and UCSC) 4 .For the vast majority of the genes, several structural variations will be found, including single nucleotide, indel and copy number polymorphisms.e selection of the best candidates should consider previously available data.Nonetheless, multiple polymorphisms for each gene should be tested in order to verify which one presents the best correlation with the phenotype 25 .
ere is a natural tendency in sports research to study genes involved in motor activities.However, more recent studies indicate that genes involved

Strategies to identify physiological roles of a polymorphism
As previously discussed, unravelling the underlying physiological mechanisms by which a genetic variant a ect performance is crucial to de nitively associate that variant to performance.However, this is often a very laborious and elusive task.Nevertheless, a few experimental approaches might be of great use for this purpose.
Knowing whether the structural DNA variant a ects or not gene and protein expression at the tissue level is probably the most fundamental question.If the polymorphism severely a ects protein structure (e.g., a non-sense or a frameshift polymorphism), it is usually easier to observe its impact on physiology and on phenotype.In fact, the best-characterized polymorphism impacting physical performance is the R577X in the gene, which is a non-sense polymorphism leading to the synthesis of a nonfunctional protein 63 . is is because the absence of the alpha-actinin-3 protein in humans carrying both polymorphic alleles can be mimicked in knockout animals, providing a very interesting model to evaluate how the lack of the protein impacts muscle structure, muscle metabolism and performance 64 .However, most polymorphisms are SNPs, indel or copy number variants, and they may occur at noncoding genomic regions.Consequently, their impact on protein structure is less evident and the creation of a good animal model is often unfeasible.In these cases, human studies are indeed necessary.
e knowledge of the tissues and cells where the polymorphic gene is expressed should dictate the best approach to examine how the polymorphism a ects gene and protein expression.For example, considering a variant in a gene that expresses exclusively in skeletal muscle, then muscle biopsies should be collected from a group of individuals with all genotypes.ese samples could be submitted to a variety of analysis, including gene expression (e.g., qPCR), protein expression (e.g., western blotting), and morphological inspection (e.g., light or electron microscopy).Further analysis should also be carried out depending on the role of the gene.Exemplifying, if the gene encodes for an enzyme, then comparing enzyme activity between genotypes is probably a very reasonable approach.Obviously, in some cases human tissue collection is not an option, especially if the gene is expressed in heart, bone, kidney, liver brain or any other organ where a biopsy is too invasive and unjusti able. is would imply the necessity for alternative and more indirect approaches.ese scenarios illustrate quite well how di cult establishing a physiological link between a polymorphism and a phenotype can be.
In cases that an animal model can be generated so the genetic variation in humans is properly mimicked, acceptable sample sizes for these animal studies are substantially low, since all other sources of variation are controlled.In contrast, when human studies are performed and groups of individual with di erent genotypes are compared, sample size becomes critical.Because studies with humans, especially the cross-sectional ones, normally have poor control of major intervenient variables (e.g., diet, exercise background, genetic background, use of medications and development conditions), it is imperative that all genotypes have a large enough number of participants in order to minimize the chances of groups being di erent due to confounding factors rather than to genotype itself.In this case, the importance of large samples is not only about statistical power, but to conveniently decrease the likelihood of assuming that genotype groups are di erent when the cause of the di erence is not the genotype "per se". in psychological characteristics may also in uence athletic performance [60][61] .Some prime examples are the genes 5HTT, BDNF and UCP2 62 and future studies should start focusing on the psychological aspects implicated in sports performance when selecting new candidate genes.

Genetic interactions
A still unexplored, yet interesting and promising area of research in sports genetics is how genes interact with other genes to modulate exercise responses or to modulate a phenotype that is strongly related to performance 4 .It is well recognised that genes do not act in isolation.Rather, there are

Rare variants
It is well accepted that the in uence of common genetic variants on performance is slight, but the combination of several polymorphisms probably is more in uential 16 .On the other hand, it is highly plausible that rare genetic variants or mutations may eventually confer a very signi cant advantage to performance.Rare variants, therefore, are probably a very important aspect of the genetic variability that underlies sports excellence.However, due to the extreme di culty to identify and characterize these variants, this is still poorly understood 15 .
In most instances, mutations lead to loss of function, disease or disabilities, which adversely a ects sports-related phenotypes 24 .A good example is the mutation found in the PYGM gene that results in a deficit in carbohydrate metabolism (i.e., McArdle's disease) and, as a consequence, the patient presents intolerance to physical exercise.Many other examples of mutations that would cause incompatibility with athletic phenotypes could be provided.However, not all mutations are detrimental to physical performance.e notorious case of a mutation resulting in the absence of the myostatin protein illustrates how a rare variant can represent an extreme advantage to some performance-related phenotypes without causing any harm to health 65 .It must be noted, however, that it is currently unknown whether this speci c mutation in the myostatin gene is in fact favourable to physical performance.
Because each rare variant will be carried by no more than a few individuals, identifying these rare variants and correlate them to a speci c sports-related phenotype is not an easy task.Nonetheless, e orts should be made in order to identify new rare variants and broaden our knowledge about the genetics of the sports excellence.Maybe performing a genome-wide scan of individuals presenting abnormally high phenotypes such as muscle mass, strength, exibility and aerobic capacity and extraordinarily excellent performance in sports is a reasonable approach to search for new rare variants.

The use of genetic markers to detect sports talent
e molecular mechanisms in uencing athletic performance take place and are regulated by genes; these are gradually being revealed.Some authors argue that testing for the presence of key genetic variants in youth can be used as a way to select potential athletes 61 .
e early identi cation of potential elite athletes could, in theory, optimize training plans and competitive stimuli during growth and development, therefore increasing the chances of reaching the peak of physical performance 61 . is process of identifying talent by means of polymorphisms (genetic testing) could, in principle, be revolutionary to the eld of sport 25 .
e publication of Y et al. 66 describing the in uence of the polymorphism R577X in ACTN3 gene on athletic performance had a major impact on the scienti c community and in general population.Some authors 67 have stated that the association of ACTN3 with the physical performance is su cient for using this polymorphism as a tool to select potential athletes.In 2004, just one year after complex interactions among many genes modulating phenotypes 42 .Some polymorphisms might not have a meaningful impact on athletic performance alone, but the presence of other polymorphisms, through gene-gene interactions, may enlarge its impact upon phenotype, so that not a presence of a single polymorphism but, instead, the combination of some speci c polymorphisms may be potentially bene cial to some performance-related phenotypes 4 .
In addition to the interactions between genes, gene-environment interactions must be taken into consideration in future research, especially in light of the multifactorial nature of sports performance.After certain environmental stimuli, some polymorphisms may have greater or lesser in uence on the responses to those stimuli 26 .e replication of the results in di erent cohorts, under di erent environmental stimuli, is an indirect way to assess this interaction.
e researchers' primary e orts have been focused on the discovery of novel polymorphisms with the capability to in uence physical tness components.Subsequently establishing a panorama for one or more polymorphisms, possible gene-gene and geneenvironment interactions should be investigated.
the publication of Yang and colleague's paper, an Australian company and after, in 2008, an American company began to develop and sell, at a cost of US $170, claiming that the genetic test would determine to which sport the person would have a higher "vocation" 68 .Currently, there are at least 7 companies that sell genetic tests, now with a broader range of polymorphisms being o ered 61 .
Despite the strong appeal of these genetic tests, the information provided by them is useless to anyone who seeks the competitive career in any sport.As discussed throughout this paper, sports performance is multifactorial.Genetics comprises only one of many contributing factors.Moreover, the genes and variants that may have a positive e ect on performance are numerous, all of them under the regulation of an extremely complex network of other genes and variants, among other factors.
erefore, the presence or the absence of a few polymorphisms will never have such a predictive power.e literature presents many good examples of very successful athletes who, nevertheless, presented a genetic pro le that could be considered "unfavourable".Regarding the ACTN3, which is one of the best-characterized polymorphisms 69 , there are athletes achieving great results in strength activities that do not display the RR genotype 70 .
e Council of Europe Bioethics Convention and the Genetic Information Non-Discrimination Act in the US consider the use of the genetic testing to predict performance an unethical action and they are currently evaluating whether they should be banned 71 .Besides the absence of scienti c validity and the lack of useful information, three other important ethical issues must be highlighted in relation to information misuse: 1) the autonomy of the individual to not to know their genetic information, since some polymorphisms associated physical performance may also be associated with diseases; 2) invasion of privacy, as a personal information can be passed to other people and used in a discriminative way; 3) professional misconduct and diminished opportunities, as coaches and trainers may not be willing to invest their time and e orts in an individual without "genetic predisposition" to sports 61 .
With the intention to find gifted individuals at early stages, the major target of genetic testing would be children and youths.Some authors point that fanatic relatives could take their children to train very hard, if they believe that the children have more changes based on results of genetic testing 72 .Knowingly, excessive training loads at young ages could represent a health harm, and result in overtraining, burnout and sport abandonment 73 .It is important to note that, as highlighted by H 74 , the athlete who knows your genetic information may feel discouraged to keep practicing, especially if the information is negative for his/her expectation.
A potential use of genetic testing is related to the individualized prescription of exercise 26 .Several researchers argue that genetics should be used as an aid for improving exercise prescription and optimizing training responses in non-athletes [75][76][77][78][79][80] .Although some authors have already published training recommendations according to athletes' genotype 67 , it must be noted that there is no currently available information to support any genotype-directed training for athletes.In fact, there is no study addressing how genotypes in uence training responses in athletes.Although in principle the proposal of genetic tests is revolutionary, at the moment, we still have no evidence supporting the creation of any model with acceptable predictive value.At present, the use of genetic tests has no advantage over the traditional methods for talent identi cation already used by trainers 25 .

Conclusion
conscious about the implications of the misuse of the genetic information.While some people may claim that genetic information could be used to detect talent and to drive athletic development, it must be noted that there is no scienti c evidences for the predictive value of genetic in sports.e most appropriate statement at the moment is that genetics is only one out of many contributing factors to the athletic performance, and sometimes it may play only e future of genetic studies involving athletes is promising.In recent years, many polymorphisms have been associated with athletic phenotypes, but de nitive con rmation of association and the underlying physiological mechanisms are proven di cult tasks.e challenges to progress in this novel area are enormous, but a variety of experimental approaches can be used to unravel part of the mystery.Researchers and the general population should be a.In fact, some genes are present in more than one copy per haploid genome (considering only the 23 non-redundant chromosomes), which means that there are more than two copies of these genes per diploid genome (considering the entire 46 chromosomes).
secondary roles.It will be a long way until we know exactly what is the role of genetics for each sport and which are, at the molecular level, the variants accounting for this and how they work.

FIGURE 1 -
FIGURE 1 -Contribution of genetic factors to performance-relevant quantitative traits.

FIGURE 2 -
FIGURE 2 -Critical steps and experimental approaches necessary to validate a candidate gene as relevant to sports performance.

FIGURE 3 -
FIGURE 3 -Level of signifi cance found considering a fi xed sample of athletes and varying the amount of nonathletic subjects (controls).