Allele Speciﬁ c Expression (ASE) analysis between Bos Taurus and Bos Indicus cows using RNA-Seq data at SNP level and gene level

: In the current study, allele specifi c expression analysis was performed in two subspecies cows (Bos taurus and Bos indicus) at SNP and gene levels. RNA-Seq data of 21,078,477 and 20940063 paired end reads from pooling of whole blood samples (Leukocyte) from 40 US Holstein (Bos Taurus) and 45 Cholistani cows (Bos indicus) obtained from SRA database in NCBI. Quality control and trimming of row RNA-Seq data were processed by FASTQC and Trimmomatic softwares. The transcriptome was assembled by TopHat2 software in two cow’s population by aligning and mapping the RNA-Seq reads on bovine reference genome. The SNPs were discovered by Samtools software and ASE analysis was performed by Chi-square test. Results showed that 50183 and 137954 SNPs were discovered on the assembled transcriptome of Holstein and Cholistani cow samples, respectively, and 15308 SNPs were common in both breeds. 10158 SNPs from 50183 (20%) in Holstein and 31523 SNPs from 137954 (23%) in Cholistani cows were identifi ed as ASE-SNPs. Reference allele and alternative allele count in Holstein and Cholistani cows were 3041 and 7155, respectively. Among 131 discovered SNPs in 41 genes with different expression in Holstein and Cholistani cows, 31 ASE-SNPs (5 in Holstein; 26 in Cholistani cows) were discovered.


INTRODUCTION
Allele Specifi c Expression (ASE) is the phenomena that two alleles of the same loci are expressed differently (Gu & Wang 2015), and its a powerful method that measures the expression of each allele through SNP in RNA samples. ASE is an important aspect of gene regulation and one of the important genetic factors that lead to phenotypic variation can be used to identify the variance of gene regulation factors (Gaur et al. 2013, Mayba et al. 2014). Although the majority of genes are expressed equally from both alleles, some genes are differentially expressed. Besides the gene expression differences between species, the inter individual differences in gene expression are often highly heritable and can be highly context-specific (Wayne et al. 2004, Gibson & Weir 2005, Hughes et al. 2006, Lemos et al. 2008, Ayroles et al. 2009, McDaniell et al. 2010. ASE may accumulate with genetic divergence and possibly with adaptation to different environments and are responsive to dynamic developmental processes (Von Korff et al. 2009). ASE assays can be used to identify cis, trans and cis-by-trans regulatory variation (Main et al. 2009).
RNA sequencing (RNA-Seq) is a powerful new method for mapping and quantifying transcriptomes developed to analyze global gene expression. In other words, RNA-Seq is a next generating sequencing based technology for studying of whole transcriptome and gene expression. This technique provides insights at multiple levels into the transcription of the genome as it yields sequence, splicing and expression-level information, so provides a far more precise measurement of levels of transcripts and their isoforms than other methods (Wang et al. 2009). It simultaneously enables study of transcriptomics sequences and very accurate quantitative gene expression (digital expression). Hence, these data are very suitable for high-throughput study of expression level of all transcribed genes and their SNPs. Recently, RNA-Seq has also been used as an efficient and cost-effective method to systematically identify SNPs in transcribed regions in different species (Cloonan et al. 2008, Morin et al. 2008, Chepelev et al. 2009, Cirulli et al. 2010. Transcription is the first step in translation of genome to function underlying genetic codes. Therefore, transcriptase might fill the gap between genotype and phenotype and help understanding the mechanisms from sequence to function (Wang et al. 2009).
Previous studies discovered SNPs in bovine milk transcriptome using RNA-Seq (Canovas et al. 2010, Wickramasinghe et al. 2012, Banabazi et al. 2016, Pareek et al. 2016). It has been detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of analyzed genes. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome (Canovas et al. 2010). Chitwood et al. (2013) were analyzed transcriptomics data to identify SNP in individual blastocyst expressed genes, and individual SNP were examined to characterize allele specific expression. Expressed biallelic SNP variants with allelic imbalances were observed in 473 SNP, where one allele represented between 65-95% of a variant's transcripts.
In recent years, single nucleotide polymorphisms (SNP) have been the most important and efficient tool in animal breeding. About 40% of the SNPs in the genes cause a change in an amino acid. SNPs are either transition or transversion. Transitions are interchanges of two-ring purines (A↔G) or onering pyrimidines (T↔C), while transversions are interchanges of purine to pyrimidine and viceversa (G↔C، G↔T، A↔C ،A↔T). Arefnezhad et al. (2015) reported that transition and transversion nucleotide replacement were 1155417 and 512986 in Caspian horse, respectively, and replacement ratio of transition to transversion (Ts/Tv) for SNPs was 2.25.
The importance of understanding transcriptomic variation is obvious as the role of gene expression in shaping phenotypes is well documented. In particular, the transcriptomic variation among cattle breeds may provide mechanistic knowledge on their differentiation on phenotypes including appearance, physiological, behavioral, and production traits. There is accumulating evidence that variation in gene expression, presumably controlled by genomic variations within regulatory elements, contributes to phenotypic variation (Passador-Gurgel et al. 2007). There are substantial phenotypic difference between Holstein and Cholistani cattle. In particular, they differ remarkably in their resistance to thermal stress, parasites, and diseases (Huang et al. 2012).
In the current study, SNP discovery and Allele Specific Expression analysis were performed in two subspecies cows (Bos taurus and Bos indicus) at SNP level and gene level. We used mRNA-Seq to characterize and compare the Leukocyte transcriptomes of US Holstein and Cholistani cows. These variations may provide partial explanations for differential phenotypes between cattle breeds, particularly between Bos taurus and Bos indicus cattle.

Quality control and preparation of RNA-Seq data
After data editing, the removed and low quality reads in both breeds were almost equal and relatively low. For example, amongst the 20940063 initial reads in Cholistani cows, 19379487 reads had high quality and 1560576 reads had low quality, therefore, 5.7% reads were removed from the analysis.
The length of whole transcriptome assembled, for example 52798651 bases in Holstein, indicates around 2% of the whole genome (around 2.6 Mbp) expressed as mRNA. In Cholistani cows, read mapping rate for forward and reverse reads were 81.3 and 79.9%, respectively, and multiple alignments rate was about 9.4%. Overall read mapping was 80.6% and concordant pair alignment was 70.1%. In Holstein cows, read mapping rate for forward and reverse reads were 66.3 and 55.4%, respectively, and multiple alignments rate was about 7.2%. Overall read mapping was 60.8% and concordant pair alignment was 51.3%.

SNP and ASE-SNP discovery
After quality control and filtering, 50183 and 137954 SNPs were discovered on the assembled transcriptome of US Holstein and Cholistani cow samples, respectively, and 15308 SNPs were common in both breeds. The number of discovered SNPs in Cholistani cows (Bos Indicus) was approximately three times higher than Holstein (Bos Taraus) cows (Table I).
Based on the results of Chi-square (χ2) test on 3041 and 7155 loci in the Holstein and Cholistani cows, respectively, it was found that number of reference and alternate alleles were equal.
Totally, in Holstein cows 10158 from 50183 SNPs (20%) were identified as ASE-SNPs. From 10158 loci, number of imbalance alternate and reference alleles were 5006 (49%) and 5152 (51%), respectively. There is generally some bias toward reference allele. This indicates the reference genome has been applied well for mapping RNA reads on both subspecies.

SNP and ASE-SNP types on SNP level and gene level
In the present study, 12 SNP types were identified (4 transition and 8 transversion) and the most commonly SNPs were transition SNPs, including 69.6% in Holstein cows and 70.6% in Cholistani cows (Table 2). Replacement ratio of transition to transversion (Ts/Tv) for SNPs was 2.3 and 2.4 in Holstein and Cholistani cows, respectively. The results obtained by Arefnezhad et al. (2015) confirmed this concept.
In ASE-SNPs, the percentage of transition increased from 69.6% to 71% and 70.6% to 73% in Holstein and Cholistani cows, respectively. Replacement ratio of transition to transversion (Ts/Tv) for ASE-SNPs increase from 2.3 to 2.4 and 2.4 to 2.7 in Holstein and Cholistani cows, respectively (Table II).
In transcriptome of US Holstein and Pakistanian Cholistani cows' population, 24616

DISCUSSION
Based on the results of current study the number of discovered SNPs in Cholistani cows (Bos Indicus) was approximately three times higher than Holstein cows (Bos Taraus). Because, for the alignment of both species; which Holstein is a bos taurus and Cholistani (zebo) is a Bos indicus; used a same reference genome with Herford origin, which is also a Bos taurus cow. In addition, stringent settings of tophat2 program were not used in alignment, as with large number of mismatch between the nucleotides on the transcriptome of Cholistani cows and reference genome, alignment may still be successful. Therefore, in SNP discovery analysis, all these mismatches were considered as SNP. Also, above mentioned settings increase relative alignment and mapping rate. Some additional discovered SNPs on the tanscriptome of Cholistani cow are due to 20% higher alignment and mapping rate in Cholistani compared to Holstein cows (70.1% versus 51.3%). The number of discovered SNPs did not correlate with chromosome length (Table  I). So, transcription across the genome does not occur with a homogeneous distribution with the same coverage. In other words, some regions contain more candidate genes or important genes that transcription is more intense and deeper in those regions. So, these regions have a larger share of the assembled transcriptomes. Also, the SNPs in these regions have high frequency and remain after filtration.
By SNP screening process, Allelic specific expression (ASE) was identified in both American Holstein and Pakistani Cholistani cows. Gene's expression levels in Cholistani and Holstein cows have been shown in Table III. Results showed that there are significant different between these two subspecies (P.value < 0.01). Gene ontology (GO) enrichment and pathway analysis revealed that these genes are involved in 20 pathways. A large number of genes are involved on immune response pathways, the electron transport chain and the pathway of translate. These pathways maybe effect on different levels of heat stress and disease resistance. Results showed that most of the genes in metabolic pathways had high expression in Zebo while these genes had low or no expression in Holstein cows, likewise many of these genes are involved on immune pathways in Cholistani cows. Some factors effect on gene expression difference in mentioned two sub-species including: mutation in genes (as Single Nucleotide Polymorphism), epigenetic effects including allele specific expression in this article, environmental effects and gene expression regulatory effects (gene interactions as gene-network). Banabazi et al. (2016) were found 53478 and 145443 SNPs across the genome on the transcriptome of Holstein and Cholistani cows respectively; that 178 SNPs (24 SNPs in Holstein cows and 154 SNPs in Cholistani cows) were found in 41 detected gene with different expression in current research.
Based on the results there was no SNP in some genes. Generally, a portion of difference in gene expression is due to SNPs and also it could be caused due to regulation of gene expression under different condition or due to epigenetic effects, such as allelic specific expression. The expression difference between two alleles in a single-nucleotide position causes phenotype diversity and probably explains the large part of variances between these two bovine subspecies, especially in diversity, susceptibility to disease and parasites, tolerating environmental stress such as biological and nonbiological stresses in different environmental conditions. While, differential gene expression analysis or even allelic specific expression in gene level may not be able to explain phenotype diversity.