Whole genome SNPs discovery in Nero Siciliano pig

Abstract Autochthonous pig breeds represent an important genetic reserve to be utilized mainly for the production of typical products. To explore its genetic variability, here we present for the first time whole genome sequencing data and SNPs discovered in a male domestic Nero Siciliano pig compared to the last pig reference genome Sus scrofa11.1.A total of 346.8 million paired reads were generated by sequencing. After quality control, 99.03% of the reads were mapped to the reference genome, and over 11 million variants were detected.Additionally, we evaluated sequence diversity in 21 fitness-related loci selected based on their biological function and/or their proximity to relevant QTLs. We focused on genes that have been related to environmental adaptation and reproductive traits in previous studies regarding local breeds. A total of 6,747 variants were identified resulting in a rate of 1 variant every ~276 bases. Among these variants 1,132 were novel to the dbSNP151 database. This study represents a first step in the genetic characterization of Nero Siciliano pig and also provides a platform for future comparative studies between this and other swine breeds.

Preservation of genetic variability, used or potentially usable for food production, of non-food raw materials or related to social, cultural, and economic aspects, represents a challenge of fundamental importance on a planetary level. Although the difficulty of conserving biodiversity in species of zootechnical interest is not a recent concern, in recent years the need to preserve genetic variability within breeds has increased and is important in production systems (Kuec et al., 2012). Animal genetic resources and their management systems are an integral part of ecosystems and productive landscapes in Italy, especially in Sicily. The role of livestock is more than ever, to provide sufficient food for humans that is protein-rich, safe and healthy, and with high nutritional and organoleptic values. Local pig breeds can be used for the production of raw materials particularly suitable for production of typical processed products. The higher economic value of typical productions compared to conventional commercial prod-ucts and the growing consumer preference towards quality food could give support to plans for livestock biodiversity conservation (Herrero-Medrano et al., 2013;Wilkinson et al., 2013). In this respect, governments, institutional breeding organizations, private breeders and market demand play a crucial role in this endeavor toprotect and valuation of local breeds (Tapio et al., 2006;Ollivier, 2009).
Although the interest in local pig breeds has increased significantly in recent years, only a few of them have been included in whole-genome sequencing projects (Esteve-Codina et al., 2011Herrero-Medrano et al., 2014). The knowledge of the genetic background of these local breeds is very important as many of them have unique characteristics that could help address the challenges related to climate change, increase in world population and for food security and nutrition, as highlighted in the Domestic Animal Diversity Information System (DAD-IS) of FAO.
The Nero Siciliano pig is an autochthonous genetic type of the rural areas in Sicily (Italy). It lives in the woods of the Nebrodi and Madonie mountains and is reared in extensive and semi-extensive systems, making good use of pasture and other natural plant resources following the traditional practices used in this area. It is resistant to disease and with a great potential for adaptation to difficult environments, as it has a great ability for rooting and for procuring food. Its "Register of Native Breeds" was established in 2001 and now contains about 14.000 animals, of which 5.000 are sows (ANAS -Italian Pig Breeders Association, 2018), from over 128 farms.The meat obtained from these pigs is sold at a higher price than that of commercial pigs, and in 2005 a request was made to allow labelling fresh Nero Siciliano meat with the Protected Denomination of Origin (D'Alessandro et al., 2007). Black pigs are rustic, disease resistant animals, and live well in harsh conditions, but run a high risk of losing their original traits because of the lack of a real plan for genetic selection and setting up appropriate breeding systems and controls. The genetic variability of the Nero Siciliano pig has been assessed with the use of various genetic markers in several studies on molecular characterization of genetic structure and analysis of coat colour genes (MC1R and KIT gene) to evaluate their usefulness for breed traceability (Russo et al., 2004;Fontanesi et al., 2010;Guastella et al., 2010).
All the procedures used in this research were in compliance with the European guidelines for the care and use of animals in research (Directive 2010/63/EU). A blood sample from a male of Nero Siciliano pig was used for DNA extraction. The individual was chosen for this study as one of the most representative boars of this breed, registered in the "Register of native breed" (ANAS; ID:163347). The leukocytes fraction recovered from the fresh whole blood sample was used for total genomic DNA (gDNA) extraction using the Wizard® Genomic DNA Purification Kit (Promega Corporation, Italy), following the manufacturer's instructions. For DNA quantification a Qubit 2.0 Fluorometer was used with the Qubit dsDNA HS Assay Kit (Thermo Fisher, Italy). DNA quality was assessed by a Nanophotometer P-330 (Implen GmbH) and also by visual inspection after agarose gel electrophoresis (1% agarose in TAE 1X buffer). A PCR-Free library was prepared with TruSeq DNA kit (insert size 350 bp) using 1 mg of gDNA and following the protocol provided by Illumina. Paired-sequencing was carried out with a HiSeqX platform (Illumina).
The sequenced raw reads were checked using the FastQC program and cleaned with Trimmomatic v. 0.36 (Bolger et al., 2014) to remove adapters and low-quality sequences (Phred score < 30). Good quality reads were mapped against the Sus scrofa reference genome (version 11.1; GenBank: GCA_000003025.6) with BWA (version 0.7.12-r1039) ) and mapping quality was evaluated using Qualimap2 (Okonechnikov et al., 2016). Single nucleotide polymorphisms (SNPs), short insertions and deletions (INDEL), and structural variants (SVs) analyses were performed using SUPERW and PINDEL pipelines (Ye et al., 2009;Sanseverino et al., 2015). The resulting variants were further filtered using the following parameters: QUAL (phred-scaled quality score of called variant) ³ 30, DP (number of high-quality bases for called variant) ³ 10, AD (allele depth) ³ 10, removal of all called variants that showed the same genotype of the reference. Putative effects of SNPs were evaluated using SnpEff software v4_3m_core (Cingolani et al., 2012). We further focused on 21 fitness-related gene sequences (Table  1) obtained using samtools ) and bcftools (Li, 2011).
Subsequently, the resulting high impact effects mutations were aligned and manually inspected with MEGA7 using the reference genomic and relative transcripts sequences retrieved from GenBank, in order to evaluate the putative functional role of the variants on the respective protein sequences. Variants called by SUPERW and PINDEL were compared with bedtools intersect and duplicates were removed from the PINDEL output. In order to detect novel SNPs, snpSift (Cingolani et al., 2012) was utilized against dbSNP151 database (ftp.ncbi.nih.gov/snp/organisms/pig_9823/) and all resulting novel SNPs were manually examined and confirmed.
To explore the genetic resources of this breed, here we present for the first time the whole genome sequencing analysis of a male domestic Nero Siciliano pig, as well as a comparison with the most recent pig reference genome (Sscrofa 11.1) released by the International Swine Genome Sequencing Consortium and improved in annotation and assembly by Warr et al. (2015). In particular, we focused our attention on 21 genes that were selected according to their function and/or their association with specific traits (Table 1). These genes have been chosen because they affect phenotypes related to rusticity, adaptability to poor conditions of management and feeding, and great resistance to diseases, all these representing some of the most distinctive features of autochthonous breeds, especially the Nero Siciliano pig.
In this study, a total of 346.8 million raw paired-reads were produced by Illumina HiSeq X sequencing. After quality filtering and trimming,~344.3 million (99.29%) high-quality reads were mapped to the S. scrofa reference genome, with a mean coverage of 39.5 X. A total of 11,253,945 genetic variants were detected by SUPERW in this study. Of these,~82% were SNPs whereas~12% and 5% were short insertions and deletions respectively. Moreover, more than 58% of the detected SNPs (6,555,556 variants) were heterozygous, while the remaining 42% were found in alternative homozygosity state. The overall observed frequency was 1 variant every 222 bases, with a SNP mutation rate of 1/269 bp. However, we cannot confirm that all DNA mutations detected in this study segregate in the Nero Siciliano breed, as only one sample was considered.
SnpEff analysis showed that most of the variants recognized were located in non-coding regions of the genome, such as introns and intergenic regions (Figure 1a). Approximately 36% of the missense, 0.4% nonsense, and 63,6% silent mutations were observed, resulting in a missense/silent and Ts/Tv (transition/transversion) ratio of 0.5617 and 2.3956 respectively. However, the Ts/Tv ratio was similar to other pig genomes (Kang et al., 2015), while the observed SNP mutation rate was slightly higher than that reported by Jungerius et al. (2005).
Among the structural variants identified by PINDEL, we observed a total of 808,486 insertions, 452,926 deletions, 196,971 replacements, 2,383 tandem duplications, and 1,029 inversions. Of these, 586,686 were heterozygous, whereas 875,109 were in alternative homozygosity.
Using the panel of fitness-related genes selected in this study, we identified a total of 6,747 SNPs and short INDELs (Figure 1b), that were classified according to Cingolani et al. (2012) in 7 "high", 35 "moderate", 54 "low impact" and 6,651 as modifiers (Table 2). This resulted in a mutation rate of 1 per~276 bases; for further details see the supplementary material Tables S1, S2, and S3. Among the total variants identified, 1,132 were novel, consisting of 476 heterozygous and 656 in alternative homozygosity form.
The seven high impact mutations, all in the alternative homozygous state, affected five out of the 21 examined genes: VPS13A (Vacuolar protein sorting 13 homolog A); AZGP1 (Alpha-2-glycoprotein 1, zinc-binding); LCORL (Ligand-dependent nuclear receptor corepressor-like protein), FUT1 (Fucosyltransferases 1); PRLR (Prolactin Receptor). Such variants consisted in one SNP and six nucleotide insertions. Four of these latter were gain of function mutations and restored the reading frames of the VPS13A, AZGP1, FUT1 and PRLR genes, as evidenced by comparative analysis with the reference genome and its transcripts. The remaining two insertions produced a premature stop codon and a lack stop codon in the AZGP1 and LCORL genes respectively, whereas the unique SNP de-596 Whole genome SNPs in Nero Siciliano pig  Dong et al., 2009;Fischer et al., 2005, Ribani et  tected was a missense mutation (ACGàGCG; Thr 103 àAla 103 ) affecting the FUT1 gene. Five of these seven high impact mutations were novel to the dbSNP database (see Table S1). Among the structural variations affecting the subset of the fitness-related genes we observed 101 replacements (RPL), 132 insertions, and 112 deletions. Of these, 203 were heterozygous and 142 were in the alternative homozygosity state. Figure 2 shows the gene-wide distribution of all detected mutations including the related sequencing coverage for all genes investigated.
SNPs discovery analysis of the 21 fitness-related genes showed a coherent rate of mutation compared to the whole genome data. We focused on high impact mutations that may affect the gene product. The VPS13A gene plays a role in maintenance of thermostatic conditions during thermal stress and is involved in blood circulation (Groenen, 2016). We found a novel nucleotide insertion (G, genome position: 230125827) that causes a frameshift mutation restoring the VPS13A reading frame.
AZGP1 is a putative candidate gene for adaptation to environment. A mutation in this gene, that overlaps QTLs for the number of vertebra (Beeckmann et al., 2003;Harmegnies et al., 2006), abdominal fat and ear shape and size (Ma et al., 2009), was identified in Mangalica, Cinta Senese, and one European wild boar, but not in commercial pigs (Herrero-Medrano et al., 2014). Furthermore, AZGP1 is correlated with lipid mobilization and it is considered a candidate gene for body weight regulation and obesity in humans (Mracek et al., 2010). We found two novel frameshift mutations in this gene, both nucleotide insertions (genome positions: 7874326 and 7874521 of the chromosome 3), which could affect its function, with possible effects on fat deposition. The LCORL gene overlaps a QTL involved in morphological modifications occurring during domestication events regarding elongation of the back and an increased number of vertebrae (Rubin et al., 2012). This gene is considered a candidate gene for body size. A known LCORL frameshift mutation (rs791023757; genome position: 12829718) was detected in this study and resulted in a lacking stop codon. Unfortunately, no phenotypes have been associated so far with this mutation, as evidenced by lack of information in the dbSNP database.
We found two variants with high impact also in FUT1 gene. This gene encodes a membrane protein involved in the synthesis of a precursor of blood group antigen. Previous studies showed that polymorphisms in this gene are associated with adhesion and colonization capacity of F18 Whole genome SNPs in Nero Siciliano pig fimbriated Escherichia coli to intestinal mucosa (Bao et al., 2012;Zhang et al., 2015). The toxins produced by this microrganism cause piglet post-weaning diarrhea (Luo et al., 2010;Zhang et al., 2015). We identified a missense mutation in position 54079560 of chromosome 6 (FUT1 gene) that results in an amino acid change at position 103 (ThràAla) of the protein. This SNP, already recorded in the dbSNP database (rs335979375), was associated with E. coli F18-resistant or susceptible genotypes (Meijerink et al., 1997(Meijerink et al., , 2000. Tthe second identified variant was a G insertion (genome position: 54079637), but further studies will be needed to validate these findings and the role of these mutations in the Nero Siciliano breed.
The PRLR gene encodes a receptor for prolactin and is considered a strong candidate gene for various traits affecting directly (ovulation rate) or indirectly (ovarian weight, uterine length and number of teats) litter size and general reproductive performance in pigs (Vincent et al., 1998;van Rens et al., 2003;Tomás et al., 2006). In the PRLR gene we detected a G insertion in position 20642378 (chromosome 16), but its contribution to the phenotypic variation remains to be elucidated.
The Nero Siciliano pig is not a well-characterised breed, and this study represents a first step in the genetic characterization of this animal, even if further research on the whole population reared in Sicily is needed to confirm D'Alessandro et al. 599 the observed genetic variation and to integrate our data. In fact, all genetic changes detected in this study are only differences compared to the reference genome used and are therefore not indicative of the presence of mutated loci in the breed. Since publication of the Sus scrofa reference genome (Warr et al., 2015), several re-sequencing projects have been undertaken, but few have focused on local breeds. In this study we report, for the first time, the sequencing and variant calling analysis of a single boar of Nero Siciliano pig, with the aim of starting to acquire useful information on its genetic background that could be crucial to understand new genetic selection concepts for creating new sustainable pork chains based on local pig breeds. Therefore, the importance of preserving local breeds as a source of genomic diversity for further improvements of commercial pigs represents an added value in typical local productions. However, currently, in Italy the information regarding local pigs is strongly limited and therefore further sequencing studies will be essential for detecting the extent of genetic diversity occurring in Nero Siciliano pig.
The data sets supporting the results of this article are included within the article and its additional files. The raw reads used for the genome-wide analysis have been deposited in the NCBI Sequence Read Archive (SRA) under the following accession number: SRX3406507.

Supplementary material
The following online material is available for this article: