Current understanding of an Emerging Coronavirus using in silico approach: Severe Acute Respiratory Syndrome-Coronavirus-2 (SARS-CoV-2)

Abstract Novel coronavirus (nCoV) namely “SARS-CoV-2” is being found responsible for current PANDEMIC commenced from Wuhan (China) since December 2019 and has been described with epidemiological linkage to China in about 221 countries and territories until now. In this study we have characterized the genetic lineage of SARS-CoV-2 and report the recombination within the genus and subgenus of coronaviruses. Phylogenetic relationship of thirty nine coronaviruses belonging to its four genera and five subgenera was analyzed by using the Neighbor-joining method using MEGA 6.0. Phylogenetic trees of full length genome, various proteins (spike, envelope, membrane and nucleocapsid) nucleotide sequences were constructed separately. Putative recombination was probed via RDP4. Our analysis describes that the “SARS-CoV-2” although shows great similarity to Bat-SARS-CoVs sequences through whole genome (giving sequence similarity 89%), exhibits conflicting grouping with the Bat-SARS-like coronavirus sequences (MG772933 and MG772934). Furthermore, seven recombination events were observed in SARS-CoV-2 (NC_045512) by RDP4. But not a single recombination event fulfills the high level of certainty. Recombination mostly housed in spike protein genes than rest of the genome indicating breakpoint cluster arises beyond the 95% and 99% breakpoint density intervals. Genetic similarity levels observed among “SARS-CoV-2” and Bat-SARS-CoVs advocated that the latter did not exhibit the specific variant that cause outbreak in humans, proposing a suggestion that “SARS-CoV-2” has originated possibly from bats. These genomic features and their probable association with virus characteristics along with virulence in humans require further consideration.

was not imported from China, noted in the United States. On 30 th January 2020, in a meeting, per the International Health Regulations (IHR), WHO announced this outbreak a "Public Health Emergency of International Concern (PHEIC)" due to its extensive spreading and transmission from human-to-human. On March 11, COVID-19 cases increased many folds involving numerous countries and many deaths, WHO proclaimed the COVID-19 as a "PANDEMIC" (Takian et al., 2020;Cascella et al., 2020).
The CoVs are from a huge family of single-stranded RNA viruses (+ssRNA) being segregated from various species of animals (Perlman and Netland 2009). CoVs appeared crownlike derived from coronam a Latin term means crown, as its envelope contain spike of glycoproteins and are positivestranded RNA viruses. The subfamily Orthocoronavirinae of the Coronaviridae; order Nidovirales, categorizes into four genera: 1 (Alphacoronavirus: alphaCoV), 2 (Betacoronavirus: betaCoV), 3 (Deltacoronavirus: deltaCoV) and 4 (Gammacoronavirus: gammaCoV). Probable gene sources of first two genera are from bats and rodents and last two from avian species. Moreover, genus betaCoV splits into five further sub-genera/lineages (Chan et al., 2013).
Its genome has ~30,000 nucleotides and has elliptic/ round and frequently pleomorphic form with diameter of about 60-140 nm with 5' Cap and 3' poly (A) tail. At least six ORFs constitute the genome and subgenomes of a representative CoV. The first ORFs (ORF1a/b), which encode 16 non-structural proteins (nsp1-16), except Gammacoronavirus that lacks nsp1 represents 2/3 of the whole genome length. There is a (−1) frameshift among ORF1a which lead to the production of pp1a while, ORF1b responsible for the production of pp1ab polypeptide. Latter on these polypeptides are being processed via virally encoded one or two papain-like protease, chymotrypsinlike protease (3CLpro) and or main protease (Mpro) into 16 nsps (Masters 2006;Ziebuhr et al. 2000) as shown in Figure 1 and 2. Remaining ORFs located on the 1/3 of

Introduction
World Health Organization (WHO) defines, continuous emergences of viral ailments possess a serious problem to the public health. During preceding two decades, many particular viral epidemics like severe acute respiratory syndrome coronavirus (SARS-CoV) was noted (2002)(2003) similarly, H1N1 influenza (2009). However, Middle East Respiratory Syndrome Coronavirus (MERS-CoV) was latest and firstly recognized in Saudi Arabia (2012). These epidemics like SARS-CoV triggered an epidemic in China affecting many other countries with about 8,000 cases similarly MERS-CoV has about 2,500 cases while about 8,00 deaths in both (Cascella et al., 2020).
During whole period which extends till today, an epidemic characterized by inexplicable low respiratory infections was identified in a largest cosmopolitan area in Wuhan (Hubei province), was first time reported on December 31, 2019, to the Country Office of WHO in China. However, some published literature showed the commencement of symptoms exhibiting cases back to the beginning of December, 2019. The causative agent was unidentified so; initial cases were called as "pneumonia of unknown etiology." The Chinese Center for Disease Control and Prevention (CDC) and few local CDCs established a complete program for outbreak investigation. The causative agent of the etiology of this infection is a "novel" virus member of coronavirus (CoV) family. Firstly, it was called 2019-nCoV, latter on International Committee on Taxonomy of Viruses (ICTV) termed it as "SARS-CoV-2" virus due to its great similarity with that virus caused SARS outbreak (SARS-CoVs) (Gorbalenya et al., 2020).
Dr. Tedros Adhanom Ghebreyesus; Director-General of WHO, declared on 11 th February, 2020 that current illness was called as "COVID-19," an acronym of "coronavirus disease 2019" caused by novel CoV, a very contagious and has spread quickly around the world. Another turning point appeared on 26 th February, 2020, as another first case which the genome nearby the 3′terminus encodes for no less than four main structural proteins {spike (S), envelope (E), membrane (M) and nucleocapsid (N) proteins} (as illustrated in Figure 2). All accessory and structural proteins are translated via sgRNAs of CoVs (Hussain et al., 2005).
The indicated viruses are potent to overcome species barriers along with causing many illnesses in human varying from the "common cold" to very cruel diseases like MERS and SARS, for unknown reasons (Singhal, 2020; Assiri et al., 2013). Most likely, these viruses have originated from bats to alternate mammalian hosts, as SARS-CoV by Himalayan palm civet, and MERS-CoV via dromedary camel, before hopping to humans (Banerjee et al. 2019). However, dynamics of "SARS-CoV-2" are at presented not well known, there are assumptions about its animal origin. Though, its origins are not clear, but genomic analyses advocate that SARS-CoV-2 has possibly evolved from a strain present in bats. So, the mutation in the original strain may have directly activated virulence towards humans. High ability of this virus to become a global pandemic possesses a serious public health risk, as this virus spreads faster than its two ancestors i.e., SARS-CoV and MERS-CoV (Singhal, 2020).
Treatments vary for suspected (needed to be isolated at first glance) and confirmed (required to be shifted to the hospitals) patients. The basic necessity is to sustain hydration, nutrition and controlling fever and cough (Jin et al., 2020;Singhal, 2020;Chen et al., 2020). Considering the need, oxygen therapy, as nasal catheter, High Flow Nasal Oxygen Therapy (HFNO), Non-Invasive Ventilation (NIV), mask oxygen or Extracorporeal Membrane Oxygenation should be administered (ECMO; Jin et al., 2020). Previously, no particular vaccine or antiviral cure for COVID-19 is recommended (Lu et al., 2020). However, use of antibiotics and antivirals must be avoided in confirmed cases (Jin et al., 2020;Singhal, 2020. Antibiotics and antifungals are required if co-infections occur (Russell et al., 2020;Zhao et al., 2020). Various antiviral drugs as oseltamivir, ganciclovir and lopinavirritonavir , Lopinavir /Ritonavir, Neuraminidase inhibitors, Nucleoside analogues, abidol, peptide (EK1), RNA synthesis inhibitors (TDF, 3TC), antiinflammatory agents like hormones and traditional Chinese (Lianhuaqingwen Capsule and ShuFengJieDu) medicines, might be used (Lu et al., 2020;Jin et al., 2020). Remdesivir as well as chloroquine , arbidol an antiviral drug present in and China and Russia, interferons, intravenous immunoglobulin, and plasma of COVID-19 recovered patients, can be proposed for therapy after more evidences of efficacy and safety (Jin et al., 2020;Zhang and Liu, 2020). Currently, different vaccines viz., moderna, Pfizer-BioNTEC, Johnson & Johnson/Janssen (sinovac) Sinopharm/Beijing, Sputnik V and Oxford/AstraZeneca have been administered in many countries of the world (Our World in Data, 2021).

Retrieving and acquisition of sequence data
Sequence analysis of "SARS-CoV-2" whole-genome (Genbank accession no. NC_045512) belongs to betacoronavirus and its full genome sequence analysis shown that it is deviating from MERS-CoV and SARS-CoV  as shown in Figure 3. The "SARS-CoV-2" as well as Bat-SARS-CoVs exhibits a distinctive lineage inside the subgenus of the sarbecovirus . We were aiming to characterize the genomic relationships of full length genome and four open reading frames {spike protein (S), envelope protein (E), membrane protein (M) and nucleocapsid protein (N)} of "SARS-CoV-2" and exploration of putative recombination. Thirty nine full viral sequences that belong to four genera of CoVs viz. alpha-CoVs, beta-CoVs, gamma-CoVs, delta-CoVs and five subgenus of beta CoVs were downloaded from NCBI nucleotide sequence database (NCBI, 2021). Various accession number, organism, host, country and collection year of all thirty nine CoVs sequences (mentioned in Supplementary Material Table 1). These sequences were selected from the research articles Paraskevis et al., 2020;Wang et al., 2018). Each genome was segregated into S, E, M and N for phylogenetic, recombination and mutational analyses.

Phylogenetic tree construction using MEGA 6.0.
For this, sequences were first aligned via CLUSTAL W method of MEGA6 (Tamura et al., 2013) for full length as well as S, E, M and N sequences phylogenetic tree construction, separately. Neighbor-Joining method proposed by Saitou and Nei (1987) was followed for evolutionary history. The optimal tree for full length, S, E, M and N with sum of branch lengths = 7. 97034047, 9.34476698, 8.30616966, 8.67787963, 8.54006468 respectively (as shown in Supplementary Material Figure 1 {(a), (b), (c), (d)}. The % age of replicate trees in which the related taxa grouped altogether in the bootstrap test (1000 replicates), shown following branches (Felsenstein, 1985). The tree was drawn to scale, along branch lengths in similar units as those for the evolutionary distances as to deduce the phylogenetic tree. Jukes and Cantor (1969)

Recombination detection by RDP-4
Recombination Detection Program (RDP4) proposed by Martin et al. (2015) was used for putative recombination by distinguishing probable parents and recombination regions in coronaviruses. Via MEGA6 programming, herein sequences were adjusted then swapped to the RDP-4 to determine recombination. For the evaluation of recombination events default estimation of X-over based on robotized RDP (R), BOOTSCAN (B), GENECONV (G), MaxChi (M), SiScan (S), Chimaera (C), 3SEQ (T), LARD (L) and PhylPro (P). However, a cut off estimation (0.05) was exploited as p-value. Additionally, these recombination events were grasped via phylogenetic investigation of distinct genes alongside nucleotide arrangement by MegAlign (Lasergene, DNA-STAR). Intended for validity, recombinants were further avowed by in excess of one strategy. Additionally, sequences of amino acids of S, E, M and N (GenBank accession no. NC_045512) were retrieved from NCBI to utilized for three-dimensional-modeling of proteins via Phyre2 (Kelley et al., 2015) are presented in Figure 2.

Results and Discussion
Phylogenetic analysis exhibited that "SARS-CoV-2" belongs to genus "βCoV" and subgenus "sarbecovirus" along with other SARS and SARS-CoV viruses isolated from human, bat and other animals. Phylogenetic trees of full length, S, E, M and N nucleotide sequences have been constructed separately and thirty nine sequences have been arranged according to their four genera (αCoV, βCoV, δCoV and γCoV) and five subgenus level viz., sarbecovirus, merbecovirus, hibecovirus, embevovirus and nobecovirus. It is depicted from Figure 3 that new coronavirus "SARS-CoV-2" (NC_045512) has close lineage to Bat-SARS-CoVs (MG772933 and MG772934; isolated from the bats from China) than to other human infecting CoVs, including SARS-CoVs and it is not a mosaic . Though not only the full genome of Bat-SARS-CoVs remains the closest to SARS-CoV-2 but their S, E, M and N nucleotide sequences also match with each other.
Although it is evident from the BLASTn results that full length, S, E, M and N nucleotide sequences of novel coronavirus and Bat-SARS-CoVs share 89%, 83.68%, 98.68%, 93.92% and 91.28% per cent identity respectively. The spike protein of SARS-CoV-2 has the biggest variation among others, thus revealed the host selectivity. Since, the proteins M and E are involved in virus assembly, whereas the spike protein (S) mediates virus entry into host cells and is a critical determinant of viral host range (Li, 2016). There is a prospective that "SARS-CoV-2" has originated probably from bats CoVs but the levels of their genetic similarity proposes that "SARS-CoV-2" is not an exact variant of old Bat-SARS-CoVs to cause the recent pandemic. Moreover, they showed less than 90% sequence similarity so that quite a long branch in phylogenetic tree is present between them indicating that Bat-SARS-CoVs are not the direct ancestors of SARS-CoV-2. Importantly, we cannot overlook that bats are the natural reservoirs for SARS-CoV and MERS and humans are terminal host whilst masked palm civet and Arabian camels are intermediate hosts of SARS-CoV and MERS respectively. But in case of nCoVs it is assumed that it is hosted by bats and transmitted to human via unknown animal(s) presently (Lu et al., 2020).
Analyses via codon usage can sort out the origin of proteins with deep ancestry and inadequate phylogenetic signals designed de novo. Spike protein of novel coronavirus has close relationship with bat coronaviruses sequences thus showing uniform ancestry (Supplementary Material Figure 1a). Spike (S) protein of novel coronavirus could be result from the recombination of unsampled coronavirus yet unknown (Ji et al., 2020). Moreover, phylogenetic discordance in deep relationships of coronaviruses is common and can be explained either by early recombination event or changed evolutionary rates in different lineages, or a combination of both (Magiorkinis et al., 2004).
Recombination analysis of ten different coronavirus sequences were characterized using RDP-4. Seven recombination events were observed in SARS-CoV-2 (NC_045512). At least five methods out of nine and recombination score more than 0.50 is required for authenticity. But in this analysis depicted in Table 1, no recombination fulfils the requirement. Moreover, recombination breakpoint distribution maps created between size and distribution of recombination events were noticed in selected coronavirus sequences as described by the Heath et al (2006) and Varsani et al (2008). Many recombination events occurred in spike protein genes else than rest of the genome indicating breakpoint cluster arise more than the 95% and 99% breakpoint density intervals (grey areas) as depicted in Figure 4. Recombination events are multifaceted and frequent in Bat-CoVs than SARS-CoV-2. Henceforth not responsible for nCoVs emergence and COVID-19 outbreak (Lu et al., 2020).
Hence, our study nullifies the hypothesis of "SARS-CoV-2" emergence a consequence of recent recombination Table 1. Details of seven recombination events detected in SARS-CoV-2 using RDP-4. Major and minor parents are inferred based on genetic fragments they denoted to the recombinant, with the major parent donating the larger fragment and the minor parent the smaller fragment. Methods used to detect recombination are as follows RDP (R), GENCONV (G), BOOTSCAN (B), MAXCHI (M), CHIMEARA (C), SISCAN (S), PHYLPRO (P), LARD (L) and 3SEQ (T). The method with the most significant associated p-value is indicated in bold for each event.

Recombinant
Major Note: A recombinant score > 0.5 indicated that the recombination event was expected with a high degree of certainty. event. SARS-CoV-2 genome has no close genetic relationship within the subgenus sarbecovirus members, thus provide new lineage. Next-generation sequencing and bioinformatics are changing the way we can respond to pandemic, fast tracking the identification of pathogens, enlightening our understanding of disease incidence and transmission and endorsing data sharing (Armstrong et al., 2019). Moreover, high throughput, unbiased sequencing is a potential technique in the identification of novel pathogens (Palacios et al., 2008).

Conclusion
In a nutshell, we have highlighted a detailed genomic structure of SARS-CoV-2 that cause current serious pandemic, coupled with its phylogenetic and recombination analysis with other CoVs belonging to four genera and five subgenera. All the more by and large, the COVID-19 connected to 2019-nCoV features the concealed virus reservoir in wild creature(s) and their capability to once in a while overflow into human populaces. To bring this pandemic to an end, a large share of the world needs to be immune to the virus by vaccination. Now the challenge is to make these vaccines available to people around the world. It will be key that people in all countries not just in rich countries receive the required protection. MAGIORKINIS, G., MAGIORKINIS, E., PARASKEVIS, D., VANDAMME, A.M., VAN