Acessibilidade / Reportar erro

New Coronavirus 2 (SARS-CoV-2) Detection Method from Human Nucleic Acid Sequences Using Capsule Networks

Abstract

The new coronavirus SARS-CoV-2 is an infectious virus with a long incubation period, which was first detected in Wuhan, China, spread all over the world, seriously threatening human life. Therefore, accurate and rapid detection of SARS-CoV-2 is very important for controlling the epidemic and preventing its further spread. Currently, nucleic acid detection makes an important contribution to the prevention and control of SARS-CoV-2. In this study, a new and highly sensitive nucleic acid detection method for SARS-CoV-2 has been proposed. The nucleic acid sequences were digitized by Entropy-based mapping technique. Then, the digitized these sequences were divided into 100-unit sections using the sliding window method and given as input to Capsule Networks.10988 segments (5494 SARS-CoV-2, 5494 normal) are classified with capsule nets. With the proposed method, an accuracy performance of 100% was achieved by using capsule networks to identify SARS-CoV-2 from nucleic acid sequences. The results show that the proposed method successfully identifies SARS-CoV-2 from nucleic acid sequences.

Keywords:
SARS-CoV-2; Covid-19; Nucleic acid detection; Capsule networks; Coronavirus

HIGHLIGHTS

• DNA genome sequences of 10 different races are compared.

• Covid-19 nucleic acid sequences are digitized by Entropy based mapping technique.

• The digitized Covid-19 nucleic acid sequences are classified by the capsule networks.

INTRODUCTION

The new coronavirus SARS-CoV-2, which has spread all over the world and seriously threatens human life, is an infectious virus with a long incubation period [11 Segars J, Katler Q, McQueen DB, Kotlyar A, Glenn T, Knight Z, et al. Prior and novel coronaviruses, Coronavirus Disease 2019 (COVID-19), and human reproduction: what is known?. Fertil. Steril. 2020 Jun;113(6):1140-9.]. The new coronavirus, which is a positive-strand RNA virus, is a type of the coronavirus family that can affect both animals and humans [22 Heer CD, Sanderson Dj, Voth LS, Alhammad YMO, Schmidt MS, Trammell SAJ, et al. Coronavirus Infection and PARP Expression Dysregulate the NAD Metabolome: An Actionable Component of Innate Immunity. J. Biol. Chem. 2020 Oct; 295 (52):17986-96.,33 Calvo C, Hortelano MGL, Vicente JCC, Martínez JLV. Recommendations on The Clinical Management of the COVID-19 Infection by The «new coronavirus» SARS-CoV2. Spanish Paediatric Association Working Group. 2020 Apr; 92(4): 241.e1.]. Some different viruses from the coronavirus family have caused severe respiratory diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). Coronaviruses can be transmitted between humans and animals. The genetic information of viruses varies due to the mutation of their Ribonucleic Acid (RNA) [44 Zhang Y, Cao X, Wang P, Wang G, Lei G, Shou Z, et al. Emotional ‘Inflection Point’ in Public Health Emergencies with The 2019 New Coronavirus Pneumonia (NCP) in China. J. Affect. Disord. 2020 Nov; 276: 797-803.]. The virus clings to the cell in surface proteins, and by creating changes in these proteins, it can escape from the already formed immune system and multiply faster, causing much more damage to the cells. Symptoms of the new coronavirus infection include sore throat, headache, fever, dry cough, runny nose, upset stomach and vomiting, shortness of breath, and difficulty breathing [55 Ning PY, Yu AP, Wang Y, Guo LR, Shan D, Kong M, et al. Environmental Monitoring of a Laboratory for New Coronavirus Nucleic Acid Testing. Biomed. Environ. Sci. 2020 Oct; 33(10):771-4.,66 Yesilkaya UY, Sen M, Karamustafalioglu N. New Variants and New Symptoms in COVID-19: First Episode Psychosis and Cotard’s Syndrome Two Months After Infection with the B.1.1.7 variant of coronavirus. Schizophr. Res. 2021 May; 243:315-6.]. In more severe cases, the infection can cause pneumonia, acute respiratory failure, kidney failure, and even death. Like all types of viruses, coronaviruses, which have constantly evolved over time, have begun to cause more serious health problems with flu-like symptoms. Therefore, accurate and rapid identification of pathogenic viruses such as corona plays a vital role in preventing epidemics, saving people's lives, and even in choosing appropriate treatments [77 Al-Raeei M. The Basic Reproduction Number of The New Coronavirus Pandemic with Mortality for India, the Syrian Arab Republic, the United States, Yemen, China, France, Nigeria and Russia with Different Rate of Cases. Clinical Epidemiology and Global Health. 2021 Jan-Mar; 9:147-9.]. Recently, developments in molecular biology technology have led to the rapid development of Deoxyribonucleic Acid (DNA) signal processing methods and become a very important in Covid-19 virus detection. The existing prediction methods for Covid-19 nucleic acid are introduced to help to scientists in presenting better methods for efficient detection of coronavirus. These methods are Polymerase chain reaction (PCR)-based methods, isothermal nucleic acid amplification-based methods, microarray-based methods, and newly developed methods [88 Shen M, Zhou Y, YE J, Maskri AAA, Kang Y, Zeng S, et al. Recent Advances and Perspectives of Nucleic Acid Detection for Coronavirus. J. Pharm. Anal. 2020 Apr; 10(2): 97-101.]. PCR is a laboratory technique used to amplify DNA sequences. The method contains the gene segment called primers to assemble a copy alongside each segment. The PCR-based method is a widely used technique for screening for coronaviruses [99 Balboni A, Gallina L, Palladini A, Prosperi S, Battilani M. A Real-time PCR Assay for Bat SARS-likecoronavirus Detection and its Application to Italian Greater Horseshoe BatFaecal Sample Surveys, Sci. World J. 2012 Apr; 2012: 989514.,1010 Uhlenhaut C, Cohen JI, Pavletic S, Illei G, Banacloche JCG, Asab MA, et al. Use of a Novel Virus Detection Assay to Identify Coronavirus HKU1 in the Lungs of a Hematopoietic Stem Cell Transplant Recipient with Fatal Pneumonia. Transpl. Infect. Dis. 2012 Jul; 14:79-85.]. However, these methods are both time consuming and costly, so they are not widely used in clinical samples. Isothermal nucleic acid amplification-based methods are commonly used for the amplification of DNAs. These methods exhibit great high specificity and sensitivity as a result of its exponential amplification feature [1111 Notomi T, Okayama H, Masubuchi H, Yonekawa T, Watanabe K, Amino N, Hase T. Loop-mediated Isothermal Amplification of DNA. Nucleic Acids Res. 2000 Jun; 28(12): e63.]. Loop-mediated isothermal amplification (LAMP) is a technique that does not require expensive tools and is fast in performance. LAMP tests may be preferred to help reduce the cost of Covid-19 diagnosis [1212 Enosawa M, Kageyama S, Sawai K, Watanabe K, Notomi T, Onoe S, et al. Use of Loop-Mediated Isothermalamplification of the IS900 Sequence for Rapid Detection of Cultured Myco-bacterium Avium Subsp. Paratuberculosis. J. Clin. Microbiol. 2003 Sep; 41(9):4359-65.]. However, the biggest disadvantage of LAMP-based methods is that they always show optimum performance at 65°C. One of the fast and highly efficient detection methods is the microarray method [1313 Long WH, Xiao HS, Gu XM, Zhang QH, Yang HJ, Zhao GP, et al. A Universal Microarray for Detection of SARS Coronavirus. J Virol Methods. 2004 Oct; 121(1): 57-63.]. In this technique, cDNA will first be produced by RNA of coronavirus via reverse transcription, then oligonucleotides will load cDNAs into each well, then free DNAs will be removed [1414 Chen Q, Li J, Deng ZR, Xiong W, Wang Q, Hu YQ. Comprehensive Detection and Identification of Seven Animal Coronaviruses and Human Respiratory Coronavirus 229E with Amicroarray Hybridization Assay. Intervirology.2010 Dec; 53(2):95-104.]. Finally, specific probes will detect coronavirus RNA. Microarray methods are widely used in coronavirus detection [1515 Shi R, Ma W, Wu Q, Zhang B, Song Y. Guo Q, et al. Design and Application of 60mer Oligonucleotide Microarray in SARS Coronavirus Detection. Chin. Sci. Bull. 2003 Jun; 48(12):1165-9.,1616 Guo X, Geng P, Wang Q, Cao B, Liu B. Development of a Single Nucleotide Poly-morphism DNA Microarray for the Detection and Genotyping of the SARS coronavirus. J. Microbiol. Biotechnol. 2014 Oct;24(10):1445-54.]. The newly developed methods use artificial intelligence (AI) and deep learning approaches have been preferred to detect of Covid-19. Also, in [1717 Yin L, Man S, Ye S, Liu G, Ma L. CRISPR-Cas Based Virus Detection: Recent Advances and Perspectives. Biosens. Bioelectron. 2021 Dec; 193:113541.] the authors provide information on recent advances in virus detection using CRISPR-Cas systems such as CRISPR-Cas12a/Cas13a. Their review also increased the importance and advantages of these methods. The paper [1818 Sheng N, Xue-Ping M, Pang SY, Song QX, Zou BJ, Zhou GH. Research Progress of Nucleic Acid Detection Technology Platforms for New Coronavirus SARS-CoV-2. Chinese J. Anal. Chem. 2020 Oct; 48 (10):1279-87.] introduces nucleic acid detection techniques for SARS-CoV-2 according to different platforms. It also presents advantages and disadvantages of these different technology platforms. It provides a reference for selecting the appropriate nucleic acid detection technology for SARS-CoV-2. In [1919 Xie C, Jiang L, Huang G, Pu H, Gong B, Lin H, et al. Comparison of Different Samples for 2019 Novel Coronavirus Detection by Nucleic Acid Amplification Tests. Int. J. Infect. Dis. 2020 Apr; 93: 264-7.] the authors present the clinical and laboratory characteristics of 19 suspected cases. In the study, the 2019-nCoV nucleic acid amplification test was performed with 3 different kits and the results were compared. They obtained the same result for each sample with these three kits. The authors evaluate the antibodies-based test and nucleic acid-based test for SARS-CoV-2-infected patients. They studied 133 patients diagnosed with SARS-CoV-2. They collected demographic data, clinical records, laboratory tests, and outcomes. In the study, the method increase the accuracy in coronavirus detection and provides an effective complement to the false-negative results from a nucleic acid test for SARS-CoV-2 infection [2020 Liu R, Liu X, Yuan L, Han H, Shereen MA, Zhen J, et al. Analysis of Adjunctive Serological Detection to Nucleic Acid Test for Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Infection Diagnosis. Int. Immunopharmacol. 2020 Sep; 86: 106746.]. In [2121 Geling T, Huaizheng G, Ying C, Hua H. Recurrent Positive Nucleic Acid Detection in a Recovered COVID-19 Patient: A Case Report and Literature Review. Respir. Med. Case Rep. 2020; 31: 101152.], the authors analyze a case of COVID-19 and review the relevant literature. They examine the recurrent positive nucleic acid detection in a recovered COVID-19 patient. They present a review study. In [2222 Mani K, Thirumalmuthu K, Kathiresan D S, Ramalingam S, Sankaran R, Jeyaraj S. In-silico Analysis of Covid-19 Genome Sequences of Indian Origin: Impact of Mutations in Identification of SARS-Co-V2. Molecular and Cellular Probes. 2021 Aug; 58:101748.], the authors evaluated mismatches in PCR primers against SARS-CoV-2 genomes from bioinformatics analysis approach. Primers and probe sequences targeting ORF1ab gene assays displayed > 98.3% accuracy. In [2323 Arslan H, Arslan H. A New COVID-19 Detection Method from Human Genome Sequences Using CpG Island Features and KNN Classifier. Eng. Sci. Technol. Int. J. 2021 Apr; 24(4):839-47.], a COVID-19 detection method based on the K-nearest neighbors (KNN) classifier using the complete genome sequences of human coronaviruses is proposed. Also, two features based on CpG island to detect COVID-19 cases is described. The proposed method achieves 98.4% accuracy. In [2424 Alkady W, ElBahnasy K, Leiva V, Gad W, Classifying COVID-19 based on amino acids encoding with machine learning algorithms, Chemometrics and Intelligent Laboratory Systems. 2022 May; 224:104535.], a prediction model based on amino acid coding is proposed. The model classifies the types of coronavirus. The performance of the proposed model was compared with machine learning methods such as decision trees, k-nearest neighbor, random forest, support vector machine. A performance of 98.69% was achieved. In [2525 Cobre AF, Surek M, Stremel D P, Fachi MM, Borba HHL, et al. Diagnosis and prognosis of COVID-19 employing analysis of patients’ plasma and serum via LC-MS and machine learning. Comput. Biol. Med. 2022 Jul; 146:105659.], the performance of seven machine learning methods was compared using two different databases containing plasma and serum samples for the diagnosis of Covid-19. The highest performance was achieved with the PLS-DA model with 93%. In [2626 Jaroenram W, Chatnuntawech I, Kampeera J, Pengpanich S, Leaungwutiwong P, Tondee B, et al. One-step colorimetric isothermal detection of COVID-19 with AI-assisted automated result analysis: A platform model for future emerging point-of-care RNA/DNA disease diagnosis. Talanta, 2022 Nov; 249:123375.], a new RT-LAMP assay with artificial intelligence has been developed. This assay system gave more precise and faster results in large-scale tests. The tool named RT-LAMP-DETR has achieved 100% sensitivity, specificity and accuracy with the results obtained from RT-qPCR in the diagnosis of Covid-19.

As mentioned above, some studies have been carried out in the literature using various methods for Corona virus detection. However, these methods have some advantages and disadvantages. Although PCR-based methods are routine and reliable for coronavirus detection, they are both time consuming and costly and therefore not preferred in clinical samples. LAMP-based methods do not require expensive instruments and performing the LAMP test can help reduce the cost of detecting Covid-19, they are also fast. However, the biggest disadvantage of LAMP-based methods is that they need to reach a temperature of 65°C for optimum performance. Although microarray-based methods are widely used for Coronavirus detection, these methods have some disadvantages. These are labor intensive requirements for synthesizing, purifying and storing DNA solutions prior to microarray fabrication. Also, microarrays became more expensive as more printing devices were required. Rapid, accurate, low cost identification of the coronavirus plays a vital role in selecting appropriate treatments for SARS-CoV-2, saving people's lives, and preventing Covid outbreaks. This study presents an effective and accurate detection method for coronavirus from nucleic acid sequences using capsule networks. Also the study does not require high cost and specific laboratory conditions. This work will assist researchers and clinicians in developing better techniques for timely, rapid and highly accurate detection of coronavirus infection, as well as in developing better techniques.

The following is a summary of the main contributions:

  • A new and highly sensitive SARS-CoV-2 nucleic acid identification model is presented for the inexpensive and rapid detection diagnosis of coronavirus.

  • To the best of our knowledge, this is the first study in which the Covid-19 nucleic acid sequences were digitized and classified by capsule networks.

  • This study provides researchers with a better technique for rapid and accurate detection of coronavirus.

  • This study presents an alternative method for the existing detection methods for Covid-19 from nucleic acids in molecular biology technology.

  • The proposed method obtained accuracy performance of 100% with capsule networks for Covid-19 infection detection by using nucleic acid sequences.

The rest of this paper is arranged as follows: In section 2, we provide fundamental information about the dataset, numerical mapping techniques, creating spectrogram images, capsule networks, and the proposed approach are mentioned. The obtained findings in the experimental results, section 3 are discussed. Section 4 presents the conclusion.

MATERIAL AND METHODS

In this section, steps of the proposed SARS-CoV-2 detection model in the prediction of Covid-19 from human nucleotide acids sequences using capsule networks are summarized.

The proposed SARS-CoV-2 detection method

This section describes sample collection, the digitization and preprocessing of DNA sequences, capsule network, and the proposed architecture. The overview of the proposed SARS-Cov-2 detection model is shown in Figure 1. In this block diagram, nucleic acid sequences of SARS-CoV-2 and normal are used as input for the proposed detection method. Next, the collected sequences are digitized by Entropy-based mapping technique. Then, the digitized these sequences were divided into 100-unit sections using the sliding window method and given as input to Capsule Networks. The segments (5494 SARS-CoV-2, 5494 normal) are classified with capsule nets.

Figure 1
Flowchart of the proposed SARS-Cov-2 detection method

The sample collection

The used SARS-CoV-2 and normal nucleic acid sequences in the study were obtained from the NCBI Virus genbank [2727 National Center for Biotechnology Information (NCBI) [Internet]. USA [update 2021 July] Available from: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#
https://www.ncbi.nlm.nih.gov/labs/virus/...
]. The used dataset for the implementation is given in Table 1. In the experiment, 10 Covid-19 genome sequence samples with a length of 29172 bases, with 10 different geo locations, and 10 healthy DNA gene sequence samples with the same length and same races were used.

Table 1
SARS-CoV-2 and normal genome sequences

Digitization of the sample sequences and preprocess

The nucleic acid sequences should be converted to digital signals in machine learning based applications. In this study the used dataset was digitized by Entropy based mapping technique [2828 Das B, Turkoglu I. A Novel Numerical Mapping Method Based on Entropy for Digitizing DNA Sequences. Neural Comput. Appl. 2018 Feb; 29(8):207-15.,2929 Das B. A deep learning model for identification of diabetes type 2 based on nucleotide signals. Neural. Comput. Appl. 2022 Mar; 34:12587-99.]. The entropy-based technique was chosen because it deepens the discrimination rates of SARS-CoV2-associated regions in nucleic acid sequences. The proposed technique better reflects the complex structure of the nucleic acid sequences and performs digitization according to the repetition frequency of codons. Moreover, this technique provides a wide range of correlation information on the genome sequence according to the repetition frequency values of codons. In this technique, nucleotide sequences are digitized according to the repetition frequency of codons. There are 64 codons in a nucleic acid sequence and each codon encodes a specific amino acid. For instance, ‘Asp’ amino acid encodes ‘GAT’, ‘GAC’ codons. ‘Phe’ encodes ‘TTT’, ‘TTC’ codons. ‘Lys’ amino acid encodes ‘AAA’, ‘AAG’ codons. The pseudo code of the Entropy based numerical technique is given in Algorithm 1.

Procedure: Entropy based numerical mapping technique

Input: DNAS=’AGTTCCA…’ signal with length of max_base

Output: digital signal(Z) with length of max_base-2

Step 1: PNN= [], Z= []; L=length (DNAS)-2;

Step 2: CodeFreq=codonbias(DNAS); //finding codons corresponding to each amino acid in the DNAS signal

Step 3: for i=1 to L

Step 4: PNN=[DNAS(i) DNAS(i+1) DNAS(i+2)];

Step 5: end for i;

Step 6: for j=1 to L

Step 7: switch [DNAS(i) DNAS(i+1) DNAS(i+2)];

Step 8: case CodeFreq.Ala.Codon(1); // GCA, GCC, GCG, GCT codons for Ala amino acid

Step 9: z= CodeFreq.Ala.Freq(1) * log(CodeFreq.Ala.Freq(1)); // the formula for GCA codon

Step 10: z = z *((CodeFreq.Ala.Freq(1))^(1/log(CodeFreq.Ala.Freq(1))));

Step 11: Z=[Z z];

Step 12: repeat Step 9,10,11 for Ala.Codon(2), Ala.Codon(3), Ala.Codon(4),

Step 13: repeat Step 8,9,10,11 for all 20 amico acid like Arg, Val etc.

Step 14: otherwise;

Step 15: break;

Step 16: end for switch

Step 17: end for j

Nucleic acid sequences with a total length of 29172 bp were used for normal and SARS CoV-2 that were digitized by the entropy-based mapping technique. These nucleic acid sequences were segmented to be appropriate for capsule network input. Three different window sizes 50, 100, and 200 units were used to segment the digitized signal. For example, when a 100 bp window size was used, 14487 Covid-19 and 14487 normal sections were obtained. were classified with capsule networks. These nucleic acid sequences of different lengths were classified by the capsule networks.

Capsule networks

Convolutional Neural Network (CNN) architectures achieve very high performance in object recognition/classification, but CNN architectures cannot reveal a lot of information about the objects in the image [3030 Toraman S. Preictal and Interictal Recognition for Epileptic Seizure Prediction Using Pre-trained 2DCNN Models. Traitement du Signal. 2020 Dec; 37 (6):1045-54.]. For example, they cannot provide various information such as the relationship between objects in the image or the orientation of the object. To overcome these shortcomings, Sabour and coauthors [3131 Sabour S, Frosst N, Hinton GE. Dynamic Routing Between Capsules. Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017 Dec 4-9; Long Beach, CA, USA. Curran Associates Inc.; 2017;3859-69.] propose Capsule Networks, which is a new neural network architecture in 2017. While neurons are in the hidden layer in artificial neural networks, they are found in structures called Capsules in the Capsule Network architecture. Also, the output of CNNs is scalar while the output of Capsule Network is vectorial. CNNs use as activation functions such as Sigmoid, Tangent, ReLU etc. Capsule Networks, on the other hand, use the activation function shown in Equation 1 and called squashing [3131 Sabour S, Frosst N, Hinton GE. Dynamic Routing Between Capsules. Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017 Dec 4-9; Long Beach, CA, USA. Curran Associates Inc.; 2017;3859-69.

32 Mukhometzianov, R., Carrillo, J. CapsNet comparative performance evaluation for image classification. ArXiv. 2018 May;arXiv:1805.11195.
-3333 Toraman S, Alakus B, Turkoglu I. Convolutional Capsnet: A Novel Artificial Neural Network Approach to Detect COVID-19 Disease from X-ray Images Using Capsule Networks. Chaos, Solitons & Fractals. 2020 Nov; 140: 110122.].

(1) v j = s j 2 1 + s j 2 . s j s j

In Equation 1, sj denotes the capsule input and vj the output vector of the capsule. In this study, the used capsule networks structure is shown in Figure 2.

Figure 2
Capsule network structure for classification of SARS-CoV-2 sequences

The proposed capsule network architecture consists of three convolutions, two max pooling, primary capsule, and Label capsule layers. The convolution layers contain 16, 32, and 64 filters, respectively. The filter sizes of the convolution layers are 7, 5, and 9, respectively. Max pooling with a stride of 2 was used. Unlike the original capsule network architecture, two convolutions and two max-pooling were added to the first layers of the network. Thus, both the feature maps of the images were obtained and the processing load was reduced by reducing the size of the images with max pooling. In Table 2, the parameters of the proposed capsule network architecture are given according to the 200-unit input signal. In Table 3, hyper-parmaters of 200-unit segments are given.

Table 2
Details of layers and parameters of capsule network architecture
Table 3
Hyper parameters of capsule network

RESULTS AND DISCUSSION

In the study, nucleic acid sequences of 29.172 bases long SARS-CoV-2 and normal were used. These genome sequences belong two groups were digitized by Entropy-based numerical mapping technique. The digitized genome sequences were divided into 50, 100, and 200 units by the sliding window method (1 unit). The resulting segments were used to feed the capsule network. In addition, the k-fold cross validation method was used to accurately evaluate the performance of the system. The k value of 5 was chosen. The graphics for the training phase of the model performed for the classification of nucleic acid sequences are shown in Figure 3. Figure 3 shows the classification results of the signals obtained with three different window sizes.

Figure 3
Training and loss graphics a- 50, b-100, c-200 units signal length

Table 4, 5, 6 presents the accuracy, precision, sensitivity, specificity, and F1-score values and averages obtained by 5-fold cross-validation for the classification of nucleic acid sequences of SARS-CoV-2 and normal with capsule networks.

Table 4
Test results of the proposed method 50 units
Table 5
Test results of the proposed method 100 units.
Table 6
Test results of the proposed method 200 units.

In order to better demonstrate the validity and reliability of the experiment, the data set was divided into five parts and each part was tested separately. As seen in Table 4-6, 97.01%, 99.95% and 100.00% accuracy were obtained, respectively. The obtained results provide a good classification result of the Entropy-based digital mapping technique. This is due to usage of the Entropy-based digital mapping technique and further deepening the discrimination rates between different type of sequences. In addition, this technique better reflects the complex structure of DNA sequences. In the study, three different signal sizes were tried to determine the signal length to be given as an input to the capsule network. Figure 4 shows that signal lengths between 100-200 are appropriate values that can be used in the analysis of nucleic acid sequences.

Figure 4
Signal segment size - accuracy relationship

The confusion matrix is given in Figure 5 to examine the statistical success of the proposed method. Figure 5 shows the correct/incorrect classification numbers of the sequences belonging to normal and SARS-Cov-2, which were digitized by Entropy based numerical technique. Only 6 segments from 5494 SARS-CoV-2 nucleic acid sequences, and only 18 segments from normal sequences were misidentified.

Figure 5
Confusion matrix for classification of nucleic acid sequences of SARS-CoV-2 by capsule networks (50 units, 100 units, and 200 units respectively).

In this paper, a method based on capsule networks is presented to identify SARS coronavirus from base sequences. In literature, there are various detection methods for coronavirus nucleic acid. These methods are Isothermal-based methods, microarray-based methods, PCR-based methods and newly developed methods. Table 7 shows a comparison of our method with the current methods in the literature for the detection of SARS-Cov-2 from DNA sequences. The common feature of these methods used is that they are molecular biology techniques. Also, these methods require various equipment and educated analysts, the high cost and high temperatures. Moreover, these methods can only be performed under good laboratory conditions. Unlike other methods, our method does not require laboratory environment, high cost and high temperature. The newly developed methods are based on machine learning methods. Many studies have been presented in the literature for the detection of Covid-19 using chest X-ray images. However, these methods are not for detecting coronavirus from nucleic acid sequences, but for detection Covid-19 using CT images. The proposed study is different from the other studies in the literature about identification of SARS-CoV-2 from nucleic acid sequences. The proposed study achieved the highest classification accuracy rate of 100% with capsule networks to detect SARS-CoV-2 from nucleic acid sequences.

Table 7
A comparison of the proposed method with other methods

CONCLUSION

In this study, a method based on Capsule networks has been proposed to identify and classify SARS-CoV-2 from nucleic acid sequences. In order to examine these genome sequences, firstly, the nucleic acid sequences were digitized with the Entropy-based numerical mapping technique. Thus, the data has been made suitable to be given as an input to a new neural network model, Capsule Networks. The results show that there is a distinction between sequences who are infected with SARS-Cov-2 and normal. Experimental results show that our method provides 100% accuracy in the screening of SARS-CoV-2, one of the nucleotide sequences. The proposed method offers a fast, inexpensive and high-accuracy technology in contrast to other laboratory-based and high-cost methods in the diagnosis of Covid-19 from DNA sequences. This highlights the effectiveness of the proposed capsule networks-based method. In the future, when other coronavirus-like virus types emerge, it may be possible to apply the proposed method to detect these diseases again.

REFERENCES

  • 1
    Segars J, Katler Q, McQueen DB, Kotlyar A, Glenn T, Knight Z, et al. Prior and novel coronaviruses, Coronavirus Disease 2019 (COVID-19), and human reproduction: what is known?. Fertil. Steril. 2020 Jun;113(6):1140-9.
  • 2
    Heer CD, Sanderson Dj, Voth LS, Alhammad YMO, Schmidt MS, Trammell SAJ, et al. Coronavirus Infection and PARP Expression Dysregulate the NAD Metabolome: An Actionable Component of Innate Immunity. J. Biol. Chem. 2020 Oct; 295 (52):17986-96.
  • 3
    Calvo C, Hortelano MGL, Vicente JCC, Martínez JLV. Recommendations on The Clinical Management of the COVID-19 Infection by The «new coronavirus» SARS-CoV2. Spanish Paediatric Association Working Group. 2020 Apr; 92(4): 241.e1.
  • 4
    Zhang Y, Cao X, Wang P, Wang G, Lei G, Shou Z, et al. Emotional ‘Inflection Point’ in Public Health Emergencies with The 2019 New Coronavirus Pneumonia (NCP) in China. J. Affect. Disord. 2020 Nov; 276: 797-803.
  • 5
    Ning PY, Yu AP, Wang Y, Guo LR, Shan D, Kong M, et al. Environmental Monitoring of a Laboratory for New Coronavirus Nucleic Acid Testing. Biomed. Environ. Sci. 2020 Oct; 33(10):771-4.
  • 6
    Yesilkaya UY, Sen M, Karamustafalioglu N. New Variants and New Symptoms in COVID-19: First Episode Psychosis and Cotard’s Syndrome Two Months After Infection with the B.1.1.7 variant of coronavirus. Schizophr. Res. 2021 May; 243:315-6.
  • 7
    Al-Raeei M. The Basic Reproduction Number of The New Coronavirus Pandemic with Mortality for India, the Syrian Arab Republic, the United States, Yemen, China, France, Nigeria and Russia with Different Rate of Cases. Clinical Epidemiology and Global Health. 2021 Jan-Mar; 9:147-9.
  • 8
    Shen M, Zhou Y, YE J, Maskri AAA, Kang Y, Zeng S, et al. Recent Advances and Perspectives of Nucleic Acid Detection for Coronavirus. J. Pharm. Anal. 2020 Apr; 10(2): 97-101.
  • 9
    Balboni A, Gallina L, Palladini A, Prosperi S, Battilani M. A Real-time PCR Assay for Bat SARS-likecoronavirus Detection and its Application to Italian Greater Horseshoe BatFaecal Sample Surveys, Sci. World J. 2012 Apr; 2012: 989514.
  • 10
    Uhlenhaut C, Cohen JI, Pavletic S, Illei G, Banacloche JCG, Asab MA, et al. Use of a Novel Virus Detection Assay to Identify Coronavirus HKU1 in the Lungs of a Hematopoietic Stem Cell Transplant Recipient with Fatal Pneumonia. Transpl. Infect. Dis. 2012 Jul; 14:79-85.
  • 11
    Notomi T, Okayama H, Masubuchi H, Yonekawa T, Watanabe K, Amino N, Hase T. Loop-mediated Isothermal Amplification of DNA. Nucleic Acids Res. 2000 Jun; 28(12): e63.
  • 12
    Enosawa M, Kageyama S, Sawai K, Watanabe K, Notomi T, Onoe S, et al. Use of Loop-Mediated Isothermalamplification of the IS900 Sequence for Rapid Detection of Cultured Myco-bacterium Avium Subsp. Paratuberculosis. J. Clin. Microbiol. 2003 Sep; 41(9):4359-65.
  • 13
    Long WH, Xiao HS, Gu XM, Zhang QH, Yang HJ, Zhao GP, et al. A Universal Microarray for Detection of SARS Coronavirus. J Virol Methods. 2004 Oct; 121(1): 57-63.
  • 14
    Chen Q, Li J, Deng ZR, Xiong W, Wang Q, Hu YQ. Comprehensive Detection and Identification of Seven Animal Coronaviruses and Human Respiratory Coronavirus 229E with Amicroarray Hybridization Assay. Intervirology.2010 Dec; 53(2):95-104.
  • 15
    Shi R, Ma W, Wu Q, Zhang B, Song Y. Guo Q, et al. Design and Application of 60mer Oligonucleotide Microarray in SARS Coronavirus Detection. Chin. Sci. Bull. 2003 Jun; 48(12):1165-9.
  • 16
    Guo X, Geng P, Wang Q, Cao B, Liu B. Development of a Single Nucleotide Poly-morphism DNA Microarray for the Detection and Genotyping of the SARS coronavirus. J. Microbiol. Biotechnol. 2014 Oct;24(10):1445-54.
  • 17
    Yin L, Man S, Ye S, Liu G, Ma L. CRISPR-Cas Based Virus Detection: Recent Advances and Perspectives. Biosens. Bioelectron. 2021 Dec; 193:113541.
  • 18
    Sheng N, Xue-Ping M, Pang SY, Song QX, Zou BJ, Zhou GH. Research Progress of Nucleic Acid Detection Technology Platforms for New Coronavirus SARS-CoV-2. Chinese J. Anal. Chem. 2020 Oct; 48 (10):1279-87.
  • 19
    Xie C, Jiang L, Huang G, Pu H, Gong B, Lin H, et al. Comparison of Different Samples for 2019 Novel Coronavirus Detection by Nucleic Acid Amplification Tests. Int. J. Infect. Dis. 2020 Apr; 93: 264-7.
  • 20
    Liu R, Liu X, Yuan L, Han H, Shereen MA, Zhen J, et al. Analysis of Adjunctive Serological Detection to Nucleic Acid Test for Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Infection Diagnosis. Int. Immunopharmacol. 2020 Sep; 86: 106746.
  • 21
    Geling T, Huaizheng G, Ying C, Hua H. Recurrent Positive Nucleic Acid Detection in a Recovered COVID-19 Patient: A Case Report and Literature Review. Respir. Med. Case Rep. 2020; 31: 101152.
  • 22
    Mani K, Thirumalmuthu K, Kathiresan D S, Ramalingam S, Sankaran R, Jeyaraj S. In-silico Analysis of Covid-19 Genome Sequences of Indian Origin: Impact of Mutations in Identification of SARS-Co-V2. Molecular and Cellular Probes. 2021 Aug; 58:101748.
  • 23
    Arslan H, Arslan H. A New COVID-19 Detection Method from Human Genome Sequences Using CpG Island Features and KNN Classifier. Eng. Sci. Technol. Int. J. 2021 Apr; 24(4):839-47.
  • 24
    Alkady W, ElBahnasy K, Leiva V, Gad W, Classifying COVID-19 based on amino acids encoding with machine learning algorithms, Chemometrics and Intelligent Laboratory Systems. 2022 May; 224:104535.
  • 25
    Cobre AF, Surek M, Stremel D P, Fachi MM, Borba HHL, et al. Diagnosis and prognosis of COVID-19 employing analysis of patients’ plasma and serum via LC-MS and machine learning. Comput. Biol. Med. 2022 Jul; 146:105659.
  • 26
    Jaroenram W, Chatnuntawech I, Kampeera J, Pengpanich S, Leaungwutiwong P, Tondee B, et al. One-step colorimetric isothermal detection of COVID-19 with AI-assisted automated result analysis: A platform model for future emerging point-of-care RNA/DNA disease diagnosis. Talanta, 2022 Nov; 249:123375.
  • 27
    National Center for Biotechnology Information (NCBI) [Internet]. USA [update 2021 July] Available from: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#
    » https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#
  • 28
    Das B, Turkoglu I. A Novel Numerical Mapping Method Based on Entropy for Digitizing DNA Sequences. Neural Comput. Appl. 2018 Feb; 29(8):207-15.
  • 29
    Das B. A deep learning model for identification of diabetes type 2 based on nucleotide signals. Neural. Comput. Appl. 2022 Mar; 34:12587-99.
  • 30
    Toraman S. Preictal and Interictal Recognition for Epileptic Seizure Prediction Using Pre-trained 2DCNN Models. Traitement du Signal. 2020 Dec; 37 (6):1045-54.
  • 31
    Sabour S, Frosst N, Hinton GE. Dynamic Routing Between Capsules. Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017 Dec 4-9; Long Beach, CA, USA. Curran Associates Inc.; 2017;3859-69.
  • 32
    Mukhometzianov, R., Carrillo, J. CapsNet comparative performance evaluation for image classification. ArXiv. 2018 May;arXiv:1805.11195.
  • 33
    Toraman S, Alakus B, Turkoglu I. Convolutional Capsnet: A Novel Artificial Neural Network Approach to Detect COVID-19 Disease from X-ray Images Using Capsule Networks. Chaos, Solitons & Fractals. 2020 Nov; 140: 110122.
Editor-in-Chief: Paulo Vitor Farago
Associate Editor: Marcelo Ricardo Vicari

Publication Dates

  • Publication in this collection
    13 Feb 2023
  • Date of issue
    2023

History

  • Received
    29 Apr 2022
  • Accepted
    12 Oct 2022
Instituto de Tecnologia do Paraná - Tecpar Rua Prof. Algacyr Munhoz Mader, 3775 - CIC, 81350-010 Curitiba PR Brazil, Tel.: +55 41 3316-3052/3054, Fax: +55 41 3346-2872 - Curitiba - PR - Brazil
E-mail: babt@tecpar.br