《BioRxiv,2月4日,Machine learning-based analysis of genomes suggests associations between Wuhan 2019-nCoV and bat Betacoronaviruses》

  • 来源专题:COVID-19科研动态监测
  • 编译者: zhangmin
  • 发布时间:2020-02-05
  • Machine learning-based analysis of genomes suggests associations between Wuhan 2019-nCoV and bat Betacoronaviruses

    Gurjit S Randhawa, Maximillian P.M. Soltysiak, Hadi El Roz, Camila P.E. de Souza, Kathleen A. Hill, Lila Kari

    doi: https://doi.org/10.1101/2020.02.03.932350

    Abstract

    As of February 3, 2020, the 2019 Novel Coronavirus (2019-nCoV) spread to 27 countries with 362 deaths and more than 17000 confirmed cases. 2019-nCoV is being compared to the infamous SARS coronavirus outbreak. Between November 2002 and July 2003, SARS resulted in 8098 confirmed cases worldwide with a 9.6% death rate and 774 deaths. Mainland China alone suffered 349 deaths and 5327 confirmed cases. Though 2019-nCoV has a death rate of 2.2% as of 3 February, the 174895 confirmed cases in a few weeks (December 8, 2019 to February 3, 2020) are alarming. Cases are likely under-reported given the comparatively longer incubation period. Such outbreaks demand rapid elucidation and analysis of the virus genomic sequence for timely treatment plans. We classify the 2019-nCoV using MLDSP and MLDSP-GUI, alignment-free methods that use Machine Learning (ML) and Digital Signal Processing (DSP) for genome analyses. Genomic sequences were mapped into their respective genomic signals (discrete numeric series) using a two-dimensional numerical representation (Chaos Game Representation). The magnitude spectra were computed by applying Discrete Fourier Transform on the genomic signals. The Pearson Correlation Coefficient was used to calculate a pairwise distance matrix. The feature vectors were constructed from the distance matrix and used as an input to the supervised machine learning algorithms. 10-fold cross-validation was applied to compute the average classification accuracy scores. The trained classifier models were used to predict the labels of 29 2019-nCoV sequences. The classification strategy used over 5000 genomes and tested associations at taxonomic levels of realm to species. From our machine learning-based alignment-free analyses using MLDSP-GUI, we corroborate the current hypothesis of a bat origin and classify 2019-nCoV as Sarbecovirus, within Betacoronavirus.

    *注,本文为预印本论文手稿,是未经同行评审的初步报告,其观点仅供科研同行交流,并不是结论性内容,请使用者谨慎使用.

  • 原文来源:https://www.biorxiv.org/content/10.1101/2020.02.03.932350v1
相关报告
  • 《BioRxiv,2月9日,Machine learning analysis of genomic signatures provides evidence of associations between Wuhan 2019-nCoV and bat betacoronaviruses》

    • 来源专题:COVID-19科研动态监测
    • 编译者:zhangzx
    • 发布时间:2020-02-10
    • Abstract.As of February 8, 2020, the 2019 Novel Coronavirus (2019-nCoV) spread to 29 countries with 725 deaths and more than 34000 confirmed cases. 2019-nCoV is being compared to the infamous SARS coronavirus, which resulted, between November 2002 and July 2003, in 8098 confirmed cases worldwide with a 9.6% death rate and 774 deaths. Though 2019-nCoV has a death rate of 2% as of 8 February, the 34963 confirmed cases in a few weeks (December 8, 2019 to February 8, 2020) are alarming, with cases...
  • 《BioRxiv,2月4日,Preliminary identification of potential vaccine targets for 2019-nCoV based on SARS-CoV immunological studies》

    • 来源专题:COVID-19科研动态监测
    • 编译者:zhangmin
    • 发布时间:2020-02-05
    • Preliminary identification of potential vaccine targets for 2019-nCoV based on SARS-CoV immunological studies Syed Faraz Ahmed, Ahmed A. Quadeer, Matthew R. McKay doi: https://doi.org/10.1101/2020.02.03.933226 Abstract The beginning of 2020 has seen the emergence of the 2019 novel coronavirus (2019-nCoV) outbreak. Since the first reported case in the Wuhan city of China, 2019-nCoV has spread to other cities in China as well as to multiple countries across four continents. There is an imminent need to better understand this novel virus and to develop ways to control its spread. In this study, we sought to gain insights for vaccine design against 2019-nCoV by considering the high genetic similarity between 2019-nCoV and the Severe Acute Respiratory Syndrome coronavirus (SARS-CoV), and leveraging existing immunological studies of SARS-CoV. By screening the experimentally-determined SARS-CoV-derived B cell and T cell epitopes in the immunogenic structural proteins of SARS-CoV, we identified a set of B cell and T cell epitopes derived from the spike (S) and nucleocapsid (N) proteins that map identically to 2019-nCoV proteins. As no mutation has been observed in these identified epitopes among the available 2019-nCoV sequences (as of 29 January 2020), immune targeting of these epitopes may potentially offer protection against 2019-nCoV. For the T cell epitopes, we performed a population coverage analysis of the associated MHC alleles and proposed a set of epitopes that is estimated to provide broad coverage globally, as well as in China. Our findings provide a screened set of epitopes that can help guide experimental efforts towards the development of vaccines against 2019-nCoV. *注,本文为预印本论文手稿,是未经同行评审的初步报告,其观点仅供科研同行交流,并不是结论性内容,请使用者谨慎使用.