《BioRxiv,3月2日,Vorpal: A Novel RNA Virus Feature-Extraction Algorithm Demonstrated Through Interpretable Genotype-to-Phenotype Linear Models》

  • 来源专题:COVID-19科研动态监测
  • 编译者: zhangmin
  • 发布时间:2020-03-03
  • Vorpal: A Novel RNA Virus Feature-Extraction Algorithm Demonstrated Through Interpretable Genotype-to-Phenotype Linear Models

    Phillip Davis, John Bagnoli, David Yarmosh, Alan Shteyman, Lance Presser, Sharon Altmann, Shelton Bradrick, Joseph A Russell III

    doi: https://doi.org/10.1101/2020.02.28.969782

    Abstract

    In the analysis of genomic sequence data, so-called alignment free approaches are often selected for their relative speed compared to alignment-based approaches, especially in the application of distance comparisons and taxonomic classification. These methods are typically reliant on excising K-length substrings of the input sequence, called K-mers. In the context of machine learning, K-mer based feature vectors have been used in applications ranging from amplicon sequencing classification to predictive modeling for antimicrobial resistance genes. This can be seen as an analogy of the bag-of-words model successfully employed in natural language processing and computer vision for document and image classification.

    *注,本文为预印本论文手稿,是未经同行评审的初步报告,其观点仅供科研同行交流,并不是结论性内容,请使用者谨慎使用.

  • 原文来源:https://www.biorxiv.org/content/10.1101/2020.02.28.969782v1
相关报告
  • 《3月2日_MRIGlobal公司通过基因型表型模型展示一种RNA病毒特征抽取算法》

    • 来源专题:COVID-19科研动态监测
    • 编译者:xuwenwhlib
    • 发布时间:2020-03-04
    • 3月2日_MRIGlobal公司通过基因型表型模型展示一种RNA病毒特征抽取算法 1.时间:2020年3月2日 2.机构或团队:美国MRIGlobal 3.事件概要: 3月2日,bioRxiv预印本平台发表了来自MRIGlobal研究团队的题为“Vorpal: A Novel RNA Virus Feature-Extraction Algorithm Demonstrated Through Interpretable Genotype-to-Phenotype Linear Models”的文章。 文章指出,在基因组序列分析中通常选择无比对方法,因为它相比于基于比对的方法具有更快的分析速度,并在距离比较和分类学中应用广泛。这些方法通常依赖于切除输入序列的K长度子串(K-mers)。在机器学习中,基于K-mer的特征向量已成功应用于从扩增子测序分类到抗菌素耐药基因预测模型中。可以将其类比为自然语言处理和计算机视觉中用于文档和图像分类的词袋模型。自然语言处理中的特征提取技术早先已应用于基因组数据。但是,由于高序列间差异和K-mers的精确匹配要求,“词袋”方法在RNA病毒空间数据上的应用并不可靠。 为了使词袋法的简便性与RNA病毒空间变异伴随的复杂性协调一致,本文设计了一种以客观反映潜在生物学现象的方式并解决K-mers不可靠问题的方法。该研究算法Vorpal允许构建以聚类K-mers为输入向量,并通过正则化将二进制表型的稀疏预测因子作为输出的可解释线性模型。在本文中,通过拟合三个单独的RNA病毒进化枝中二元表型的核苷酸水平的基因组基序预测因子来证明Vorpal的有效性;人类病原体与在甲型流感病毒中引起原发性非人类病原体,在埃博拉病毒中引起出血热与非出血性发热以及在甲型流感中人类宿主与非人类宿主的关系。该代码可从https://github.com/mriglobal/vorpal下载。 *注,本文为预印本论文手稿,是未经同行评审的初步报告,其观点仅供科研同行交流,并不是结论性内容,请使用者谨慎使用。 4.附件: 原文链接https://www.biorxiv.org/content/10.1101/2020.02.28.969782v1
  • 《BioRxiv,3月20日,Ancient RNA virus epidemics through the lens of recent adaptation in human genomes》

    • 来源专题:COVID-19科研动态监测
    • 编译者:zhangmin
    • 发布时间:2020-03-21
    • Ancient RNA virus epidemics through the lens of recent adaptation in human genomes David Enard, View ORCID ProfileDmitri Petrov doi: https://doi.org/10.1101/2020.03.18.997346 Abstract Over the course of the last several million years of evolution, humans likely have been plagued by hundreds or perhaps thousands of epidemics. Little is known about such ancient epidemics and a deep evolutionary perspective on current pathogenic threats is lacking. The study of past epidemics has typically been limited in temporal scope to recorded history, and in physical scope to pathogens that left sufficient DNA behind, such as Yersinia pestis during the Great Plague. Host genomes however offer an indirect way to detect ancient epidemics beyond the current temporal and physical limits. Arms races with pathogens have shaped the genomes of the hosts by driving a large number of adaptations at many genes, and these signals can be used to detect and further characterize ancient epidemics. *注,本文为预印本论文手稿,是未经同行评审的初步报告,其观点仅供科研同行交流,并不是结论性内容,请使用者谨慎使用.