Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis. Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters. The analysis module and visualization module were combined into a reusable workflow. Currently, clusterProfiler supports three species, including humans, mice, and yeast. Methods provided in this package can be easily extended to other species and ontologies. The clusterProfiler package is released under Artistic-2.0 License within Bioconductor project. The source code and vignette are freely available at http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html
The methodologies used to generate genome and metagenome annotations are diverse and vary between groups and laboratories. Descriptions of the annotation process are helpful in interpreting genome annotation data. Some groups have produced Standard Operating Procedures ( SOPs) that describe the annotation process, but standards are lacking for structure and content of these descriptions. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse an online repository of SOPs.
The SnRK2 family members are plant-specific serine/threonine kinases involved in plant response to abiotic stresses and abscisic acid (ABA)-dependent plant development. SnRK2s have been classed into three groups; group 1 comprises kinases not activated by ABA, group 2 comprises kinases not activated or activated very weakly by ABA, and group 3 comprises kinases strongly activated by ABA. So far, the ABA-dependent kinases belonging to group 3 have been studied most thoroughly. They are considered major regulators of plant response to ABA. The regulation of the plant response to ABA via SnRK2s pathways occurs by direct phosphorylation of various downstream targets, for example, SLAC1, KAT1, AtRbohF, and transcription factors required for the expression of numerous stress response genes. Members of group 2 share some cellular functions with group 3 kinases; however, their contribution to ABA-related responses is not clear. There are strong indications that they are positive regulators of plant responses to water deficit. Most probably they complement the ABA-dependent kinases in plant defense against environmental stress. So far, data concerning the physiological role of ABA-independent SnRK2s are very limited; it is to be expected they will be studied extensively in the nearest future.
High-throughput sequencing technologies, also known as next-generation sequencing (NGS) technologies, have revolutionized the way that genomic research is advancing. In addition to the static genome, these state-of-art technologies have been recently exploited to analyze the dynamic transcriptome, and the resulting technology is termed RNA sequencing (RNA-seq). RNA-seq is free from many limitations of other transcriptomic approaches, such as microarray and tag-based sequencing method. Although RNA-seq has only been available for a short time, studies using this method have completely changed our perspective of the breadth and depth of eukaryotic transcriptomes. In terms of the transcriptomics of teleost fishes, both model and non-model species have benefited from the RNA-seq approach and have undergone tremendous advances in the past several years. RNA-seq has helped not only in mapping and annotating fish transcriptome but also in our understanding of many biological processes in fish, such as development, adaptive evolution, host immune response, and stress response. In this review, we first provide an overview of each step of RNA-seq from library construction to the bioinformatic analysis of the data. We then summarize and discuss the recent biological insights obtained from the RNA-seq studies in a variety of fish species.
Traditional Chinese medicine (TCM) has been used for thousands of years to treat or prevent disease. The health care paradigm has shifted from a focus on disease to TCM therapy with a holistic approach. However, the actual value of TCM has not been fully recognized worldwide due to a lack of scientific approaches to its study. Today omics has become practically available, and resembles TCM in many aspects, and can serve as a key driving force for the translation of the traditional Chinese medical formulae (chinmediformulae) into practice, and will develop and advance the concept of the metabolomics of chinmediformulae (chinmedomics). Chinmedomics seeks to elucidate the therapeutic and synergistic properties and metabolism of chinmediformulae and the involved metabolic pathways using modern analytical techniques. It is an integral part of top-down systems biology, which aims to improve understanding of chinmediformulae. This approach of combining chinmedomics with chinmediformulae with modern health care systems may lead to a revolution in TCM therapy. Although the scientific study of chinmedomics is at an early stage and requires further scrutiny and validation, the approach has major implications to improve the efficacy of chinmediformulae. This article introduces and reviews the concept of chinmedomics, and highlights recent examples of the approach, which are presented for description and discussion.
Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from “data-to-knowledge-to-innovation,” a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards ®—the human gene database; the MalaCards —the human diseases database; and the PathCards— the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®—the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene–tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell “cards” in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics, and others yet to emerge on the postgenomics horizon.
Glycosylation defines the adhesive properties of animal cell surfaces and the surrounding extracellular environments. Because cells respond to stimuli by altering glycan expression, glycan structures vary according to spatial location in tissue and temporal factors. These dynamic structural expression patterns, combined with the essential roles glycans play in physiology, drive the need for analytical methods for glycoconjugates. In addition, recombinant glycoprotein drug products represent a multibillion dollar market. Effective analytical methods are needed to speed the identification of new targets and the development of industrial glycoprotein products, both new and biosimilar. Mass spectrometry is an enabling technology in glycomics. This review summarizes mass spectrometry of glycoconjugate glycans. The intent is to summarize appropriate methods for glycans given their chemical properties as distinct from those of proteins, lipids, and small molecule metabolites. Special attention is given to the uses of mass spectral profiling for glycomics with respect to the N-linked, O-linked, ganglioside, and glycosaminoglycan compound classes. Next, the uses of tandem mass spectrometry of glycans are summarized. The review finishes with an update on mass spectral glycoproteomics.
Weak acids are widely used as food preservatives (e. g., acetic, propionic, benzoic, and sorbic acids), herbicides (e. g., 2,4-dichlorophenoxyacetic acid), and as antimalarial (e. g., artesunic and artemisinic acids), anticancer (e. g., artesunic acid), and immunosuppressive (e. g., mycophenolic acid) drugs, among other possible applications. The understanding of the mechanisms underlying the adaptive response and resistance to these weak acids is a prerequisite to develop more effective strategies to control spoilage yeasts, and the emergence of resistant weeds, drug resistant parasites or cancer cells. Furthermore, the identification of toxicity mechanisms and resistance determinants to weak acid-based pharmaceuticals increases current knowledge on their cytotoxic effects and may lead to the identification of new drug targets. This review integrates current knowledge on the mechanisms of toxicity and tolerance to weak acid stress obtained in the model eukaryote Saccharomyces cerevisiae using genome-wide approaches and more detailed gene-by-gene analysis. The major features of the yeast response to weak acids in general, and the more specific responses and resistance mechanisms towards a specific weak acid or a group of weak acids, depending on the chemical nature of the side chain R group (R-COOH), are highlighted. The involvement of several transcriptional regulatory networks in the genomic response to different weak acids is discussed, focusing on the regulatory pathways controlled by the transcription factors Msn2p/Msn4p, War1p, Haa1p, Rim101p, and Pdr1p/Pdr3p, which are known to orchestrate weak acid stress response in yeast. The extrapolation of the knowledge gathered in yeast to other eukaryotes is also attempted.
The identification of ligand-binding sites is often the starting point for protein function annotation and structure-based drug design. Many computational methods for the prediction of ligand-binding sites have been developed in recent decades. Here we present a consensus method metaPocket, in which the predicted sites from four methods: LIGSITE(cs), PASS, Q-SiteFinder, and SURFNET are combined together to improve the prediction success rate. All these methods are evaluated on two datasets of 48 unbound/bound structures and 210 bound structures. The comparison results show that metaPocket improves the success rate from similar to 70 to 75% at the top 1 prediction. MetaPocket is available at http://metapocket.eml.org.
The glycome is defined as the glycan repertoire of cells, tissues, and organisms, as found under specified conditions. The vastly diverse glycome is generated by a nontemplate driven biosynthesis, which is indirectly encoded in the genome, and very dynamic. Due to this overwhelming diversity, glycomic analysis must be approached at different hierarchical levels of complexity. In this review five such levels of complexity and the experimental approaches used for analysis at each level are discussed for a subclass of the glycome: the sialome. The sialome, in analogy to the canopy of a forest, covers the cell membrane with diverse array of complex sialylated structures. Sialome complexity includes modification of sialic acid core structure (the leaves and flowers), the linkage to the underlying sugar (the stems), the identity, and arrangement of the underlying glycans (the branches), the structural attributes of the underlying glycans (the trees), and finally, the spatial organization of the sialoglycans in relation to components of the intact cell surface (the forest). Understanding the full complexity of the sialome thus requires combined analyses at multiple levels, that is, the sialome is far more than the sum of its parts.
Acute exacerbations of chronic obstructive pulmonary disease (COPD) are a major source of morbidity and contribute significantly to healthcare costs. Although bacterial infections are implicated in nearly 50% of exacerbations, only a handful of pathogens have been consistently identified in COPD airways, primarily by culture-based methods, and the bacterial microbiota in acute exacerbations remains largely uncharacterized. The aim of this study was to comprehensively profile airway bacterial communities using a culture-independent microarray, the 16S rRNA PhyloChip, of a cohort of COPD patients requiring ventilatory support and antibiotic therapy for exacerbation-related respiratory failure. PhyloChip analysis revealed the presence of over 1,200 bacterial taxa representing 140 distinct families, many previously undetected in airway diseases; bacterial community composition was strongly influenced by the duration of intubation. A core community of 75 taxa was detected in all patients, many of which are known pathogens. Bacterial community diversity in COPD airways is substantially greater than previously recognized and includes a number of potential pathogens detected in the setting of antibiotic exposure. Comprehensive assessment of the COPD airway microbiota using high-throughput, culture-independent methods may prove key to understanding the relationships between airway bacterial colonization, acute exacerbation, and clinical outcomes in this and other chronic inflammatory airway diseases.
Neurodegenerative diseases such as Alzheimer's disease (AD), Parkinson's disease (PD), and amyotrophic lateral sclerosis (ALS) lack robust diagnostics and prognostic biomarkers. Metabolomics is a postgenomics field that offers fresh insights for biomarkers of common complex as well as rare diseases. Using data on metabolite-disease associations published in the previous decade (2006–2016) in PubMed, ScienceDirect, Scopus, and Web of Science, we identified 101 metabolites as putative biomarkers for these three neurodegenerative diseases. Notably, uric acid, choline, creatine, L-glutamine, alanine, creatinine, and N-acetyl-L-aspartate were the shared metabolite signatures among the three diseases. The disease-metabolite-pathway associations pointed out the importance of membrane transport (through ATP binding cassette transporters), particularly of arginine and proline amino acids in all three neurodegenerative diseases. When disease-specific and common metabolic pathways were queried by using the pathway enrichment analyses, we found that alanine, aspartate, glutamate, and purine metabolism might act as alternative pathways to overcome inadequate glucose supply and energy crisis in neurodegeneration. These observations underscore the importance of metabolite-based biomarker research in deciphering the elusive pathophysiology of neurodegenerative diseases. Future research investments in metabolomics of complex diseases might provide new insights on AD, PD, and ALS that continue to place a significant burden on global health.
Driverless cars with artificial intelligence (AI) and automated supermarkets run by collaborative robots (cobots) working without human supervision have sparked off new debates: what will be the impacts of extreme automation, turbocharged by the Internet of Things (IoT), AI, and the Industry 4.0, on Big Data and omics implementation science? The IoT builds on (1) broadband wireless internet connectivity, (2) miniaturized sensors embedded in animate and inanimate objects ranging from the house cat to the milk carton in your smart fridge, and (3) AI and cobots making sense of Big Data collected by sensors. Industry 4.0 is a high-tech strategy for manufacturing automation that employs the IoT, thus creating the Smart Factory. Extreme automation until "everything is connected to everything else" poses, however, vulnerabilities that have been little considered to date. First, highly integrated systems are vulnerable to systemic risks such as total network collapse in the event of failure of one of its parts, for example, by hacking or Internet viruses that can fully invade integrated systems. Second, extreme connectivity creates new social and political power structures. If left unchecked, they might lead to authoritarian governance by one person in total control of network power, directly or through her/his connected surrogates. We propose Industry 5.0 that can democratize knowledge coproduction from Big Data, building on the new concept of symmetrical innovation. Industry 5.0 utilizes IoT, but differs from predecessor automation systems by having three-dimensional (3D) symmetry in innovation ecosystem design: (1) a built-in safe exit strategy in case of demise of hyperconnected entrenched digital knowledge networks. Importantly, such safe exists are orthogonal-in that they allow "digital detox" by employing pathways unrelated/unaffected by automated networks, for example, electronic patient records versus material/article trails on vital medical information; (2) equal emphasis on both acceleration and deceleration of innovation if diminishing returns become apparent; and (3) next generation social science and humanities (SSH) research for global governance of emerging technologies: "Post-ELSI Technology Evaluation Research" (PETER). Importantly, PETER considers the technology opportunity costs, ethics, ethics-of-ethics, framings (epistemology), independence, and reflexivity of SSH research in technology policymaking. Industry 5.0 is poised to harness extreme automation and Big Data with safety, innovative technology policy, and responsible implementation science, enabled by 3D symmetry in innovation ecosystem design.
Biological psychiatry research has long focused on the brain in elucidating the neurobiological mechanisms of anxiety- and trauma-related disorders. This review challenges this assumption and suggests that the gut microbiome and its interactome also deserve attention to understand brain disorders and develop innovative treatments and diagnostics in the 21st century. The recent, in-depth characterization of the human microbiome spurred a paradigm shift in human health and disease. Animal models strongly suggest a role for the gut microbiome in anxiety- and trauma-related disorders. The microbiota–gut–brain (MGB) axis sits at the epicenter of this new approach to mental health. The microbiome plays an important role in the programming of the hypothalamic–pituitary–adrenal (HPA) axis early in life, and stress reactivity over the life span. In this review, we highlight emerging findings of microbiome research in psychiatric disorders, focusing on anxiety- and trauma-related disorders specifically, and discuss the gut microbiome as a potential therapeutic target. 16S rRNA sequencing has enabled researchers to investigate and compare microbial composition between individuals. The functional microbiome can be studied using methods involving metagenomics, metatranscriptomics, metaproteomics, and metabolomics, as discussed in the present review. Other factors that shape the gut microbiome should be considered to obtain a holistic view of the factors at play in the complex interactome linked to the MGB. In all, we underscore the importance of microbiome science, and gut microbiota in particular, as emerging critical players in mental illness and maintenance of mental health. This new frontier of biological psychiatry and postgenomic medicine should be embraced by the mental health community as it plays an ever-increasing transformative role in integrative and holistic health research in the next decade.
Machine learning (ML) is being ubiquitously incorporated into everyday products such as Internet search, email spam filters, product recommendations, image classification, and speech recognition. New approaches for highly integrated manufacturing and automation such as the Industry 4.0 and the Internet of things are also converging with ML methodologies. Many approaches incorporate complex artificial neural network architectures and are collectively referred to as deep learning (DL) applications. These methods have been shown capable of representing and learning predictable relationships in many diverse forms of data and hold promise for transforming the future of omics research and applications in precision medicine. Omics and electronic health record data pose considerable challenges for DL. This is due to many factors such as low signal to noise, analytical variance, and complex data integration requirements. However, DL models have already been shown capable of both improving the ease of data encoding and predictive model performance over alternative approaches. It may not be surprising that concepts encountered in DL share similarities with those observed in biological message relay systems such as gene, protein, and metabolite networks. This expert review examines the challenges and opportunities for DL at a systems and biological scale for a precision medicine readership.
Omics is a form of high-throughput systems science. However, taxonomies for omics studies are limited, inviting us to rethink new ways in which we classify, prioritize, and rank various omics systems science studies. In this overarching context, the genome-wide study approaches have proliferated in number and popularity over the past decade. However, their hierarchy is not well organized and the development of attendant terminology is not controlled. In the present study, we searched the literature in PubMed and the Web of Science databases published from March 1999 to September 2016 using the keywords, including genome-wide, association, whole genome, transcriptome-wide, metabolome, epigenome, and phenome. We identified the whole genome study approaches and sorted them according to the omics technology types (genomics, proteomics, and so on) and hierarchy. Thirty-four studies from over 90 publications were sorted into 10 omics groups: DNA level, transcriptomics, proteomics, interactomics, metabolomics, epigenomics, miRNomics/ncRNomics, phenomics, environmental omics, and pharmacogenomics. We suggest here modifications of terminology for study approaches, which share the same acronyms such as EWAS for epigenome-wide association and environment-wide association studies, and MWAS for methylome-wide association and metabolome-wide association studies. Taken together, our study presented here provides the first systematic review and analyses of whole genome approaches and presents a baseline for further controlled terminology development, with a view to a new taxonomy for omics and multi-omics studies in the future. Finally, we call for greater dialogue and collaboration across diverse omics knowledge domains and applications, for example, across plants, animals, clinical medicine, and ecology.
Mass spectrometry is an analytical technique for the characterization of biological samples and is increasingly used in omics studies because of its targeted, nontargeted, and high throughput abilities. However, due to the large datasets generated, it requires informatics approaches such as machine learning techniques to analyze and interpret relevant data. Machine learning can be applied to MS-derived proteomics data in two ways. First, directly to mass spectral peaks and second, to proteins identified by sequence database searching, although relative protein quantification is required for the latter. Machine learning has been applied to mass spectrometry data from different biological disciplines, particularly for various cancers. The aims of such investigations have been to identify biomarkers and to aid in diagnosis, prognosis, and treatment of specific diseases. This review describes how machine learning has been applied to proteomics tandem mass spectrometry data. This includes how it can be used to identify proteins suitable for use as biomarkers of disease and for classification of samples into disease or treatment groups, which may be applicable for diagnostics. It also includes the challenges faced by such investigations, such as prediction of proteins present, protein quantification, planning for the use of machine learning, and small sample sizes.
The Human Microbiome Project (HMP) is a global initiative undertaken to identify and characterize the collection of human-associated microorganisms at multiple anatomic sites (skin, mouth, nose, colon, vagina), and to determine how intra-individual and inter-individual alterations in the microbiome influence human health, immunity, and different disease states. In this review article, we summarize the key findings and applications of the HMP that may impact pharmacology and personalized therapeutics. We propose a microbiome cloud model , reflecting the temporal and spatial uncertainty of defining an individual's microbiome composition, with examples of how intra-individual variations (such as age and mode of delivery) shape the microbiome structure. Additionally, we discuss how this microbiome cloud concept explains the difficulty to define a core human microbiome and to classify individuals according to their biome types. Detailed examples are presented on microbiome changes related to colorectal cancer, antibiotic administration, and pharmacomicrobiomics, or drug–microbiome interactions, highlighting how an improved understanding of the human microbiome, and alterations thereof, may lead to the development of novel therapeutic agents, the modification of antibiotic policies and implementation, and improved health outcomes. Finally, the prospects of a collaborative computational microbiome research initiative in Africa are discussed.
Triple negative breast cancer (TNBC) represents approximately 15% of breast cancers and is characterized by lack of expression of both estrogen receptor (ER) and progesterone receptor (PR), together with absence of human epidermal growth factor 2 (HER2). TNBC has attracted considerable attention due to its aggressiveness such as large tumor size, high proliferation rate, and metastasis. The absence of clinically efficient molecular targets is of great concern in treatment of patients with TNBC. In light of the complexity of TNBC, we applied a systematic and integrative transcriptomics and interactomics approach utilizing transcriptional regulatory and protein–protein interaction networks to discover putative transcriptional control mechanisms of TNBC. To this end, we identified TNBC-driven molecular pathways such as the Janus kinase-signal transducers, and activators of transcription (JAK-STAT) and tumor necrosis factor (TNF) signaling pathways. The multi-omics molecular target and biomarker discovery approach presented here can offer ways forward on novel diagnostics and potentially help to design personalized therapeutics for TNBC in the future.
Recent advancements in mass spectrometric proteomics provide a promising result in utilizing saliva to explore biomarkers for diagnostic purposes. However, the issues of specificity or redundancy of disease-associated salivary biomarkers have not been described. This systematic review was therefore aimed to define and summarize disease-related salivary biomarkers identified by mass spectrometry proteomics. Peer-reviewed articles published through July 2009 within three databases were reviewed. Out of 243 articles, 21 studies were selected in this systematic review with conditions including Sjögren's syndrome, squamous cell carcinoma, dental caries, diabetes, breast cancer, periodontitis, gastric cancer, systemic sclerosis, oral lichen planus, bleeding oral cavity, and graft-versus-host disease. The sample size ranged from 3–41 in both diseased and control subjects, with no consensus on sample collection protocol. One hundred eighty biomarkers were identified in total; 87 upregulated, 63 downregulated, and 30 varying based on disease. Except for Sjögren's syndrome, the majority of studies with the same disease produce inconsistent biomarkers. Larger sample size and standardization of sample collection/treatment protocol may improve future studies.