Circular RNAs composed of exonic sequence have been described in a small number of genes. Thought to result from splicing errors, circular RNA species possess no known function. To delineate the universe of endogenous circular RNAs, we performed high-throughput sequencing (RNA-seq) of libraries prepared from ribosome-depleted RNA with or without digestion with the RNA exonuclease, RNase R. We identified >25,000 distinct RNA species in human fibroblasts that contained non-colinear exons (a "backsplice") and were reproducibly enriched by exonuclease degradation of linear RNA. These RNAs were validated as circular RNA (ecircRNA), rather than linear RNA, and were more stable than associated linear mRNAs in vivo. In some cases, the abundance of circular molecules exceeded that of associated linear mRNA by >10-fold. By conservative estimate, we identified ecircRNAs from 14.4% of actively transcribed genes in human fibroblasts. Application of this method to murine testis RNA identified 69 ecircRNAs in precisely orthologous locations to human circular RNAs. Of note, paralogous kinases HIPK2 and HIPK3 produce abundant ecircRNA from their second exon in both humans and mice. Though HIPK3 circular RNAs contain an AUG translation start, it and other ecircRNAs were not bound to ribosomes. Circular RNAs could be degraded by siRNAs and, therefore, may act as competing endogenous RNAs. Bioinformatic analysis revealed shared features of circularized exons, including long bordering introns that contained complementary ALU repeats. These data show that ecircRNAs are abundant, stable, conserved and nonrandom products of RNA splicing that could be involved in control of gene expression.
Recently, several laboratories have reported thousands of circular RNAs (circRNAs) in animals. Numerous circRNAs are highly stable and have specific spatiotemporal expression patterns. Even though a function for circRNAs is unknown, these features make circRNAs an interesting class of RNAs as possible biomarkers and for further research. We developed a database and website, "circBase," where merged and unified data sets of circRNAs and the evidence supporting their expression can be accessed, downloaded, and browsed within the genomic context. circBase also provides scripts to identify known and novel circRNAs in sequencing data. The database is freely accessible through the web server at http://www.circbase.org/.
It is now clear that there is a diversity of circular RNAs in biological systems. Circular RNAs can be produced by the direct ligation of 5' and 3' ends of linear RNAs, as intermediates in RNA processing reactions, or by "backsplicing," wherein a downstream 5' splice site (splice donor) is joined to an upstream 3' splice site (splice acceptor). Circular RNAs have unique properties including the potential for rolling circle amplification of RNA, the ability to rearrange the order of genomic information, protection from exonucleases, and constraints on RNA folding. Circular RNAs can function as templates for viroid and viral replication, as intermediates in RNA processing reactions, as regulators of transcription in cis, as snoRNAs, and as miRNA sponges. Herein, we review the breadth of circular RNAs, their biogenesis and metabolism, and their known and anticipated functions.
RNA-seq is now the technology of choice for genome-wide differential gene expression experiments, but it is not clear how many biological replicates are needed to ensure valid biological interpretation of the results or which statistical tools are best for analyzing the data. An RNA-seq experiment with 48 biological replicates in each of two conditions was performed to answer these questions and provide guidelines for experimental design. With three biological replicates, nine of the 11 tools evaluated found only 20%-40% of the significantly differentially expressed (SDE) genes identified with the full set of 42 clean replicates. This rises to >85% for the subset of SDE genes changing in expression by more than fourfold. To achieve >85% for all SDE genes regardless of fold change requires more than 20 biological replicates. The same nine tools successfully control their false discovery rate at. 5% for all numbers of replicates, while the remaining two tools fail to control their FDR adequately, particularly for low numbers of replicates. For future RNA-seq experiments, these results suggest that at least six biological replicates should be used, rising to at least 12 when it is important to identify SDE genes for all fold changes. If fewer than 12 replicates are used, a superior combination of true positive and false positive performances makes edgeR and DESeq2 the leading tools. For higher replicate numbers, minimizing false positives is more important and DESeq marginally outperforms the other tools.
While the human transcriptome contains a large number of circular RNAs (circRNAs), the functions of most circRNAs remain unclear. Sequence annotation suggests that most circRNAs are generated from splicing in reversed orders across exons. However, the mechanisms of this backsplicing are largely unknown. Here we constructed a single exon minigene containing split GFP, and found that the pre-mRNA indeed produces circRNA through efficient backsplicing in human and Drosophila cells. The backsplicing is enhanced by complementary introns that form double-stranded RNA structure to bring splice sites in proximity, but such structure is not required. Moreover, backsplicing is regulated by general splicing factors and cis-elements, but with regulatory rules distinct from canonical splicing. The resulting circRNA can be translated to generate functional proteins. Unlike linear mRNA, poly-adenosine or poly-thymidine in 3' UTR can inhibit circular mRNA translation. This study revealed that backsplicing can occur efficiently in diverse eukaryotes to generate circular mRNAs.
Various stable circular RNAs (circRNAs) are newly identified to be the abundance of noncoding RNAs in Archaea, Caenorhabditis elegans, mice, and humans through high-throughput deep sequencing coupled with analysis of massive transcriptional data. CircRNAs play important roles in miRNA function and transcriptional controlling by acting as competing endogenous RNAs or positive regulators on their parent coding genes. However, little is known regarding circRNAs in plants. Here, we report 2354 rice circRNAs that were identified through deep sequencing and computational analysis of ssRNA-seq data. Among them, 1356 are exonic circRNAs. Some circRNAs exhibit tissue-specific expression. Rice circRNAs have a considerable number of isoforms, including alternative backsplicing and alternative splicing circularization patterns. Parental genes with multiple exons are preferentially circularized. Only 484 circRNAs have backsplices derived from known splice sites. In addition, only 92 circRNAs were found to be enriched for miniature inverted-repeat transposable elements (MITEs) in flanking sequences or to be complementary to at least 18-bp flanking intronic sequences, indicating that there are some other production mechanisms in addition to direct backsplicing in rice. Rice circRNAs have no significant enrichment for miRNA target sites. A transgenic study showed that overexpression of a circRNA construct could reduce the expression level of its parental gene in transgenic plants compared with empty-vector control plants. This suggested that circRNA and its linear form might act as a negative regulator of its parental gene. Overall, these analyses reveal the prevalence of circRNAs in rice and provide new biological insights into rice circRNAs.
The microRNA (miRNA) "sponge" method was introduced three years ago as a means to create continuous miRNA loss of function in cell lines and transgenic organisms. Sponge RNAs contain complementary binding sites to a miRNA of interest, and are produced from transgenes within cells. As with most miRNA target genes, a sponge's binding sites are specific to the miRNA seed region, which allows them to block a whole family of related miRNAs. This transgenic approach has proven to be a useful tool to probe miRNA functions in a variety of experimental systems. Here we will discuss the ways sponge and related constructs can be optimized and review recent applications of this method with particular emphasis on stable expression in cancer studies and in transgenic animals.
To investigate the global expression profile of miRNAs in primary breast cancer (BC) and normal adjacent tumor tissues (NATs) and its potential relevance to clinicopathological characteristics and patient survival, the genome-wide expression profiling of miRNAs in BC was investigated using a microarray containing 435 mature human miRNA oligonucleotide probes. Nine miRNAs of hsa-miR-21, hsa-miR-365, hsa-miR-181b, hsa-let-7f, hsa-miR-155, hsa-miR-29b, hsa-miR-181d, hsa-miR-98, and hsa-miR-29c were observed to be up-regulated greater than twofold in BC compared with NAT, whereas seven miRNAs of hsa-miR-497, hsa-miR-31, hsa-miR-355, hsa-miR-320, rno-mir-140, hsa-miR-127 and hsa-miR-30a-3p were observed to be down-regulated greater than twofold. The most significantly up-regulated miRNAs, hsa-mir-21 (miR-21), was quantitatively analyzed by TaqMan real-time PCR in 113 BC tumors. Interestingly, among the 113 BC cases, high level expression of miR-21 was significantly correlated with advanced clinical stage (P = 0.006, Fisher's exact text), lymph node metastasis (P = 0.007, Fisher's exact text), and shortened survival of the patients (hazard ratio [HR]= 5.476, P < 0.001). Multivariate Cox regression analysis revealed this prognostic impact (HR= 4.133, P = 0.001) to be independent of disease stage (HR= 2.226, P = 0.013) and histological grade (HR= 3.681, P = 0.033). This study could identify the differentiated miRNAs expression profile in BC and reveal that miR-21 overexpression was correlated with specific breast cancer biopathologic features, such as advanced tumor stage, lymph node metastasis, and poor survival of the patients, indicating that miR-21 may serve as a molecular prognostic marker for BC and disease progression.
RNA abundance and DNA copy number are routinely measured in high-throughput using microarray and next-generation sequencing ( NGS) technologies, and the attributes of different platforms have been extensively analyzed. Recently, the application of both microarrays and NGS has expanded to include microRNAs ( miRNAs), but the relative performance of these methods has not been rigorously characterized. We analyzed three biological samples across six miRNA microarray platforms and compared their hybridization performance. We examined the utility of these platforms, as well as NGS, for the detection of differentially expressed miRNAs. We then validated the results for 89 miRNAs by real-time RT-PCR and challenged the use of this assay as a "gold standard." Finally, we implemented a novel method to evaluate false-positive and false-negative rates for all methods in the absence of a reference method.
Polycomb repressive complex-2 (PRC2) is a histone methyltransferase required for epigenetic silencing during development and cancer. Among chromatin modifying factors shown to be recruited and regulated by long noncoding RNAs (IncRNAs), PRC2 is one of the most studied. Mammalian PRC2 binds thousands of RNAs in vivo, and it is becoming a model system for the recruitment of chromatin modifying factors by RNA. Yet, well-defined PRC2-binding motifs within target RNAs have been elusive. From the protein side, PRC2 RNA-binding subunits contain no known RNA-binding domains, complicating functional studies. Here we provide a critical review of existing models for the recruitment of PRC2 to chromatin by RNAs. This discussion may also serve researchers who are studying the recruitment of other chromatin modifiers by IncRNAs.
Riboswitches are commonly used by bacteria to detect a variety of metabolites and ions to regulate gene expression. To date, nearly 40 different classes of riboswitches have been discovered, experimentally validated, and modeled at atomic resolution in complex with their cognate ligands. The research findings produced since the first riboswitch validation reports in 2002 reveal that these noncoding RNA domains exploit many different structural features to create binding pockets that are extremely selective for their target ligands. Some riboswitch classes are very common and are present in bacteria from nearly all lineages, whereas others are exceedingly rare and appear in only a few species whose DNA has been sequenced. Presented herein are the consensus sequences, structural models, and phylogenetic distributions for all validated riboswitch classes. Based on our findings, we predict that there are potentially many thousands of distinct bacterial riboswitch classes remaining to be discovered, but that the rarity of individual undiscovered classes will make it increasingly difficult to find additional examples of this RNA-based sensory and gene control mechanism.
A plethora of noncoding (nc) RNAs has been revealed through the application of high-throughput analysis of the transcriptome, and this has led to an intensive search for possible biological functions attributable to these transcripts. A major category of functional ncRNAs that has emerged is for those that are implicated in coordinate gene silencing, either in cis or in trans. The archetype for this class is the well-studied long ncRNA Xist which functions in cis to bring about transcriptional silencing of an entire X chromosome in female mammals. An important step in X chromosome inactivation is the recruitment of the Polycomb repressive complex PRC2 that mediates histone H3 lysine 27 methylation, a hallmark of the inactive X chromosome, and recent studies have suggested that this occurs as a consequence of PRC2 interacting directly with Xist RNA. Accordingly, other ncRNAs have been linked to PRC2 targeting either in cis or in trans, and here also the mechanism has been proposed to involve direct interaction between PRC2 proteins and the different ncRNAs. In this review, I discuss the evidence for and against this hypothesis, in the process highlighting alternative models and discussing experiments that, in the future, will help to resolve existing discrepancies.
N-6-methyladenosine (m(6)A) is the most abundant modification in mammalian mRNA and long noncoding RNA (lncRNA). Recent discoveries of two m(6)A demethylases and cell-type and cell-state-dependent m(6)A patterns indicate that m(6)A modifications are highly dynamic and likely play important biological roles for RNA akin to DNA methylation or histone modification. Proposed functions for m(6)A modification include mRNA splicing, export, stability, and immune tolerance; but m(6)A studies have been hindered by the lack of methods for its identification at single nucleotide resolution. Here, we develop a method that accurately determines m(6)A status at any site in mRNA/lncRNA, termed site-specific cleavage and radioactive-labeling followed by ligation-assisted extraction and thin-layer chromatography (SCARLET). The method determines the precise location of the m(6)A residue and its modification fraction, which are crucial parameters in probing the cellular dynamics of m(6)A modification. We applied the method to determine the m(6)A status at several sites in two human lncRNAs and three human mRNAs and found that m(6)A fraction varies between 6% and 80% among these sites. We also found that many m(6)A candidate sites in these RNAs are however not modified. The precise determination of m(6)A status in a long noncoding RNA also enables the identification of an m(6)A-containing RNA structural motif.
Malat1 is an abundant long, noncoding RNA that localizes to nuclear bodies known as nuclear speckles, which contain a distinct set of pre-mRNA processing factors. Previous studies in cell culture have demonstrated that Malat1 interacts with pre-mRNA splicing factors, including the serine-and arginine-rich (SR) family of proteins, and regulates a variety of biological processes, including cancer cell migration, synapse formation, cell cycle progression, and responses to serum stimulation. To address the physiological function of Malat1 in a living organism, we generated Malat1-knockout (KO) mice using homologous recombination. Unexpectedly, the Malat1-KO mice were viable and fertile, showing no apparent phenotypes. Nuclear speckle markers were also correctly localized in cells that lacked Malat1. However, the cellular levels of another long, noncoding RNA-Neat1-which is an architectural component of nuclear bodies known as paraspeckles, were down-regulated in a particular set of tissues and cells lacking Malat1. We propose that Malat1 is not essential in living mice maintained under normal laboratory conditions and that its function becomes apparent only in specific cell types and under particular conditions.
Competition between mammalian RNAi-related gene silencing pathways is well documented. It is therefore important to identify all classes of small RNAs to determine their relationship with RNAi and how they affect each other functionally. Here, we identify two types of 5'-phosphate, 3'-hydroxylated human tRNA-derived small RNAs (tsRNAs). tsRNAs differ from microRNAs in being essentially restricted to the cytoplasm and in associating with Argonaute proteins, but not MOV10. The first type belongs to a previously predicted Dicer-dependent class of small RNAs that we find can modestly down-regulate target genes in trans. The 5' end of type II tsRNA was generated by RNaseZ cleavage downstream from a tRNA gene, while the 3' end resulted from transcription termination by RNA polymerase III. Consistent with their preferential association with the nonslicing Argonautes 3 and 4, canonical gene silencing activity was not observed for type II tsRNAs. The addition, however, of an oligonucleotide that was sense to the reporter gene, but antisense to an overexpressed version of the type II tsRNA, triggered robust, >80% gene silencing. This correlated with the redirection of the thus reconstituted fully duplexed double-stranded RNA into Argonaute 2, whereas Argonautes 3 and 4 were skewed toward less structured small RNAs, particularly single-strand RNAs. We observed that the modulation of tsRNA levels had minor effects on the abundance of microRNAs, but more pronounced changes in the silencing activities of both microRNAs and siRNAs. These findings support that tsRNAs are involved in the global control of small RNA silencing through differential Argonaute association, suggesting that small RNA-mediated gene regulation may be even more finely regulated than previously realized.
N(6)-methyladenosine (m(6)A) is the most abundant modification in mammalian mRNA and long noncoding RNA (lncRNA). Recent discoveries of two m(6)A demethylases and cell-type and cell-state-dependent m(6)A patterns indicate that m(6)A modifications are highly dynamic and likely play important biological roles for RNA akin to DNA methylation or histone modification. Proposed functions for m(6)A modification include mRNA splicing, export, stability, and immune tolerance; but m(6)A studies have been hindered by the lack of methods for its identification at single nucleotide resolution. Here, we develop a method that accurately determines m(6)A status at any site in mRNA/lncRNA, termed site-specific cleavage and radioactive-labeling followed by ligation-assisted extraction and thin-layer chromatography (SCARLET). The method determines the precise location of the m(6)A residue and its modification fraction, which are crucial parameters in probing the cellular dynamics of m(6)A modification. We applied the method to determine the m(6)A status at several sites in two human lncRNAs and three human mRNAs and found that m(6)A fraction varies between 6% and 80% among these sites. We also found that many m(6)A candidate sites in these RNAs are however not modified. The precise determination of m(6)A status in a long noncoding RNA also enables the identification of an m(6)A-containing RNA structural motif.
With an increasing interest in RNA therapeutics and for targeting RNA to treat disease, there is a need for the tools used in protein-based drug design, particularly DOCKing algorithms, to be extended or adapted for nucleic acids. Here, we have compiled a test set of RNA-ligand complexes to validate the ability of the DOCK suite of programs to successfully recreate experimentally determined binding poses. With the optimized parameters and a minimal scoring function, 70% of the test set with less than seven rotatable ligand bonds and 26% of the test set with less than 13 rotatable bonds can be successfully recreated within 2 angstrom heavy-atom RMSD. When DOCKed conformations are rescored with the implicit solvent models AMBER generalized Born with solvent-accessible surface area (GB/SA) and Poisson-Boltzmann with solvent-accessible surface area (PB/SA) in combination with explicit water molecules and sodium counterions, the success rate increases to 80% with PB/SA for less than seven rotatable bonds and 58% with AMBER GB/SA and 47% with PB/SA for less than 13 rotatable bonds. These results indicate that DOCK can indeed be useful for structure-based drug design aimed at RNA. Our studies also suggest that RNA-directed ligands often differ from typical protein-ligand complexes in their electrostatic properties, but these differences can be accommodated through the choice of potential function. In addition, in the course of the study, we explore a variety of newly added DOCK functions, demonstrating the ease with which new functions can be added to address new scientific questions.
High-salinity, drought, and low temperature are three common environmental stress factors that seriously influence plant growth and development worldwide. Recently, microRNAs (miRNAs) have emerged as a class of gene expression regulators that have also been linked to stress responses. However, the relationship between miRNA expression and stress responses is just beginning to be explored. Here, we identified 14 stress-inducible miRNAs using microarray data in which the effects of three abiotic stresses were surveyed in Arabidopsis thaliana. Among them, 10 high-salinity-, four drought-, and 10 cold-regulated miRNAs were detected, respectively. miR168, miR171, and miR396 responded to all of the stresses. Expression profiling by RT-PCR analysis showed great cross-talk among the high-salinity, drought, and cold stress signaling pathways. The existence of stress-related elements in miRNA promoter regions provided further evidence supporting our results. These findings extend the current view about miRNA as ubiquitous regulators under stress conditions.
Proper normalization is a critical but often an underappreciated aspect of quantitative gene expression analysis. This study describes the identification and characterization of appropriate reference RNA targets for the normalization of microRNA (miRNA) quantitative RT-PCR data. miRNA microarray data from dozens of normal and disease human tissues revealed ubiquitous and stably expressed normalization candidates for evaluation by qRT-PCR. miR-191 and miR-103, among others, were found to be highly consistent in their expression across 13 normal tissues and five pair of distinct tumor/normal adjacent tissues. These miRNAs were statistically superior to the most commonly used reference RNAs used in miRNA qRT-PCR experiments, such as 5S rRNA, U6 snRNA, or total RNA. The most stable normalizers were also highly conserved across flash-frozen and formalin-fixed paraffin-embedded lung cancer tumor/NAT sample sets, resulting in the confirmation of one well-documented oncomir (let-7a), as well as the identification of novel oncomirs. These findings constitute the first report describing the rigorous normalization of miRNA qRT-PCR data and have important implications for proper experimental design and accurate data interpretation.
Alternative splicing of pre-mRNAs is a major contributor to both proteomic diversity and control of gene expression levels. Splicing is tightly regulated in different tissues and developmental stages, and its disruption can lead to a wide range of human diseases. An important long-term goal in the splicing field is to determine a set of rules or "code'' for splicing that will enable prediction of the splicing pattern of any primary transcript from its sequence. Outside of the core splice site motifs, the bulk of the information required for splicing is thought to be contained in exonic and intronic cis-regulatory elements that function by recruitment of sequence-specific RNA-binding protein factors that either activate or repress the use of adjacent splice sites. Here, we summarize the current state of knowledge of splicing cis-regulatory elements and their context-dependent effects on splicing, emphasizing recent global/genome-wide studies and open questions.