Motivation: The availability of flexible open source software for the analysis of gene expression raw level data has greatly facilitated the development of widely used preprocessing methods for these technologies. However, the expansion of microarray applications has exposed the limitation of existing tools. Results: We developed the oligo package to provide a more general solution that supports a wide range of applications. The package is based on the BioConductor principles of transparency, reproducibility and efficiency of development. It extends the existing tools and leverages existing code for visualization, accessing data and widely used preprocessing routines. The oligo package implements a unified paradigm for preprocessing data and interfaces with other BioConductor tools for downstream analysis. Our infrastructure is general and can be used by other BioConductor packages.
The assessment of data quality is a major concern in microarray analysis. arrayQualityMetrics is a Bioconductor package that provides a report with diagnostic plots for one or two colour microarray data. The quality metrics assess reproducibility, identify apparent outlier arrays and compute measures of signal-to-noise ratio. The tool handles most current microarray technologies and is amenable to use in automated analysis pipelines or for automatic report generation, as well as for use by individuals. The diagnosis of quality remains, in principle, a context-dependent judgement, but our tool provides powerful, automated, objective and comprehensive instruments on which to base a decision. Availability:: arrayQualityMetrics is a free and open source package, under LGPL license, available from the Bioconductor project at www.bioconductor.org. A users guide and examples are provided with the package. Some examples of HTML reports generated by arrayQualityMetrics can be found at http://www.microarray-quality.org Contact:: email@example.com Supplementary information:: Supplementary data are available at Bioinformatics online.
Motivation: When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations. Results: We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably. Availablity: Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org. Contact: firstname.lastname@example.org Supplementary information: Additional figures may be found at http://www.stat.berkeley.edu/similar tobolstad/normalize/ index.html.
Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.
limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Single-stranded oligonucleotides stabilize highly fluorescent Ag nanoclusters, with emission colors tunable via DNA sequence. We utilized DNA microarrays to optimize these scaffold sequences for creating nearly spectrally pure Ag nanocluster fluorophores that are highly photostable and exhibit great buffer stability. Five different nanocluster emitters have been created with tunable emission from the blue to the near-IR and excellent photophysical properties. Ensemble and single molecule fluorescence studies show that oligonucleotide encapsulated Ag nanoclusters exhibit significantly greater photostability and higher emission rates than commonly used cyanine dyes.
We present a genotyping method for simultaneously scoring 116,204 SNPs using oligonucleotide arrays. At call rates >99%, reproducibility is >99.97% and accuracy, as measured by inheritance in trios and concordance with the HapMap Project, is >99.7%. Average intermarker distance is 23.6 kb, and 92% of the genome is within 100 kb of a SNP marker. Average heterozygosity is 0.30, with 105,511 SNPs having minor allele frequencies >5%.
The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.
Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of 'baseline' expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful 'contrasts', i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user.