→ AbstractThe presenter(s) will be available for live Q&A in this session (BCC East).
Mark Ziemann 1, Antony Kaspi 2
1 Deakin University, Geelong, Australia. Email:
m.ziemann@deakin.edu.au2 The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia.
Project Website:
http://dee2.io/Source Code:
https://github.com/markziemann/dee2License: (example: GNU General Public License v3.0)
RNA-seq is currently the most popular method for transcriptome-wide gene expression profiling,
but despite data-sharing requirements, rates of data reuse are still very low. This is due to the need
for high end computing infrastructure and pipelines that require command line expertise for raw
data processing. Resources such as Recount2, ARCHS4 and Digital Expression Explorer 2 (DEE2)
provide easy access to some uniformly processed data, with queryable web interfaces, bulk
downloads and R packages.
Keeping up with the rapid pace of data deposition to the Short Read Archive (SRA) is proving a
challenge. As of May 2020, there are 1.49M samples available in SRA for the nine organisms
included in DEE2, and of these 0.88M are available as processed data in DEE2 (Figure 1). This
makes DEE2 coverage about twice as extensive as the next largest dataset (ARCHS4). Since original
publication in 2019, DEE2 has grown from 5.3 to 8.05 T mapped reads.
In this presentation I will outline the challenges and strategies in maintaining and growing
resources of this scale. In addition we will discuss recent enhancements including direct integration
of the web interface to Degust (
http://degust.erc.monash.edu/), a popular web based tool for
statistical analysis of RNA-seq data. The R package getDEE2 has been extensively updated and
submitted to BioConductor. It allows programmatic access to DEE2 datasets in the form of
SummarizedExperiment objects that are compatible with many downstream analysis tools in the
BioConductor ecosystem. Together these advances are helping DEE2 to achieve the goal of making
all RNA-seq data freely available to everyone.