Sunday, July 19 • 23:31 - 23:45
Digital Expression Explorer 2: a repository of 8 trillion uniformly processed RNA-seq reads and still counting 🍐

Mark Ziemann 1, Antony Kaspi 2

1 Deakin University, Geelong, Australia. Email: m.ziemann@deakin.edu.au
2 The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia.

Project Website: http://dee2.io/
Source Code: https://github.com/markziemann/dee2
RNA-seq is currently the most popular method for transcriptome-wide gene expression profiling,
but despite data-sharing requirements, rates of data reuse are still very low. This is due to the need
for high end computing infrastructure and pipelines that require command line expertise for raw
data processing. Resources such as Recount2, ARCHS4 and Digital Expression Explorer 2 (DEE2)
provide easy access to some uniformly processed data, with queryable web interfaces, bulk
downloads and R packages.
Keeping up with the rapid pace of data deposition to the Short Read Archive (SRA) is proving a
challenge. As of May 2020, there are 1.49M samples available in SRA for the nine organisms
included in DEE2, and of these 0.88M are available as processed data in DEE2 (Figure 1). This
makes DEE2 coverage about twice as extensive as the next largest dataset (ARCHS4). Since original
publication in 2019, DEE2 has grown from 5.3 to 8.05 T mapped reads.
In this presentation I will outline the challenges and strategies in maintaining and growing
resources of this scale. In addition we will discuss recent enhancements including direct integration
of the web interface to Degust (http://degust.erc.monash.edu/), a popular web based tool for
statistical analysis of RNA-seq data. The R package getDEE2 has been extensively updated and
submitted to BioConductor. It allows programmatic access to DEE2 datasets in the form of
SummarizedExperiment objects that are compatible with many downstream analysis tools in the
BioConductor ecosystem. Together these advances are helping DEE2 to achieve the goal of making
all RNA-seq data freely available to everyone.

### Hi there 👋I am a Lecturer and researcher in computational biology at Deakin University, Australia. Our group is focused on building data resources and software tools to accelerate biomedical discovery. We collaborate closely with clinicians and biologists to get the most out... Read More →

Sunday July 19, 2020 23:31 - 23:45 EDT