→ AbstractThe presenter(s) will be available for live Q&A in this session (BCC East).
Mark Ziemann 1, Antony Kaspi 2
1 Deakin University, Geelong, Australia. Email:
m.ziemann@deakin.edu.au2 The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia.
Project Website:
http://dee2.io/Source Code:
https://github.com/markziemann/dee2License: (example: GNU General Public License v3.0)
RNA-seq is currently the most popular method for transcriptome-wide gene expression profiling, but despite data-sharing requirements, rates of data reuse are still very low. This is due to the need for high end computing infrastructure and pipelines that require command line expertise for raw data processing. Resources such as Recount2, ARCHS4 and Digital Expression Explorer 2 (DEE2) provide easy access to some uniformly processed data, with queryable web interfaces, bulk downloads and R packages.
Keeping up with the rapid pace of data deposition to the Short Read Archive (SRA) is proving a challenge. As of May 2020, there are 1.49M samples available in SRA for the nine organisms included in DEE2, and of these 0.88M are available as processed data in DEE2 (Figure 1). This makes DEE2 coverage about twice as extensive as the next largest dataset (ARCHS4). Since original publication in 2019, DEE2 has grown from 5.3 to 8.05 T mapped reads.
In this presentation I will outline the challenges and strategies in maintaining and growing resources of this scale. In addition we will discuss recent enhancements including direct integration of the web interface to Degust (
http://degust.erc.monash.edu/), a popular web based tool for statistical analysis of RNA-seq data. The R package getDEE2 has been extensively updated and submitted to BioConductor. It allows programmatic access to DEE2 datasets in the form of SummarizedExperiment objects that are compatible with many downstream analysis tools in the BioConductor ecosystem. Together these advances are helping DEE2 to achieve the goal of making all RNA-seq data freely available to everyone.