Loading…
BCC2020 has ended
➞ Set your timezone before doing anything else on this site (home page, on the right)
Limit what is shown by Type, Category, or Hemisphere
Registration closed July 15.

BCC2020 is online, global, and affordable. The meeting and training are now done, and the CoFest is under way.

The 2020 Bioinformatics Community Conference brings together the Bioinformatics Open Source Conference (BOSC) and the Galaxy Community Conference into a single event featuring training, a meeting, and a CollaborationFest. Events run from July 17 through July 25, and is held in both the eastern and western hemispheres.

Sunday, July 19 • 23:31 - 23:45
Digital Expression Explorer 2: a repository of 8 trillion uniformly processed RNA-seq reads and still counting 🍐

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!



Abstract


The presenter(s) will be available for live Q&A in this session (BCC East).

Mark Ziemann 1, Antony Kaspi 2

1 Deakin University, Geelong, Australia. Email: m.ziemann@deakin.edu.au
2 The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia.

Project Website: http://dee2.io/
Source Code: https://github.com/markziemann/dee2
License: (example: GNU General Public License v3.0)

RNA-seq is currently the most popular method for transcriptome-wide gene expression profiling,
but despite data-sharing requirements, rates of data reuse are still very low. This is due to the need
for high end computing infrastructure and pipelines that require command line expertise for raw
data processing. Resources such as Recount2, ARCHS4 and Digital Expression Explorer 2 (DEE2)
provide easy access to some uniformly processed data, with queryable web interfaces, bulk
downloads and R packages.
Keeping up with the rapid pace of data deposition to the Short Read Archive (SRA) is proving a
challenge. As of May 2020, there are 1.49M samples available in SRA for the nine organisms
included in DEE2, and of these 0.88M are available as processed data in DEE2 (Figure 1). This
makes DEE2 coverage about twice as extensive as the next largest dataset (ARCHS4). Since original
publication in 2019, DEE2 has grown from 5.3 to 8.05 T mapped reads.
In this presentation I will outline the challenges and strategies in maintaining and growing
resources of this scale. In addition we will discuss recent enhancements including direct integration
of the web interface to Degust (http://degust.erc.monash.edu/), a popular web based tool for
statistical analysis of RNA-seq data. The R package getDEE2 has been extensively updated and
submitted to BioConductor. It allows programmatic access to DEE2 datasets in the form of
SummarizedExperiment objects that are compatible with many downstream analysis tools in the
BioConductor ecosystem. Together these advances are helping DEE2 to achieve the goal of making
all RNA-seq data freely available to everyone.


Speakers
avatar for Mark Ziemann

Mark Ziemann

Deakin University
### Hi there 👋I am a Lecturer and researcher in computational biology at Deakin University, Australia. Our group is focused on building data resources and software tools to accelerate biomedical discovery. We collaborate closely with clinicians and biologists to get the most out... Read More →


Sunday July 19, 2020 23:31 - 23:45 EDT
BOSC