BCC2020 is online, global, and affordable. The meeting and training are now done, and the CoFest is under way.

The 2020 Bioinformatics Community Conference brings together the Bioinformatics Open Source Conference (BOSC) and the Galaxy Community Conference into a single event featuring training, a meeting, and a CollaborationFest. Events run from July 17 through July 25, and is held in both the eastern and western hemispheres.

Wednesday, July 22 • 00:31 - 00:45
Serratus: Ultra-deep search for novel coronaviruses 🍐

Rayan Chikhi 1, Kyl Wellman, Steven J. Hallam 2, Anton Korobeynikov 3, Dan Lohr, Robert
C. Edgar, Artem Babaian, Dmitry Meloshko 3, Tomer Altman, Ryan J. McLaughlin 2, Jeff
Taylor, Victor Lin, and Gherman Novakovsky 2.

# Author list randomized
1 Institut Pasteur & CNRS
2 University of British Columbia
3 Center for Algorithmic Biotechnologies, Saint Petersburg State University.

Contact: artem@rRNA.ca
Project Website: http://serratus.io
Source Code: https://github.com/ababaian/serratus
License: GPLv3


Despite intense efforts to sequence and analyze SARS-CoV-2 isolates, understanding
of the virus's provenance is limited by incomplete genomic characterization of the
Coronaviridae (CoV) family.

Serratus is an open science project for the discovery of new virus sequences by
aligning all RNA-seq, meta-genomic, meta-transcriptomic and environmental NGS data
in the NCBI Short Read Archive (SRA).

Here we report a preliminary survey of 1.14 million sequence libraries (26.78
petabases) where we have uncovered several previously unreported CoV species, and
identified thousands of CoV+ libraries.

To perform this ultra-high throughput CoV search, we leveraged AWS cloud HPC with
a 22,500 vCPU cluster. Using a hyper-parallelized architecture we could bypass
conventional networking and disk IO bottlenecks to achieve a processing rate in
excess of 500,000 sequencing libraries per day, at a cost of ~$0.01 per library.

We are building a 100% open data-set of all viral sequences in the SRA to accelerate
the translation of these data. All notebooks, source-code, raw and processed
sequence data generated in Serratus is freely available within 24h of discovery.

Expanding the known repertoire of coronaviruses will not only help determine
the origins of this pandemic, but it can help prevent another one.

avatar for Artem Babaian

Artem Babaian

University of British Columbia

Wednesday July 22, 2020 00:31 - 00:45 EDT