→ AbstractThe presenter(s) will be available for live Q&A in this session (BCC West).
Sam Kovaka 1, Yunfan Fan 2, Bohan Ni 1, Winston Timp 2, Michael C. Schatz 1,3,4
Email: skovaka1@jhu.edu
1 Department of Computer Science, Johns Hopkins University, Baltimore, MD.
2 Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 3. Department of Biology, Johns Hopkins University, Baltimore, MD
4. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
Project Source Code:
https://github.com/skovaka/UNCALLEDLicense: MIT License
ReadUntil sequencing allows nanopore devices to selectively stop sequencing an individual read in real-time by ejecting it from the pore and immediately switch to another read. If reads could be rapidly mapped to large references while being sequenced, this would enable targeted sequencing of specific genomic regions or even specific genomes. However, most mapping methods require basecalling, which is computationally intensive and requires a significant amount of the read to be sequenced.
Here we present UNCALLED (Utility for Nanopore Current ALignment to Large Expanses of DNA), an open-source mapper rapidly matches raw streaming nanopore current signals to a large DNA reference without basecalling. This is accomplished by probabilistically considering all possible k-mers that the signal could represent, and then pruning the possibilities based on the reference genome sequence encoded using an FM-index. Importantly, UNCALLED dynamically adjusts the signal level model probability cutoffs during alignment to achieve both high accuracy and high speed when aligning the noisy signal data.
We used UNCALLED to deplete the sequencing of known bacterial genomes within a Zymo mock microbial community, enriching the remaining yeast sequence from ~20x coverage to ~100x. We also used UNCALLED to enrich for 148 human genes associated with hereditary cancers to 29.6x coverage (a 5.6 fold increase) using a single MinION flowcell, enabling accurate detection of SNPs, indels, structural variants (SVs), and methylation in these genes. Notably, twice as many SVs were detected compared to 50x coverage Illumina sequencing, verified by whole-genome nanopore and PacBio HiFi sequencing. Finally, we show that UNCALLED could be used to enrich larger gene panels such as all 717 genes in the COSMIC Census, or be used with cDNA/RNA sequencing, for example to deplete high- abundance transcripts.