→ Abstract, Slides, VideoThe presenter(s) will be available for live Q&A in this session (BCC West).
Peter Cock 1, David Cooke 2, Leighton Pritchard 3
1 Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee, UK
2 Cell and Molecular Sciences, James Hutton Institute, Invergowrie, Dundee, UK
3 Strathclyde Institute of Pharmacy & Biomedical Sciences, Glasgow, UK
Repository:
https://github.com/peterjc/thapbi-pict/Documentation:
https://thapbi-pict.readthedocs.io/License: MIT
Molecular barcodes are central to environmental monitoring and identification of species present in a
sample, and use PCR primers to amplify a diagnostic genome region of the organisms of interest. We are
interested in metabarcoding where multiple samples are multiplexed for high-throughput sequencing on the
Illumina platform, using overlapping paired end reads. Each sample yields a collection of marker sequences,
and matching these to a database of known species produces a taxonomic breakdown reflecting community
composition,
THAPBI PICT is a metabarcoding tool we developed for the UK funded Tree Health and Plant Biose-
curity Initiative (THAPBI) Phyto-Threats project, which focused on identifying Phytophthora species in
commercial tree nurseries. Phytophthora (from Greek meaning plant-destroyer) are economically important
plant pathogens, important in both agriculture and forestry. This project targeted an ITS1 marker (Internal
Transcribed Spacer one, a region found in eukaryotic genomes between the 18S and 5.8S rRNA genes) with
nested primers to identify Phytophthora species. By varying primer settings and using a custom database,
THAPBI PICT can be applied to other organisms and/or barcode marker sequences - making it more than
just a Phytophthora ITS1 Classification Tool (PICT).
The analysis pipeline starts from demultiplexed paired FASTQ files, as produced by the Illumina MiSeq
platform. These are quality trimmed, overlapping reads merged and primer trimmed (calling external tools)
and then deduplicated giving a much smaller list of unique sequences and associated read counts (passing a
minimum count threshold intended to exclude "noise"). These are matched to a curated database using a
range of methods, producing both plain text and formatted Excel output. An edit graph in XGMML format
is also produced for display in Cytoscape and other visualisation tools.
THAPBI PICT is released as open source software under the MIT licence. It is written in Python, a free
open source language available on all major operating systems. Version control using git hosted publicly on
GitHub is used for the source code, documentation, and database builds including tracking the hand-curated
reference set of Phytophthora ITS1 sequences. Continuous integration of the test suite is currently run on
both TravisCI and CircleCI. Software is released to the Python Packaging Index (PyPI) as standard for
the Python ecosystem, and additionally packaged for Conda via the BioConda channel. This offers simple
installation of the tool itself, and all the command line dependencies on Linux or macOS. The documentation
is currently hosted on Read The Docs, updated automatically from the GitHub repository.