Loading…
BCC2020 has ended
➞ Set your timezone before doing anything else on this site (home page, on the right)
Limit what is shown by Type, Category, or Hemisphere
Registration closed July 15.

BCC2020 is online, global, and affordable. The meeting and training are now done, and the CoFest is under way.

The 2020 Bioinformatics Community Conference brings together the Bioinformatics Open Source Conference (BOSC) and the Galaxy Community Conference into a single event featuring training, a meeting, and a CollaborationFest. Events run from July 17 through July 25, and is held in both the eastern and western hemispheres.

Thursday, July 16
 

11:00 EDT

Pre-BCC Open House!
The conference is using the Remo.co platform as our venue. Remo offers an experience that more closely mirrors an in-person event. It has great small group and presentation support, including posters and demos. It's also more fun than most online conference platforms. Remo is also not familiar to most BCC participants.

So we are having two open houses, one in each hemisphere, the day before BCC training starts. These walk-throughs will introduce participants (yes, everyone at BCC is a participant) to Remo's features and how to navigate between sessions, poster/demos, BoFs, training and everything else.

All registered participants will receive invites the day before the open houses.

We are looking forward to showing you the BCC venue. (But, please bring your own snacks.)

Thursday July 16, 2020 11:00 - 12:00 EDT
West
 
Friday, July 17
 

03:00 EDT

Pre-BCC Open House!
The conference is using the Remo.co platform as our venue. Remo offers an experience that more closely mirrors an in-person event. It has great small group and presentation support, including posters and demos. It's also more fun than most online conference platforms. Remo is also not familiar to most BCC participants.

So we are having two open houses, one in each hemisphere, the day before BCC training starts. These walk-throughs will introduce participants (yes, everyone at BCC is a participant) to Remo's features and how to navigate between sessions, poster/demos, BoFs, training and everything else.

All registered participants will receive invites the day before the open houses.

We are looking forward to showing you the BCC venue. (But, please bring your own snacks.)

Friday July 17, 2020 03:00 - 04:00 EDT
East

09:00 EDT

West Training 1
BCC includes 12 different sessions in two hemispheres, and covering a wealth of topics.

This is the first training session of BCC2020 and features these topics:


Want to participate? Register early and save 50%.

Friday July 17, 2020 09:00 - 11:30 EDT
Joint

09:01 EDT

Adding Fun(ction) to Microbiome Analysis: Metatranscriptomics and Metaproteomics Workflows in Galaxy
 Schedule, Chat, Slides, GTN, Video

Functional microbiome analysis which estimates the functional groups expressed by microbial community, enables researchers to look beyond taxonomic composition and correlation with the condition under study. The tutorial will introduce attendees to using microbial community RNASeq data analysis using ASaiM metatranscriptomics workflow (https://training.galaxyproject.org/training-material/topics/metagenomics/tutorials/metatranscriptomics-short/tutorial.html).

We will also present metaproteomics workflows (https://training.galaxyproject.org/training-material/topics/proteomics/tutorials/metaproteomics/tutorial.html) for characterizing microbial proteins using mass spectrometry data. Attendees will be able to run workflows on small test datasets. Lastly, workshop trainers will update attendees on the latest developments in Galaxy tools and workflows for functional microbiome and multi-omics analysis.
  • Introduction to functional microbiome analysis
  • Metatranscriptomics using Galaxy framework
  • Metaproteomic analysis using Galaxy

Prerequisites
  • A laptop with a modern web browser.

Speakers
avatar for Timothy J. Griffin

Timothy J. Griffin

Professor, University of Minnesota
avatar for James Johnson

James Johnson

Senior Software Developer, Minnesota Supercomputing Institute, University of Minnesota
Galaxy for genomics and proteomics
avatar for Saskia Hiltemann

Saskia Hiltemann

Erasmus MC
Metagenomics, Training materials, board games, CTF & security
avatar for Pratik Jagtap

Pratik Jagtap

Research Assistant Professor, University of Minnesota
Metaproteomics . DIA . Proteogenomics
avatar for Subina Mehta

Subina Mehta

Researcher, University of Minnesota


Friday July 17, 2020 09:01 - 11:30 EDT
Training E

09:01 EDT

Getting started in Git using GitHub Desktop
 Schedule, Chat, Video

Git doesn't need to be tricky, and you don't need to use a terminal to do it. In a 2.5 hour session, we will talk over the basics of version control covering:
  • why version control is useful,
  • how to create your first git repository,
  • the basics of markdown,
  • what a pull request is,
  • and why open source is important in science.
Instead of focusing on code in a specific programming langauge, will instead focus on a common neutral ground - markdown - which will also give participants the ability to create their own personal or lab website on GitHub Pages.

Prerequisites
  • A laptop capable of running GitHub desktop (e.g. a linux, mac, or windows laptop, but not a chromebook or tablet).

Speakers
avatar for Yo Yehudi

Yo Yehudi

Software Developer, University of Cambridge & Open Life Science
Integrated genomic data (InterMine)


Friday July 17, 2020 09:01 - 11:30 EDT
Training A

09:01 EDT

How to use Reactome data, tools and web services
Slides, Tutorial, Video

Reactome stakeholders span the informatics, clinical and basic research communities, and present us with a broad set of user requirements, from casual browsing of online pathway information to network analysis and modeling. During the BCC2020 training session, we will introduce the Reactome graph database, web site, web services, Docker image, and downloadable data sets. We will demonstrate how Reactome is useful to bioinformaticians and data integrators who are interested in finding, organizing, and utilizing biological information to support data visualization, integration and analysis. We will address the following:
  • Different use cases for using the web portal (analysis tool, curated content, content service, download files). 
  •  What data/bioinformatics questions Reactome can help answer.
  • How to use Reactome’s Content Service and Analysis Service web interfaces and APIs.
  • How to do basic queries using Reactome’s Graph Database (Neo4J and Cypher).


Prerequisites
  •  A wi-fi enabled laptop with a modern web browser.
  •  Basic knowledge of how to navigate a system and run commands from the command line (curl, grep, etc…)
  • A robust text editor and web browser.
  • Optional: A laptop capable of running Docker. installation instructions.


Friday July 17, 2020 09:01 - 11:30 EDT
Training B

09:01 EDT

Introduction to Galaxy Administration I
 Schedule, Chat, byobo Cheatsheet, Video

This session is full.

After attending this three-session workshop you will be able to set up, configure, and administer a fairly polished Galaxy instance. Topics include:
  • deployment and platform options
  • using Ansible to install and configure your own server
  • customizing and extending your instance
  • defining and importing genomes, running data managers
  • upgrading to a new Galaxy release
  • configure nginX webserver with Galaxy
  • database overview and best practices
  • running tools in containers
  • users and groups and quotas
  • storage management and using heterogeneous storage services
  • exploring the Galaxy job configuration file
  • connecting Galaxy to compute clusters
  • polishing Galaxy on uWSGI application server
  • instance monitoring using Grafana
  • shared data management with CVMFS
  • when things go wrong: Galaxy server troubleshooting tips & examples

Prerequisites
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor: If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle

Speakers
avatar for That Other Person

That Other Person

Software Engineer, Galaxy Project, Johns Hopkins University
avatar for Martin Čech

Martin Čech

Dev and Trainer, free element
Galaxy Enthusiast
avatar for Nate Coraor

Nate Coraor

System Administrator, Galaxy Project, Penn State University
avatar for Nicola Soranzo

Nicola Soranzo

Earlham Institute


Friday July 17, 2020 09:01 - 11:30 EDT
Training C

09:01 EDT

Introduction to Using Galaxy
Outline, Chat, Slides, Tutorial, Video

This workshop will introduce the Galaxy user interface and how it can be used for reproducible data analysis. We will cover the basic features of Galaxy, including where to find tools, how to import and use your data, and an introduction to workflows. This session is recommended for anyone who has not used, or only rarely uses Galaxy.

Prerequisites
  • Little or no experience using Galaxy
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best

Speakers
avatar for Delphine Lariviere

Delphine Lariviere

Penn State University
Post-doc in the Galaxy Team (Nekrutenko Lab). Works on bacterial genomics, assembly, RNA Seq, TnSeq. Also interested in evolution, metagenomics, epigenetics and visualisation.
avatar for Wolfgang Maier

Wolfgang Maier

University of Freiburg
Interests:- Galaxy tool development- Variant calling tools and pipelines- User trainings


Friday July 17, 2020 09:01 - 11:30 EDT
Training D

11:30 EDT

Break
Break!  Grab some food, check your email, stretch...

Friday July 17, 2020 11:30 - 12:15 EDT
Joint

12:15 EDT

12:16 EDT

From stars to constellations: Scaling analyses in Galaxy
Chat, Slides, Video

Everyone knows how to do their analysis on a single dataset, but now it’s the Big Data era and data is pouring in faster than you can process it! We will show you how to manage importing hundreds and thousands of samples, processing these in batch, and scaling analyses to hundreds and thousands of datasets with complex experimental designs. You’ll learn about the new rule based uploader and how to attach metadata to datasets in bulk, management of sample collections in workflows, and scaling your processing to meet your demands.

Prerequisites

Speakers
avatar for Marius van den Beek

Marius van den Beek

Penn State University
avatar for Saskia Hiltemann

Saskia Hiltemann

Erasmus MC
Metagenomics, Training materials, board games, CTF & security


Friday July 17, 2020 12:16 - 14:45 EDT
Training D

12:16 EDT

Handling integrated biological data using Python, Jupyter, and InterMine
Tutorial, Video

This session is full.

This tutorial will guide you through loading and analyzing integrated biological data (generally genomic or proteomic data) using InterMine, either via UI or via an API in Python. Topics covered will include automatically generating code to perform queries, customising the code to meet your needs, and automated analysis of sets, e.g gene sets, including enrichment statistics. Skills gained can be re-used in any of the dozens of InterMines available, covering a broad range of organisms and dedicated purposes, from model organisms to plants, drug targets, and mitochondrial DNA.
Users will also learn how to import and export their gene and protein lists to and from Jupyter notebooks hosted on https://jupyter.org/.

Prerequisites
  • Little or no experience using Galaxy
  • A wi-fi enabled laptop with a modern web brwoser.

Speakers
avatar for sergio contrino

sergio contrino

University of Cambridge
avatar for Yo Yehudi

Yo Yehudi

Software Developer, University of Cambridge & Open Life Science
Integrated genomic data (InterMine)
RL

Rachel Lyne

Intermine, University of Cambridge


Friday July 17, 2020 12:16 - 14:45 EDT
Training A

12:16 EDT

Introduction to Galaxy Administration II
 Schedule, Chat, byobo Cheatsheet, Video

This session is full.

This is the second session of a three-part workshop. Please sign up for all 3 sessions

After attending this three-session workshop you will be able to set up, configure, and administer a fairly polished Galaxy instance. Topics include:
  • deployment and platform options
  • using Ansible to install and configure your own server
  • customizing and extending your instance
  • defining and importing genomes, running data managers
  • upgrading to a new Galaxy release
  • configure nginX webserver with Galaxy
  • database overview and best practices
  • running tools in containers
  • users and groups and quotas
  • storage management and using heterogeneous storage services
  • exploring the Galaxy job configuration file
  • connecting Galaxy to compute clusters
  • polishing Galaxy on uWSGI application server
  • instance monitoring using Grafana
  • shared data management with CVMFS
  • when things go wrong: Galaxy server troubleshooting tips & examples

Prerequisites
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor: If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle

Speakers
avatar for That Other Person

That Other Person

Software Engineer, Galaxy Project, Johns Hopkins University
avatar for Martin Čech

Martin Čech

Dev and Trainer, free element
Galaxy Enthusiast
avatar for Nate Coraor

Nate Coraor

System Administrator, Galaxy Project, Penn State University
avatar for Nicola Soranzo

Nicola Soranzo

Earlham Institute


Friday July 17, 2020 12:16 - 14:45 EDT
Training C

12:16 EDT

R / Bioconductor in the Cloud
 Tutorial, AnVIL, Video

Bioconductor provides more than 1800 R packages for the analysis and comprehension of high-throughput genomic data. Most users install and run Bioconductor on a personal computer or perhaps use an academic cluster. Cloud-based solutions are increasing appealing, removing the headaches of local installation while providing access to (a) better, scalable computing resources; and (b) large-scale 'consortium' and other reference data sets. This session introduces the AnVIL cloud computing environment. We cover use of the cloud as
  • a replacement to desktop-style computing;
  • integrating workflows for 'upstream' processing of large data resources with interactive 'downstream' analysis and comprehension, using Human Cell Atlas single-cell datasets as an example; and
  • querying cloud-based consortium data for integration with a users' own data sets. 

Prerequisites
  • Participants should be comfortable working with R and RStudio.
  • Some familiarity with Bioconductor is helpful but not required.
  • No prior cloud-based experience is necessary.
  • A wifi enabled laptop with RStudio installed.

Speakers
avatar for Martin Morgan

Martin Morgan

Roswell Park Comprehensive Cancer Center
I am an evolutionary biologist by training, and my worst grades in college were in statistics and computer science. So it is a little ironic that I have spent the last fifteen years of my life in bioinformatics, working on the Bioconductor project for the statistical analysis and... Read More →
NT

Nitesh Turaga

Roswell Park Comprehensive Cancer Center
LS

Lori Shepherd

Senior Programmer / Project Management, Roswell Park Comprehensive Cancer Center
I am a member of the Bioconductor Core Team


Friday July 17, 2020 12:16 - 14:45 EDT
Training E

12:16 EDT

Scaling genomic analysis with Glow and Apache Spark
 TutorialGet Started, Ingest Data, Transform Variants, Run GloWgR

Glow makes genomic data work with Apache Spark, the leading engine for working with large structured datasets. It fits natively into the ecosystem of tools that have enabled thousands of organizations to scale their workflows to petabytes of data. Glow bridges the gap between bioinformatics and the Spark ecosystem by working with datasets in common file formats like VCF, BGEN, and Plink as well as high-performance big data standards. You can write queries using the native Spark SQL APIs in Python, SQL, R, Java, and Scala. The same APIs allow you to bring your genomic data together with other datasets such as electronic health records, real world evidence, and medical images. Glow makes it easy to parallelize existing tools and libraries implemented as command line tools or Pandas functions.

Prerequisites
  • Basic Python
  • Some exposure to Spark useful but not necessary

Speakers
HD

Henry Davidge

Databricks
avatar for Karen Feng

Karen Feng

Software Engineer, Databricks
avatar for Rishi Ghose

Rishi Ghose

Solutions Architect, Databricks
MS

Michael Shtelma

Lead Specialist Solutions Architect, Databricks
FN

Frank Nothaft

GTM Lead-Genomics, Databricks
avatar for Kiavash Kianfar

Kiavash Kianfar

Sr. Software Engineer, Databricks
I am currently a Sr. Software Engineer in the Health and Life Sciences team at Databricks working on scalable unified analytics for Genomics, while being on leave from my tenured Associate Professor position at Texas A&M University.@Databricks:* Developing algorithms and software... Read More →
AK

Amir Kermany

Databricks
WB

Will Brandler

Databricks


Friday July 17, 2020 12:16 - 14:45 EDT
Training B

14:45 EDT

Break
Break!  Grab some food, check your email, stretch...

Friday July 17, 2020 14:45 - 15:30 EDT
Joint

15:30 EDT

15:31 EDT

Building communities with open source + open science
Chat, Video

This session is full.

Many journals require that scientific / research code to be open source in order to be published, but simply sharing source code alone isn’t usually enough to draw in new users and contributors. This session will teach researchers and coders the basics of how to make their open source scientific code repositories inclusive and welcoming to contributors. Experienced community managers are also welcome to attend and help pass their knowledge on to others. This session will be run by the Open Life Science team, who collectively have experience working openly, mentoring, and training others in open practice.

Prerequisites
  • An interest in open science
  • A wi-fi enabled laptop
Updated information

Assignment before this event: (should take 20-30 minutes)
1. Project vision: Reflect on your current work.
  • Take personal notes regarding your favorite open project (that you either lead or work on) by answering the following questions:
    • The problem the project is trying to solve.
    • How you think openness and open leadership will help solve it.
    • How meeting personal goals will help you and help solve the problem.
    • How meeting your cultural goals for your community, organization, or project * will help solve the problem.
2. Implicit bias and inclusion: Please do the Implicit Bias Quiz
  • Go to https://implicit.harvard.edu to complete the ‘Gender - Career’ or ‘Gender - Science’ quiz (10 minutes). You can ‘continue as a guest’ by choosing your country,
  • Reflect on these questions when you’ve finished the implicit association test:
    • What does inclusion mean to you?
    • Did your results of the implicit association test surprise you?
Please bring your notes from these assignments to the workshop so that you can make the best of your learning experience and add thoughtfully to the group discussions.
If you have any questions that we can help you with, please contact us by emailing team@openlifesci.org

Speakers
avatar for Malvika Sharan

Malvika Sharan

Senior Researcher, The Alan Turing Institute
I am a senior researcher for the Tools, Practices and Systems research programme at The Alan Turing Institute, London. With a focus on Open Research, I lead a team of community managers and co-lead The Turing Way project that aims to make data science reproducible, collaborative... Read More →
avatar for Yo Yehudi

Yo Yehudi

Software Developer, University of Cambridge & Open Life Science
Integrated genomic data (InterMine)


Friday July 17, 2020 15:31 - 18:00 EDT
Training A

15:31 EDT

Dockstore Fundamentals: Introduction to Docker and Descriptors for Reproducible Analysis
SlidesVideo

This will be a hands-on workshop to train a beginner on the fundamental technologies used to create portable and reproducible workflows. Attendees will learn how to use Docker for packaging software into containers, how to write analytical workflows in a descriptor language (CWL, WDL, or Nextflow), and how to publish these workflows on Dockstore for sharing with others. We will cover basic Dockstore features such as running workflows using the Dockstore command-line interface and end with an overview of more advanced topics like best practices for workflows, publishing using DOIs, and sharing collections of workflows through organizations.

Prerequisites
  • Basic command line and scripting knowledge
  • A laptop with a modern web browser.

Speakers
avatar for Louise Cabansay

Louise Cabansay

Software Engineer, Dockstore, UC Santa Cruz Genomics Institute
avatar for Andrew Duncan

Andrew Duncan

Software Developer, OICR
avatar for Denis Yuen

Denis Yuen

Senior Software Developer, Ontario Institute for Cancer Research
Workflows, cloud, GA4GH, Docker, Java



Friday July 17, 2020 15:31 - 18:00 EDT
Training D

15:31 EDT

Import, handle, visualize and analyze biodiversity data in Galaxy
Slides, Tutorial, Video

This Ecology-focused session will introduce using Galaxy to import (from external sources as GBIF, iNaturalist, Atlas of Living Australia or Zenodo repositories), handle (filter, rename fields, search/replace text patterns), visualize (stacked histograms) and analyze (calculate species abundance, phenology and trends) biodiversity data.

Prerequisites
  • Introduction to Using Galaxy or equivalent experience
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best


Speakers
avatar for Yvan Le Bras

Yvan Le Bras

Research engineer, French National Museum of Natural History


Friday July 17, 2020 15:31 - 18:00 EDT
Training C

15:31 EDT

Introduction to Galaxy Administration III
 Schedule, Chat, byobo Cheatsheet, Video

This session is full.

This is the third session of a three-part workshop. Please sign up for all 3 sessions

After attending this three-session workshop you will be able to set up, configure, and administer a fairly polished Galaxy instance. Topics include:
  • deployment and platform options
  • using Ansible to install and configure your own server
  • customizing and extending your instance
  • defining and importing genomes, running data managers
  • upgrading to a new Galaxy release
  • configure nginX webserver with Galaxy
  • database overview and best practices
  • running tools in containers
  • users and groups and quotas
  • storage management and using heterogeneous storage services
  • exploring the Galaxy job configuration file
  • connecting Galaxy to compute clusters
  • polishing Galaxy on uWSGI application server
  • instance monitoring using Grafana
  • shared data management with CVMFS
  • when things go wrong: Galaxy server troubleshooting tips & examples

Prerequisites
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor: If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle

Speakers
avatar for That Other Person

That Other Person

Software Engineer, Galaxy Project, Johns Hopkins University
avatar for Martin Čech

Martin Čech

Dev and Trainer, free element
Galaxy Enthusiast
avatar for Nate Coraor

Nate Coraor

System Administrator, Galaxy Project, Penn State University


Friday July 17, 2020 15:31 - 18:00 EDT
Training B

15:31 EDT

Processing of Single Cell RNA-Seq Data with Galaxy
Slides, Tutorial, Starting History, Complete History, Video

Single-cell RNA-seq analysis is a rapidly evolving field at the forefront of transcriptomic research. Galaxy offers a multitude of analysis options. In this training, participants will learn about the processing, mapping and quantification of 10x Genomics data from the raw barcoded reads to the count matrices.

Prerequisites

Speakers
avatar for Bjorn Gruning

Bjorn Gruning

University of Freiburg
avatar for Mehmet Tekman

Mehmet Tekman

Post Doc, University of Freiburg
Single-Cell, Developing Wrappers with Emacs
avatar for Hans-Rudolf Hotz

Hans-Rudolf Hotz

Friedrich Miescher Institute for Biomedical Research, Basel


Friday July 17, 2020 15:31 - 18:00 EDT
Training E

18:00 EDT

Interregnum
West Training Day 1 is done, and East Training Day 1 is coming


Friday July 17, 2020 18:00 - 21:00 EDT
Joint

21:00 EDT

21:01 EDT

Command Line Essentials for Bioinformaticians
 Slides, TutorialVideo

This session is full.

We will start the session with a quick refresher on the basics of bash. I will then introduce a few well known unix tools and features of the shell with a focus on how to use these to make key bioinformatics tasks easier and more efficient.

Getting the most out of your shell (bash centric)
As bioinformaticians we regularly deal with directories filled with hundreds of files and have to manage running an equally large number of parallel jobs. There are many features of the shell that can make this easier. Here I will focus on some of the key ones that I use often.

Tools: bash (loops, functions, strings), xargs, parallel

Manipulating tabular data
Lots of bioinformatics data is tabular, gff, vcf, sam. Using these formats as examples I will introduce some useful tools for manipulating tabular data

Tools: cut, paste, awk, shuff, comm

Manipulating sequence data
Manipulating sequence data like fasta and fastq requires specialised bioinformatic tools. Two very useful ones are samtools and bioawk. This section will show you how to easily accomplish common tasks like splitting, sampling or reformatting a large sequence file.

Tools: samtools, bioawk

Prerequisites
  • A laptop with a modern web browser

Speakers
avatar for Ira Cooke

Ira Cooke

Senior Lecturer, James Cook University
Ira is interested in computational tools and workflows for analysing large ‘omics datasets. He applies these to a wide variety of research questions from clinical applications of genomics to human health to aquaculture and the biology of corals and cephalopods.His two main research... Read More →
WW

Wytamma Wirth

James Cook University
JZ

Jia Zhang

James Cook University
LF

Legana Fingerhut

James Cook University


Friday July 17, 2020 21:01 - 23:30 EDT
Training E

21:01 EDT

Dockstore Fundamentals: Introduction to Docker and Descriptors for Reproducible Analysis
Slides, Video

This session is full.

This will be a hands-on workshop to train a beginner on the fundamental technologies used to create portable and reproducible workflows. Attendees will learn how to use Docker for packaging software into containers, how to write analytical workflows in a descriptor language (CWL, WDL, or Nextflow), and how to publish these workflows on Dockstore for sharing with others. We will cover basic Dockstore features such as running workflows using the Dockstore command-line interface and end with an overview of more advanced topics like best practices for workflows, publishing using DOIs, and sharing collections of workflows through organizations.

Prerequisites
  • Basic command line and scripting knowledge
  • A laptop with a modern web browser.

Speakers
avatar for Louise Cabansay

Louise Cabansay

Software Engineer, Dockstore, UC Santa Cruz Genomics Institute
avatar for Andrew Duncan

Andrew Duncan

Software Developer, OICR
avatar for Denis Yuen

Denis Yuen

Senior Software Developer, Ontario Institute for Cancer Research
Workflows, cloud, GA4GH, Docker, Java



Friday July 17, 2020 21:01 - 23:30 EDT
Training C

21:01 EDT

How to use Reactome data, tools and web services
Slides, Tutorial, Video

Reactome stakeholders span the informatics, clinical and basic research communities, and present us with a broad set of user requirements, from casual browsing of online pathway information to network analysis and modeling. During the BCC2020 training session, we will introduce the Reactome graph database, web site, web services, Docker image, and downloadable data sets. We will demonstrate how Reactome is useful to bioinformaticians and data integrators who are interested in finding, organizing, and utilizing biological information to support data visualization, integration and analysis. We will address the following:
  • Different use cases for using the web portal (analysis tool, curated content, content service, download files). 
  •  What data/bioinformatics questions Reactome can help answer.
  • How to use Reactome’s Content Service and Analysis Service web interfaces and APIs.
  • How to do basic queries using Reactome’s Graph Database (Neo4J and Cypher).


Prerequisites
  •  A wi-fi enabled laptop with a modern web browser.
  •  Basic knowledge of how to navigate a system and run commands from the command line (curl, grep, etc…)
  • A robust text editor and web browser.
  • Optional: A laptop capable of running Docker. installation instructions.


Friday July 17, 2020 21:01 - 23:30 EDT
Training A

21:01 EDT

Introduction to Galaxy Administration I
→ ScheduleChat, Slides, Tutorialbyobo Cheatsheet, Video

This session is full.

This is the first session of a 3 session workshop. Please sign up for all 3 sessions.

After attending this three-session workshop you will be able to set up, configure, and administer a fairly polished Galaxy instance. Topics include:
  • deployment and platform options
  • using Ansible to install and configure your own server
  • customizing and extending your instance
  • defining and importing genomes, running data managers
  • upgrading to a new Galaxy release
  • configure nginX webserver with Galaxy
  • database overview and best practices
  • running tools in containers
  • users and groups and quotas
  • storage management and using heterogeneous storage services
  • exploring the Galaxy job configuration file
  • connecting Galaxy to compute clusters
  • polishing Galaxy on uWSGI application server
  • instance monitoring using Grafana
  • shared data management with CVMFS
  • when things go wrong: Galaxy server troubleshooting tips & examples

Prerequisites
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor: If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle

Speakers
avatar for Simon Gladman

Simon Gladman

University of Melbourne
avatar for Kiran K Telukunta

Kiran K Telukunta

TMS Foundation
Done PhD from University of Freiburg in Pharmaceutical Bioinformatics. As part of TMS Foundation and Bioclues promoting Galaxy Opensource frame work in India. Working as Cloud Architect. For a detail profile please visit my linkedin page. Following reasons should encourage you to... Read More →
avatar for Nicholas Rhodes

Nicholas Rhodes

SysAdmin & DBA, Queensland Facility for Advanced Bioinformatics
I have a forty-plus year involvement with life sciences computing. My first encounter with sequences was Dayhoff's "Atlas of Protein Sequence and Structure" - a book that listed all the 65 known protein sequences.
CB

Catherine Bromhead

University of Melbourne


Friday July 17, 2020 21:01 - 23:30 EDT
Training D

21:01 EDT

Introduction to Using Galaxy
 Outline, Slides, Tutorial, GTNVideo

This workshop will introduce the Galaxy user interface and how it can be used for reproducible data analysis. We will cover the basic features of Galaxy, including where to find tools, how to import and use your data, and an introduction to workflows. This session is recommended for anyone who has not used, or only rarely uses Galaxy.

Prerequisites
  • Little or no experience using Galaxy
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best

Speakers
avatar for Jennifer Hillman-Jackson

Jennifer Hillman-Jackson

Galaxy Project, Penn State University
Application Science Support Training at GalaxyProject.org 
avatar for Igor Makunin

Igor Makunin

Bioinformatician, QFAB
I am principal User Support for Galaxy Australia and acts as the gate keeper for all requests and queries to the platform. I also lead training events on the platform and have done for the past six years.


Friday July 17, 2020 21:01 - 23:30 EDT
Training B

23:30 EDT

Break
Break!  Grab some food, check your email, stretch...

Friday July 17, 2020 23:30 - Saturday July 18, 2020 00:15 EDT
Joint
 
Saturday, July 18
 

00:15 EDT

00:16 EDT

Bionitio: building better bioinformatics tools with batteries included. I
 Schedule, Slides, Tutorial, Video

Software development is a central part of bioinformatics, but for many reasons software quality is not always prioritised, leading to problems in maintenance, usability and reproducibility. Adopting software engineering best practices at the beginning of a project can address these problems, but this is often not done due to lack of time and/or experience. This workshop covers the essentials of good programming practices and provides you with tools and knowledge to build high quality bioinformatics software from the outset. We will introduce Bionitio (https://github.com/bionitio-team/bionitio), a tool for quickly creating new software projects with important features and infrastructure already included.

Learning Outcomes
  • By the end of the workshop you will have:
  • Created a new software repository
  • Committed it to GitHub
  • Set up continuous integration testing
  • Used test-driven-development to add a new feature to the program
  • Learnt about good practices in software development and why they are useful
  • Learnt how to use a tool (bionitio) to automate this for future projects.

Target audience

Bioinformaticians with beginner to intermediate level of programming experience who want to apply good software engineering practices in their daily work. Experience with the Unix command-line is assumed. Basic familiarity with Python (or similar languages) is an advantage.

Requirements

  • You'll need to bring a laptop with Unix (e.g. an Apple Mac or Linux).
  • Windows users: please install Putty.
  • Before attending the workshop, please set up a GitHub account (free, https://github.com/join).

Speakers
avatar for Anna Syme

Anna Syme

Bioinformatician, Royal Botanical Gardens, Melbourne, Australia
avatar for Bernie Pope

Bernie Pope

Victorian Health and Medical Research Fellow, Melbourne Bioinformatics, University of Melbourne
I am an Associate Professor at The University of Melbourne. My research focuses on applying computational techniques to biological questions, especially related to Human Genomics and Cancer.


Saturday July 18, 2020 00:16 - 02:45 EDT
Training B

00:16 EDT

From stars to constellations: Scaling analyses in Galaxy
 Chat, SlidesVideo

Everyone knows how to do their analysis on a single dataset, but now it’s the Big Data era and data is pouring in faster than you can process it! We will show you how to manage importing hundreds and thousands of samples, processing these in batch, and scaling analyses to hundreds and thousands of datasets with complex experimental designs. You’ll learn about the new rule based uploader and how to attach metadata to datasets in bulk, management of sample collections in workflows, and scaling your processing to meet your demands.

Prerequisites

Speakers
MD

Maria Doyle

Application and Training Specialist, Peter MacCallum Cancer Centre
avatar for Marius van den Beek

Marius van den Beek

Penn State University


Saturday July 18, 2020 00:16 - 02:45 EDT
Training C

00:16 EDT

Getting started in Git using GitHub Desktop
Slides, Tutorial, Video

Git doesn't need to be tricky, and you don't need to use a terminal to do it. In a 2.5 hour session, we will talk over the basics of version control covering:
  • why version control is useful,
  • how to create your first git repository,
  • the basics of markdown,
  • what a pull request is,
  • and why open source is important in science.
Instead of focusing on code in a specific programming langauge, will instead focus on a common neutral ground - markdown - which will also give participants the ability to create their own personal or lab website on GitHub Pages.

Prerequisites
  • A laptop capable of running GitHub desktop (e.g. a linux, mac, or windows laptop, but not a chromebook or tablet).

Speakers
avatar for Thom Cuddihy

Thom Cuddihy

Bioinformatician/Software Developer, QFAB
Thom Cuddihy currently works at QFAB as a bioinformatician and software developer. He specialises in multiple programming languages including Python, C#, Java, and R, and has a strong background in databases, system administration and high-performance computing. He also has extensive... Read More →


Saturday July 18, 2020 00:16 - 02:45 EDT
Training A

00:16 EDT

Introduction to Galaxy Administration II
→ ScheduleChat, Slides, Tutorialbyobo Cheatsheet, Video

This session is full.

This is the second session of a 3 session workshop. Please sign up for all 3 sessions.

After attending this three-session workshop you will be able to set up, configure, and administer a fairly polished Galaxy instance. Topics include:
  • deployment and platform options
  • using Ansible to install and configure your own server
  • customizing and extending your instance
  • defining and importing genomes, running data managers
  • upgrading to a new Galaxy release
  • configure nginX webserver with Galaxy
  • database overview and best practices
  • running tools in containers
  • users and groups and quotas
  • storage management and using heterogeneous storage services
  • exploring the Galaxy job configuration file
  • connecting Galaxy to compute clusters
  • polishing Galaxy on uWSGI application server
  • instance monitoring using Grafana
  • shared data management with CVMFS
  • when things go wrong: Galaxy server troubleshooting tips & examples

Prerequisites
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor: If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle

Speakers
avatar for Simon Gladman

Simon Gladman

University of Melbourne
avatar for Kiran K Telukunta

Kiran K Telukunta

TMS Foundation
Done PhD from University of Freiburg in Pharmaceutical Bioinformatics. As part of TMS Foundation and Bioclues promoting Galaxy Opensource frame work in India. Working as Cloud Architect. For a detail profile please visit my linkedin page. Following reasons should encourage you to... Read More →
avatar for Nicholas Rhodes

Nicholas Rhodes

SysAdmin & DBA, Queensland Facility for Advanced Bioinformatics
I have a forty-plus year involvement with life sciences computing. My first encounter with sequences was Dayhoff's "Atlas of Protein Sequence and Structure" - a book that listed all the 65 known protein sequences.
CB

Catherine Bromhead

University of Melbourne


Saturday July 18, 2020 00:16 - 02:45 EDT
Training D

00:16 EDT

Produce a portable germline variant-calling pipeline in CWL and WDL using Janis and GATK I
 ScheduleSlides, TutorialVideo

This is the first session of a two part training. Please register for both. The second session is on Sunday.

In this session, we'll use Janis (a Python workflow framework) to build a GATK pipeline to call variants. We'll show how Janis workflows can be translated to CWL and WDL, and how to use the Janis assistant to run these pipelines in CWLTool and Cromwell.

Through the use of containers, the pipeline produced in this workshop can be run locally, on HPCs with Singularity or through cloud vendors (such as Google Cloud and AWS).

Prerequisites
  • Some basic Python is preferred
  •  Installed Software: Docker, Python 3.6+

Speakers
avatar for Michael Franklin

Michael Franklin

Research Software Engineer, University of Melbourne
I'm a research software engineer at the University of Melbourne / Peter MacCallum Cancer Centre who's interested in all things pipelines! I develop Janis, a workflow assistant that generates CWL and WDL. I'm also interested in general programming, specifically web (React) and database-y... Read More →
RL

Richard Lupat

Peter MacCallum Cancer Centre


Saturday July 18, 2020 00:16 - 02:45 EDT
Training E

02:45 EDT

Break
Break!  Grab some food, check your email, stretch...

Saturday July 18, 2020 02:45 - 03:30 EDT
Joint

03:30 EDT

03:31 EDT

Bionitio: building better bioinformatics tools with batteries included II
 Schedule, Slides, Tutorial, Video

This is the second part of a two-session workshop. If you register for one, you are required to also register for the other.

Software development is a central part of bioinformatics, but for many reasons software quality is not always prioritised, leading to problems in maintenance, usability and reproducibility. Adopting software engineering best practices at the beginning of a project can address these problems, but this is often not done due to lack of time and/or experience. This workshop covers the essentials of good programming practices and provides you with tools and knowledge to build high quality bioinformatics software from the outset. We will introduce Bionitio (https://github.com/bionitio-team/bionitio), a tool for quickly creating new software projects with important features and infrastructure already included.

Learning Outcomes
  • By the end of the workshop you will have:
  • Created a new software repository
  • Committed it to GitHub
  • Set up continuous integration testing
  • Used test-driven-development to add a new feature to the program
  • Learnt about good practices in software development and why they are useful
  • Learnt how to use a tool (bionitio) to automate this for future projects.

Target audience

Bioinformaticians with beginner to intermediate level of programming experience who want to apply good software engineering practices in their daily work. Experience with the Unix command-line is assumed. Basic familiarity with Python (or similar languages) is an advantage.

Requirements

  • You'll need to bring a laptop with Unix (e.g. an Apple Mac or Linux).
  • Windows users: please install Putty.
  • Before attending the workshop, please set up a GitHub account (free, https://github.com/join).



Speakers
avatar for Anna Syme

Anna Syme

Bioinformatician, Royal Botanical Gardens, Melbourne, Australia
avatar for Bernie Pope

Bernie Pope

Victorian Health and Medical Research Fellow, Melbourne Bioinformatics, University of Melbourne
I am an Associate Professor at The University of Melbourne. My research focuses on applying computational techniques to biological questions, especially related to Human Genomics and Cancer.


Saturday July 18, 2020 03:31 - 06:00 EDT
Training B

03:31 EDT

Building communities with open source + open science
Chat, Video

Many journals require that scientific / research code to be open source in order to be published, but simply sharing source code alone isn’t usually enough to draw in new users and contributors. This session will teach researchers and coders the basics of how to make their open source scientific code repositories inclusive and welcoming to contributors. Experienced community managers are also welcome to attend and help pass their knowledge on to others. This session will be run by the Open Life Science team, who collectively have experience working openly, mentoring, and training others in open practice.

Prerequisites
  • An interest in open science
  • A laptop

Speakers
avatar for Yo Yehudi

Yo Yehudi

Software Developer, University of Cambridge & Open Life Science
Integrated genomic data (InterMine)


Saturday July 18, 2020 03:31 - 06:00 EDT
Training A

03:31 EDT

High-throughput molecular dynamics with Galaxy
Chat, Slides, Tutorial, Server, Video

Molecular dynamics (MD) is one of the most commonly used techniques in computational chemistry and biophysics for biomolecular modeling. However, MD has a steep learning curve, regarding both the underlying theory and the software. During this session, you will learn some of the main principles behind MD simulation and analysis of protein-ligand systems, using the Galaxy platform to provide an intuitive, graphical interface to the MD engine and analysis software. We will then show you how to scale-up to high-throughput MD using Galaxy collections.

Prerequisites
  • Basic familiarity with Galaxy

Speakers
avatar for Chris Barnett

Chris Barnett

Lecturer, University of Cape Town
avatar for Simon Bray

Simon Bray

University of Freiburg
I'm a member of the European Galaxy Team at the University of Freiburg, interested in computational chemistry, molecular dynamics, and the use of workflow management systems for virtual screening.
avatar for Tharindu Senapathi

Tharindu Senapathi

PhD Student, University of Cape Town
Physical and computational chemist with expertise in the development and application of free energy and hybrid classical/quantum mechanical methods for application to chemical, life and biomedical sciences.


Saturday July 18, 2020 03:31 - 06:00 EDT
Training E

03:31 EDT

Import, handle, visualize and analyze biodiversity data in Galaxy
 Slides, Tutorial, Video

This Ecology-focused session will introduce using Galaxy to import (from external sources as GBIF, iNaturalist, Atlas of Living Australia or Zenodo repositories), handle (filter, rename fields, search/replace text patterns), visualize (stacked histograms) and analyze (calculate species abundance, phenology and trends) biodiversity data.

Prerequisites
  • Introduction to Using Galaxy or equivalent experience
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best


Speakers
avatar for Yvan Le Bras

Yvan Le Bras

Research engineer, French National Museum of Natural History


Saturday July 18, 2020 03:31 - 06:00 EDT
Training C

03:31 EDT

Introduction to Galaxy Administration III
→ ScheduleChat, Slides, Tutorialbyobo Cheatsheet, Video

This session is full.

This is the third session of a 3 session workshop. Please sign up for all 3 sessions.

After attending this three-session workshop you will be able to set up, configure, and administer a fairly polished Galaxy instance. Topics include:
  • deployment and platform options
  • using Ansible to install and configure your own server
  • customizing and extending your instance
  • defining and importing genomes, running data managers
  • upgrading to a new Galaxy release
  • configure nginX webserver with Galaxy
  • database overview and best practices
  • running tools in containers
  • users and groups and quotas
  • storage management and using heterogeneous storage services
  • exploring the Galaxy job configuration file
  • connecting Galaxy to compute clusters
  • polishing Galaxy on uWSGI application server
  • instance monitoring using Grafana
  • shared data management with CVMFS
  • when things go wrong: Galaxy server troubleshooting tips & examples

Prerequisites
  • Knowledge and comfort with the Unix/Linux command line interface and a text editor: If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle

Speakers
avatar for Simon Gladman

Simon Gladman

University of Melbourne
avatar for Kiran K Telukunta

Kiran K Telukunta

TMS Foundation
Done PhD from University of Freiburg in Pharmaceutical Bioinformatics. As part of TMS Foundation and Bioclues promoting Galaxy Opensource frame work in India. Working as Cloud Architect. For a detail profile please visit my linkedin page. Following reasons should encourage you to... Read More →
avatar for Nicholas Rhodes

Nicholas Rhodes

SysAdmin & DBA, Queensland Facility for Advanced Bioinformatics
I have a forty-plus year involvement with life sciences computing. My first encounter with sequences was Dayhoff's "Atlas of Protein Sequence and Structure" - a book that listed all the 65 known protein sequences.
CB

Catherine Bromhead

University of Melbourne


Saturday July 18, 2020 03:31 - 06:00 EDT
Training D

06:00 EDT

Interregnum
East Training Day 1 is done, and West Training Day 2 is coming


Saturday July 18, 2020 06:00 - 09:00 EDT
Joint

09:00 EDT

West Training 4
Fourth training session of BCC2020 West and the first of the second day of training.  Topics offered are:


Interested? Register early and save 50%.


Saturday July 18, 2020 09:00 - 11:30 EDT
Joint

09:01 EDT

Galaxy Code Architecture
Videos, Video

How is the Galaxy code structured? What do the various other projects related to Galaxy do? What happens when I start Galaxy?

Please join us to explore various aspects of the Galaxy codebase, understand the various top-level files and modules in Galaxy, understand how dependencies work in Galaxy's frontend and backend, and a whole lot more.

Prerequisites
  • A laptop with a modern web browser (Google Chrome, Firefox and Safari will work best).

Speakers
avatar for Nate Coraor

Nate Coraor

System Administrator, Galaxy Project, Penn State University


Saturday July 18, 2020 09:01 - 11:30 EDT
Training C

09:01 EDT

How to write a JBrowse 2 plugin
Slides, Tutorial, Video

JBrowse 2 is a new genome browser that is built using ReactJS. It has new features for structural variant visualization and comparative genomics with things like split views, synteny views, Circos views and more. We will demonstrate how to set up JBrowse 2 and show how new plugins can create custom views, custom tracks, or custom data adapters in JBrowse 2.

Prerequisites
  • A wi-fi enabled laptop with a modern web browser.
  • Experience with Javascript a plus but not necessary.


Speakers
CD

Colin Diesh

University of California, Berkeley
avatar for Garrett Stevens

Garrett Stevens

Developer, University of California, Berkeley
I'm one of the core developers on the JBrowse team, where I've been mostly working on JBrowse 2. I'd love to answer any questions you have about JBrowse 2, or just talk shop about web development, UI design, etc. I'll be a trainer in two of the JBrowse 2 workshops.


Saturday July 18, 2020 09:01 - 11:30 EDT
Training D

09:01 EDT

Introduction to Machine Learning
Slides, Tutorial, Video

Questions:
  • What is machine learning and why it is useful?
  • How to use regression and classification techniques to create predictive models from biological datasets?
Learning objectives
  • Provide the basics of machine learning and its variants
  • Learn how to do classification and regression using the training and test data
  • Learn how to use Galaxy's machine learning tools

Prerequisites
  • Introduction to Galaxy or equivalent experience
  • A laptop with a modern web browser (Google Chrome, Firefox and Safari will work best).

Speakers
avatar for Alireza Khanteymoori

Alireza Khanteymoori

Postdoc Researcher, University of Freiburg


Saturday July 18, 2020 09:01 - 11:30 EDT
Training A

09:01 EDT

RNA-Seq analysis with AskOmics Interactive Tool
→ Slides, Tutorial, Video

This session is full.

AskOmics is a web software for data integration and query using the Semantic Web technologies. It helps users to convert multiple data sources (CSV/TSV files, GFF and BED annotation) into RDF triples, and perform complex queries against this files, but also on distant SPARQL endpoint. AskOmics provide a user-frendly interface to build the queries so users don't have to learn the SPARQL language.

AskOmics comes useful for cross-referencing results datasets with various reference data. For example, in RNA-Seq studies, we often need to filter the results on the fold change and the p-value, to get the most significant deferentially expressed genes. These genes often need to be linked on the reference genome to obtain more information about their location. Then, we may need to determine if these genes are part of a QTL associated with a phenotype of interest. Finally, we can have access to distant endpoints to get disease linked to our genes, or publications.

AskOmics offers a solution to 1) automatically convert the multiple data formats to RDF and 2) use a user-friendly interface to perform complex SPARQL queries on the RDF datasets to find the genes you are interested in and 3) cross-reference local datasets with distant databases (NeXtProt for example).

During this training session, we will use the Galaxy AskOmics Interactive Tool to integrate galaxy datasets into an AskOmics instance. Then we will perfom complex queries against this data and a distant SPARQL endpoint NeXtProt to answer a biological questions.

Prerequisites
  • Basic knowledge about RNA-seq
  • A laptop with a modern web browser. Google Chrome, Firefox and Safari will work best


Speakers
avatar for Anthony Bretaudeau

Anthony Bretaudeau

BIPAA/GenOuest
avatar for Xavier Garnier

Xavier Garnier

Univ Rennes, Inria, CNRS, IRISA
I am working on AskOmics, a web tool to integrate and query biological data using Semantic Web technologies, and its interaction with Galaxy


Saturday July 18, 2020 09:01 - 11:30 EDT
Training B

09:01 EDT

Train the Galaxy Trainer
Slides, Tutorial, Video

This workshop will introduce:
  • using Galaxy as a training tool
  • Determining aim and audience
    • e.g. single topic; string of related topics;
    • e.g. response to specific request for training; or general upskilling people in Galaxy bioinformatics
  • setting up appropriate infrastructure
    • usegalaxy.* resources
    • TIaaS
    • Your own
  • The available materials 
    • GTN tutorials
    • and/or write your own; including how to contribute it to GTN
    • Customising materials for your needs (Slides, language etc.)
  • Distributed workshops 
    • In practice
    • Local facilitators vs lead trainers
    • Using Zoom / Skype / other video conferencing software
  •  Practise setting up your own workshop?
    • eg. choose a topic from GTN
    • check that it runs on Galaxy server of choice
    • time it // modify if need be (e.g. cut down data set more?)
    • create schedule, eg google doc → publish → tinyurl
  • Getting good feedback!

Prerequisites
  • An interest in bioinformatics training and Galaxy


Speakers
avatar for Saskia Hiltemann

Saskia Hiltemann

Erasmus MC
Metagenomics, Training materials, board games, CTF & security
avatar for Delphine Lariviere

Delphine Lariviere

Penn State University
Post-doc in the Galaxy Team (Nekrutenko Lab). Works on bacterial genomics, assembly, RNA Seq, TnSeq. Also interested in evolution, metagenomics, epigenetics and visualisation.


Saturday July 18, 2020 09:01 - 11:30 EDT
Training E

11:30 EDT

Break
Break!  Grab some food, check your email, stretch...

Saturday July 18, 2020 11:30 - 12:14 EDT
Joint

12:15 EDT

12:16 EDT

Building communities with open source + open science
Chat, Video

This session is full.

Many journals require that scientific / research code to be open source in order to be published, but simply sharing source code alone isn’t usually enough to draw in new users and contributors. This session will teach researchers and coders the basics of how to make their open source scientific code repositories inclusive and welcoming to contributors. Experienced community managers are also welcome to attend and help pass their knowledge on to others. This session will be run by the Open Life Science team, who collectively have experience working openly, mentoring, and training others in open practice.

Prerequisites
  • An interest in open science
  • A laptop
Updated information

Assignment before this event: (should take 20-30 minutes)
1. Project vision: Reflect on your current work.
  • Take personal notes regarding your favorite open project (that you either lead or work on) by answering the following questions:
    • The problem the project is trying to solve.
    • How you think openness and open leadership will help solve it.
    • How meeting personal goals will help you and help solve the problem.
    • How meeting your cultural goals for your community, organization, or project * will help solve the problem.
2. Implicit bias and inclusion: Please do the Implicit Bias Quiz
  • Go to https://implicit.harvard.edu to complete the ‘Gender - Career’ or ‘Gender - Science’ quiz (10 minutes). You can ‘continue as a guest’ by choosing your country,
  • Reflect on these questions when you’ve finished the implicit association test:
    • What does inclusion mean to you?
    • Did your results of the implicit association test surprise you?
Please bring your notes from these assignments to the workshop so that you can make the best of your learning experience and add thoughtfully to the group discussions.
If you have any questions that we can help you with, please contact us by emailing team@openlifesci.org

Speakers
avatar for Malvika Sharan

Malvika Sharan

Senior Researcher, The Alan Turing Institute
I am a senior researcher for the Tools, Practices and Systems research programme at The Alan Turing Institute, London. With a focus on Open Research, I lead a team of community managers and co-lead The Turing Way project that aims to make data science reproducible, collaborative... Read More →
avatar for Yo Yehudi

Yo Yehudi

Software Developer, University of Cambridge & Open Life Science
Integrated genomic data (InterMine)


Saturday July 18, 2020 12:16 - 14:45 EDT
Training B

12:16 EDT

Embedding JBrowse 2 in your website
Slides, Tutorial, Video

JBrowse 2 is the next generation of JBrowse genome browsers, with an all-new pluggable technology platform based on React, mobx-state-tree, and web workers. The embedded version, JBrowse 2 Embedded, is designed to be self-contained and easily embedded in any website without requiring any iframes or CSS hacking and without requiring any specific JavaScript frameworks.

We will show you several ways of embedding JBrowse 2 Embedded in your web-based tool or website.

Prerequisites
  • A wi-fi enabled laptop with a modern web browser.


Speakers
CD

Colin Diesh

University of California, Berkeley
avatar for Garrett Stevens

Garrett Stevens

Developer, University of California, Berkeley
I'm one of the core developers on the JBrowse team, where I've been mostly working on JBrowse 2. I'd love to answer any questions you have about JBrowse 2, or just talk shop about web development, UI design, etc. I'll be a trainer in two of the JBrowse 2 workshops.


Saturday July 18, 2020 12:16 - 14:45 EDT
Training C

12:16 EDT

Getting your hands on Climate data
Slides, Tutorial, Video

Training on accessing and analyzing climate data in Galaxy. During this session you will understand how to use climate data for developing a simple adaptation case study using Galaxy Climate workbench. We will first explain the difference between climate and weather data; show how to visualize climate data on a map with Galaxy and then how to create a simple workflow for framing a very simple adaptation case study.

Prerequisites
  • Introduction to Using Galaxy or equivalent experience
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best


Speakers
avatar for Anne Fouilloux

Anne Fouilloux

Research Software Engineer, University of Oslo
I am working on Galaxy Climate (development of tools, integration of climate data, training material).


Saturday July 18, 2020 12:16 - 14:45 EDT
Training E

12:16 EDT

Reference data with CVMFS and remote jobs with Pulsar
Slides, Video

Learn to use CVMFS for easy access to ready-to-go terabytes of reference data in Galaxy. Then find out how to send jobs to the ends of the universe with Pulsar!

Prerequisites
  • Basic understanding of Galaxy from a developer point of view.
  • A laptop with a modern web browser. Google Chrome, Firefox, and Safari will work best


Speakers
avatar for Gianmauro Cuccuru

Gianmauro Cuccuru

University of Freiburg
avatar for Nate Coraor

Nate Coraor

System Administrator, Galaxy Project, Penn State University


Saturday July 18, 2020 12:16 - 14:45 EDT
Training D

12:16 EDT

Scripting Galaxy with BioBlend
Slides, Tutorial, Video

Galaxy has an always-growing API that allows for external programs to upload and download data, manage histories and datasets, run tools and workflows, and even perform admin tasks. This session will cover various approaches to access the API, in particular using the BioBlend Python library.

Prerequisites
  • Basic understanding of Galaxy from a developer point of view.
  • Python programming.
  • A laptop with a modern web browser (Google Chrome, Firefox and Safari will work best).

Speakers
avatar for Marius van den Beek

Marius van den Beek

Penn State University
avatar for Dannon Baker

Dannon Baker

Galaxy Project, Johns Hopkins University
Talk to me about Galaxy Development, especially the UI!
avatar for Nicola Soranzo

Nicola Soranzo

Earlham Institute


Saturday July 18, 2020 12:16 - 14:45 EDT
Training A

14:45 EDT

Break
Break!  Grab some food, check your email, stretch...

Saturday July 18, 2020 14:45 - 15:30 EDT
Joint

15:30 EDT

15:31 EDT

Bioinformatics in Education
Frequently, bioinformatics is introduced to students as a set of tools, algorithms and platforms, enticing students with mathematical/technical interests and potentially leaving those behind whose primary interest lays in the soft sciences, such as biology, medicine or health care. Using bioinformatics tools and routines to solve biological questions entices students in this latter group to engage and to acquire bioinformatics principles "in passing".

Prerequisites
  • Internet access and an up-to-date web browser, preferably Firefox, Safari or Chrome. Others might work.

Speakers
UH

Uwe Hilgert

Director Industry Engagement, STEM Training & Workforce Development, University of Arizona
IV

Isabella Viney

University of Arizona


Saturday July 18, 2020 15:31 - 18:00 EDT
Training E

15:31 EDT

Fit your tools into any platform with Bioconda and BioContainers
Audience
Platform experts, tool developers, domain experts using the tools

Goals
Learn packaging with conda build and how containers are constructed from conda recipes/packages.

Schedule
See the workshop page.

Organizers
This workshop is organised by WP2 as part of EOSC-Life.

Prerequisites
  • Linux/OSX recommended
  • Python 2, Python 3

Speakers
AG

Andrea Giachetti

University of Florence (UNIFI)
LG

Loraine Guéguen

Station Biologique de Roscoff
VL

Vincenzo Laveglia

CIRMMP Florence



Saturday July 18, 2020 15:31 - 18:00 EDT
Training A

15:31 EDT

Introduction to Nextflow
This introductory course begins with describing the core elements of Nextflow. This is followed by a hands-on tutorial where participants implement a proof-of-concept RNA-seq pipeline. The goal is to become familiar with basic concepts, including how to define parameters, use channels for data and write processes to perform tasks. We then introduce how to build and use containers and finish with how to run pipelines in the cloud.

Prerequisites
  • The workshop requires participants to have a basic knowledge of Linux shell programming and include all materials and preconfigured compute environment in the cloud.
  • A laptop with a modern web browser and access to a Linux/Unix shell

Speakers
avatar for Evan Floden

Evan Floden

Seqera Labs
PD

Paolo Di Tommaso

Seqera Labs
avatar for Kevin Sayers

Kevin Sayers

Solutions Architect, Seqera Labs


Saturday July 18, 2020 15:31 - 18:00 EDT
Training C

15:31 EDT

Proteomic data analysis in Galaxy
 Chat, Slides, Tutorial, Video

This session will cover
  • Introduction to protein identification and quantification (with slides)
  • Galaxy Training: Peptide and protein identification using OpenMS tools
  • Galaxy Training: Label-free data analysis using MaxQuant

Prerequisites
  • Introduction to Galaxy or equivalent experience
  • A laptop with a modern web browser

Speakers
avatar for Melanie Föll

Melanie Föll

PostDoc, Northeastern University Boston
avatar for Matthias Fahrner

Matthias Fahrner

PhD student, Institute for Surgical Pathology, Medical Center – University of Freiburg


Saturday July 18, 2020 15:31 - 18:00 EDT
Training D

15:31 EDT

Reproducible Analysis in the Cloud with Dockstore and Terra
This training will lead users through the steps of performing reproducible analysis at scale in the cloud. Attendees will learn how to find workflows on Dockstore and how to export them to Terra’s interoperable cloud compute platform. We will give a brief tutorial of the Terra platform by walking through an example use case for genomic analysis. Along the way we’ll give you tips and tricks for scaling analyses on the Terra environment and introduce some of the more advanced features like using Jupyter Notebooks for producing and exploring results.

Prerequisites
  • Preferably, a Google account set up with Terra, instructions will be provided ahead of time.
  • A laptop with a modern web browser.

Speakers
avatar for Louise Cabansay

Louise Cabansay

Software Engineer, Dockstore, UC Santa Cruz Genomics Institute
avatar for Beth Sheets

Beth Sheets

Program Manager / Dockstore, AnVIL, BioData Catalyst, UC Santa Cruz Genomics Institute
BS

Beri Shifaw

Broad Institute



Saturday July 18, 2020 15:31 - 18:00 EDT
Training B

18:00 EDT

Interregnum
West Training Day 2 is done, and East Training Day 2 is coming


Saturday July 18, 2020 18:00 - 21:00 EDT
Joint

21:00 EDT

East Training 4
First training session of second day of BCC2020 East.  Offerings in this session are:


Interested? Register early and save 50%.

Saturday July 18, 2020 21:00 - 23:30 EDT
Joint

21:01 EDT

Galaxies for Crop Science
Schedule, Slides, TutorialVideo

This session will demonstrate Galaxy use for bioinformatics tasks routinely employed in breeding for crop improvement, such as conducting GWAS, identifying trait-specific markers for candidate genes across crop cultivars, and performing genomic selection. The crop-specific datasets and tools in two public galaxy instances, Rice Galaxy (https://galaxy.irri.org)  and Excellence In Breeding Platform Galaxy (http://galaxy-demo.excellenceinbreeding.org/) will be used. These instances are being merged into one "Crop Galaxy", soon to be available from (https://cropgalaxy.excellenceinbreeding.org).

Our tutorials are available from this GItHub Wiki site: https://tinyurl.com/y2w2m9mn. This home page links to the tutorials for GWAS (Dmytro), SNP data tools for candidate genes (Ramil), and Genomic selection (Venice). Video recordings of the sessions will be made available after the workshop. Our presentation (Google slide deck) is here https://tinyurl.com/yyryyzn9.

Prerequisites
  • Basic knowledge of molecular breeding 
  • Familiarity with Galaxy or some experience using Galaxy interface
  • A laptop with a modern web browser (Google Chrome - preferred, Firefox and Safari will work best). 

Speakers
avatar for Venice Juanillas

Venice Juanillas

Specialist-Information Systems, IRRI
avatar for Dmytro Chebotarov

Dmytro Chebotarov

Scientist, Computational Genetics, International Rice Research Institute (IRRI)
avatar for Ken McNally

Ken McNally

Sr. Scientist II - RIce Genomics, International Rice Research Institute
I lead the Bioinformatics and Genomics Cluster at IRRI, the International Rice Research Institute. Our products include 1) the Rice SNP-Seek database (https://snp-seek.irri.org) and 2) Rice and EiB galaxies (soon to be at https://cropgalaxy.excellenceinbreeding.org). IRRI coordinates... Read More →
avatar for Ramil Mauleon

Ramil Mauleon

Senior University Lecturer, Southern Cross University
I specialize in bioinformatics, genetics, and genomics , with focus on agricultural crops.



Saturday July 18, 2020 21:01 - 23:30 EDT
Training E

21:01 EDT

Galaxy Code Architecture
Slides, Videos, Video

How is the Galaxy code structured? What do the various other projects related to Galaxy do? What happens when I start Galaxy?

Please join us to explore various aspects of the Galaxy codebase, understand the various top-level files and modules in Galaxy, understand how dependencies work in Galaxy's frontend and backend, and a whole lot more.

Prerequisites
  • A laptop with a modern web browser (Google Chrome, Firefox and Safari will work best).

Speakers
avatar for Nate Coraor

Nate Coraor

System Administrator, Galaxy Project, Penn State University


Saturday July 18, 2020 21:01 - 23:30 EDT
Training A

21:01 EDT

How to write a JBrowse 2 plugin
 Slides, TutorialVideo

JBrowse 2 is a new genome browser that is built using ReactJS. It has new features for structural variant visualization and comparative genomics with things like split views, synteny views, Circos views and more. We will demonstrate how to setup JBrowse 2 and show how new plugins can create custom views, custom tracks, or custom data adapters in JBrowse 2.

Prerequisites
  • A wi-fi enabled laptop with a modern web browser.
  • Experience with Javascript a plus but not necessary.


Speakers
RB

Rob Buels

University of California, Berkeley
PX

Peter Xie

Software Developer, University of California, Berkeley


Saturday July 18, 2020 21:01 - 23:30 EDT
Training D

21:01 EDT

R / Bioconductor in the Cloud
Tutorial, AnVIL, Video

Bioconductor provides more than 1800 R packages for the analysis and comprehension of high-throughput genomic data. Most users install and run Bioconductor on a personal computer or perhaps use an academic cluster. Cloud-based solutions are increasing appealing, removing the headaches of local installation while providing access to (a) better, scalable computing resources; and (b) large-scale 'consortium' and other reference data sets. This session introduces the AnVIL cloud computing environment. We cover use of the cloud as
  • a replacement to desktop-style computing;
  • integrating workflows for 'upstream' processing of large data resources with interactive 'downstream' analysis and comprehension, using Human Cell Atlas single-cell datasets as an example; and
  • querying cloud-based consortium data for integration with a users' own data sets. 

Prerequisites
  • Participants should be comfortable working with R and RStudio.
  • Some familiarity with Bioconductor is helpful but not required.
  • No prior cloud-based experience is necessary.
  • A wifi enabled laptop with RStudio installed.

Speakers
avatar for Martin Morgan

Martin Morgan

Roswell Park Comprehensive Cancer Center
I am an evolutionary biologist by training, and my worst grades in college were in statistics and computer science. So it is a little ironic that I have spent the last fifteen years of my life in bioinformatics, working on the Bioconductor project for the statistical analysis and... Read More →
NT

Nitesh Turaga

Roswell Park Comprehensive Cancer Center
LS

Lori Shepherd

Senior Programmer / Project Management, Roswell Park Comprehensive Cancer Center
I am a member of the Bioconductor Core Team


Saturday July 18, 2020 21:01 - 23:30 EDT
Training B

21:01 EDT

Train the Galaxy Trainer
Chat, Slides

This workshop will introduce:
  • using Galaxy as a training tool
  • Determining aim and audience
    • e.g. single topic; string of related topics;
    • e.g. response to specific request for training; or general upskilling people in Galaxy bioinformatics
  • setting up appropriate infrastructure
    • usegalaxy.* resources
    • TIaaS
    • Your own
  • The available materials 
    • GTN tutorials
    • and/or write your own; including how to contribute it to GTN
    • Customising materials for your needs (Slides, language etc.)
  • Distributed workshops 
    • In practice
    • Local facilitators vs lead trainers
    • Using Zoom / Skype / other video conferencing software
  •  Practise setting up your own workshop?
    • eg. choose a topic from GTN
    • check that it runs on Galaxy server of choice
    • time it // modify if need be (e.g. cut down data set more?)
    • create schedule, eg google doc → publish → tinyurl
  • Getting good feedback!

Prerequisites
  • An interest in bioinformatics training and Galaxy


Speakers
avatar for Gareth Price

Gareth Price

Head of Computational Biology, QCIF Facility for Advanced Bioinformatics
MD

Maria Doyle

Application and Training Specialist, Peter MacCallum Cancer Centre
avatar for Simon Gladman

Simon Gladman

University of Melbourne


Saturday July 18, 2020 21:01 - 23:30 EDT
Training C

23:30 EDT

Break
Break!  Grab some food, check your email, stretch...

Saturday July 18, 2020 23:30 - Sunday July 19, 2020 00:15 EDT
Joint
 
Sunday, July 19
 

00:15 EDT

00:16 EDT

Automating Practical Classroom with GitHub Classroom
Slides, Video

GitHub Classroom is a product from GitHub (part of GitHub Education) that help trainers/educators to create repositories for students. It automates provisioning of Git repository for each student (private repository), and allows trainers/educators to keep track each private repository that has been provisioned to the students. This workshop will guide you how you can use GiHub Classroom to automate a practical session/classroom with students or even in a workshop. This workshop covers the following topics:

- Introduction to GitHub Classroom
- Creating a template repository
- Creating a classroom
- Enrolling students into a classroom
- Assigning project to students

Prerequisites

- GitHub ID (compulsory, please let me know your GiHub ID prior attending the session)
- Have a basic knowledge/skill in Git and GitHub or have attended "Getting started in Git using GitHub Desktop" session (optional)
- A laptop with a modern web browser. Google Chrome, Firefox, and Safari will work best


Speakers
avatar for Muhammad Farhan Sjaugi

Muhammad Farhan Sjaugi

Perdana University
avatar for Abd Rahim

Abd Rahim

UNITEN - Universiti Tenaga Nasional


Sunday July 19, 2020 00:16 - 02:45 EDT
Training C

00:16 EDT

Embedding JBrowse 2 in your website
Slides, Tutorial

JBrowse is the next generation of JBrowse genome browsers, with an all-new pluggable technology platform based on React, mobx-state-tree, and web workers. The embedded version, JBrowse 2 Embedded, is designed to be self-contained and easily embedded in any website without requiring any iframes or CSS hacking and without requiring any specific JavaScript frameworks.

We will show you several ways of embedding JBrowse 2 Embedded in your web-based tool or website.

Prerequisites
  • A wi-fi enabled laptop with a modern web browser.


Speakers
RB

Rob Buels

University of California, Berkeley
PX

Peter Xie

Software Developer, University of California, Berkeley


Sunday July 19, 2020 00:16 - 02:45 EDT
Training A

00:16 EDT

Introduction to RNA-Seq Analysis with Galaxy
Slides, Tutorial, Video

This workshop will introduce the concepts behind transcriptomics with NGS data and how to analyze this data in Galaxy. Specifically, this workshop will focus on de novo transcriptome reconstruction of RNA-seq data with the following goals:
  • comprehensive identification of all transcripts across an experiment
  • appropriately annotating classes of transcripts
  • generating abundance estimates across a transcriptome
  • significance testing of differentially expressed transcriptshttps://vimeo.com/494282844
  • visualisation of reads and transcript structures

Prerequisites

Speakers
avatar for Gareth Price

Gareth Price

Head of Computational Biology, QCIF Facility for Advanced Bioinformatics


Sunday July 19, 2020 00:16 - 02:45 EDT
Training D

00:16 EDT

Produce a portable germline variant-calling pipeline in CWL and WDL using Janis and GATK II
 Schedule, Slides, TutorialVideo

This is the second session of a two part training. Please register for both. The first session is on Saturday.

In this session, we'll use Janis (a Python workflow framework) to build a GATK pipeline to call variants. We'll show how Janis workflows can be translated to CWL and WDL, and how to use the Janis assistant to run these pipelines in CWLTool and Cromwell.

Through the use of containers, the pipeline produced in this workshop can be run locally, on HPCs with Singularity or through cloud vendors (such as Google Cloud and AWS).

Prerequisites
  • Some basic Python is preferred
  •  Installed Software: Docker, Python 3.6+

Speakers
avatar for Michael Franklin

Michael Franklin

Research Software Engineer, University of Melbourne
I'm a research software engineer at the University of Melbourne / Peter MacCallum Cancer Centre who's interested in all things pipelines! I develop Janis, a workflow assistant that generates CWL and WDL. I'm also interested in general programming, specifically web (React) and database-y... Read More →
RL

Richard Lupat

Peter MacCallum Cancer Centre


Sunday July 19, 2020 00:16 - 02:45 EDT
Training E

00:16 EDT

Reference data with CVMFS and remote jobs with Pulsar
 Schedule, Video

Learn to use CVMFS for easy access to ready-to-go terabytes of reference data in Galaxy. Then find out how to send jobs to the ends of the universe with Pulsar!

Prerequisites
  • Basic understanding of Galaxy from a developer point of view.
  • A laptop with a modern web browser. Google Chrome, Firefox, and Safari will work best


Speakers
avatar for Simon Gladman

Simon Gladman

University of Melbourne
CB

Catherine Bromhead

University of Melbourne


Sunday July 19, 2020 00:16 - 02:45 EDT
Training B

02:45 EDT

Break
Break!  Grab some food, check your email, stretch...

Sunday July 19, 2020 02:45 - 03:30 EDT
Joint

03:30 EDT

03:31 EDT

Getting your hands on Climate data
Slides, Tutorial, Video

Training on accessing and analyzing climate data in Galaxy. During this session you will understand how to use climate data for developing a simple adaptation case study using Galaxy Climate workbench. We will first explain the difference between climate and weather data; show how to visualize climate data on a map with Galaxy and then how to create a simple workflow for framing a very simple adaptation case study.

Prerequisites
  • Introduction to Using Galaxy or equivalent experience
  • A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best


Speakers
avatar for Anne Fouilloux

Anne Fouilloux

Research Software Engineer, University of Oslo
I am working on Galaxy Climate (development of tools, integration of climate data, training material).


Sunday July 19, 2020 03:31 - 06:00 EDT
Training B

03:31 EDT

Handling integrated biological data using Python, Jupyter, and InterMine
Tutorial, Video

This tutorial will guide you through loading and analyzing integrated biological data (generally genomic or proteomic data) using InterMine, either via UI or via an API in Python. Topics covered will include automatically generating code to perform queries, customising the code to meet your needs, and automated analysis of sets, e.g gene sets, including enrichment statistics. Skills gained can be re-used in any of the dozens of InterMines available, covering a broad range of organisms and dedicated purposes, from model organisms to plants, drug targets, and mitochondrial DNA.
Users will also learn how to import and export their gene and protein lists to and from Jupyter notebooks hosted on https://jupyter.org/.

Prerequisites
  • Little or no experience using Galaxy
  • A laptop with a modern web browser.

Speakers
avatar for Daniela Butano

Daniela Butano

Research SofwareEngineer, InterMine, University of Cambridge
avatar for Yo Yehudi

Yo Yehudi

Software Developer, University of Cambridge & Open Life Science
Integrated genomic data (InterMine)
RL

Rachel Lyne

Intermine, University of Cambridge


Sunday July 19, 2020 03:31 - 06:00 EDT
Training D

03:31 EDT

Introduction to Machine Learning
Slides, Tutorial

Questions:
  • What is machine learning and why it is useful?
  • How to use regression and classification techniques to create predictive models from biological datasets?
Learning objectives
  • Provide the basics of machine learning and its variants
  • Learn how to do classification and regression using the training and test data
  • Learn how to use Galaxy's machine learning tools

Prerequisites
  • Introduction to Galaxy or equivalent experience
  • A laptop with a modern web browser (Google Chrome, Firefox and Safari will work best).

Speakers
avatar for Alireza Khanteymoori

Alireza Khanteymoori

Postdoc Researcher, University of Freiburg


Sunday July 19, 2020 03:31 - 06:00 EDT
Training E

03:31 EDT

Processing of Single Cell RNA-Seq Data with Galaxy
Slides, Tutorial, Starting History, Complete History, Video

Single-cell RNA-seq analysis is a rapidly evolving field at the forefront of transcriptomic research. Galaxy offers a multitude of analysis options. In this training, participants will learn about the processing, mapping and quantification of 10x Genomics data from the raw barcoded reads to the count matrices.

Prerequisites

Speakers
avatar for Bjorn Gruning

Bjorn Gruning

University of Freiburg
avatar for Mehmet Tekman

Mehmet Tekman

Post Doc, University of Freiburg
Single-Cell, Developing Wrappers with Emacs
avatar for Hans-Rudolf Hotz

Hans-Rudolf Hotz

Friedrich Miescher Institute for Biomedical Research, Basel


Sunday July 19, 2020 03:31 - 06:00 EDT
Training A

03:31 EDT

Scripting Galaxy with BioBlend
Slides, Tutorial, Video

Galaxy has an always-growing API that allows for external programs to upload and download data, manage histories and datasets, run tools and workflows, and even perform admin tasks. This session will cover various approaches to access the API, in particular using the BioBlend Python library.

Prerequisites
  • Basic understanding of Galaxy from a developer point of view.
  • Python programming.
  • A laptop with a modern web browser (Google Chrome, Firefox and Safari will work best).

Speakers
avatar for Marius van den Beek

Marius van den Beek

Penn State University
avatar for Nuwan Goonasekera

Nuwan Goonasekera

University of Melbourne
avatar for Nicola Soranzo

Nicola Soranzo

Earlham Institute
CS

Clare Sloggett

University of Melbourne


Sunday July 19, 2020 03:31 - 06:00 EDT
Training C

06:00 EDT

Interregnum
East Training Day 2 is done, and the West Main Meeting Day 1 is coming.


Sunday July 19, 2020 06:00 - 10:00 EDT
Joint

10:00 EDT

BCC2020 Conference Day 1: West
Keynotes, accepted talks, posters, demos, and networking in the West.

Sunday July 19, 2020 10:00 - 15:00 EDT
Joint

10:01 EDT

Welcome
Welcome to the 2020 Bioinformatics Community Conference (BCC2020)!

We'll introduce the conference, talk about the logistics of this online event, and present last minute news. This session will also include a tribute to James Taylor, one of the founders and PIs of the Galaxy Project who had a huge impact on open source and open science.

We will also hold a short icebreaker or two.

Moderators
avatar for Dave Clements

Dave Clements

Training and Outreach Coordinator, Galaxy Project, Johns Hopkins University
avatar for Nomi Harris

Nomi Harris

BOSC Chair, LBNL
This is my 10th year chairing or co-chairing BOSC, the Bioinformatics Open Source Conference.In 2020, BOSC is part of the online Bioinformatics Community Conference, BCC2020.

Sunday July 19, 2020 10:01 - 10:30 EDT
Joint

10:30 EDT

West Keynote 1: How Open Source has Changed the World
Lincoln Stein, Ontario Institute for Cancer Research

This keynote will be presented live.

Abstract

During the week of March 16, 2020, the Ontario universities of Waterloo, Toronto, and McMaster closed their campuses due to the COVID-19 outbreak. Just a few days later, a small group of students who suddenly found themselves with lots of free time mounted a web site called flatten.ca to collect self-reported symptoms from individuals with COVID-19 and to display the distribution of cases across the country. On the first day it opened, flatten.ca had about 300 visitors. Within two weeks this number had swelled to 337,000 and continues to grow. The system is now used by public health authorities across the country, has been adopted by the City of Montreal as its official COVID-19 tracking system, and has spawned similar sites in locales as far away as Somalia. The students did not need to write a research grant proposal, apply to a health data registry for access, seek REB approval, or obtain software licenses. They perceived an urgent need, applied open source tools and methodologies, and built a fully functional system in record time, well ahead of the "professionals" in academia and industry.

This is the world that the pioneers of Open Source envisioned. One in which a passionate community of individuals can turn an idea into reality with a few keystrokes by building on top of a large set of unencumbered high quality tools, techniques and datasets.

However, it doesn't always go this way. In biomedical research we continue to be encumbered by antiquated protocols for accessing health data, stymied by published descriptions of computational protocols that are faulty or incomplete, impeded by the logistics of moving large data sets around, and blindered by restrictive data usage conditions that discourage the creative integration of diverse datasets. In this talk, I will look back over the progress we have made, and then look forward to the new paradigms for code and data sharing that promise to make success stories like flatten.ca the rule rather than the exception.


This keynote will be introduced by Nomi Harris.

Speakers
avatar for Lincoln Stein

Lincoln Stein

OICR
Lincoln Stein focuses on supporting biomedical research both in Ontario and around the world by making large and complex biological datasets findable, accessible and usable.Prior to joining OICR in 2006, Dr. Stein played an integral role in many large-scale data initiatives at Co... Read More →


Sunday July 19, 2020 10:30 - 11:15 EDT
Joint

11:15 EDT

Break!
Take a break!  Check your email, grab some food, acknowledge your family and pets, ...

Just make sure you are back in 15 minutes.

Sunday July 19, 2020 11:15 - 11:30 EDT
Joint

11:30 EDT

BOSC West Session 1a: Sequencing & analysis 🍐
The first talk session of BCC2020 is split into multiple tracks.  This track will include talks to submitted to the BOSC track.

Moderators
avatar for Chris Fields

Chris Fields

Director, HPCBio, University of Illinois Urbana-Champaign
I am a reformed molecular microbiologist associatively directing a moderately sized group of very smart people from crazy diverse backgrounds, and we all work on anything and everything sequence-related.

Sunday July 19, 2020 11:30 - 12:20 EDT
BOSC
  Meeting-West

11:30 EDT

Galaxy West Session 1: Applications and use cases 🌀
The first talk session of BCC2020 is split into multiple tracks.  This track will include talks to submitted to the Galaxy track.

Moderators
avatar for Delphine Lariviere

Delphine Lariviere

Penn State University
Post-doc in the Galaxy Team (Nekrutenko Lab). Works on bacterial genomics, assembly, RNA Seq, TnSeq. Also interested in evolution, metagenomics, epigenetics and visualisation.

Sunday July 19, 2020 11:30 - 12:45 EDT
Galaxy
  Meeting-West

11:31 EDT

Cooperative bacteriophage genome annotation in the biologist-friendly Galaxy and Apollo platforms 🌀
Abstract

Jolene Ramsey 1,2, Cory Maughmer 1,2, Anthony Criscione 1,2, Mei Liu 1,2, Ry Young 1,2, Jason J. Gill 1,3

  1. Center for Phage Technology, Texas A&M University, College Station, Texas, USA
  2. Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas, USA
  3. Department of Animal Science, Texas A&M University, College Station, Texas, USA

The presenter(s) will be available for live Q&A at the end of this session (BCC West).
In the modern genomic era, scientists without extensive bioinformatic training need to apply advanced computational analyses to genome annotation. At the Center for Phage Technology (CPT), we use two open source, web-based platforms: Galaxy, for reproducible computational analyses, and Apollo, a collaborative genome annotation editor, to facilitate annotation of phage genomes. The development and expansion of the Galaxy-Apollo bridge has been discussed at prior Galaxy Community Conferences, and the critical contributions by many former and current community members are gratefully acknowledged. In this presentation, we will describe how scientists and students have been trained to use semi-automated workflows in Galaxy and Apollo for collaborative annotation of genomes, including feature calling, contextualized functional prediction, and comparative genomics.
Unlike the genomes of most cellular life forms, phage genomes are usually a single contiguous molecule <200,000 bases in length. Their size allows high standards for complete, evidence-based annotations, and is amenable to genomics education settings. The CPT Galaxy and Apollo system is used for original biological research and development of new bioinformatic tools to analyze many individual phage genomes, as well as clusters of related phages. Our robust suite of phage-oriented tools includes open source applications such as PhageTerm, as well as unique programs for finding Shine-Dalgarno sequences, a collection of tools used for confident identification of lysis genes, and identification of interrupted genes that contain frameshifts or introns. The step-wise process moves all aspects of control and choice into the user’s court. In comparison to widely used automated and fast command-line annotation methods, our integrated and flexible approach benefits from trained human intervention to result in high-quality final annotations.
The CPT has educated a steady stream of scientists, as well as both undergraduate and graduate students, informally and through formal university course offerings on using this Galaxy-Apollo infrastructure to annotate phage genomes. The resulting data, continuously collated on our BioProject page (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA222858), is deposited in public sequence repositories and published regularly. By creating a free user account, local or international teams can begin their own analyses. Accompanying user training material in the Galaxy Training Network format is hosted at https://cpt.tamu.edu/training-material/.
Project Website: https://cpt.tamu.edu/galaxy-pub

Speakers
avatar for Jolene Ramsey

Jolene Ramsey

Postdoc, Texas A&M University
I love to study the viruses of bacteria, called bacteriophages, or phages. Ask me about viruses, or my favorite podcast, This Week in Virology.



Sunday July 19, 2020 11:31 - 11:45 EDT
Galaxy

11:31 EDT

Digital Expression Explorer 2: a repository of 8 trillion uniformly processed RNA-seq reads and still counting 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC East).

Mark Ziemann 1, Antony Kaspi 2

1 Deakin University, Geelong, Australia. Email: m.ziemann@deakin.edu.au
2 The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia.

Project Website: http://dee2.io/
Source Code: https://github.com/markziemann/dee2
License: (example: GNU General Public License v3.0)

RNA-seq is currently the most popular method for transcriptome-wide gene expression profiling, but despite data-sharing requirements, rates of data reuse are still very low. This is due to the need for high end computing infrastructure and pipelines that require command line expertise for raw data processing. Resources such as Recount2, ARCHS4 and Digital Expression Explorer 2 (DEE2) provide easy access to some uniformly processed data, with queryable web interfaces, bulk downloads and R packages.

Keeping up with the rapid pace of data deposition to the Short Read Archive (SRA) is proving a challenge. As of May 2020, there are 1.49M samples available in SRA for the nine organisms included in DEE2, and of these 0.88M are available as processed data in DEE2 (Figure 1). This makes DEE2 coverage about twice as extensive as the next largest dataset (ARCHS4). Since original publication in 2019, DEE2 has grown from 5.3 to 8.05 T mapped reads.

In this presentation I will outline the challenges and strategies in maintaining and growing resources of this scale. In addition we will discuss recent enhancements including direct integration of the web interface to Degust (http://degust.erc.monash.edu/), a popular web based tool for statistical analysis of RNA-seq data. The R package getDEE2 has been extensively updated and submitted to BioConductor. It allows programmatic access to DEE2 datasets in the form of SummarizedExperiment objects that are compatible with many downstream analysis tools in the BioConductor ecosystem. Together these advances are helping DEE2 to achieve the goal of making all RNA-seq data freely available to everyone.


Speakers
avatar for Mark Ziemann

Mark Ziemann

Deakin University
### Hi there 👋I am a Lecturer and researcher in computational biology at Deakin University, Australia. Our group is focused on building data resources and software tools to accelerate biomedical discovery. We collaborate closely with clinicians and biologists to get the most out... Read More →



Sunday July 19, 2020 11:31 - 11:45 EDT
BOSC

11:45 EDT

Community genome annotation integrates with Galaxy via Apollo providing greater integration and more functional annotation options 🌀
➞ Abstract 

Nathan Dunn 1, Helena Rasche 2, Anthony Bretaudeau 3, Ian Holmes 4

  1. Lawrence Berkeley National Lab, Berkeley, CA
  2. University of Freiburg, Freiburg, Germany
  3. French National Institute for Agriculture, Food, and Environment (INRAE), Rennes, France
  4. University of California Berkeley, Berkeley, CA

The presenter(s) will be available for live Q&A at the end of this session (BCC West)

Speakers
avatar for Nathan Dunn

Nathan Dunn

Software Developer, Lawrence Berkeley National Lab



Sunday July 19, 2020 11:45 - 11:50 EDT
Galaxy

11:45 EDT

Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Sam Kovaka 1, Yunfan Fan 2, Bohan Ni 1, Winston Timp 2, Michael C. Schatz 1,3,4
Email: skovaka1@jhu.edu

1 Department of Computer Science, Johns Hopkins University, Baltimore, MD.
2 Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 3. Department of Biology, Johns Hopkins University, Baltimore, MD
4. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

Project Source Code: https://github.com/skovaka/UNCALLED
License: MIT License

ReadUntil sequencing allows nanopore devices to selectively stop sequencing an individual read in real-time by ejecting it from the pore and immediately switch to another read. If reads could be rapidly mapped to large references while being sequenced, this would enable targeted sequencing of specific genomic regions or even specific genomes. However, most mapping methods require basecalling, which is computationally intensive and requires a significant amount of the read to be sequenced.

Here we present UNCALLED (Utility for Nanopore Current ALignment to Large Expanses of DNA), an open-source mapper rapidly matches raw streaming nanopore current signals to a large DNA reference without basecalling. This is accomplished by probabilistically considering all possible k-mers that the signal could represent, and then pruning the possibilities based on the reference genome sequence encoded using an FM-index. Importantly, UNCALLED dynamically adjusts the signal level model probability cutoffs during alignment to achieve both high accuracy and high speed when aligning the noisy signal data.

We used UNCALLED to deplete the sequencing of known bacterial genomes within a Zymo mock microbial community, enriching the remaining yeast sequence from ~20x coverage to ~100x. We also used UNCALLED to enrich for 148 human genes associated with hereditary cancers to 29.6x coverage (a 5.6 fold increase) using a single MinION flowcell, enabling accurate detection of SNPs, indels, structural variants (SVs), and methylation in these genes. Notably, twice as many SVs were detected compared to 50x coverage Illumina sequencing, verified by whole-genome nanopore and PacBio HiFi sequencing. Finally, we show that UNCALLED could be used to enrich larger gene panels such as all 717 genes in the COSMIC Census, or be used with cDNA/RNA sequencing, for example to deplete high- abundance transcripts.



Speakers
SK

Sam Kovaka

Johns Hopkins University



Sunday July 19, 2020 11:45 - 11:50 EDT
BOSC

11:50 EDT

THAPBI PICT -- a metabarcoding analysis pipeline developed as a Phytophthora ITS1 Classification Tool 🍐
AbstractSlidesVideo

The presenter(s) will be available for live Q&A in this session (BCC West).

Peter Cock 1, David Cooke 2, Leighton Pritchard 3

1 Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee, UK
2 Cell and Molecular Sciences, James Hutton Institute, Invergowrie, Dundee, UK
3 Strathclyde Institute of Pharmacy & Biomedical Sciences, Glasgow, UK

Repository: https://github.com/peterjc/thapbi-pict/
Documentation: https://thapbi-pict.readthedocs.io/
License: MIT

Molecular barcodes are central to environmental monitoring and identification of species present in a
sample, and use PCR primers to amplify a diagnostic genome region of the organisms of interest. We are
interested in metabarcoding where multiple samples are multiplexed for high-throughput sequencing on the
Illumina platform, using overlapping paired end reads. Each sample yields a collection of marker sequences,
and matching these to a database of known species produces a taxonomic breakdown reflecting community
composition,
THAPBI PICT is a metabarcoding tool we developed for the UK funded Tree Health and Plant Biose-
curity Initiative (THAPBI) Phyto-Threats project, which focused on identifying Phytophthora species in
commercial tree nurseries. Phytophthora (from Greek meaning plant-destroyer) are economically important
plant pathogens, important in both agriculture and forestry. This project targeted an ITS1 marker (Internal
Transcribed Spacer one, a region found in eukaryotic genomes between the 18S and 5.8S rRNA genes) with
nested primers to identify Phytophthora species. By varying primer settings and using a custom database,
THAPBI PICT can be applied to other organisms and/or barcode marker sequences - making it more than
just a Phytophthora ITS1 Classification Tool (PICT).
The analysis pipeline starts from demultiplexed paired FASTQ files, as produced by the Illumina MiSeq
platform. These are quality trimmed, overlapping reads merged and primer trimmed (calling external tools)
and then deduplicated giving a much smaller list of unique sequences and associated read counts (passing a
minimum count threshold intended to exclude "noise"). These are matched to a curated database using a
range of methods, producing both plain text and formatted Excel output. An edit graph in XGMML format
is also produced for display in Cytoscape and other visualisation tools.
THAPBI PICT is released as open source software under the MIT licence. It is written in Python, a free
open source language available on all major operating systems. Version control using git hosted publicly on
GitHub is used for the source code, documentation, and database builds including tracking the hand-curated
reference set of Phytophthora ITS1 sequences. Continuous integration of the test suite is currently run on
both TravisCI and CircleCI. Software is released to the Python Packaging Index (PyPI) as standard for
the Python ecosystem, and additionally packaged for Conda via the BioConda channel. This offers simple
installation of the tool itself, and all the command line dependencies on Linux or macOS. The documentation
is currently hosted on Read The Docs, updated automatically from the GitHub repository.


Speakers
avatar for Peter Cock

Peter Cock

The James Hutton Institute
Bioinformatician at the James Hutton Institute, a member of the BOSC organizing committee, treasurer of the Open Bioinformatics Foundation, and a core developer on the Biopython project.



Sunday July 19, 2020 11:50 - 11:55 EDT
BOSC

11:50 EDT

Computational chemistry analysis using Galaxy: Exploring antigen-antibody binding patterns for MUC1-AR20.5 🌀
➞ Abstract

Christopher Barnett 1, Tharindu Senapathi 1, Sean Collins 2, Kyllen Dilsook 2, Natalie Terry 2

  1. Scientific Computing Research Unit and Department of Chemistry, University of Cape Town, Rondebosch, 7701, South Africa. Email: chris.barnett@uct.ac.za
  2. Department of Chemistry, University of Cape Town, Rondebosch, 7701, South Africa

The presenter(s) will be available for live Q&A at the end of this session in both BCC West and BCC East.

Speakers
avatar for Chris Barnett

Chris Barnett

Lecturer, University of Cape Town



Sunday July 19, 2020 11:50 - 12:05 EDT
Galaxy

11:55 EDT

Please contribute to FASTQE so I don’t have to 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC East).

Andrew Lonsdale1,2,

1 Peter MacCallum Cancer Centre, Melbourne, Victoria 3000, Australia. Email: andrew.lonsdale@petermac.org
2 Sir Peter MacCallum Department of Oncology, The University of Melbourne, Victoria 3010, Australia

Project Website: http://fastqe.com
Source Codehttps://github.com/lonsbio/fastqe
License: MIT License

FASTQE is a utility for viewing the quality of biological sequence data as emoji . It
takes the FASTQ format, summarises the average quality score per base-position, and
transcribes each ASCII-encoded Phred summary score into a corresponding emoji to see the
good , the bad ,and the ugly of sequencing data.

Initially just a proof of concept at the end of a 2016 PyConAU talk, it has gradually evolved
into a Python package that is also available as a command line program. It can be
installed both via PyPI and Bioconda. When invoked from the command line it can also
display the minimum and maximum quality scores per position, and bin quality
scores into a reduced set of emoji.

Despite little promotion beyond social media (@fastqe), it has gained some popularity.
FASTQE has been used for an undergraduate command line workshop [1], presentations,
and workshops. Surprisingly , there have even been serious uses of the tool. Using
FASTQE, it was found that artefacts in single-cell RNA-seq data can increase the burden of
error correction in cell barcodes, and revealed at least one case of a software bug that
can lead to incorrect barcode correction .

Despite these compelling use cases, FASTQE has a bus-factor of 1. In order to provide
a more valuable tool for bioinformatics training, education and outreach, contributions are
needed. This presentation will demonstrate the functionality of FASTQE, outline the current
status of the project, a roadmap for enhancements, and a call for more contributions to this
open source project. Everyone knows this is a silly idea . This talk will persuade future
contributors that maybe it isn't a silly as it sounds .

[1] Rachael St. Jacques, Max Maza, Sabrina Robertson, Guoqing Lu, Andrew Lonsdale, Ray A Enke (2019). A Fun
Introductory Command Line Exercise: Next Generation Sequencing Quality Analysis with Emoji!. NIBLSE
Incubator: Intro to Command Line Coding Genomics Analysis, (Version 2.0). QUBES Educational Resources.
doi:10.25334/Q4D172

Speakers
AL

Andrew Lonsdale

Peter MacCallum Cancer Centre, Melbourne, Victoria 3000, Australia



Sunday July 19, 2020 11:55 - 12:00 EDT
BOSC
  Meeting-West

12:00 EDT

A reproducible workflow for amplicon-based microbial community analysis using the drake R package 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Rodrigo Ortega-Polo 1, Shefali Vishwakarma 2,3, Lan Tran 4, Amanda Gregoris 4, Marta Guarna 4

1 Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada; Lethbridge, Alberta,
Canada. Email: rodrigo.ortegapolo@canada.ca
2 Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada; Lethbridge, Alberta,
Canada.
3 Department of Molecular Biology and Biochemistry, Simon Fraser University; Surrey, British Columbia,
Canada.
4 Beaverlodge Research Farm, Agriculture and Agri-Food Canada; Beaverlodge, Alberta, Canada.

Project Website: https://github.com/BeeCSI-Microbiome/dada2_drake_workflow
Source Code: https://github.com/BeeCSI-Microbiome/dada2_drake_workflow
License: MIT License

The use of workflow management systems promotes best practices in computational biology such
as reproducibility, provenance tracking and documentation of steps and parameters used in
analyses. Furthermore, the ability to restart workflows from a given point in the analysis instead of
starting over provides an efficient way for developing data analysis pipelines. The drake R package
is a framework for workflow management that allows users to design and visualize workflows
status in a reproducible and scalable manner (Figure 1). In our work, we used drake to design a
pipeline for amplicon-based microbial community data using DADA2 for denoising and taxonomic
classification, phyloseq and other R packages for visualization and data tidying. We implemented
this workflow for the analysis of 16S rRNA microbial community datasets from the honey bee gut
microbiome. This workflow has the advantage of enabling users to evaluate microbial communities
with amplicon sequencing data working entirely within R.

Speakers
RO

Rodrigo Ortega-Polo

Agriculture and Agri-Food Canada



Sunday July 19, 2020 12:00 - 12:05 EDT
BOSC

12:05 EDT

Q & A 🌀
Question and Answer session for the just finished talks.

Moderators
avatar for Delphine Lariviere

Delphine Lariviere

Penn State University
Post-doc in the Galaxy Team (Nekrutenko Lab). Works on bacterial genomics, assembly, RNA Seq, TnSeq. Also interested in evolution, metagenomics, epigenetics and visualisation.

Sunday July 19, 2020 12:05 - 12:10 EDT
Galaxy

12:05 EDT

SigBio-Shiny: A standalone interactive application for detecting biological significance on a set of genes 🍐
→ Abstract


The presenter(s) will be available for live Q&A in this session (BCC East).

Sangram Keshari Sahu

Independent Researcher, Banglore, India.

Email: sangramsahu15@gmail.com
Project Website: https://github.com/sk-sahu/sig-bio-shiny
Source Code: https://github.com/sk-sahu/sig-bio-shiny
Licence: MIT Licence

Detecting biological significance is an essential step for any high-throughput sequence analysis.
Once sequence reads are mapped and assembled, this is followed by different quantification
analysis which ends up with a set of features (transcript/gene). Quickly exploring those features
together from different angles along with statistical inference gives a good idea about the
biology they are involved.
Doing these kinds of exploration for a particular organism requires an up to date annotation
database. Currently available online/API platforms support either very few or only model
organisms. Apart from that, reproducibility is a primary issue as databases continually updated.
To tackle these problems I am presenting SigBio-Shiny, a standalone interactive application
based on R-Shiny which supports more than just model organisms with no requirement of
manual database maintenance. It leverages available open-source resources such as
Bioconductor's AnnotationHub to col ect the organism's updated database in real-time with
keeping track of what version of the database used. On top of this database, it helps with
detecting biological significance on a set of genes by doing gene mapping, enrichment analysis
of Gene Ontology (GO) and Pathway analysis.
Keywords: Interactive application, Significant biology, Non-model Organism, Annotation
database, Gene mapping, Gene Ontology (GO), Pathway, Enrichment analysis

Speakers
avatar for Sangram Keshari Sahu

Sangram Keshari Sahu

Genomics Data Scientist



Sunday July 19, 2020 12:05 - 12:10 EDT
BOSC
  Meeting-West

12:10 EDT

Streamlining accessibility and computability of large-scale genomic datasets with the NHGRI genome data science Analysis, Visualization, and Informatics Lab-Space (ANVIL) 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Michael C. Schatz 1, Anthony Philippakis 2, on behalf of the AnVIL project team 3
                                                
1 Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD. Email: mschatz@cs.jhu.edu
2 Broad Institute of MIT and Harvard, Cambridge, MA
3 City University of New York, Harvard, Oregon Health & Sciences University, Penn State, Roswell Park Cancer Institute, University of California Santa Cruz, University of Chicago, Vanderbilt, Washington University.


Project Website: https://anvilproject.org/ 
Source Code: https://github.com/anvilproject 
License: MIT License


The traditional model of genomic data sharing – centralized data warehouses such as dbGaP from which researchers download data to analyze locally – is increasingly unsustainable. Not only are transfer/download costs prohibitive, but this approach also leads to redundant siloed compute infrastructure and makes ensuring security and compliance of protected data highly problematic.
The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-Space, or AnVIL, inverts this model, providing a cloud environment for the analysis of large genomic and related datasets. By providing a unified environment for data management and compute, AnVIL eliminates the need for data movement, allows for active threat detection and monitoring, and provides elastic, shared computing resources that can be acquired by researchers as needed. AnVIL provides access to key NHGRI datasets, such as the CCDG (Centers for Common Disease Genomics), CMG (Centers for Mendelian Genomics), eMERGE (Electronic Medical Records and Genomics), as well as other relevant datasets.
The platform is built on a set of established components that have been used in a number of flagship scientific projects. The Terra platform provides a compute environment with secure data and analysis sharing capabilities. Dockstore provides standards based sharing of containerized tools and workflows. Bioconductor and Galaxy provide environments for users at different skill levels to construct and execute analyses. The Gen3 data commons framework provides data and metadata ingest, querying, and organization.
AnVIL provides a collaborative environment for creating and sharing data and analysis workflows for both users with limited computational expertise and sophisticated data scientist users. It provides multiple entry points for data access and analysis, including execution of batch workflows written in WDL, notebook environments including Jupyter and RStudio, Bioconductor packages for building analysis on top of AnVIL APIs and services, and will offer Galaxy instances for interactive analysis. It will be possible to integrate additional analysis environments through standard APIs.
Long-term, the AnVIL will provide a unified platform for ingestion and organization for a multitude of current and future genomic and genome-related datasets. Importantly, it will ease the process of acquiring access to protected datasets for investigators and drastically reduce the burden of performing large- scale integrated analyses across many datasets to fully realize the potential of ongoing data production efforts.
                                   
    

Speakers
MS

Michael Schatz

Johns Hopkins University



Sunday July 19, 2020 12:10 - 12:15 EDT
BOSC

12:10 EDT

Integrating and analyzing genotype, phenotype, and environmental data through CartograTree and Tripal Galaxy 🌀
➞ Abstract

Irene Cobo-Simón 1, Nic Herndon 2, Margaret Staton 3, Emily Grau 4, Sean Buehler 4, Peter Richter 4, Risharde Ramnath 4, Charlie Demurjian 4, Abdullah Almsaeed 3, Jill Wegrzyn 4

  1. Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
  2. Department of Computer Science, East Carolina University, NC, USA
  3. Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
  4. Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA

The presenter(s) will be available for live Q&A at the end of this session (BCC West)

Speakers
avatar for Irene Cobo

Irene Cobo

Postdoctoral Scholar, Department of Ecology and Evolutionary Biology, University of Connecticut
My research interest is mainly focused on evolutionary biology from a molecular perspective. In particular, I am interested in studying the genomic basis of adaptation and biodiversity.



Sunday July 19, 2020 12:10 - 12:25 EDT
Galaxy

12:15 EDT

A comprehensive benchmarking of WGS-based structural variant callers 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC East).

Varuni Sarwal 1,2, Sebastian Niehus 3,4, Ram Ayyala 1, Serghei Mangul 5

1 University of California, Los Angeles, CA 90095, USA. Email: sarwal8@gmail.com
2 Indian Institute of Technology Delhi, Hauz Khas, New Delhi, Delhi 110016, India
3 Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Str. 2, 10178 Berlin, Germany
4 Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin,
Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, 10117 Berlin, Germany
5 University of Southern California, Los Angeles, CA 90089, USA

Project Website: https://github.com/Mangul-Lab-USC/benchmarking_SV_publication
Source Code: https://github.com/Mangul-Lab-USC/benchmarking_SV_publication
License: MIT License

Structural variants (SVs) are genomic regions that contain an altered DNA sequence due to
deletion, duplication, insertion, or inversion, and have varying pathogenicity of disease.
Dissecting SVs from whole genome sequencing (WGS) data presents a number of challenges
and a plethora of SV-detection methods have been developed. Currently, there is a paucity of
evidence which investigators can use to select appropriate SV-detection tools. We evaluated the
performance of 15 SV-detection tools based on their ability to detect deletions from aligned
WGS reads using a comprehensive PCR-confirmed gold standard set of SVs to find methods
with a good balance between sensitivity and precision. While the number of true deletions is
3710, the number of deletions detected by the tools ranged from 899 to 82,225. 53% of the
methods reported fewer deletions than are known to be present in the sample. The length
distribution of detected deletions varied across tools and was substantially different from the
distribution of true deletions. 53% of tools underestimate the true size of SVs and deletions
detected by BreakDancer were the closest to the true median deletion length. We allowed
deviation in the coordinates of the detected deletions and compared deviations to the coordinates
of the true deletions from 0 to 10,000 bp. Manta achieved the highest f-score for all thresholds.
Methods with high specificity rates tend to also have significantly higher f-score and precision
rates. CLEVER was able to achieve the highest sensitivity while the most precise method was
PopDel. We assessed the performance of SV callers at coverages from 32x to 0.1x generated by
down-sampling the original WGS data. DELLY showed the highest F-score for coverage below
4x while Manta was the best performing tool from 8x to 32x. We assessed the effect of deletion
length on the accuracy of detection. Manta and CREST were the only tools with high specificity
for deletions shorter than 500bp. LUMPY was the only method able to deliver an F-score above
30% across all categories. Manta and LUMPY were the best performing tools for general
applications. Our recommendations can help researchers choose the best SV detection software,
as well as inform the developer community of the challenges of SV detection.

Speakers
avatar for Varuni Sarwal

Varuni Sarwal

Undergraduate student, UC Los Angeles



Sunday July 19, 2020 12:15 - 12:20 EDT
BOSC
  Meeting-West

12:20 EDT

Q&A for session B1a 🍐
The presenter(s) will be available for live Q&A in this session.

Moderators
avatar for Chris Fields

Chris Fields

Director, HPCBio, University of Illinois Urbana-Champaign
I am a reformed molecular microbiologist associatively directing a moderately sized group of very smart people from crazy diverse backgrounds, and we all work on anything and everything sequence-related.

Sunday July 19, 2020 12:20 - 12:25 EDT
BOSC

12:24 EDT

BOSC West Session 1b: Open data 🍐
The first talk session of BCC2020 is split into multiple tracks.  This track will include talks to submitted to the BOSC track.

Moderators
avatar for Chris Fields

Chris Fields

Director, HPCBio, University of Illinois Urbana-Champaign
I am a reformed molecular microbiologist associatively directing a moderately sized group of very smart people from crazy diverse backgrounds, and we all work on anything and everything sequence-related.

Sunday July 19, 2020 12:24 - 12:39 EDT
BOSC
  Meeting-West

12:25 EDT

Tripal: an example of successful open-source distributed team development 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Margaret Staton 1*, Abdullah Almsaeed 1, Noah Caldwell 1, Ethalinda Cannon 2, Valentin Guignon 3,
Doreen Main 4, Monica Polechau 5, Manuel Ruiz 3, Jill Wegrzyn 6, Bradford Condon 1, Stephen Ficklin 6,
Lacey Anne Sanderson 7

1. University of Tennessee, Knoxville, TN, USA. * Email: mstaton1@utk.edu
2. Iowa State University, Ames, Iowa, USA.
3. Bioversity International, Montpellier, France.
4. Washington State University, Pullman, WA, USA.
5. USDA-ARS National Agricultural Library, Beltsville, MD, USA.
6. University of Connecticut, Storrs, Connecticut, USA.
7. University of Saskatchewan, Saskatoon, Saskatchewan, Canada

Project Website: http://tripal.info/
Source Code: https://github.com/tripal
License: GNU General Public License v2.0

Tripal is an open-source software toolkit for building community-oriented biological databases
with a focus on genetic and genomic data. Beyond database structure and data access, it provides a
mechanism for data standardization and consistent implementation of FAIR principles across
communities. Currently, the Tripal software provides the foundation for over 30 databases
spanning animals, plants, insects, and more. Tripal has an active international developer
community working from academia, government agencies, and research institutes. Over the past
decade, the Tripal developer community has built a distributed team software development model
with over 30 developers from at least 10 different research groups and 3 countries. Two aspects to
Tripal have helped to make this a success. First, we have recently defined a community governance
structure with a project management committee and an internal advisory board. These function to
promote communication, provide a mechanism for shared decision making, and balance innovation
with sustainability. Second, Tripal's architecture consists of a core of common, centralized
functionality that can be easily expanded with shareable extension modules. This balances shared
community structure and reusable code with the need for individual research groups to customize
and develop quickly and independently. We have noted some disadvantages, but mostly
advantages, due to the unique community structure and software architecture.

Speakers
avatar for Margaret Staton

Margaret Staton

Assistant Professor, University of Tennessee, Knoxville
On the cyberinfrastructure side, I work on community genome databases (particularly Tripal software) and mobile apps for citizen science/outreach. I also do a lot with basic data analysis around genomes, transcriptomes, and epigenomes of plants.



Sunday July 19, 2020 12:25 - 12:30 EDT
BOSC

12:25 EDT

Automated real-time data analysis and visualizations for the SARS-CoV-2/Covid19 portal 🌀
➞ Abstract

Marius van den Beek 1, Dannon Baker 2, Anton Nekrutenko 1

  1. Department of Biochemistry and Molecular Biology, Penn State University, University Park PA, USA
  2. Department of Biology, Johns Hopkins University, Baltimore MD, USA

The presenter(s) will be available for live Q&A at the end of this session (BCC West).

Speakers
avatar for Marius van den Beek

Marius van den Beek

Penn State University



Sunday July 19, 2020 12:25 - 12:40 EDT
Galaxy

12:30 EDT

BioThings Explorer: A platform for distributed knowledge integration across biomedical APIs 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

JiwenXin 1,SebastianLelong 1, XinghuaZhbionitioou 1, MarcoCano 1, GingerTsueng 1, ChunleiWu 1, Andrew
Su 1

1 Scripps Research, 10550 North Torrey Pines Road, La Jolla, CA 92037, kevinxin@scripps.edu

Project Website: https://biothings.io/explorer/ 
Source Code: https://github.com/biothings/biothings_explorer 
License: Apache License

BioThings Explorer (BTE) represents a distributed biomedical data integration solution that enables complex queries to be constructed, executed by aligning and connecting disparate RESTful APIs. It facilitates exploring and querying the vast wealth of biomedical data, which is continuously being generated by investigators, granting users the opportunity to seek out logical relationships between bio-entities and discover hidden connections in biomedical data without the burden to build a centralized data warehouse.

BioThings Explorer leverages SmartAPI (https://smart-api.info), an API registry that extends the OpenAPI standard. SmartAPI records provide rich metadata info of the type of associations (e.g. Disease (input) -> treated_by -> Gene (output)) an API is able to deliver, as well as how to retrieve that association. (An example can be found at https://bit.ly/smartapi_opentarget). Together, these SmartAPI records form a metaknowledge graph (https://smartapi.info/registry/translator/meta-kg) that describes the compatibility of APIs based on shared input and output types. BioThings Explorer can then take advantage of the metaknowledge graph to automate the planning and execution of queries across the API network based on specific user requests.

Compared to traditional centralized data integration solutions, BTE offers several advantages. First, it can be easily extended by the community. Adding a new API into the distributed knowledge graph only requires the creation of a SmartAPI metadata record, not the addition of any new code to enforce standardized syntax. Because of its extensibility, over 27 APIs have already been integrated into BTE, covering 138 API operations and 14 semantic types. Second, querying source APIs on the fly guarantees that the data retrieved are always up-to-date with the source. Last, this approach is highly scalable, since the BTE client runs on each user's own computing infrastructure, so there is no centralized component that could become a single point of failure.

Through both the Python package and the web interface, BioThings Explorer can be used to answer two classes of queries -- "PREDICT" and "EXPLAIN". The EXPLAIN queries are designed to identify plausible reasoning chains to explain the relationship between two entities, for example, Why does imatinib have an effect on the treatment of chronic myelogenous leukemia (CML)? (try it live at CoLab: https://bit.ly/bte_explain_colab). And the PREDICT queries are designed to predict plausible relationships between one entity and an entity class, for example, What drugs might be used to treat hyperphenylalaninemia? (try it live at CoLab: https://bit.ly/bte_predict_colab).

Speakers
avatar for Jiwen Xin

Jiwen Xin

Scripps Research
I'm a senior staff scientist in Scripps Research. I'm a Ph.D. in Biology and a self-taught computer engineer. I love combining my expertise in both Biology and Computer Science to build scalable and high performance open source applications to facilitate biomedical research.



Sunday July 19, 2020 12:30 - 12:35 EDT
BOSC

12:35 EDT

Don’t worry about data management - use Cenzontle 🍐
Abstract

The presenter(s) will be available for live Q&A in this session (BCC West).

Asis Hallab 1 , Verónica Suaste 2 , Francisco Ramírez 2 , Constantin Eiteneuer 1 , Thomas Voecking 1 , Alicia Mastretta-Yanes 2

1 Jülich Research Center, Germany. Email: asis.hallab@gmail.com
2 CONABIO, Mexico.

Project Website: https://sciencedb.github.io/ 
Source Code: https://github.com/ScienceDb
License: GPL-3

The need for a feature complete flexible management suite capable of handling big distributed data 
In life sciences data often is diverse, interdisciplinary, and stored at different sites. The reproducibility crisis has long been recognized. In the US alone an annual loss of 28 billion dollars has been attributed to research funding spent on projects that yielded not reproducible results (doi.org/10.1371/journal.pbio.1002165). Identified causes are diverse but regularly comprise insufficient data management. Data should be findable, accessable, interoperable, and reusable (FAIR) and a concise data management plan is key to receiving funding and publication. The problem is that creating a suitable data management platform is a considerable software engineering task in itself, more so for diverse big data. And even more so if several distributed data warehouses shall be integrated. Efficient and reliable data management often has no ideal solution, because research groups need to do science not data warehouse software engineering.

Solution: Have software built your data administration warehouse for you
We present Cenzontle. A set of automatic software generators that create your custom data warehouse for you automatically. Define your data formats in standard JSON and get a fully functional warehouse with none to minimal coding effort. The warehouse comprises two interfaces. A graphical browser based one that follows Google’s material design standards and thus have both a professional look and intuitive handling. No documentation is needed to use it. Custom visualizations with Plotly can be integrated and help the scientist to explore the data and form hypotheses. A programmatic interface (API) allows data scientists to build exhaustive queries, execute them efficiently, and thus feed data directly into their analysis pipelines from any programming language. A luxurious IDE helps with query building and has a complete searchable documentation. Standard “CRUD” access functions are offered to all data models. Data can be created, also en mass by uploading tables. It can be read, searched, sorted, and separated into mouth sized subsets. Records can be updated and deleted, of course. Most importantly different data storages can be incorporated. Use any number of databases and servers you like. Relations between records even on different servers is included. Full security is guaranteed using standard authentication and role based authorization, verified on each standard access function.

Speakers
AH

Asis Hallab

Jülich Research Center



Sunday July 19, 2020 12:35 - 12:40 EDT
BOSC

12:40 EDT

Q & A 🌀
Question and Answer session for the just finished talks.

Moderators
avatar for Delphine Lariviere

Delphine Lariviere

Penn State University
Post-doc in the Galaxy Team (Nekrutenko Lab). Works on bacterial genomics, assembly, RNA Seq, TnSeq. Also interested in evolution, metagenomics, epigenetics and visualisation.

Sunday July 19, 2020 12:40 - 12:45 EDT
Galaxy

12:40 EDT

Q&A for session B1b 🍐
→ Abstract


The presenter(s) will be available for live Q&A in this session (not sure yet wich hemisphere).

Moderators
avatar for Chris Fields

Chris Fields

Director, HPCBio, University of Illinois Urbana-Champaign
I am a reformed molecular microbiologist associatively directing a moderately sized group of very smart people from crazy diverse backgrounds, and we all work on anything and everything sequence-related.

Sunday July 19, 2020 12:40 - 12:45 EDT
BOSC

12:45 EDT

Break!
Take a break!  Check your email, grab some food, acknowledge your family and pets, ...

Just make sure you are back in 15 minutes.

Sunday July 19, 2020 12:45 - 13:00 EDT
Joint

13:00 EDT

Broad Institute Data Sciences Platform Sponsor Table
The Broad Institute's Data Sciences Platform aims to accelerate science, transform medicine, and improve lives through data technologies. It is a diverse organization of more than 160 people including engineers, computational scientists and designers who work together and with many external collaborators to deliver high-quality open source software and services, such as the Genome Analysis Toolkit (GATK), the Cromwell workflow management system and Terra, the Broad Institute's cloud-based data access and analysis platform.

Please stop by and learn more about the Broad Data Sciences Platform. We are located on the first floor of the Poster / Demo building,

Speakers
avatar for Geraldine Van der Auwera

Geraldine Van der Auwera

Director of Outreach and Communications, Broad Institute Data Sciences Platform
I direct outreach and communication efforts for the software and services developed by the Data Sciences Platform at the Broad Institute, which include GATK, the Broad's open source toolkit for variant discovery analysis; the Cromwell/WDL workflow management system; and Terra.bio... Read More →



Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

eLife Innovation Sponsor Table
eLife works to improve research communication through open science and open technology innovation.

eLife is a non-profit organisation inspired by research funders and led by scientists. Our mission is to help scientists accelerate discovery by operating a platform for research communication that encourages and recognises the most responsible behaviours in science.

eLife sponsored childcare at the 2018 joint conference, and again at the 2019 Galaxy Conference. This year eLife is sponsoring closed captioning for conference talks.

Please stop by and learn more about eLife. We are located on the first floor of the Poster / Demo building,

Speakers
avatar for Emmy Tsang

Emmy Tsang

Innovation Community Manager, Delft University of Technology



Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

GigaScience Sponsor Table
GigaScience is an online open access, open data, open peer-review journal published by Oxford University Press and BGI. The journal offers ‘big data’ research from the life and biomedical sciences, and on top of 'Omics research includes the growing range of work that uses difficult-to-access large-scale data, such as imaging, neuroscience, ecology, systems biology, and other new types of shareable data. GigaScience is unique in the publishing industry as it publishes all research objects (data, software tools, source code, workflows, containers and other elements related to the work underpinning the findings in the article). Promoting Open Science, all published software needs to be under an OSI-license, all supporting data must be available and open, and all peer review is carried out transparently. Presenting workflows via our GigaGalaxy.net server, novel work presented at the meeting utilising Galaxy is eligible to a 15% APC if it is submitted to our Galaxy series.

Please stop by and learn more about GigaScience. We are located on the first floor of the Poster / Demo building,

Speakers
avatar for Laurie Goodman

Laurie Goodman

Publishing Director, GigaScience Press
Laurie Goodman, PhD, is the Publishing Director for GigaScience Press, which publishes the international, open-science journals GigaScience and GigaByte. Both journals have won awards for Innovation in publishing. Dr. Goodman received her BS and MS from Stanford University in 1986... Read More →
avatar for Ken Cho

Ken Cho

Systems Programmer Analyst, GigaScience
avatar for Scott Edmunds

Scott Edmunds

Editor in Chief, GigaScience Press/BGI Hong Kong
Scott Edmunds is the Editor in Chief of GigaScience Press. With over 15 years experience in Open Access and Open Data publishing he is co-founder of CivicSight (formerly Open Data Hong Kong) and CitizenScience.Asia, and is on the Board of Directors of the Dryad Digital Repository... Read More →




Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P1-03: : A somatic variant-calling pipeline for the support of molecular tumor boards at German university hospitals 🍐
This poster will be presented live at BCC West.

Speakers
avatar for Wolfgang Maier

Wolfgang Maier

University of Freiburg
Interests:- Galaxy tool development- Variant calling tools and pipelines- User trainings



Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P1-07: : Automating the annotation of biological data through semantic technologies and machine learning 🍐
➞ Abstract

This poster will be presented live at BCC West.

Speakers
avatar for Lorcán Pigott-Dix

Lorcán Pigott-Dix

PhD Student, Earlham Institute
I am exploring how to improve the automatic annotation of biological data through machine learning and semantic technologies. My background is in computational ecology, and I am interested in biology, natural language processing, and cultural evolution.


Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P1-08: : BioThings Explorer: A platform for distributed knowledge integration across biomedical APIs 🍐
➞ Abstract

This poster will be presented live at BCC West, Poster Room P01-08. 

Speakers
avatar for Jiwen Xin

Jiwen Xin

Scripps Research
I'm a senior staff scientist in Scripps Research. I'm a Ph.D. in Biology and a self-taught computer engineer. I love combining my expertise in both Biology and Computer Science to build scalable and high performance open source applications to facilitate biomedical research.



Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P1-10: : BLAST in a container 🍐
➞ Abstract

This poster will be presented live at BCC West.

Tom Madden, Christiam Camacho, Yuri Merezhuk, Yan Raytselis
National Center for Biotechnology Information, National Library of  Medicine, National Institutes of Health.  Email: madden@ncbi.nlm.nih.gov
Project Website: https://github.com/ncbi/blast_plus_docs
Source Codehttps://github.com/ncbi/docker/tree/master/blast
License: Public Domain
 
The Basic Local Alignment Search Tool (BLAST) is a very popular application for searching and aligning DNA and protein sequences.  BLAST is  widely used in many different environments and pipelines.  In order to support these use cases better, we are now making a containerized version of BLAST, using Docker, available.  This approach offers some  advantages including a reproducible run-time environment and the ability to work with bioinformatics workflow languages such as CWL.  Additionally, we are staging BLAST databases on some cloud providers, facilitating the use of these resources on the cloud.  We discuss the advantages of a containerized version of BLAST and show examples using the containers we provide.  
Additionally, we discuss work in progress on a Kubernetes based system to start our containerized version of BLAST on multiple machines in order to handle large search sets.
This research was supported by the Intramural Research Program of the National Library of Medicine, National Institutes of Health.

Speakers
avatar for Tom Madden

Tom Madden

NIH
Team Lead for BLAST at the NCBI/NLM/NIH


Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P2-10: : Deploying Galaxy workflows in containers 🌀
Abstract

This poster will be presented live at BCC West.

Speakers
avatar for Bert Droesbeke

Bert Droesbeke

Data Scientist, VIB



Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P3-01: : Earth System Modelling and data analysis with Galaxy Climate Science Workbench 🌀
➞ Abstract

This poster will be presented live at BCC West.

Speakers
avatar for Anne Fouilloux

Anne Fouilloux

Research Software Engineer, University of Oslo
I am working on Galaxy Climate (development of tools, integration of climate data, training material).



Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P3-04: : Enabling computational workflows with Tripal and Galaxy 🌀
Abstract

This poster will be presented live at BCC West.

Speakers
avatar for Sean Buehler

Sean Buehler

University of Connecticut



Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P3-09: : Exploring chromatogram library-based data independent acquisition analysis using EncyclopeDIA within Galaxy framework 🌀
Abstract, Poster

This poster will be presented live at BCC West.

Speakers
avatar for Pratik Jagtap

Pratik Jagtap

Research Assistant Professor, University of Minnesota
Metaproteomics . DIA . Proteogenomics



Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P3-10: : Extensions of READemption for the analysis of several RNA-seq based protocols 🍐
➞ Abstract

This poster will be presented live at BCC West.

Speakers

Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P3-12: : Full-factorial examination of high-throughput microbiome sequencing workflows from sample preparation to bioinformatic analysis 🍐
➞ Abstract
The development of sequencing technologies to evaluate bacterial microbiota composition has allowed insight into the role of the microbiome in human health. However, the variety of methodologies used to prepare and analyze samples for microbiota composition can introduce artifacts, including errors and biases. These artifacts alter our perception of bacterial diversity and our final interpretation of microbiota differences among samples. Using a mock bacterial microbiota of known composition and abundance, we performed a translational bioinformatic pipeline evaluation of various PCR conditions, amplicon library preparation methods, and bioinformatic analyses to gain insight into methodological sources of artifacts.
Genomic DNA was extracted from pure cultures of individual mock bacterial isolates (n = 43), quantified, and then pooled. To compare the effects of PCR on the development of artifacts, we performed all possible permutations of three polymerases, three alternative primer pairs targeting varying regions of the 16S rRNA gene, two barcoding approaches, five elongation times, two annealing temperature offsets, and two amplicon cleanup methods. All individual PCR reactions were sequenced on an Illumina MiSeq platform. Bioinformatic analysis was performed with three different microbiome analysis pipelines, including DADA2, mothur, and QIIME2. Resulting sequence variants were classified as expected or unexpected, and missing members of the mock community were identified. Unexpected reads were further identified as an artifact representative of either chimeras using DECIPHER, mock community sequences containing mismatches or indels, primer dimers, 16S rRNA contamination, or non-16S rRNA off-target amplification.
We found that primer choice accounted for a significant amount of discord between the mock community and sequence output. Additionally, longer amplicon fragment lengths negatively impacted the quality of sequencing reads. Polymerase choice, annealing temperature, and elongation time negligible impacts on sequencing results. QIIME2 and DADA2 performed similarly using standard pipelines and produced the most accurate results. The use of mothur was associated with a high number of operational taxonomic units which were classified as contamination and increased the interpretation of community diversity.
By employing a defined mock community, this full factorial experiment allowed us to gather insight into methodological sources of pipeline artifacts and allow us to identify a methodology that results in an optimized workflow for improved examination of microbiota composition. This workflow enables full-circle analysis of samples with superior precision in comparison to current workflow standards.
This poster will be presented live at BCC West.

Speakers
avatar for Travis J. De Wolfe

Travis J. De Wolfe

Postdoctoral Scholar, University of Pittsburgh
I am a Postdoctoral Scholar in the Department of Biomedical Informatics at the University of Pittsburgh School of Medicine. The goal of my research is to use microbiological culture techniques and sequencing technologies to test theories regarding the role of bacterial communities... Read More →


Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

13:00 EDT

P4-07: : GVL Demo: from Administrators to End-users 🌀
Abstract

This poster will be presented live at BCC East and BCC West.

Speakers
avatar for Nuwan Goonasekera

Nuwan Goonasekera

University of Melbourne



Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P5-04: : Magic-BLAST an accurate RNA-seq mapper 🍐
➞ Abstract

This poster will be presented live at BCC West.

Grzegorz Boratyn, Jean Thierry-Mieg, Danielle Thierry-Mieg, Tom Madden
National Center for Biotechnology Information, National Library of  Medicine, National Institutes of Health.  Email: boratyng@ncbi.nlm.nih.gov

Project Website: https://ncbi.github.io/magicblast
Source Code: https://ftp.ncbi.nlm.nih.gov/blast/executables/magicblast/1.5.0/ncbi-magicblast-1.5.0-src.tar.gz
License: Public domain

Next-generation sequencing (NGS) technologies facilitate rapid analysis of gene expression across individuals,tissues, or conditions. Mapping reads against a reference genome is the first step in many genomics analysispipelines. It is therefore essential to map the reads reliably. Many algorithms were developed to tackle thisproblem however few of them can map well long reads.

We present Magic-BLAST, a tool for mapping NGS runs against one or multiple genomes or transcriptomes.It incorporates ideas from the MAGIC-AceView pipeline implemented within the BLAST code base. Magic-BLAST processes NGS reads in batches. It builds an index of a batch of reads and scans a BLAST database(a genome or transcriptome) for potential word matches. Each match becomes a seed for local alignmentcomputation. To avoid aligning to repeats Magic-BLAST first counts word occurrences in the genome andremoves frequent words from the read index. Finally, collinear local alignments are combined into spliced alignments.

Magic-BLAST is very robust across wide range of conditions. It works well with reads generated by Illumina,Roche 454, and PacBio platforms. It also provides very good performance when mapping against genomeswith biased compositions or from related species. Magic-BLAST is very accurate in intron discovery andoutperforms similar programs.

Magic-BLAST is convenient to use. It does not need any special tuning for different technologies andgenomes. It works well in different conditions using default parameters. It directly accesses reads stored inthe NCBI Sequence Read Archive (SRA), without the need to download the data beforehand. It works with FASTA and FASTQ files. It can align reads to sequences in BLAST databases or FASTA files and integrateswell with NCBI facilities and services.

Magic-BLAST is available as Linux, Mac, and Windows executable, docker image, and can be installed fromBioconda. Recently added features include better handling of nanopore reads and reporting results withskipping over regions with too many sequencing errors for reliable alignment.

This research was supported by the Intramural Research Program of the National Library of Medicine at the NIH.

Speakers
GB

Grzegorz Boratyn

BLAST developer at NCBI/NLM/NIH


Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P5-07: : Open-Source, Large-Scale Set Similarity Search with Sketch 🍐
➞ Abstract

This poster will be presented live at BCC West.

Speakers
DB

Daniel Baker

PhD Candidate, Johns Hopkins University


Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P5-10: : Planet Microbe: Toward the reintegration of oceanographic ‘omics dataset in their environmental and physiochemical context 🍐
➞ Abstract

This poster will be presented live at BCC West.

Speakers
avatar for Alise Jany Ponsero

Alise Jany Ponsero

postdoc, University of Arizona
I'm a postdoc at the University of Arizona, working on computational tools and cyberinfrastructures for metagenomics


Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P5-12: : Progerin expression induces a significant downregulation of transcription from human repetitive sequences in iPSC-derived dopaminergic neurons 🌀
Abstract

This poster will be presented live at BCC West.

Speakers
avatar for Walter Arancio

Walter Arancio

Precarious researcher in Italy..., None
My main research line concerns the molecular aspects that underlie the processes of human development and aging, and their effects on oncogenic transformation. In detail, my studies regard the mutual influences between [1] repeated sequences (LINE-1, ALU, et cetera), [2] ncRNAs... Read More →



Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P6-01: : pyGenomeTracks: Reproducible plots for multivariate genomic data sets 🍐
➞ Abstract

This poster will be presented live at BCC East and BCC West.

Speakers
avatar for Lucille Delisle

Lucille Delisle

Post-doc, EPFL SV ISREC UPDUB
Hi,I am a Post-doc in Denis Duboule lab working on gene regulation during development.For the scientific part, I analyzed various NGS methods including Hi-C, ATAC-seq, CUT&RUN. I recently developped a new method for single-cell RNA-seq, named baredSC.For the galaxy part, I develop... Read More →


Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P6-02: : Reproducible, collaborative and exploratory data analysis using CyVerse VICE 🍐
➞ Abstract

This poster will be presented live at BCC West.

Speakers
RT

Reetu Tuteja

Science Analyst, CyVerse, University of Arizona


Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P6-03: : Running and sharing the ENCODE atac-seq pipeline on Truwl 🍐
➞ Abstract

Please see a video of the demo here: https://youtu.be/J_hlAuopobY

This poster will be presented live at BCC West.

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) has produced a set of high-quality analysis pipelines that are used by the ENCODE Consortium and have been released to the community. The pipelines are described with the Workflow Description Language (WDL) and use containerization to enhance reproducibility. To increase the usability and dissemination of these pipelines further we have developed a web interface on Truwl (https://truwl.com/) for specifying parameters and inputs for the ENCODE atac-seq pipeline. The pipeline can be executed directly from the web interface on Google Cloud Platform (GCP). Once compute jobs are successfully executed, the analysis is posted back to Truwl to allow others to view the parameters, inputs, and outputs of previously executed pipelines. Automatically posting previously executed jobs provides increased transparency of computational experiments and provides examples for others to follow. All content on Truwl is open-access, web-searchable, and has unique identifiers making it easy to find and easy to share. In this software demonstration we will show the use of the atac-seq pipeline from Truwl by both specifying the parameters and inputs from the web interface individually and reusing a previously posted analysis.


Speakers
avatar for Karl Sebby

Karl Sebby

President, Truwl


Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P6-04: : SigBio-Shiny: A standalone interactive application for detecting biological significance on a set of genes 🍐
➞ Abstract

This poster will be presented live at BCC East and BCC West.

Speakers
avatar for Sangram Keshari Sahu

Sangram Keshari Sahu

Genomics Data Scientist


Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P6-08: : Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED 🍐
➞ Abstract

This poster will be presented live at BCC West.

Speakers
SK

Sam Kovaka

Johns Hopkins University


Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P6-12: : Towards more FAIR research software 🍐
➞ Abstract

This poster will be presented live at BCC East and BCC West.

Speakers
avatar for Mateusz  Kuzak

Mateusz Kuzak

Community Officer, The Netherlands eScience Center


Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

Poster / Demo West Session 1
The first poster and demo session of BCC2020.

Access the Poster / Demo hall through the "Go to Posters" button at the top left in the main BCC2020 Remo conference space.

Sunday July 19, 2020 13:00 - 13:45 EDT
Joint

13:45 EDT

Break!
Take a break!  Check your email, grab some food, acknowledge your family and pets, ...

Just make sure you are back in 15 minutes.

Sunday July 19, 2020 13:45 - 14:00 EDT
Joint

14:00 EDT

BOSC West Session 2: Reproducibility and standards 🍐
The second accepted talk session of BCC2020 is split into multiple tracks.  This track will include talks to submitted to the BOSC track.  

Moderators
MM

Moni Muñoz-Torres

Oregon State University

Sunday July 19, 2020 14:00 - 15:00 EDT
BOSC
  Meeting-West

14:00 EDT

Galaxy West Session 2: Extending the Galaxy ecosystem 🌀
Presentations about extending the Galaxy ecosystem.

All speakers in this session will be available for live Q&A at the end of this session.

Moderators
Sunday July 19, 2020 14:00 - 15:00 EDT
Galaxy
  Meeting-West

14:01 EDT

Automated generation of training materials from markdown documents 🌀
➞ Abstract 

Delphine Larivière
1,4, Frederick Tan 2, John Muschelli 2, James Taylor 3,4, Jeff Leek 2 and the
Galaxy Project 4

  1. Nekrutenko Lab, BMB department, Eberly College of Science, The Pennsylvania State University
  2. Leek group, Data Science Lab, Department of Biostatistics, Johns Hopkins Bloomberg School of Health
  3. Taylor Lab, Biology Department, Johns Hopkins University
  4. Galaxy Project https://galaxyproject.org/

The presenter(s) will be available for live Q&A at the end of this session (BCC West).

Speakers
avatar for Delphine Lariviere

Delphine Lariviere

Penn State University
Post-doc in the Galaxy Team (Nekrutenko Lab). Works on bacterial genomics, assembly, RNA Seq, TnSeq. Also interested in evolution, metagenomics, epigenetics and visualisation.



Sunday July 19, 2020 14:01 - 14:15 EDT
Galaxy

14:01 EDT

Bionitio: building better bioinformatics tools with batteries included 🍐
→ Abstract


The presenter(s) will be available for live Q&A in this session (BCC East).

Authors: Peter Georgeson, Anna Syme, Jessica Chung, Michael Milton, Harriet Dashnow, Andrew Lonsdale, Clare Sloggett, Bernard Pope
License: MIT
URL: https://github.com/bionitio-team/bionitio
Publication: Georgeson, Syme et al. Bionitio: demonstrating and facilitating best practices for bioinformatics command-line software. Gigascience 8, (2019).

The results-driven focus of bioinformatics means that shortcuts are often taken during software development for the sake of making something "that works". Furthermore, many bioinformaticians are not trained in software engineering, and research-oriented projects have limited budgets for quality assurance.

In response to this problem we have developed Bionitio, a tool that automates the process of starting new bioinformatics software projects following recommended best practices. With a single command, the user can create a new well-structured project in one of twelve programming languages. The resulting software is functional — carrying out a prototypical bioinformatics task — and thus serves as both a working example and a template for building new tools. Key features include command-line argument parsing, error handling, logging, defined exit status values, a test suite, a version number, standardised building and packaging, documentation, a standard open-source software license, revision control, and containerisation.

For example, the following command creates a new Python 3 project called skynet using the BSD 3 Clause license and creates a remote repository on GitHub for username cyberdyne:

bionitio-boot.sh -i python -n skynet -c BSD-3-Clause -g cyberdyne

Bionitio serves as a learning aid for beginner-to-intermediate bioinformatics programmers and provides an excellent starting point for new projects. This helps developers adopt good programming practices from the beginning of a project and encourages high-quality tools to be developed more rapidly. Bionitio has been used in several workshops, providing a common codebase for coordination of workshop materials and an extensible platform for the delivery of hands-on practical activities. Additionally, by providing complete working examples in many different languages, Bionitio acts as a kind of "Rosetta Stone" and is therefore an excellent vehicle for comparative programming skills transfer.

In this talk we will describe the design and implementation of Bionitio and demonstrate how it can be used to quickly start new open source bioinformatics projects.

Speakers
avatar for Bernie Pope

Bernie Pope

Victorian Health and Medical Research Fellow, Melbourne Bioinformatics, University of Melbourne
I am an Associate Professor at The University of Melbourne. My research focuses on applying computational techniques to biological questions, especially related to Human Genomics and Cancer.



Sunday July 19, 2020 14:01 - 14:15 EDT
BOSC
  Meeting-West

14:15 EDT

Enhancing rigor and reproducibility in biomedical research 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Jaqueline J. Brito 1,*, Jun Li 2, Jason H. Moore 3, Casey S. Greene 4,5, Nicole A. Nogoy 6, Lana X.
Garmire 2, Serghei Mangul 1,7

1 Dept. of Clinical Pharmacy, School of Pharmacy, University of Southern California, USA
2 Dept. of Computational Medicine & Bioinformatics, University of Michigan, USA
3 Dept. of Biostatistics, Epidemiology, and Informatics, Institute for Biomedical Informatics,
University of Pennsylvania, USA
4 Dept. of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, USA
5 Childhood Cancer Data Lab, Alex's Lemonade Stand, USA
6 GigaScience, Hong Kong
7 Quantitative and Computational Biology, University of Southern California, USA
*Email: britoj@usc.edu

Project Website: https://github.com/Mangul-Lab-USC/enhancing_reproducibility
License: CC BY 4.0 License

Computational methods reshaped the landscape of modern biology, generating new channels of
communications to publish and share the most recent techniques and methodologies. While the
dependence on computational tools of the biomedical community increases steadily, the
mechanisms ensuring open data, open software, and reproducibility are heterogeneously
enforced. Institutions, funders, and publishers offer different guidelines, or no guideline at all.
For instance, publications may cite software artifacts, key to reproduce research results, that
may become unavailable or depend on packages no-longer supported. Publications lacking fully
reproducible research significantly limit the role of reviewers in evaluating technical strength
and scientific contribution. Moreover, incomplete ancillary information for an academic
software package will likely bias and restrict any subsequent research produced with the tool.
In this presentation, we provide eight recommendations across four different domains to
improve three main principles: reproducibility, transparency, and rigor in computational
biology. These are the main principles which should be emphasized in life sciences curricula,
especially as assays and pipelines grow more complex than ever. We propose that a
combination of lowering the learning curve needed to maintain the three principles and more
strict guidelines are key to ensure adoption by the community. Ultimately, our
recommendations target fostering a sustainable data science ecosystem in biomedicine and life
science research.
Keywords: Reproducibility; Open science; Reproducible research; FAIR principles.

Speakers
JJ

Jaqueline J. Brito

Dept. of Clinical Pharmacy, School of Pharmacy, University of Southern California



Sunday July 19, 2020 14:15 - 14:20 EDT
BOSC

14:15 EDT

Integrating refgenie and Galaxy for reference data management: a proposal for IDC 🌀
➞ Abstract

Ignacio Eguinoa
1,2 , Frederik Coppens 1,2

  1. Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent, Belgium
  2. VIB Center for Plant Systems Biology, 9052 Ghent, Belgium

The presenter(s) will be available for live Q&A at the end of this session (BCC West).

Speakers
IE

Ignacio Eguinoa

ELIXIR Belgium - VIB Center for Plant Systems Biology



Sunday July 19, 2020 14:15 - 14:20 EDT
Galaxy

14:20 EDT

Galaxy and its Tool Shed on Python 3: conclusion of a long journey 🌀
➞ Abstract

Nicola Soranzo 1, Marius van den Beek 2

  1. Earlham Institute, Norwich Research Park, Norwich, UK. Email: nicola.soranzo@earlham.ac.uk
  2. Penn State University, University Park PA, USA.

The presenter(s) will be available for live Q&A at the end of this session in both BCC West and BCC East.

Speakers
avatar for Nicola Soranzo

Nicola Soranzo

Earlham Institute



Sunday July 19, 2020 14:20 - 14:25 EDT
Galaxy

14:20 EDT

Secondary analysis of publicly available omics data across almost 3 million publications 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Nicholas Darci-Maher 1, Kerui Peng 3, Dat Duong 1, Richard J. Abdill 2, Eleazar Eskin 1, Serghei Mangul 3

1 University of California, Los Angeles, California, USA. Email: niko.darcimaher@gmail.com
2 University of Minnesota, Minnesota, USA
3 University of Southern California, California, USA

Methods code: https://github.com/smangul1/data_reusability
License: MIT License

Abstract
As today's high throughput sequencing techniques become increasingly affordable and accurate,
the number of publicly available omics datasets is rapidly accumulating. Bioinformatics methods provide
unprecedented opportunities for analysis of omics datasets in quantitative biological research.
Traditionally, such research has included primary analysis of novel omics data developed as part of the
study. However, this data has the potential to be reused, and is often valuable beyond the scope of the
study that introduced it. Data-driven research by secondary analysis on existing datasets is becoming
more important. Increased availability of public omics data represents an opportunity to find novel
insights and discoveries across different datasets.
This study presents a quantitative analysis of the reusability of omics datasets in two online
repositories, the Sequence Read Archive (SRA) and the Gene Expression Omnibus (GEO). We
downloaded over 2.5 million publications from the PubMed Central Open Access corpus, and identified
those that referenced SRA or GEO datasets. We used these papers to examine reusability based on various
factors, including journal, repository, sequencing technology, and species. We find that most datasets are
never reused--these datasets are mentioned once in the study that introduced them, but then never
referenced again. In recent years, however, data reuse is rising. We aim to shed light on the landscape of
data sharing in the quantitative biology research community, and illuminate the benefits of secondary
analysis of omics data.

Speakers
ND

Nicholas Darci-Maher

University of California, Los Angeles



Sunday July 19, 2020 14:20 - 14:25 EDT
BOSC

14:25 EDT

Q & A 🌀
Question and Answer session for the just finished talks.

Moderators
Sunday July 19, 2020 14:25 - 14:30 EDT
Galaxy

14:25 EDT

Q&A 🍐
The presenter(s) will be available for live Q&A in this session.

Sunday July 19, 2020 14:25 - 14:30 EDT
BOSC

14:30 EDT

CrowdGO: Gene Ontology prediction using a meta approach 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Maarten JMF Reijnders 1,2 and Robert M. Waterhouse 1,2

1 University of Lausanne, Lausanne, Switzerland.
2 Swiss Institute of Bioinformatics, Lausanne, Switzerland.
Email: maarten.reijnders@unil.ch

Source code: https://gitlab.com/mreijnders/CrowdGO
License: GNU General Public License v3.0

Methods to predict protein functions- defined here as assigning Gene Ontology (GO) terms -
vary considerably in their underlying approach, with different methods employing techniques
such as sequence homology, machine learning, or text mining. This often results in dramatically
different sets of GO terms predicted for the same sets of proteins. These methods are reviewed
in the Critical Assessment of Functional Annotation competitions (CAFA) (Zhou 2019), but even
the best scoring methods can be inaccurate, and none truly stand out. To concurrently exploit
the strengths of each method, we developed a meta-predictor that evaluates the predictions of
multiple top-performing methods.
CrowdGO compares the predictions of different methods and uses a machine learning model to
improve the precision, recall, and f-max scores of the resulting meta-predictions. The model can
be trained based on user-selected prediction methods, or a pre-trained model can be used. The
pre-trained models are built using prediction tools that are exclusively open-source, easy to use,
and computationally non-demanding. CrowdGO includes Snakemake workflows to use existing
models for GO term prediction, or to train new models.
Using a model built with four input predictions from a sequence homology- based predictor, Wei2GO (Reijnders 2020), two protein domain based predictors, InterProScan (Mitchell 2019) and FunFams (Scheibenreif 2019), and a deep learning predictor, DeepGOPlus (Kulmanov 2019), CrowdGO increases both the precision and meaningful recall compared to each input method (Figure 1).
CrowdGO is fully open source and leverages other open source tools.It is straightforward to use, both due to the simplistic nature of the software and the accompanying snakemake pipelines. Due to the nature of its meta-prediction algorithm, it will stay relevant even when improved function prediction software becomes
available.


Speakers
MR

Maarten Reijnders

Department of Ecology and Evolution, University of Lausanne



Sunday July 19, 2020 14:30 - 14:35 EDT
BOSC

14:30 EDT

Implementation of the IEEE-2791-2020 standard (BioCompute Objects) in Galaxy via workflow invocations 🌀
➞ Abstract

Charles Hadley King 1, Nicola Soranzo 2

  1. George Washington University, Washington D.C. USA
  2. Earlham Institute, Norwich Research Park, Norwich, UK

The presenter(s) will be available for live Q&A at the end of this session (BCC West)

Speakers
avatar for Charles Hadley King

Charles Hadley King

Senior Research Associate, George Washington University



Sunday July 19, 2020 14:30 - 14:35 EDT
Galaxy

14:35 EDT

Goslin - A grammar of succinct lipid nomenclature 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Nils Hoffmann 1, Dominik Kopczynski 1, Bing Peng 2, Robert Ahrends 3

1 Leibniz-Institut für Analytische Wissenschaften ­ ISAS ­ e.V., Otto-Hahn-Straße 6b, 44227
Dortmund, Germany. Email: nils.hoffmann@isas.de
2 Karolinska Institutet, Solna, Stockholm, Sweden.
3 Department of Analytical Chemistry, University of Vienna, Vienna, Austria.

Project Website: https://lifs.isas.de/goslin & https://apps.lifs.isas.de/goslin
Source Code: https://github.com/lifs-tools/goslin (main hub to implementations)
License: Apache v2 LICENSE & MIT License


Main Text of Abstract

We introduce the 'Grammar of Succinct Lipid Nomenclature' (Goslin), a polyglot grammar for
common lipid shorthand nomenclatures based on the LipidMaps nomenclature and the shorthand
nomenclature established by Liebisch et al. and used by LipidHome and SwissLipids, accompanied
by parser implementations in C++, Java, Python and R.

Lipid naming has evolved into several dialects which complicates the unified computational
treatment and parsing of lipid names. As a consequence, long and error-prone manual curation
often is necessary in order to streamline lists of lipid names for their processing in follow-up
analysis scripts, workflows, or tools, or for their submission to research data repositories. Goslin
was designed to address the following pressing issues in the lipidomics field especially: 1) to
simplify the implementation of lipid name handling for developers of mass spectrometry-based
lipidomics tools; 2) to offer a tool that unifies and normalizes the main existing lipid name dialects
enabling a lipidomics analysis in a high-throughput fashion.

Goslin and its parser implementations are thus designed to act as a library for the development of
lipidomics tools providing a standardized data structure for storing structural lipid information.
The parsing of lipid names as well as the lipid name generation are the main functions of Goslin. We
therefor defined a context free grammar (with ANTLR4) that defines rules and productions for all
structural properties of the lipid nomenclature, including mass spectrometry specific information
about unlabeled and heavy isotope labeled species, as well as fragments and adducts. We recently
added the calculation of masses and sum formulas, when the head group's sum composition is
known. Currently, the grammar covers 289 lipid classes within the seven most occurring lipid
categories in eukaryotic organisms, namely fatty acyls, glycerolipids, glycerophospholipids,
saccharolipids, sphingolipids, sterol lipids, and polyketides. The major advantages of using a
grammar rather than a manually coded parser are its flexibility and extensibility. Regular
expressions are also not suitable for parsing lipid names, since they are incapable of recognizing
nested patterns and can only recognize words from regular languages.

We provide implementations of Goslin in four major programming languages, namely C++, Java,
Python 3, and R to kick-start adoption and integration. Further, we set up a web service for users to
work with Goslin directly and via an OpenAPI-compliant REST API. All implementations are
available free of charge under a permissive open source license, binary releases are available from
Zenodo. We are currently working on making the libraries available via BioConda/BioContainers
and other community-facing repositories.

Speakers
NH

Nils Hoffmann

Leibniz-Institut für Analytische Wissenschaften – ISAS – e.V.



Sunday July 19, 2020 14:35 - 14:40 EDT
BOSC

14:35 EDT

Porting the rCASC workflow for scRNA-Seq data analysis to Galaxy and the Laniakea Galaxy on-demand system 🌀
➞ Abstract

Pietro Mandreoli 1, Luca Alessandrì 2, Marco Antonio Tangaro 3, Raffaele Calogero 2, Federico Zambelli 4

  1. Dept. of Biosciences, University of Milano - Italy.
  2. Dept. of Molecular Biotechnology and Health Sciences, University of Torino - Italy.
  3. Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, CNR - Italy.
  4. Dept. of Biosciences, University of Milano - Italy.

The presenter(s) will be available for live Q&A at the end of this session (BCC West)

Speakers
avatar for pietro mandreoli

pietro mandreoli

Dept. of Biosciences, University of Milano



Sunday July 19, 2020 14:35 - 14:40 EDT
Galaxy

14:40 EDT

Executable Research Article (ERA): Enrich a research paper with code and data 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

the eLife team and the Stencila team

(Presenter: Emmy Tsang, Innovation Community Manager, eLife; email: e.tsang@elifesciences.org)

Project Website: https://elifesci.org/reprodoc (this will be updated early June)
Source Code: https://github.com/stencila; https://github.com/elifesciences;
License: Apache License 2.0 (for Stencila); MIT (for eLife)

Main Text of Abstract

Code and data are important research output and integral to a full understanding of research
findings and experimental approaches in a paper. However, traditional research articles seldom
have these embedded in the manuscript's narrative, but instead, leave them as "supplementary
materials", if they are openly available.

With Executable Research Articles (ERAs), our vision is to enrich the traditional narrative of a
research article with code, data and interactive figures that can be executed in the browser,
downloaded and explored. It will give readers a direct insight into the methods, algorithms and key
data behind the published research.

We published our first demo ERA in February 2019. Over the past year, we have been working
closely with our collaborator Stencila to build an open tool stack that would enable our authors and
production team to easily publish ERAs at scale. In this talk, we hope to showcase the potential of
ERAs with examples and walk through how authors can enrich their traditional eLife paper using
Stencila Hub, through:

- Starting a Stencila Hub project linked to their eLife paper
- Converting the article to a reproducible notebook format of their preference, while preserving the relevant
 journal article metadata
- Uploading the data required to enable live re-execution of tables and figures in the article
- Replacing static tables and figures with code chunks that reproduce them

We will share our current vision of how ERAs will be integrated into our production workflow and
collect feedback. We also hope to engage participants in exploring potential functionalities for the
tool stack and building a community-driven roadmap.

Speakers
avatar for Emmy Tsang

Emmy Tsang

Innovation Community Manager, Delft University of Technology



Sunday July 19, 2020 14:40 - 14:55 EDT
BOSC

14:40 EDT

Galaxy, Selenium, and End-to-end Testing 🌀
➞ Abstract

Oleg Zharkov 1, Dave Bouvier 2, Juan David Mendez 1, Björn Grüning 1, John Chilton 2

  1. Department of Computer Science, Albert-Ludwigs-Universität Freiburg
  2. Department of Biochemistry and Molecular Biology, Penn State University, University Park PA, USA.

The presenter(s) will be available for live Q&A at the end of this session (BCC West).

Speakers
avatar for Oleg Zharkov

Oleg Zharkov

Albert-Ludwigs-Universität Freiburg



Sunday July 19, 2020 14:40 - 14:55 EDT
Galaxy

14:55 EDT

Q & A 🌀
Question and Answer session for the just finished talks.

Moderators
Sunday July 19, 2020 14:55 - 15:00 EDT
Galaxy

14:55 EDT

Q&A 🍐
The presenter(s) will be available for live Q&A in this session.

Sunday July 19, 2020 14:55 - 15:00 EDT
BOSC

15:00 EDT

Break!
The official day is done, but Birds of a Feather sessions are about to begin.  Before that happens, take a break!  Check your email, grab some food, acknowledge your family and pets, ...


Sunday July 19, 2020 15:00 - 15:15 EDT
Joint

15:00 EDT

Interregnum
West Conference Day 1 is done, and East Conference Day 1 is coming.


Sunday July 19, 2020 15:00 - 22:00 EDT
Joint

15:15 EDT

Birds of a Feather (BOFs)
Birds of a Feather (BoFs) are informal, self-organized meetups focused on specific topics. They are a great way to meet other like-minded community members and have an in-depth discussion on a topic of interest.

This BoF session currently has these BoF sessions scheduled:

  1. Board games Social, led by Delphine Lariviere


Anyone is welcome to propose a BoF! All you need is a title, an organizer, and a brief description. At BCC2020, BoFs will be scheduled the hour before or after the main meeting days in both hemispheres. You can choose to hold your BoF in one or both hemispheres.

Please propose BoFs no later than July 10. After that date, new BoF signups will be closed but you are welcome to organize informal "meetups" during BCC2020.

Sunday July 19, 2020 15:15 - 16:00 EDT
Joint

15:15 EDT

BoF: Board games Social
What better way to break the ice than board and party games. Whether your style is collaborating to fight a zombie invasion or competing for the best joke, there is a game for you. We will be using free online board game platforms, and one of our organizers has Jackbox games for party games.


Birds of a Feather (BoFs) are informal, self-organized meetups focused on specific topics. Anyone is welcome to propose a BoF. Have an idea? Please propose a BoF no later than July 10.

Moderators
avatar for Delphine Lariviere

Delphine Lariviere

Penn State University
Post-doc in the Galaxy Team (Nekrutenko Lab). Works on bacterial genomics, assembly, RNA Seq, TnSeq. Also interested in evolution, metagenomics, epigenetics and visualisation.

Sunday July 19, 2020 15:15 - 17:00 EDT
Joint

15:15 EDT

BoF: IWC - Intergalactic Workflow Commission
We've met a few times over the years and discussed some good plans for defining high quality workflows (inspired by the iuc,  nf-core). I still think defining high-quality galaxy workflows, with documenation, links to training, and test suites) is an important topic that's worth some energy. If you agree - let's see if we can get things rolling in earnest.

We'll use this room for the meeting:
https://live.remo.co/e/bcc2020-bof-iwc

Birds of a Feather (BoFs) are informal, self-organized meetups focused on specific topics. Anyone is welcome to propose a BoF. Have an idea? Please propose a BoF no later than July 10.

Speakers
avatar for Brad Langhorst

Brad Langhorst

Developmenet Group Leader, NEB


Sunday July 19, 2020 15:15 - 17:00 EDT
Joint

15:15 EDT

BoF: Parenting during Coronavirus
A place for parents to get together and talk about the trials and tribulations of parenting during the uncertainty of COVID-19.


Birds of a Feather (BoFs) are informal, self-organized meetups focused on specific topics. Anyone is welcome to propose a BoF. Have an idea? Please propose a BoF no later than July 10.

Speakers
avatar for Assunta DeSanto

Assunta DeSanto

Computational Scientist 3, Penn State University
I'm fairly new to Galaxy and the world of Bioinformatics. Teach me something new!Additionally, I'm a mom, lifelong learner, both a cat and dog person, who loves to read and be outside. Let's chat about common interests :)


Sunday July 19, 2020 15:15 - 17:00 EDT
Joint

15:15 EDT

BoF: Where is the bar?
I checked all the Floors in Remo and I don't see a single bar. From what I understand, BCC is about community building. Come to this BoF to introduce yourself and talk about life! :)


Birds of a Feather (BoFs) are informal, self-organized meetups focused on specific topics. Anyone is welcome to propose a BoF. Have an idea? Please propose a BoF no later than July 10.

Speakers
avatar for Ula Afgan

Ula Afgan

GalaxyWorks
It is great to finally connect all the bits and pieces of information and take a step on to the path to learn more about Bioinformatics. What better place and time to learn than BCC 2020? This is where all the Bioinformatics MasterMinds are!I'm excited and looking forward to enriching... Read More →



Sunday July 19, 2020 15:15 - 17:00 EDT
Joint

17:00 EDT

Galaxy Social One!
Hey!  Come back!*

BCC is all about community, and while we can't have our usual after hours gatherings, we can still meet new collaborators and spend time with old friends.  This session is an opportunity to do just that. 

We'll start with a few slides (we will highlight our Fellowship recipients) and then do 2-3 short rounds of icebreaker to get the new folks into the groove, and then leave the rest of the session for unstructured socializing.

And while this is a Galaxy-sponsored event, we strongly encourage the BOSC folk to also join in.** (You'll just have to put up with a lot of enthusiasm about the one true data analysis platform.*** )

We really, really hope to see you there (But please bring your own drinks and snacks. Sorry.)

Thanks,
Everyone who is, or ever will be, in the Galaxy Community

* Or, "Hey show up early!"
** Oh don't worry, we will crash your party too.
*** And we will even still talk to you if you have a different idea on this subject. 

Moderators
avatar for Dave Clements

Dave Clements

Training and Outreach Coordinator, Galaxy Project, Johns Hopkins University

Sunday July 19, 2020 17:00 - 19:00 EDT
Joint

22:00 EDT

BCC2020 Conference Day 1: East
Keynotes, accepted talks, posters, demos, and networking in the East.

Sunday July 19, 2020 22:00 - Monday July 20, 2020 03:00 EDT
Joint

22:01 EDT

Welcome
Welcome to the 2020 Bioinformatics Community Conference (BCC2020)!

We'll introduce the conference, talk about the logistics of this online event, and present last minute news. This session will also include a tribute to James Taylor, one of the founders and PIs of the Galaxy Project who had a huge impact on open source and open science.

We will also hold a short icebreaker or two.

Moderators
avatar for Nomi Harris

Nomi Harris

BOSC Chair, LBNL
This is my 10th year chairing or co-chairing BOSC, the Bioinformatics Open Source Conference.In 2020, BOSC is part of the online Bioinformatics Community Conference, BCC2020.
avatar for Gareth Price

Gareth Price

Head of Computational Biology, QCIF Facility for Advanced Bioinformatics


Sunday July 19, 2020 22:01 - 22:30 EDT
Joint

22:30 EDT

East Keynote 1: Open minds bring open collaborations
➔ SlidesAbstract

Prashanth N Suravajhala

  1. Birla Institute of Scientific Research, Statue circle, Jaipur, India
  2. Bioclues.org, India

This keynote will be presented live.

Post COVID-19 times has ushered a fierce competition to deliver, be it vaccine or funding or publication. As researchers, we have a fair conception to be guided by reasons not emotions amid ‘publish or perish’ adage. On the other hand, multitasking research and publishing has become a noticeable goal, but combining these tasks over time has become the need of the hour. In today’s reserved funding situation, many early/mid-career researchers face a daunting task to establish and develop their research programs, for example starting own labs crowdsourcing or obtaining funds from their previous associations/host institutions and publish it. But to what extent are we trying to preserve the fairness or integrity of science? I would like to draw your attention to ‘Hippocratic Oath for Scientists’, which would ensure keeping the research vitality in the best interests of science to sustain excellence. Towards this, the talk would delve on how the three Cs, viz. Consistency, Continuity and Credibility augur well for a successful open organization. This would invariably bring successful Collaborations, Convergence, and importantly Control over mind to the fore. The growth of an individual or organization depends on fostering commitment to open culture, net neutrality and universal access to information in education and science fields. So, it is the Collaborative index (C-index) that matters. Are we ready?


This keynote will be introduced by Gareth Price.

Speakers
avatar for Prashanth Suravajhala

Prashanth Suravajhala

Senior Scientist and Founder, Bioclues.org, Birla Institute of Scientific Research; Bioclues
Prashanth N Suravajhala is a senior scientist at Birla Institute of Scientific Research, Jaipur. A PhD in Systems Biology, he went on to gain more than 7 years of postdoctoral experience across four different laboratories. He has interests exploring the known unknown regions in the human genome, primarily... Read More →



Sunday July 19, 2020 22:30 - 23:15 EDT
Joint

23:15 EDT

Break!
Take a break!  Check your email, grab some food, acknowledge your family and pets, ...

Just make sure you are back in 15 minutes.

Sunday July 19, 2020 23:15 - 23:30 EDT
Joint

23:30 EDT

BOSC East Session 1a: Sequencing & analysis 🍐
The first talk session of BCC2020 is split into multiple tracks.  This track will include talks to submitted to the BOSC track.

Moderators
MM

Moni Muñoz-Torres

Oregon State University

Sunday July 19, 2020 23:30 - Monday July 20, 2020 00:20 EDT
BOSC

23:30 EDT

Galaxy East Session 1: Applications and use cases 🌀
The first talk session of BCC2020 is split into multiple tracks.  This track will include talks to submitted to the Galaxy track.

Moderators
MD

Maria Doyle

Application and Training Specialist, Peter MacCallum Cancer Centre

Sunday July 19, 2020 23:30 - Monday July 20, 2020 00:45 EDT
Galaxy

23:31 EDT

Cooperative bacteriophage genome annotation in the biologist-friendly Galaxy and Apollo platforms 🌀
Abstract

Jolene Ramsey 1,2, Cory Maughmer 1,2, Anthony Criscione 1,2, Mei Liu 1,2, Ry Young 1,2, Jason J. Gill 1,3

  1. Center for Phage Technology, Texas A&M University, College Station, Texas, USA
  2. Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas, USA
  3. Department of Animal Science, Texas A&M University, College Station, Texas, USA

Speakers
avatar for Jolene Ramsey

Jolene Ramsey

Postdoc, Texas A&M University
I love to study the viruses of bacteria, called bacteriophages, or phages. Ask me about viruses, or my favorite podcast, This Week in Virology.


Sunday July 19, 2020 23:31 - 23:45 EDT
Galaxy

23:31 EDT

Digital Expression Explorer 2: a repository of 8 trillion uniformly processed RNA-seq reads and still counting 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC East).

Mark Ziemann 1, Antony Kaspi 2

1 Deakin University, Geelong, Australia. Email: m.ziemann@deakin.edu.au
2 The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia.

Project Website: http://dee2.io/
Source Code: https://github.com/markziemann/dee2
License: (example: GNU General Public License v3.0)

RNA-seq is currently the most popular method for transcriptome-wide gene expression profiling,
but despite data-sharing requirements, rates of data reuse are still very low. This is due to the need
for high end computing infrastructure and pipelines that require command line expertise for raw
data processing. Resources such as Recount2, ARCHS4 and Digital Expression Explorer 2 (DEE2)
provide easy access to some uniformly processed data, with queryable web interfaces, bulk
downloads and R packages.
Keeping up with the rapid pace of data deposition to the Short Read Archive (SRA) is proving a
challenge. As of May 2020, there are 1.49M samples available in SRA for the nine organisms
included in DEE2, and of these 0.88M are available as processed data in DEE2 (Figure 1). This
makes DEE2 coverage about twice as extensive as the next largest dataset (ARCHS4). Since original
publication in 2019, DEE2 has grown from 5.3 to 8.05 T mapped reads.
In this presentation I will outline the challenges and strategies in maintaining and growing
resources of this scale. In addition we will discuss recent enhancements including direct integration
of the web interface to Degust (http://degust.erc.monash.edu/), a popular web based tool for
statistical analysis of RNA-seq data. The R package getDEE2 has been extensively updated and
submitted to BioConductor. It allows programmatic access to DEE2 datasets in the form of
SummarizedExperiment objects that are compatible with many downstream analysis tools in the
BioConductor ecosystem. Together these advances are helping DEE2 to achieve the goal of making
all RNA-seq data freely available to everyone.


Speakers
avatar for Mark Ziemann

Mark Ziemann

Deakin University
### Hi there 👋I am a Lecturer and researcher in computational biology at Deakin University, Australia. Our group is focused on building data resources and software tools to accelerate biomedical discovery. We collaborate closely with clinicians and biologists to get the most out... Read More →


Sunday July 19, 2020 23:31 - 23:45 EDT
BOSC

23:45 EDT

Community genome annotation integrates with Galaxy via Apollo providing greater integration and more functional annotation options 🌀
➞ Abstract 

Nathan Dunn 1, Helena Rasche 2, Anthony Bretaudeau 3, Ian Holmes 4

  1. Lawrence Berkeley National Lab, Berkeley, CA
  2. University of Freiburg, Freiburg, Germany
  3. French National Institute for Agriculture, Food, and Environment (INRAE), Rennes, France
  4. University of California Berkeley, Berkeley, CA

Speakers
avatar for Nathan Dunn

Nathan Dunn

Software Developer, Lawrence Berkeley National Lab


Sunday July 19, 2020 23:45 - 23:50 EDT
Galaxy

23:45 EDT

Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Sam Kovaka 1, Yunfan Fan 2, Bohan Ni 1, Winston Timp 2, Michael C. Schatz 1,3,4
Email: skovaka1@jhu.edu

1 Department of Computer Science, Johns Hopkins University, Baltimore, MD.
2 Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 3. Department of Biology, Johns Hopkins University, Baltimore, MD
4. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY

Project Source Code: https://github.com/skovaka/UNCALLED
License: MIT License

ReadUntil sequencing allows nanopore devices to selectively stop sequencing an individual read in real-time by ejecting it from the pore and immediately switch to another read. If reads could be rapidly mapped to large references while being sequenced, this would enable targeted sequencing of specific genomic regions or even specific genomes. However, most mapping methods require basecalling, which is computationally intensive and requires a significant amount of the read to be sequenced.

Here we present UNCALLED (Utility for Nanopore Current ALignment to Large Expanses of DNA), an open-source mapper rapidly matches raw streaming nanopore current signals to a large DNA reference without basecalling. This is accomplished by probabilistically considering all possible k-mers that the signal could represent, and then pruning the possibilities based on the reference genome sequence encoded using an FM-index. Importantly, UNCALLED dynamically adjusts the signal level model probability cutoffs during alignment to achieve both high accuracy and high speed when aligning the noisy signal data.

We used UNCALLED to deplete the sequencing of known bacterial genomes within a Zymo mock microbial community, enriching the remaining yeast sequence from ~20x coverage to ~100x. We also used UNCALLED to enrich for 148 human genes associated with hereditary cancers to 29.6x coverage (a 5.6 fold increase) using a single MinION flowcell, enabling accurate detection of SNPs, indels, structural variants (SVs), and methylation in these genes. Notably, twice as many SVs were detected compared to 50x coverage Illumina sequencing, verified by whole-genome nanopore and PacBio HiFi sequencing. Finally, we show that UNCALLED could be used to enrich larger gene panels such as all 717 genes in the COSMIC Census, or be used with cDNA/RNA sequencing, for example to deplete high- abundance transcripts.



Speakers
SK

Sam Kovaka

Johns Hopkins University


Sunday July 19, 2020 23:45 - 23:50 EDT
BOSC

23:50 EDT

THAPBI PICT -- a metabarcoding analysis pipeline developed as a Phytophthora ITS1 Classification Tool 🍐
AbstractSlidesVideo

The presenter(s) will be available for live Q&A in this session (BCC West).

Peter Cock 1, David Cooke 2, Leighton Pritchard 3

1 Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee, UK
2 Cell and Molecular Sciences, James Hutton Institute, Invergowrie, Dundee, UK
3 Strathclyde Institute of Pharmacy & Biomedical Sciences, Glasgow, UK

Repository: https://github.com/peterjc/thapbi-pict/
Documentation: https://thapbi-pict.readthedocs.io/
License: MIT

Molecular barcodes are central to environmental monitoring and identification of species present in a
sample, and use PCR primers to amplify a diagnostic genome region of the organisms of interest. We are
interested in metabarcoding where multiple samples are multiplexed for high-throughput sequencing on the
Illumina platform, using overlapping paired end reads. Each sample yields a collection of marker sequences,
and matching these to a database of known species produces a taxonomic breakdown reflecting community
composition,
THAPBI PICT is a metabarcoding tool we developed for the UK funded Tree Health and Plant Biose-
curity Initiative (THAPBI) Phyto-Threats project, which focused on identifying Phytophthora species in
commercial tree nurseries. Phytophthora (from Greek meaning plant-destroyer) are economically important
plant pathogens, important in both agriculture and forestry. This project targeted an ITS1 marker (Internal
Transcribed Spacer one, a region found in eukaryotic genomes between the 18S and 5.8S rRNA genes) with
nested primers to identify Phytophthora species. By varying primer settings and using a custom database,
THAPBI PICT can be applied to other organisms and/or barcode marker sequences - making it more than
just a Phytophthora ITS1 Classification Tool (PICT).
The analysis pipeline starts from demultiplexed paired FASTQ files, as produced by the Illumina MiSeq
platform. These are quality trimmed, overlapping reads merged and primer trimmed (calling external tools)
and then deduplicated giving a much smaller list of unique sequences and associated read counts (passing a
minimum count threshold intended to exclude "noise"). These are matched to a curated database using a
range of methods, producing both plain text and formatted Excel output. An edit graph in XGMML format
is also produced for display in Cytoscape and other visualisation tools.
THAPBI PICT is released as open source software under the MIT licence. It is written in Python, a free
open source language available on all major operating systems. Version control using git hosted publicly on
GitHub is used for the source code, documentation, and database builds including tracking the hand-curated
reference set of Phytophthora ITS1 sequences. Continuous integration of the test suite is currently run on
both TravisCI and CircleCI. Software is released to the Python Packaging Index (PyPI) as standard for
the Python ecosystem, and additionally packaged for Conda via the BioConda channel. This offers simple
installation of the tool itself, and all the command line dependencies on Linux or macOS. The documentation
is currently hosted on Read The Docs, updated automatically from the GitHub repository.


Speakers
avatar for Peter Cock

Peter Cock

The James Hutton Institute
Bioinformatician at the James Hutton Institute, a member of the BOSC organizing committee, treasurer of the Open Bioinformatics Foundation, and a core developer on the Biopython project.


Sunday July 19, 2020 23:50 - 23:55 EDT
BOSC

23:50 EDT

Computational chemistry analysis using Galaxy: Exploring antigen-antibody binding patterns for MUC1-AR20.5 🌀
➞ Abstract

Christopher Barnett 1, Tharindu Senapathi 1, Sean Collins 2, Kyllen Dilsook 2, Natalie Terry 2

  1. Scientific Computing Research Unit and Department of Chemistry, University of Cape Town, Rondebosch, 7701, South Africa. Email: chris.barnett@uct.ac.za
  2. Department of Chemistry, University of Cape Town, Rondebosch, 7701, South Africa

The presenter(s) will be available for live Q&A at the end of this session in both BCC West and BCC East.

Speakers
avatar for Chris Barnett

Chris Barnett

Lecturer, University of Cape Town


Sunday July 19, 2020 23:50 - Monday July 20, 2020 00:05 EDT
Galaxy

23:55 EDT

Please contribute to FASTQE so I don’t have to 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC East).

Andrew Lonsdale1,2,

1 Peter MacCallum Cancer Centre, Melbourne, Victoria 3000, Australia. Email: andrew.lonsdale@petermac.org
2 Sir Peter MacCallum Department of Oncology, The University of Melbourne, Victoria 3010, Australia

Project Website: http://fastqe.com
Source Codehttps://github.com/lonsbio/fastqe
License: MIT License

FASTQE is a utility for viewing the quality of biological sequence data as emoji . It
takes the FASTQ format, summarises the average quality score per base-position, and
transcribes each ASCII-encoded Phred summary score into a corresponding emoji to see the
good , the bad ,and the ugly of sequencing data.
Initially just a proof of concept at the end of a 2016 PyConAU talk, it has gradually evolved
into a Python package that is also available as a command line program. It can be
installed both via PyPI and Bioconda. When invoked from the command line it can also
display the minimum and maximum quality scores per position, and bin quality
scores into a reduced set of emoji.
Despite little promotion beyond social media (@fastqe), it has gained some popularity.
FASTQE has been used for an undergraduate command line workshop [1], presentations,
and workshops. Surprisingly , there have even been serious uses of the tool. Using
FASTQE, it was found that artefacts in single-cell RNA-seq data can increase the burden of
error correction in cell barcodes, and revealed at least one case of a software bug that
can lead to incorrect barcode correction .
Despite these compelling use cases, FASTQE has a bus-factor of 1. In order to provide
a more valuable tool for bioinformatics training, education and outreach, contributions are
needed. This presentation will demonstrate the functionality of FASTQE, outline the current
status of the project, a roadmap for enhancements, and a call for more contributions to this
open source project. Everyone knows this is a silly idea . This talk will persuade future
contributors that maybe it isn't a silly as it sounds .
[1] Rachael St. Jacques, Max Maza, Sabrina Robertson, Guoqing Lu, Andrew Lonsdale, Ray A Enke (2019). A Fun
Introductory Command Line Exercise: Next Generation Sequencing Quality Analysis with Emoji!. NIBLSE
Incubator: Intro to Command Line Coding Genomics Analysis, (Version 2.0). QUBES Educational Resources.
doi:10.25334/Q4D172

Speakers
AL

Andrew Lonsdale

Peter MacCallum Cancer Centre, Melbourne, Victoria 3000, Australia


Sunday July 19, 2020 23:55 - Monday July 20, 2020 00:00 EDT
BOSC
 
Monday, July 20
 

00:00 EDT

A reproducible workflow for amplicon-based microbial community analysis using the drake R package 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Rodrigo Ortega-Polo 1, Shefali Vishwakarma 2,3, Lan Tran 4, Amanda Gregoris 4, Marta Guarna 4

1 Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada; Lethbridge, Alberta,
Canada. Email: rodrigo.ortegapolo@canada.ca
2 Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada; Lethbridge, Alberta,
Canada.
3 Department of Molecular Biology and Biochemistry, Simon Fraser University; Surrey, British Columbia,
Canada.
4 Beaverlodge Research Farm, Agriculture and Agri-Food Canada; Beaverlodge, Alberta, Canada.

Project Website: https://github.com/BeeCSI-Microbiome/dada2_drake_workflow
Source Code: https://github.com/BeeCSI-Microbiome/dada2_drake_workflow
License: MIT License

The use of workflow management systems promotes best practices in computational biology such
as reproducibility, provenance tracking and documentation of steps and parameters used in
analyses. Furthermore, the ability to restart workflows from a given point in the analysis instead of
starting over provides an efficient way for developing data analysis pipelines. The drake R package
is a framework for workflow management that allows users to design and visualize workflows
status in a reproducible and scalable manner (Figure 1). In our work, we used drake to design a
pipeline for amplicon-based microbial community data using DADA2 for denoising and taxonomic
classification, phyloseq and other R packages for visualization and data tidying. We implemented
this workflow for the analysis of 16S rRNA microbial community datasets from the honey bee gut
microbiome. This workflow has the advantage of enabling users to evaluate microbial communities
with amplicon sequencing data working entirely within R.

Speakers
RO

Rodrigo Ortega-Polo

Agriculture and Agri-Food Canada


Monday July 20, 2020 00:00 - 00:05 EDT
BOSC

00:05 EDT

Q & A 🌀
Question and Answer session for the just finished talks.

Moderators
MD

Maria Doyle

Application and Training Specialist, Peter MacCallum Cancer Centre

Monday July 20, 2020 00:05 - 00:10 EDT
Galaxy

00:05 EDT

SigBio-Shiny: A standalone interactive application for detecting biological significance on a set of genes 🍐
→ Abstract


The presenter(s) will be available for live Q&A in this session (BCC East).

Sangram Keshari Sahu

Independent Researcher, Banglore, India.

Email: sangramsahu15@gmail.com
Project Website: https://github.com/sk-sahu/sig-bio-shiny
Source Code: https://github.com/sk-sahu/sig-bio-shiny
Licence: MIT Licence

Detecting biological significance is an essential step for any high-throughput sequence analysis.
Once sequence reads are mapped and assembled, this is fol owed by different quantification
analysis which ends up with a set of features (transcript/gene). Quickly exploring those features
together from different angles along with statistical inference gives a good idea about the
biology they are involved.
Doing these kinds of exploration for a particular organism requires an up to date annotation
database. Currently available online/API platforms support either very few or only model
organisms. Apart from that, reproducibility is a primary issue as databases continual y updated.
To tackle these problems I am presenting SigBio-Shiny, a standalone interactive application
based on R-Shiny which supports more than just model organisms with no requirement of
manual database maintenance. It leverages available open-source resources such as
Bioconductor's AnnotationHub to col ect the organism's updated database in real-time with
keeping track of what version of the database used. On top of this database, it helps with
detecting biological significance on a set of genes by doing gene mapping, enrichment analysis
of Gene Ontology (GO) and Pathway analysis.
Keywords: Interactive application, Significant biology, Non-model Organism, Annotation
database, Gene mapping, Gene Ontology (GO), Pathway, Enrichment analysis

Speakers
avatar for Sangram Keshari Sahu

Sangram Keshari Sahu

Genomics Data Scientist


Monday July 20, 2020 00:05 - 00:10 EDT
BOSC

00:10 EDT

Streamlining accessibility and computability of large-scale genomic datasets with the NHGRI genome data science Analysis, Visualization, and Informatics Lab-Space (ANVIL) 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Michael C. Schatz 1, Anthony Philippakis 2, on behalf of the AnVIL project team 3
                                                
1 Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD. Email: mschatz@cs.jhu.edu
2 Broad Institute of MIT and Harvard, Cambridge, MA
3 City University of New York, Harvard, Oregon Health & Sciences University, Penn State, Roswell Park Cancer Institute, University of California Santa Cruz, University of Chicago, Vanderbilt, Washington University.


Project Website: https://anvilproject.org/ 
Source Code: https://github.com/anvilproject 
License: MIT License


The traditional model of genomic data sharing – centralized data warehouses such as dbGaP from which researchers download data to analyze locally – is increasingly unsustainable. Not only are transfer/download costs prohibitive, but this approach also leads to redundant siloed compute infrastructure and makes ensuring security and compliance of protected data highly problematic.
The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-Space, or AnVIL, inverts this model, providing a cloud environment for the analysis of large genomic and related datasets. By providing a unified environment for data management and compute, AnVIL eliminates the need for data movement, allows for active threat detection and monitoring, and provides elastic, shared computing resources that can be acquired by researchers as needed. AnVIL provides access to key NHGRI datasets, such as the CCDG (Centers for Common Disease Genomics), CMG (Centers for Mendelian Genomics), eMERGE (Electronic Medical Records and Genomics), as well as other relevant datasets.
The platform is built on a set of established components that have been used in a number of flagship scientific projects. The Terra platform provides a compute environment with secure data and analysis sharing capabilities. Dockstore provides standards based sharing of containerized tools and workflows. Bioconductor and Galaxy provide environments for users at different skill levels to construct and execute analyses. The Gen3 data commons framework provides data and metadata ingest, querying, and organization.
AnVIL provides a collaborative environment for creating and sharing data and analysis workflows for both users with limited computational expertise and sophisticated data scientist users. It provides multiple entry points for data access and analysis, including execution of batch workflows written in WDL, notebook environments including Jupyter and RStudio, Bioconductor packages for building analysis on top of AnVIL APIs and services, and will offer Galaxy instances for interactive analysis. It will be possible to integrate additional analysis environments through standard APIs.
Long-term, the AnVIL will provide a unified platform for ingestion and organization for a multitude of current and future genomic and genome-related datasets. Importantly, it will ease the process of acquiring access to protected datasets for investigators and drastically reduce the burden of performing large- scale integrated analyses across many datasets to fully realize the potential of ongoing data production efforts.
                                   
    

Speakers
MS

Michael Schatz

Johns Hopkins University


Monday July 20, 2020 00:10 - 00:15 EDT
BOSC

00:10 EDT

Integrating and analyzing genotype, phenotype, and environmental data through CartograTree and Tripal Galaxy 🌀
➞ Abstract

Irene Cobo-Simón 1, Nic Herndon 2, Margaret Staton 3, Emily Grau 4, Sean Buehler 4, Peter Richter 4, Risharde Ramnath 4, Charlie Demurjian 4, Abdullah Almsaeed 3, Jill Wegrzyn 4

  1. Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
  2. Department of Computer Science, East Carolina University, NC, USA
  3. Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
  4. Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA

Speakers
avatar for Irene Cobo

Irene Cobo

Postdoctoral Scholar, Department of Ecology and Evolutionary Biology, University of Connecticut
My research interest is mainly focused on evolutionary biology from a molecular perspective. In particular, I am interested in studying the genomic basis of adaptation and biodiversity.


Monday July 20, 2020 00:10 - 00:25 EDT
Galaxy

00:15 EDT

A comprehensive benchmarking of WGS-based structural variant callers 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC East).

Varuni Sarwal 1,2, Sebastian Niehus 3,4, Ram Ayyala 1, Serghei Mangul 5

1 University of California, Los Angeles, CA 90095, USA. Email: sarwal8@gmail.com
2 Indian Institute of Technology Delhi, Hauz Khas, New Delhi, Delhi 110016, India
3 Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Str. 2, 10178 Berlin, Germany
4 Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin,
Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, 10117 Berlin, Germany
5 University of Southern California, Los Angeles, CA 90089, USA

Project Website: https://github.com/Mangul-Lab-USC/benchmarking_SV_publication
Source Code: https://github.com/Mangul-Lab-USC/benchmarking_SV_publication
License: MIT License

Structural variants (SVs) are genomic regions that contain an altered DNA sequence due to
deletion, duplication, insertion, or inversion, and have varying pathogenicity of disease.
Dissecting SVs from whole genome sequencing (WGS) data presents a number of challenges
and a plethora of SV-detection methods have been developed. Currently, there is a paucity of
evidence which investigators can use to select appropriate SV-detection tools. We evaluated the
performance of 15 SV-detection tools based on their ability to detect deletions from aligned
WGS reads using a comprehensive PCR-confirmed gold standard set of SVs to find methods
with a good balance between sensitivity and precision. While the number of true deletions is
3710, the number of deletions detected by the tools ranged from 899 to 82,225. 53% of the
methods reported fewer deletions than are known to be present in the sample. The length
distribution of detected deletions varied across tools and was substantially different from the
distribution of true deletions. 53% of tools underestimate the true size of SVs and deletions
detected by BreakDancer were the closest to the true median deletion length. We allowed
deviation in the coordinates of the detected deletions and compared deviations to the coordinates
of the true deletions from 0 to 10,000 bp. Manta achieved the highest f-score for all thresholds.
Methods with high specificity rates tend to also have significantly higher f-score and precision
rates. CLEVER was able to achieve the highest sensitivity while the most precise method was
PopDel. We assessed the performance of SV callers at coverages from 32x to 0.1x generated by
down-sampling the original WGS data. DELLY showed the highest F-score for coverage below
4x while Manta was the best performing tool from 8x to 32x. We assessed the effect of deletion
length on the accuracy of detection. Manta and CREST were the only tools with high specificity
for deletions shorter than 500bp. LUMPY was the only method able to deliver an F-score above
30% across all categories. Manta and LUMPY were the best performing tools for general
applications. Our recommendations can help researchers choose the best SV detection software,
as well as inform the developer community of the challenges of SV detection.

Speakers
avatar for Varuni Sarwal

Varuni Sarwal

Undergraduate student, UC Los Angeles


Monday July 20, 2020 00:15 - 00:20 EDT
BOSC

00:20 EDT

Q&A for session B1a 🍐
The presenter(s) will be available for live Q&A in this session.

Moderators
MM

Moni Muñoz-Torres

Oregon State University

Monday July 20, 2020 00:20 - 00:25 EDT
BOSC

00:25 EDT

Tripal: an example of successful open-source distributed team development 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Margaret Staton 1*, Abdullah Almsaeed 1, Noah Caldwell 1, Ethalinda Cannon 2, Valentin Guignon 3,
Doreen Main 4, Monica Polechau 5, Manuel Ruiz 3, Jill Wegrzyn 6, Bradford Condon 1, Stephen Ficklin 6,
Lacey Anne Sanderson 7

1. University of Tennessee, Knoxville, TN, USA. * Email: mstaton1@utk.edu
2. Iowa State University, Ames, Iowa, USA.
3. Bioversity International, Montpellier, France.
4. Washington State University, Pullman, WA, USA.
5. USDA-ARS National Agricultural Library, Beltsville, MD, USA.
6. University of Connecticut, Storrs, Connecticut, USA.
7. University of Saskatchewan, Saskatoon, Saskatchewan, Canada

Project Website: http://tripal.info/
Source Code: https://github.com/tripal
License: GNU General Public License v2.0

Tripal is an open-source software toolkit for building community-oriented biological databases
with a focus on genetic and genomic data. Beyond database structure and data access, it provides a
mechanism for data standardization and consistent implementation of FAIR principles across
communities. Currently, the Tripal software provides the foundation for over 30 databases
spanning animals, plants, insects, and more. Tripal has an active international developer
community working from academia, government agencies, and research institutes. Over the past
decade, the Tripal developer community has built a distributed team software development model
with over 30 developers from at least 10 different research groups and 3 countries. Two aspects to
Tripal have helped to make this a success. First, we have recently defined a community governance
structure with a project management committee and an internal advisory board. These function to
promote communication, provide a mechanism for shared decision making, and balance innovation
with sustainability. Second, Tripal's architecture consists of a core of common, centralized
functionality that can be easily expanded with shareable extension modules. This balances shared
community structure and reusable code with the need for individual research groups to customize
and develop quickly and independently. We have noted some disadvantages, but mostly
advantages, due to the unique community structure and software architecture.

Speakers
avatar for Margaret Staton

Margaret Staton

Assistant Professor, University of Tennessee, Knoxville
On the cyberinfrastructure side, I work on community genome databases (particularly Tripal software) and mobile apps for citizen science/outreach. I also do a lot with basic data analysis around genomes, transcriptomes, and epigenomes of plants.


Monday July 20, 2020 00:25 - 00:30 EDT
BOSC

00:25 EDT

Automated real-time data analysis and visualizations for the SARS-CoV-2/Covid19 portal 🌀
➞ Abstract

Marius van den Beek 1, Dannon Baker 2, Anton Nekrutenko 1

  1. Department of Biochemistry and Molecular Biology, Penn State University, University Park PA, USA
  2. Department of Biology, Johns Hopkins University, Baltimore MD, USA

he presenter(s) will be available for live Q&A at the end of this session (BCC East).

Speakers
avatar for Marius van den Beek

Marius van den Beek

Penn State University


Monday July 20, 2020 00:25 - 00:40 EDT
Galaxy

00:25 EDT

BOSC East Session 1b: Open data 🍐
The first talk session of BCC2020 is split into multiple tracks.  This track will include talks to submitted to the BOSC track.

Moderators
MM

Moni Muñoz-Torres

Oregon State University

Monday July 20, 2020 00:25 - 00:45 EDT
BOSC

00:30 EDT

BioThings Explorer: A platform for distributed knowledge integration across biomedical APIs 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

JiwenXin 1,SebastianLelong 1, XinghuaZhbionitioou 1, MarcoCano 1, GingerTsueng 1, ChunleiWu 1, Andrew
Su 1

1 Scripps Research, 10550 North Torrey Pines Road, La Jolla, CA 92037, kevinxin@scripps.edu

Project Website: https://biothings.io/explorer/ 
Source Code: https://github.com/biothings/biothings_explorer 
License: Apache License

BioThings Explorer (BTE) represents a distributed biomedical data integration solution that enables complex queries to be constructed, executed by aligning and connecting disparate RESTful APIs. It facilitates exploring and querying the vast wealth of biomedical data, which is continuously being generated by investigators, granting users the opportunity to seek out logical relationships between bio-entities and discover hidden connections in biomedical data without the burden to build a centralized data warehouse.

BioThings Explorer leverages SmartAPI (https://smart-api.info), an API registry that extends the OpenAPI standard. SmartAPI records provide rich metadata info of the type of associations (e.g. Disease (input) -> treated_by -> Gene (output)) an API is able to deliver, as well as how to retrieve that association. (An example can be found at https://bit.ly/smartapi_opentarget). Together, these SmartAPI records form a metaknowledge graph (https://smartapi.info/registry/translator/meta-kg) that describes the compatibility of APIs based on shared input and output types. BioThings Explorer can then take advantage of the metaknowledge graph to automate the planning and execution of queries across the API network based on specific user requests.

Compared to traditional centralized data integration solutions, BTE offers several advantages. First, it can be easily extended by the community. Adding a new API into the distributed knowledge graph only requires the creation of a SmartAPI metadata record, not the addition of any new code to enforce standardized syntax. Because of its extensibility, over 27 APIs have already been integrated into BTE, covering 138 API operations and 14 semantic types. Second, querying source APIs on the fly guarantees that the data retrieved are always up-to-date with the source. Last, this approach is highly scalable, since the BTE client runs on each user's own computing infrastructure, so there is no centralized component that could become a single point of failure.

Through both the Python package and the web interface, BioThings Explorer can be used to answer two classes of queries -- "PREDICT" and "EXPLAIN". The EXPLAIN queries are designed to identify plausible reasoning chains to explain the relationship between two entities, for example, Why does imatinib have an effect on the treatment of chronic myelogenous leukemia (CML)? (try it live at CoLab: https://bit.ly/bte_explain_colab). And the PREDICT queries are designed to predict plausible relationships between one entity and an entity class, for example, What drugs might be used to treat hyperphenylalaninemia? (try it live at CoLab: https://bit.ly/bte_predict_colab).

Speakers
avatar for Jiwen Xin

Jiwen Xin

Scripps Research
I'm a senior staff scientist in Scripps Research. I'm a Ph.D. in Biology and a self-taught computer engineer. I love combining my expertise in both Biology and Computer Science to build scalable and high performance open source applications to facilitate biomedical research.


Monday July 20, 2020 00:30 - 00:35 EDT
BOSC

00:35 EDT

Don’t worry about data management - use Cenzontle 🍐
Abstract

The presenter(s) will be available for live Q&A in this session (BCC West).

Asis Hallab 1 , Verónica Suaste 2 , Francisco Ramírez 2 , Constantin Eiteneuer 1 , Thomas Voecking 1 , Alicia Mastretta-Yanes 2

1 Jülich Research Center, Germany. Email: asis.hallab@gmail.com
2 CONABIO, Mexico.

Project Website: https://sciencedb.github.io/ 
Source Code: https://github.com/ScienceDb
License: GPL-3

The need for a feature complete flexible management suite capable of handling big distributed data 
In life sciences data often is diverse, interdisciplinary, and stored at different sites. The reproducibility crisis has long been recognized. In the US alone an annual loss of 28 billion dollars has been attributed to research funding spent on projects that yielded not reproducible results (doi.org/10.1371/journal.pbio.1002165). Identified causes are diverse but regularly comprise insufficient data management. Data should be findable, accessable, interoperable, and reusable (FAIR) and a concise data management plan is key to receiving funding and publication. The problem is that creating a suitable data management platform is a considerable software engineering task in itself, more so for diverse big data. And even more so if several distributed data warehouses shall be integrated. Efficient and reliable data management often has no ideal solution, because research groups need to do science not data warehouse software engineering.

Solution: Have software built your data administration warehouse for you
We present Cenzontle. A set of automatic software generators that create your custom data warehouse for you automatically. Define your data formats in standard JSON and get a fully functional warehouse with none to minimal coding effort. The warehouse comprises two interfaces. A graphical browser based one that follows Google’s material design standards and thus have both a professional look and intuitive handling. No documentation is needed to use it. Custom visualizations with Plotly can be integrated and help the scientist to explore the data and form hypotheses. A programmatic interface (API) allows data scientists to build exhaustive queries, execute them efficiently, and thus feed data directly into their analysis pipelines from any programming language. A luxurious IDE helps with query building and has a complete searchable documentation. Standard “CRUD” access functions are offered to all data models. Data can be created, also en mass by uploading tables. It can be read, searched, sorted, and separated into mouth sized subsets. Records can be updated and deleted, of course. Most importantly different data storages can be incorporated. Use any number of databases and servers you like. Relations between records even on different servers is included. Full security is guaranteed using standard authentication and role based authorization, verified on each standard access function.

Speakers
AH

Asis Hallab

Jülich Research Center


Monday July 20, 2020 00:35 - 00:40 EDT
BOSC

00:40 EDT

Q & A 🌀
Question and Answer session for the just finished talks.

Moderators
MD

Maria Doyle

Application and Training Specialist, Peter MacCallum Cancer Centre

Monday July 20, 2020 00:40 - 00:45 EDT
Galaxy

00:40 EDT

Q&A for session B1b 🍐
→ Abstract


The presenter(s) will be available for live Q&A in this session (not sure yet wich hemisphere).

Moderators
MM

Moni Muñoz-Torres

Oregon State University

Monday July 20, 2020 00:40 - 00:45 EDT
BOSC

00:45 EDT

Break!
Take a break!  Check your email, grab some food, acknowledge your family and pets, ...

Just make sure you are back in 15 minutes.

Monday July 20, 2020 00:45 - 01:00 EDT
Joint

01:00 EDT

eLife Innovation Sponsor Table
eLife works to improve research communication through open science and open technology innovation.

eLife is a non-profit organisation inspired by research funders and led by scientists. Our mission is to help scientists accelerate discovery by operating a platform for research communication that encourages and recognises the most responsible behaviours in science.

eLife sponsored childcare at the 2018 joint conference, and again at the 2019 Galaxy Conference. This year eLife is sponsoring closed captioning for conference talks.

Please stop by and learn more about eLife. We are located on the first floor of the Poster / Demo building,

Speakers
avatar for Emmy Tsang

Emmy Tsang

Innovation Community Manager, Delft University of Technology



Monday July 20, 2020 01:00 - 01:45 EDT
Joint

01:00 EDT

GigaScience Sponsor Table
GigaScience is an online open access, open data, open peer-review journal published by Oxford University Press and BGI. The journal offers ‘big data’ research from the life and biomedical sciences, and on top of 'Omics research includes the growing range of work that uses difficult-to-access large-scale data, such as imaging, neuroscience, ecology, systems biology, and other new types of shareable data. GigaScience is unique in the publishing industry as it publishes all research objects (data, software tools, source code, workflows, containers and other elements related to the work underpinning the findings in the article). Promoting Open Science, all published software needs to be under an OSI-license, all supporting data must be available and open, and all peer review is carried out transparently. Presenting workflows via our GigaGalaxy.net server, novel work presented at the meeting utilising Galaxy is eligible to a 15% APC if it is submitted to our Galaxy series.

Please stop by and learn more about GigaScience. We are located on the first floor of the Poster / Demo building,

Speakers
avatar for Ken Cho

Ken Cho

Systems Programmer Analyst, GigaScience
avatar for Scott Edmunds

Scott Edmunds

Editor in Chief, GigaScience Press/BGI Hong Kong
Scott Edmunds is the Editor in Chief of GigaScience Press. With over 15 years experience in Open Access and Open Data publishing he is co-founder of CivicSight (formerly Open Data Hong Kong) and CitizenScience.Asia, and is on the Board of Directors of the Dryad Digital Repository... Read More →
avatar for Laurie Goodman

Laurie Goodman

Publishing Director, GigaScience Press
Laurie Goodman, PhD, is the Publishing Director for GigaScience Press, which publishes the international, open-science journals GigaScience and GigaByte. Both journals have won awards for Innovation in publishing. Dr. Goodman received her BS and MS from Stanford University in 1986... Read More →



Monday July 20, 2020 01:00 - 01:45 EDT
Joint

01:00 EDT

P1-01: : A composable rootless container system for genomic analysis pipelines 🍐
➞ Abstract

This poster will be presented live at BCC East.

Speakers
avatar for Xu Yang

Xu Yang

School of Frontier Sciences, University of Tokyo


Monday July 20, 2020 01:00 - 01:45 EDT
Joint

01:00 EDT

P2-02: : Computational meta-analysis of transcriptome in Tetrology of Fallot reveals dysregulated hubs and pathways 🌀
➞ AbstractPoster

This poster will be presented live at BCC East.

Speakers
avatar for Sona Charles

Sona Charles

PhD student, Bharathiar University
I'm a graduate student working in the area of transcriptomics.



Monday July 20, 2020 01:00 - 01:45 EDT
Joint

01:00 EDT

P2-07: : Data integration promises robust and faster discovery of COVID-19 drug targets 🍐
Updated Title: Multi-omics data integration for the discovery of COVID-19 drug targets

Poster available in the f1000 BOCC collection

Lab page
  

Speakers
avatar for Tyrone Chen

Tyrone Chen

PhD student, Monash University
PhD student in computational biology in the Bioinformatics Lab at Monash University.I am working on harmonising data from multiple modalities to build regulatory and functional signatures of a biological process. This review contains some background information for this project.W... Read More →



Monday July 20, 2020 01:00 - 01:45 EDT
Joint

01:00 EDT

P4-07: : GVL Demo: from Administrators to End-users 🌀
Abstract

This poster will be presented live at BCC East and BCC West.

Speakers
avatar for Nuwan Goonasekera

Nuwan Goonasekera

University of Melbourne



Monday July 20, 2020 01:00 - 01:45 EDT
Joint

01:00 EDT

P6-01: : pyGenomeTracks: Reproducible plots for multivariate genomic data sets 🍐
➞ Abstract

This poster will be presented live at BCC East and BCC West.

Speakers
avatar for Lucille Delisle

Lucille Delisle

Post-doc, EPFL SV ISREC UPDUB
Hi,I am a Post-doc in Denis Duboule lab working on gene regulation during development.For the scientific part, I analyzed various NGS methods including Hi-C, ATAC-seq, CUT&RUN. I recently developped a new method for single-cell RNA-seq, named baredSC.For the galaxy part, I develop... Read More →


Monday July 20, 2020 01:00 - 01:45 EDT
Joint

01:00 EDT

P6-04: : SigBio-Shiny: A standalone interactive application for detecting biological significance on a set of genes 🍐
➞ Abstract

This poster will be presented live at BCC East and BCC West.

Speakers
avatar for Sangram Keshari Sahu

Sangram Keshari Sahu

Genomics Data Scientist


Monday July 20, 2020 01:00 - 01:45 EDT
Joint

01:00 EDT

P6-12: : Towards more FAIR research software 🍐
➞ Abstract

This poster will be presented live at BCC East and BCC West.

Speakers
avatar for Mateusz  Kuzak

Mateusz Kuzak

Community Officer, The Netherlands eScience Center


Monday July 20, 2020 01:00 - 01:45 EDT
Joint

01:00 EDT

Poster / Demo East Session 1
The first poster and demo session of BCC2020.

Access the Poster / Demo hall through the "Go to Posters" button at the top left in the main BCC2020 Remo conference space.

Monday July 20, 2020 01:00 - 01:45 EDT
Joint

01:45 EDT

Break!
Take a break!  Check your email, grab some food, acknowledge your family and pets, ...

Just make sure you are back in 15 minutes.

Monday July 20, 2020 01:45 - 02:00 EDT
Joint

02:00 EDT

BOSC East Session 2: Reproducibility and standards 🍐
The second accepted talk session of BCC2020 is split into multiple tracks.  This track will include talks to submitted to the BOSC track.  

Moderators
avatar for Heather Wiencko

Heather Wiencko

Software Engineer, Hosted Graphite
I'm a software engineer working for Hosted Graphite in the heart of Dublin, Ireland. As co-chair of BOSC, I'm excited about partnering with GCC to put on a conference the likes of which the world has never seen. I also serve as a member-at-large on the OBF Board of Directors. If you're... Read More →

Monday July 20, 2020 02:00 - 03:00 EDT
BOSC

02:00 EDT

Galaxy Session 2: Extending the Galaxy Ecosystem 🌀
The second accepted talk session of BCC2020 is split into multiple tracks.  This track will include talks to submitted to the Galaxy track.  

Moderators
avatar for Simon Gladman

Simon Gladman

University of Melbourne

Monday July 20, 2020 02:00 - 03:00 EDT
Galaxy

02:01 EDT

Automated generation of training materials from markdown documents 🌀
➞ Abstract 

Delphine Larivière
1,4, Frederick Tan 2, John Muschelli 2, James Taylor 3,4, Jeff Leek 2 and the
Galaxy Project 4

  1. Nekrutenko Lab, BMB department, Eberly College of Science, The Pennsylvania State University
  2. Leek group, Data Science Lab, Department of Biostatistics, Johns Hopkins Bloomberg School of Health
  3. Taylor Lab, Biology Department, Johns Hopkins University
  4. Galaxy Project https://galaxyproject.org/

Speakers
avatar for Delphine Lariviere

Delphine Lariviere

Penn State University
Post-doc in the Galaxy Team (Nekrutenko Lab). Works on bacterial genomics, assembly, RNA Seq, TnSeq. Also interested in evolution, metagenomics, epigenetics and visualisation.


Monday July 20, 2020 02:01 - 02:15 EDT
Galaxy

02:01 EDT

Bionitio: building better bioinformatics tools with batteries included 🍐
→ Abstract


The presenter(s) will be available for live Q&A in this session (BCC East).

Authors: Peter Georgeson, Anna Syme, Jessica Chung, Michael Milton, Harriet Dashnow, Andrew Lonsdale, Clare Sloggett, Bernard Pope
License: MIT
URL: https://github.com/bionitio-team/bionitio
Publication: Georgeson, Syme et al. Bionitio: demonstrating and facilitating best practices for bioinformatics command-line software. Gigascience 8, (2019).

The results-driven focus of bioinformatics means that shortcuts are often taken during software development for the sake of making something "that works". Furthermore, many bioinformaticians are not trained in software engineering, and research-oriented projects have limited budgets for quality assurance.

In response to this problem we have developed Bionitio, a tool that automates the process of starting new bioinformatics software projects following recommended best practices. With a single command, the user can create a new well-structured project in one of twelve programming languages. The resulting software is functional — carrying out a prototypical bioinformatics task — and thus serves as both a working example and a template for building new tools. Key features include command-line argument parsing, error handling, logging, defined exit status values, a test suite, a version number, standardised building and packaging, documentation, a standard open-source software license, revision control, and containerisation.

For example, the following command creates a new Python 3 project called skynet using the BSD 3 Clause license and creates a remote repository on GitHub for username cyberdyne:

bionitio-boot.sh -i python -n skynet -c BSD-3-Clause -g cyberdyne

Bionitio serves as a learning aid for beginner-to-intermediate bioinformatics programmers and provides an excellent starting point for new projects. This helps developers adopt good programming practices from the beginning of a project and encourages high-quality tools to be developed more rapidly. Bionitio has been used in several workshops, providing a common codebase for coordination of workshop materials and an extensible platform for the delivery of hands-on practical activities. Additionally, by providing complete working examples in many different languages, Bionitio acts as a kind of "Rosetta Stone" and is therefore an excellent vehicle for comparative programming skills transfer.

In this talk we will describe the design and implementation of Bionitio and demonstrate how it can be used to quickly start new open source bioinformatics projects.

Speakers
avatar for Bernie Pope

Bernie Pope

Victorian Health and Medical Research Fellow, Melbourne Bioinformatics, University of Melbourne
I am an Associate Professor at The University of Melbourne. My research focuses on applying computational techniques to biological questions, especially related to Human Genomics and Cancer.



Monday July 20, 2020 02:01 - 02:15 EDT
BOSC

02:15 EDT

Enhancing rigor and reproducibility in biomedical research 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Jaqueline J. Brito 1,*, Jun Li 2, Jason H. Moore 3, Casey S. Greene 4,5, Nicole A. Nogoy 6, Lana X.
Garmire 2, Serghei Mangul 1,7

1 Dept. of Clinical Pharmacy, School of Pharmacy, University of Southern California, USA
2 Dept. of Computational Medicine & Bioinformatics, University of Michigan, USA
3 Dept. of Biostatistics, Epidemiology, and Informatics, Institute for Biomedical Informatics,
University of Pennsylvania, USA
4 Dept. of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, USA
5 Childhood Cancer Data Lab, Alex's Lemonade Stand, USA
6 GigaScience, Hong Kong
7 Quantitative and Computational Biology, University of Southern California, USA
*Email: britoj@usc.edu

Project Website: https://github.com/Mangul-Lab-USC/enhancing_reproducibility
License: CC BY 4.0 License

Computational methods reshaped the landscape of modern biology, generating new channels of
communications to publish and share the most recent techniques and methodologies. While the
dependence on computational tools of the biomedical community increases steadily, the
mechanisms ensuring open data, open software, and reproducibility are heterogeneously
enforced. Institutions, funders, and publishers offer different guidelines, or no guideline at all.
For instance, publications may cite software artifacts, key to reproduce research results, that
may become unavailable or depend on packages no-longer supported. Publications lacking fully
reproducible research significantly limit the role of reviewers in evaluating technical strength
and scientific contribution. Moreover, incomplete ancillary information for an academic
software package will likely bias and restrict any subsequent research produced with the tool.
In this presentation, we provide eight recommendations across four different domains to
improve three main principles: reproducibility, transparency, and rigor in computational
biology. These are the main principles which should be emphasized in life sciences curricula,
especially as assays and pipelines grow more complex than ever. We propose that a
combination of lowering the learning curve needed to maintain the three principles and more
strict guidelines are key to ensure adoption by the community. Ultimately, our
recommendations target fostering a sustainable data science ecosystem in biomedicine and life
science research.
Keywords: Reproducibility; Open science; Reproducible research; FAIR principles.

Speakers
JJ

Jaqueline J. Brito

Dept. of Clinical Pharmacy, School of Pharmacy, University of Southern California


Monday July 20, 2020 02:15 - 02:20 EDT
BOSC

02:15 EDT

Integrating refgenie and Galaxy for reference data management: a proposal for IDC 🌀
➞ Abstract

Ignacio Eguinoa
1,2 , Frederik Coppens 1,2

  1. Ghent University, Department of Plant Biotechnology and Bioinformatics, 9052 Ghent, Belgium
  2. VIB Center for Plant Systems Biology, 9052 Ghent, Belgium

Speakers
IE

Ignacio Eguinoa

ELIXIR Belgium - VIB Center for Plant Systems Biology


Monday July 20, 2020 02:15 - 02:20 EDT
Galaxy

02:20 EDT

Galaxy and its Tool Shed on Python 3: conclusion of a long journey 🌀
➞ Abstract

Nicola Soranzo 1, Marius van den Beek 2

  1. Earlham Institute, Norwich Research Park, Norwich, UK. Email: nicola.soranzo@earlham.ac.uk
  2. Penn State University, University Park PA, USA.

The presenter(s) will be available for live Q&A at the end of this session in both BCC West and BCC East.

Speakers
avatar for Nicola Soranzo

Nicola Soranzo

Earlham Institute


Monday July 20, 2020 02:20 - 02:25 EDT
Galaxy

02:20 EDT

Secondary analysis of publicly available omics data across almost 3 million publications 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Nicholas Darci-Maher 1, Kerui Peng 3, Dat Duong 1, Richard J. Abdill 2, Eleazar Eskin 1, Serghei Mangul 3

1 University of California, Los Angeles, California, USA. Email: niko.darcimaher@gmail.com
2 University of Minnesota, Minnesota, USA
3 University of Southern California, California, USA

Methods code: https://github.com/smangul1/data_reusability
License: MIT License

Abstract
As today's high throughput sequencing techniques become increasingly affordable and accurate,
the number of publicly available omics datasets is rapidly accumulating. Bioinformatics methods provide
unprecedented opportunities for analysis of omics datasets in quantitative biological research.
Traditionally, such research has included primary analysis of novel omics data developed as part of the
study. However, this data has the potential to be reused, and is often valuable beyond the scope of the
study that introduced it. Data-driven research by secondary analysis on existing datasets is becoming
more important. Increased availability of public omics data represents an opportunity to find novel
insights and discoveries across different datasets.
This study presents a quantitative analysis of the reusability of omics datasets in two online
repositories, the Sequence Read Archive (SRA) and the Gene Expression Omnibus (GEO). We
downloaded over 2.5 million publications from the PubMed Central Open Access corpus, and identified
those that referenced SRA or GEO datasets. We used these papers to examine reusability based on various
factors, including journal, repository, sequencing technology, and species. We find that most datasets are
never reused--these datasets are mentioned once in the study that introduced them, but then never
referenced again. In recent years, however, data reuse is rising. We aim to shed light on the landscape of
data sharing in the quantitative biology research community, and illuminate the benefits of secondary
analysis of omics data.

Speakers
ND

Nicholas Darci-Maher

University of California, Los Angeles


Monday July 20, 2020 02:20 - 02:25 EDT
BOSC

02:25 EDT

Q & A 🌀
Question and Answer session for the just finished talks.

Moderators
avatar for Simon Gladman

Simon Gladman

University of Melbourne

Monday July 20, 2020 02:25 - 02:30 EDT
Galaxy

02:25 EDT

Q&A 🍐
The presenter(s) will be available for live Q&A in this session.

Moderators
avatar for Heather Wiencko

Heather Wiencko

Software Engineer, Hosted Graphite
I'm a software engineer working for Hosted Graphite in the heart of Dublin, Ireland. As co-chair of BOSC, I'm excited about partnering with GCC to put on a conference the likes of which the world has never seen. I also serve as a member-at-large on the OBF Board of Directors. If you're... Read More →

Monday July 20, 2020 02:25 - 02:30 EDT
BOSC

02:30 EDT

CrowdGO: Gene Ontology prediction using a meta approach 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Maarten JMF Reijnders 1,2 and Robert M. Waterhouse 1,2

1 University of Lausanne, Lausanne, Switzerland.
2 Swiss Institute of Bioinformatics, Lausanne, Switzerland.
Email: maarten.reijnders@unil.ch

Source code: https://gitlab.com/mreijnders/CrowdGO
License: GNU General Public License v3.0

Methods to predict protein functions- defined here as assigning Gene Ontology (GO) terms -
vary considerably in their underlying approach, with different methods employing techniques
such as sequence homology, machine learning, or text mining. This often results in dramatically
different sets of GO terms predicted for the same sets of proteins. These methods are reviewed
in the Critical Assessment of Functional Annotation competitions (CAFA) (Zhou 2019), but even
the best scoring methods can be inaccurate, and none truly stand out. To concurrently exploit
the strengths of each method, we developed a meta-predictor that evaluates the predictions of
multiple top-performing methods.
CrowdGO compares the predictions of different methods and uses a machine learning model to
improve the precision, recall, and f-max scores of the resulting meta-predictions. The model can
be trained based on user-selected prediction methods, or a pre-trained model can be used. The
pre-trained models are built using prediction tools that are exclusively open-source, easy to use,
and computationally non-demanding. CrowdGO includes Snakemake workflows to use existing
models for GO term prediction, or to train new models.
Using a model built with four input predictions from a sequence homology- based predictor, Wei2GO (Reijnders 2020), two protein domain based predictors, InterProScan (Mitchell 2019) and FunFams (Scheibenreif 2019), and a deep learning predictor, DeepGOPlus (Kulmanov 2019), CrowdGO increases both the precision and meaningful recall compared to each input method (Figure 1).
CrowdGO is fully open source and leverages other open source tools.It is straightforward to use, both due to the simplistic nature of the software and the accompanying snakemake pipelines. Due to the nature of its meta-prediction algorithm, it will stay relevant even when improved function prediction software becomes
available.


Speakers
MR

Maarten Reijnders

Department of Ecology and Evolution, University of Lausanne


Monday July 20, 2020 02:30 - 02:35 EDT
BOSC

02:30 EDT

Implementation of the IEEE-2791-2020 standard (BioCompute Objects) in Galaxy via workflow invocations 🌀
➞ Abstract

Charles Hadley King 1, Nicola Soranzo 2

  1. George Washington University, Washington D.C. USA
  2. Earlham Institute, Norwich Research Park, Norwich, UK

Speakers
avatar for Charles Hadley King

Charles Hadley King

Senior Research Associate, George Washington University


Monday July 20, 2020 02:30 - 02:35 EDT
Galaxy

02:35 EDT

Goslin - A grammar of succinct lipid nomenclature 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Nils Hoffmann 1, Dominik Kopczynski 1, Bing Peng 2, Robert Ahrends 3

1 Leibniz-Institut für Analytische Wissenschaften ­ ISAS ­ e.V., Otto-Hahn-Straße 6b, 44227
Dortmund, Germany. Email: nils.hoffmann@isas.de
2 Karolinska Institutet, Solna, Stockholm, Sweden.
3 Department of Analytical Chemistry, University of Vienna, Vienna, Austria.

Project Website: https://lifs.isas.de/goslin & https://apps.lifs.isas.de/goslin
Source Code: https://github.com/lifs-tools/goslin (main hub to implementations)
License: Apache v2 LICENSE & MIT License


Main Text of Abstract

We introduce the 'Grammar of Succinct Lipid Nomenclature' (Goslin), a polyglot grammar for
common lipid shorthand nomenclatures based on the LipidMaps nomenclature and the shorthand
nomenclature established by Liebisch et al. and used by LipidHome and SwissLipids, accompanied
by parser implementations in C++, Java, Python and R.

Lipid naming has evolved into several dialects which complicates the unified computational
treatment and parsing of lipid names. As a consequence, long and error-prone manual curation
often is necessary in order to streamline lists of lipid names for their processing in follow-up
analysis scripts, workflows, or tools, or for their submission to research data repositories. Goslin
was designed to address the following pressing issues in the lipidomics field especially: 1) to
simplify the implementation of lipid name handling for developers of mass spectrometry-based
lipidomics tools; 2) to offer a tool that unifies and normalizes the main existing lipid name dialects
enabling a lipidomics analysis in a high-throughput fashion.

Goslin and its parser implementations are thus designed to act as a library for the development of
lipidomics tools providing a standardized data structure for storing structural lipid information.
The parsing of lipid names as well as the lipid name generation are the main functions of Goslin. We
therefor defined a context free grammar (with ANTLR4) that defines rules and productions for all
structural properties of the lipid nomenclature, including mass spectrometry specific information
about unlabeled and heavy isotope labeled species, as well as fragments and adducts. We recently
added the calculation of masses and sum formulas, when the head group's sum composition is
known. Currently, the grammar covers 289 lipid classes within the seven most occurring lipid
categories in eukaryotic organisms, namely fatty acyls, glycerolipids, glycerophospholipids,
saccharolipids, sphingolipids, sterol lipids, and polyketides. The major advantages of using a
grammar rather than a manually coded parser are its flexibility and extensibility. Regular
expressions are also not suitable for parsing lipid names, since they are incapable of recognizing
nested patterns and can only recognize words from regular languages.

We provide implementations of Goslin in four major programming languages, namely C++, Java,
Python 3, and R to kick-start adoption and integration. Further, we set up a web service for users to
work with Goslin directly and via an OpenAPI-compliant REST API. All implementations are
available free of charge under a permissive open source license, binary releases are available from
Zenodo. We are currently working on making the libraries available via BioConda/BioContainers
and other community-facing repositories.

Speakers
NH

Nils Hoffmann

Leibniz-Institut für Analytische Wissenschaften – ISAS – e.V.


Monday July 20, 2020 02:35 - 02:40 EDT
BOSC

02:35 EDT

Porting the rCASC workflow for scRNA-Seq data analysis to Galaxy and the Laniakea Galaxy on-demand system 🌀
➞ Abstract

Pietro Mandreoli 1, Luca Alessandrì 2, Marco Antonio Tangaro 3, Raffaele Calogero 2, Federico Zambelli 4

  1. Dept. of Biosciences, University of Milano - Italy.
  2. Dept. of Molecular Biotechnology and Health Sciences, University of Torino - Italy.
  3. Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, CNR - Italy.
  4. Dept. of Biosciences, University of Milano - Italy.

Speakers
avatar for pietro mandreoli

pietro mandreoli

Dept. of Biosciences, University of Milano


Monday July 20, 2020 02:35 - 02:40 EDT
Galaxy

02:40 EDT

Executable Research Article (ERA): Enrich a research paper with code and data 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

the eLife team and the Stencila team

(Presenter: Emmy Tsang, Innovation Community Manager, eLife; email: e.tsang@elifesciences.org)

Project Website: https://elifesci.org/reprodoc (this will be updated early June)
Source Code: https://github.com/stencila; https://github.com/elifesciences;
License: Apache License 2.0 (for Stencila); MIT (for eLife)

Main Text of Abstract

Code and data are important research output and integral to a full understanding of research
findings and experimental approaches in a paper. However, traditional research articles seldom
have these embedded in the manuscript's narrative, but instead, leave them as "supplementary
materials", if they are openly available.

With Executable Research Articles (ERAs), our vision is to enrich the traditional narrative of a
research article with code, data and interactive figures that can be executed in the browser,
downloaded and explored. It will give readers a direct insight into the methods, algorithms and key
data behind the published research.

We published our first demo ERA in February 2019. Over the past year, we have been working
closely with our collaborator Stencila to build an open tool stack that would enable our authors and
production team to easily publish ERAs at scale. In this talk, we hope to showcase the potential of
ERAs with examples and walk through how authors can enrich their traditional eLife paper using
Stencila Hub, through:

- Starting a Stencila Hub project linked to their eLife paper
- Converting the article to a reproducible notebook format of their preference, while preserving the relevant
 journal article metadata
- Uploading the data required to enable live re-execution of tables and figures in the article
- Replacing static tables and figures with code chunks that reproduce them

We will share our current vision of how ERAs will be integrated into our production workflow and
collect feedback. We also hope to engage participants in exploring potential functionalities for the
tool stack and building a community-driven roadmap.

Speakers
avatar for Emmy Tsang

Emmy Tsang

Innovation Community Manager, Delft University of Technology


Monday July 20, 2020 02:40 - 02:55 EDT
BOSC

02:40 EDT

Galaxy, Selenium, and End-to-end Testing 🌀
➞ Abstract

Oleg Zharkov 1, Dave Bouvier 2, Juan David Mendez 1, Björn Grüning 1, John Chilton 2

  1. Department of Computer Science, Albert-Ludwigs-Universität Freiburg
  2. Department of Biochemistry and Molecular Biology, Penn State University, University Park PA, USA.

Speakers
avatar for Oleg Zharkov

Oleg Zharkov

Albert-Ludwigs-Universität Freiburg


Monday July 20, 2020 02:40 - 02:55 EDT
Galaxy

02:55 EDT

Q & A 🌀
Question and Answer session for the just finished talks.

Moderators
avatar for Simon Gladman

Simon Gladman

University of Melbourne

Monday July 20, 2020 02:55 - 03:00 EDT
Galaxy

02:55 EDT

Q&A 🍐
→ Abstract


The presenter(s) will be available for live Q&A in this session (not sure yet wich hemisphere).

Moderators
avatar for Heather Wiencko

Heather Wiencko

Software Engineer, Hosted Graphite
I'm a software engineer working for Hosted Graphite in the heart of Dublin, Ireland. As co-chair of BOSC, I'm excited about partnering with GCC to put on a conference the likes of which the world has never seen. I also serve as a member-at-large on the OBF Board of Directors. If you're... Read More →

Monday July 20, 2020 02:55 - 03:00 EDT
BOSC

03:00 EDT

Break!
The official day is done, but Birds of a Feather sessions are about to begin.  Before that happens, take a break!  Check your email, grab some food, acknowledge your family and pets, ...


Monday July 20, 2020 03:00 - 03:15 EDT
Joint

03:00 EDT

Interregnum
East Conference Day 1 is done, and the West Conference Day 2 is coming.


Monday July 20, 2020 03:00 - 10:00 EDT
Joint

03:15 EDT

Birds of a Feather (BOFs)
Birds of a Feather (BoFs) are informal, self-organized meetups focused on specific topics. They are a great way to meet other like-minded community members and have an in-depth discussion on a topic of interest.

Anyone is welcome to propose a BoF! All you need is a title, an organizer, and a brief description. At BCC2020, BoFs will be scheduled the hour before or after the main meeting days in both hemispheres. You can choose to hold your BoF in one or both hemispheres.

Please propose BoFs no later than July 10. After that date, new BoF signups will be closed but you are welcome to organize informal "meetups" during BCC2020.

Monday July 20, 2020 03:15 - 04:00 EDT
Joint

05:00 EDT

Galaxy Social East!
Hey!  Come back!

BCC is all about community, and while we can't have our usual after hours gatherings, we can still meet new collaborators and catch up with old mates.  This session is an opportunity to do just that. 

We'll start with a few slides (we will highlight our Fellowship recipients) and then do 2-3 short rounds of icebreaker to get the new folks into the groove, and then leave the rest of the session for chatting and meeting people.

And while this is a Galaxy-sponsored event, we strongly encourage BOSCers to also join in.

This event will be soooooo much better than what the folks in the west are putting on. Stay up or get up early, and find out what's happening on this side of the world.

See you there.

Moderators
avatar for Gareth Price

Gareth Price

Head of Computational Biology, QCIF Facility for Advanced Bioinformatics

Monday July 20, 2020 05:00 - 07:00 EDT
Joint

09:00 EDT

Birds of a Feather (BOFs)
Birds of a Feather (BoFs) are informal, self-organized meetups focused on specific topics. They are a great way to meet other like-minded community members and have an in-depth discussion on a topic of interest.

This BoF session currently has these BoF sessions scheduled:

  1. A code of conduct for the OBF, led by Bastian Greshake Tzovaras and Malvika Sharan
  2. Celestial Masses: Galaxy for mass spectrometry-based research, led by Pratik Jagtap, Oliver Schilling, and Yves Vandenbrouck
  3. Computation Informatics Workflows, led byu Ambarish Kumar
  4. Galaxy Africa, led by Peter van Heusden
  5. Galaxy India, led by Anshu Bhardway


Anyone is welcome to propose a BoF! All you need is a title, an organizer, and a brief description. At BCC2020, BoFs will be scheduled the hour before or after the main meeting days in both hemispheres. You can choose to hold your BoF in one or both hemispheres.

Please propose BoFs no later than July 10. After that date, new BoF signups will be closed but you are welcome to organize informal "meetups" during BCC2020.

Monday July 20, 2020 09:00 - 09:45 EDT
Joint

09:00 EDT

BoF: A code of conduct for the OBF
The Open Bioinformatics Foundation is looking to adapt a code of conduct that would cover both in person events as well online interactions and that could be used by its member projects too. Following the discussings at BOSC2019 we're discussing the current draft.


Birds of a Feather (BoFs) are informal, self-organized meetups focused on specific topics. Anyone is welcome to propose a BoF. Have an idea? Please propose a BoF no later than July 10.

Moderators
avatar for Malvika Sharan

Malvika Sharan

Senior Researcher, The Alan Turing Institute
I am a senior researcher for the Tools, Practices and Systems research programme at The Alan Turing Institute, London. With a focus on Open Research, I lead a team of community managers and co-lead The Turing Way project that aims to make data science reproducible, collaborative... Read More →
avatar for Bastian Greshake Tzovaras

Bastian Greshake Tzovaras

Director of Research, Open Humans Foundation
Bastian Greshake Tzovaras is the Director of Research for the Open Humans Foundation which is dedicated to empowering individuals and communities around their personal data, to explore  and share for the purposes of education, health, and research.

Monday July 20, 2020 09:00 - 09:45 EDT
Joint

09:00 EDT

BoF: Celestial Masses: Galaxy for mass spectrometry-based research
This is a birds of feather meeting for researchers in the field of mass spectrometry-based proteomics. This also extends to multi-omics studies such as proteogenomics, metaproteomics, metabolomics. etc. We seek to discuss some of the work by researchers of the Galaxy-P team, the Schilling Lab at the University of Freiburg and the ProteoRE team. We will discuss ongoing projects, and challenges and opportunities in the field.

Birds of a Feather (BoFs) are informal, self-organized meetups focused on specific topics. Anyone is welcome to propose a BoF. Have an idea? Please propose a BoF no later than July 10.

Moderators
avatar for Timothy J. Griffin

Timothy J. Griffin

Professor, University of Minnesota
avatar for Pratik Jagtap

Pratik Jagtap

Research Assistant Professor, University of Minnesota
Metaproteomics . DIA . Proteogenomics
OS

Oliver Schilling

University of Freiburg

Monday July 20, 2020 09:00 - 09:45 EDT
Joint

09:00 EDT

BoF: Computational Immunoinformatics Worflows Over Galaxy Server
Computational prediction of neoantigens have emerged as current and burning topic in the present pandemic virus outbreak. Nonetheless it serves the purpose of personalized immunogenic therapy for cancer patients. GALAXY toolshed has available immunoinformatics resources to form workflows for neoantigen prediction. Administrator may add their customized tools for desired computational steps. Artificial intelligence is playing important role in vaccine design. Algorithms may be added to GALAXY as customized tools to further progress towards vaccine design. Coverage of population is additional and advantageous aspect of computational workflows in the domain of immunoinformatics.              


Birds of a Feather (BoFs) are informal, self-organized meetups focused on specific topics. Anyone is welcome to propose a BoF. Have an idea? Please propose a BoF no later than July 10.

Speakers

Monday July 20, 2020 09:00 - 09:45 EDT
Joint

09:00 EDT

BoF: Galaxy Africa - Outreach Plans
While there have been Galaxy users in Africa since at least 2009, the last few years have seen an increase in adoption. This BoF is for and by Galaxy users in Africa with a short overview of Galaxy usage on the continent and time to plan our way forward to grow and consolidate the Galaxy Africa community.


Birds of a Feather (BoFs) are informal, self-organized meetups focused on specific topics. Anyone is welcome to propose a BoF. Have an idea? Please propose a BoF no later than July 10.


Monday July 20, 2020 09:00 - 09:45 EDT
Joint

09:00 EDT

BoF: Galaxy-India: Launch and Outreach Plans
Galaxy Community members have been preparing to launch the Galaxy-India community later this year.  If you are interested in learning more or contributing to this community then please join us for this BoF.  We will discuss the plans for the launch and outreach, including (pandemic permitting) workshops in India later this year or next year.  We will also discuss possible work towards a Galaxy-India server.  Interested?  Please join us.

A tentative structure for BoF is as follows:
  1. Learning from other major Galaxy installations 
  2. Preparations for setting up Galaxy India - Human and infrastructure resources 
  3. Roadmap: role mapping and responsibilities so that we have a dedicated team to coordinate and manage this activity 
  4. Integration of Bioclues and Galaxy communities


Birds of a Feather (BoFs) are informal, self-organized meetups focused on specific topics. Anyone is welcome to propose a BoF. Have an idea? Please propose a BoF no later than July 10.

Moderators
avatar for Anshu Bhardwaj

Anshu Bhardwaj

Long term Fellow, Senior Scientist, Interdisciplinary Research Center (CRI) and CSIR-Institute of Microbial Technology
avatar for Harpreet Singh

Harpreet Singh

Assistant Professor, Hans Raj Mahila Maha Vidyalaya (HMV), Jalandhar
I am the Head, Department of Bioinformatics at one of the best women colleges in India. My research interest include Structural Bioinformatics, Machine Learning and Genomics. I am also the Finance Secretary of one of the biggest Bioinformatics community in India i.e. Bioclues.org... Read More →

Monday July 20, 2020 09:00 - 09:45 EDT
Joint

10:00 EDT

BCC2020 Conference Day 2: West
Keynotes, accepted talks, posters, demos, and networking in the West.

Monday July 20, 2020 10:00 - 15:00 EDT
Joint

10:01 EDT

Day 2 Welcome
Daily announcements and an icebreaker.

Moderators
avatar for Dave Clements

Dave Clements

Training and Outreach Coordinator, Galaxy Project, Johns Hopkins University

Monday July 20, 2020 10:01 - 10:15 EDT
Joint

10:15 EDT

West Keynote 2: Open minds bring open collaborations
➔  Slides, Abstract

Prashanth N Suravajhala

  1. Birla Institute of Scientific Research, Statue circle, Jaipur, India
  2. Bioclues.org, India

Post COVID-19 times has ushered a fierce competition to deliver, be it vaccine or funding or publication. As researchers, we have a fair conception to be guided by reasons not emotions amid ‘publish or perish’ adage. On the other hand, multitasking research and publishing has become a noticeable goal, but combining these tasks over time has become the need of the hour. In today’s reserved funding situation, many early/mid-career researchers face a daunting task to establish and develop their research programs, for example starting own labs crowdsourcing or obtaining funds from their previous associations/host institutions and publish it. But to what extent are we trying to preserve the fairness or integrity of science? I would like to draw your attention to ‘Hippocratic Oath for Scientists’, which would ensure keeping the research vitality in the best interests of science to sustain excellence. Towards this, the talk would delve on how the three Cs, viz. Consistency, Continuity and Credibility augur well for a successful open organization. This would invariably bring successful Collaborations, Convergence, and importantly Control over mind to the fore. The growth of an individual or organization depends on fostering commitment to open culture, net neutrality and universal access to information in education and science fields. So, it is the Collaborative index (C-index) that matters. Are we ready?


This session will be introduced by Dave Clements.

Speakers
avatar for Prashanth Suravajhala

Prashanth Suravajhala

Senior Scientist and Founder, Bioclues.org, Birla Institute of Scientific Research; Bioclues
Prashanth N Suravajhala is a senior scientist at Birla Institute of Scientific Research, Jaipur. A PhD in Systems Biology, he went on to gain more than 7 years of postdoctoral experience across four different laboratories. He has interests exploring the known unknown regions in the human genome, primarily... Read More →



Monday July 20, 2020 10:15 - 11:00 EDT
Joint
  Meeting-West

11:00 EDT

Break!
Take a break!  Check your email, grab some food, acknowledge your family and pets, ...

Just make sure you are back in 15 minutes.

Monday July 20, 2020 11:00 - 11:15 EDT
Joint

11:15 EDT

BOSC West Session 3: Building Open Source Communities (BOSC) 🍐
Accepted talks and lightning talks.

Moderators
avatar for Yo Yehudi

Yo Yehudi

Software Developer, University of Cambridge & Open Life Science
Integrated genomic data (InterMine)

Monday July 20, 2020 11:15 - 12:00 EDT
BOSC
  Meeting-West

11:15 EDT

Galaxy West Session 3: Galaxy Administration 🌀
Accepted talks and lightning talks.

Moderators
Monday July 20, 2020 11:15 - 12:00 EDT
Galaxy
  Meeting-West

11:16 EDT

Building open source communities and empowering new contributors 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Yo Yehudi 1, Adrián Bazaga 12, Daniela Butano 1, Rachel Lyne 1, K.H. Reierskog 1,
InterMine Collaborators, Gos Micklem 1

1 Department of Genetics, University of Cambridge, Cambridge, United Kingdom
2 STORM Therapeutics Ltd, Cambridge, United Kingdom

Project Website: http://intermine.org/
Source Code: https://github.com/intermine/internships
License: CC-BY + Apache - https://github.com/intermine/internships/blob/master/README.md

Background: Open source software is a project where source code is open for redistribution,
modification, and which doesn't restrict how the software can be used
(https://opensource.org/osd). Many open source projects take the meaning of open source far
beyond this definition by building structured communities that facilitate contributions to the code
base, documentation, and design of the software. We wil share our experiences from building
community interactions into InterMine (an open source biological data warehouse).

Internship programs: Joining open source communities can often be a chal enge to
newcomers, who may not be aware of unwritten rules, community norms, and expectations. To
help change this, projects like InterMine participate in structured long-term programs to help
onboard newcomers. Two programs of note in this domain are Google Summer of Code (also
known as GSoC, https://summerofcode.withgoogle.com/) and Outreachy
(https://www.outreachy.org/).

Unpaid initiatives: Hacktoberfest (https://hacktoberfest.digitalocean.com/) is a month-long
drive to incentivise contributions to open source software. With the "first timers only" initiatives
(https://www.firsttimersonly.com/) InterMine curates, describes, and tags easier issues to make
them extra-friendly for beginners, creating a low-barrier on-ramp for its contributors.

Practical benefits: InterMine has been mentoring interns recruited via GSoC and Outreachy on
a yearly basis since 2017 and is doing so again in 2020. Over this time we have had tangible
production-ready practical benefits from the projects our interns have worked on, including a
registry for listing public instances of our software (http://registry.intermine.org/) and upgraded
SOLR search functionality
(https://intermineorg.wordpress.com/2018/11/15/intermine-3-0-solr-search/).

Contributors are offered benefits such as sponsored conference and hackathon attendance,
community-branded "swag", and recommendations for university and job applications.

Year-on-year, we find interns and Hacktoberfest contributors tend to return in later years in
many ways: as mentors, to offer technical support for their work, and even joining as staff.

Summary: Scientific and research software can strongly benefit from embracing open source
community models and initiatives, gaining both completed practical projects and a greater pool
of skil ed contributors. Thoughtful y designed pathways enable contributors to engage and stay
involved in the longer term, even when contributors themselves come from non-scientific
backgrounds.

Speakers
avatar for Yo Yehudi

Yo Yehudi

Software Developer, University of Cambridge & Open Life Science
Integrated genomic data (InterMine)



Monday July 20, 2020 11:16 - 11:30 EDT
BOSC

11:16 EDT

The cloud-native Galaxy: Galaxy on Kubernetes 🌀
➞ Abstract

Alexandru Mahmoud 1, Nuwan Goonasekera 2, Pablo Moreno 3, John Chilton 4, Marius van den Beek 4, Enis Afgan 1

  1. Johns Hopkins University, Baltimore, MD, USA
  2. Melbourne Bioinformatics, University of Melbourne, Victoria 3010, Australia
  3. The European Bioinformatics Institute (EMBL-EBI), Cambridgeshire, United Kingdom
  4. Penn State University, State College, PA, USA

The presenter(s) will be available for live Q&A at the end of this session (BCC West).

Speakers
avatar for Alexandru Mahmoud

Alexandru Mahmoud

Galaxy Team, Johns Hopkins University



Monday July 20, 2020 11:16 - 11:30 EDT
Galaxy

11:30 EDT

Codeathons as a tool for improving diversity in computer science 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

ALLISSA DILLMAN 1, RANA MORRIS 2, PETER COOPER 3, ERIC SAYERS 4, BART TRAWICK 5

1 Allissa Dillman, NCBI/NLM/NIH, Bethesda MD 20892 allissa.dillman@nih.gov
2 Rana Morris NCBI/NLM/NIH, Bethesda MD 20892
3 Peter Cooper NCBI/NLM/NIH, Bethesda MD 20892
4 Eric Sayers NCBI/NLM/NIH, Bethesda MD 20892
5 Bart Trawick NCBI/NLM/NIH, Bethesda MD 20892

Project Website: https://ncbi-codeathons.github.io/
Source Code: https://github.com/topics/womenled-nih-2019
License: MIT License

Women are underrepresented in computer science, accounting for only ~18% of the population
receiving degrees in this field. These numbers have been dropping since the 1980s when female
representation was at ~37%. A perceived lack of experience and of few opportunities of female
mentorship are often cited as barriers to women entering computationally intensive fields. Hackathons
are one place where early career computer scientists can explore their creativity and code as part of a
team. Additionally, these events also allow the opportunity to network with others in the biological,
data, and computer science fields, improving representation throughout career stages and creating
opportunities to find novel mentors. Finally, hackathons are a great way to learning new skills, tools and
technologies on the fly from peers. However, hackathons typically also have a gender gap that reflects
the overall participation rate in computer science, with only around 20-25% of participants being
female. Our goal was to facilitate collaboration among communities in science and technology who may
often not interact and to increase the representation of women in computer science activities. To this
end, we created the women-led biodata science codeathon, an event with all-female organization and
leadership and where team projects were proposed, led, developed and presented by women. The
event itself was held May 8-10, 2019 on the National Institutes of Health main campus in Bethesda
Maryland. We had forty-six women from 11 NIH institutes, 10 universities, two consulting firms, two
industrial companies, and a software company work together as teams on eight projects using cloud
infrastructure provide free of charge by the National Center for Biotechnology Information. The majority
of our participants were first time hackathoners and many of them cited the fact that this event was
women-led as the reason for their interest. The event was so successful several teams continue to
collaborate on their codeathon projects, through on-going analysis, writing manuscripts, and working on
posters for upcoming conferences. Many women were asking for another iteration of the event before it
had even finished. The 2nd annual women-led BioData Science Codeathon at NIH will take place in the
fall of 2020. We are continuing to empower diverse coding, science and technologies groups with the
goal of creating more codeathons and other data and computational events that will encourage data
democratization for all.
Document Outline

Speakers


Monday July 20, 2020 11:30 - 11:35 EDT
BOSC

11:30 EDT

Custos: Enabling User Authentication via External Institutional Identities 🌀
➞ Abstract

Juleen Graham 1, Dannon Baker 1, Isuru Ranawaka 2, Alexandru Mahmoud 1, Terry Fleury 3, Suresh Marru 2, Marlon Pierce 2, Enis Afgan 1

  1. Johns Hopkins University, Baltimore, MD, USA
  2. Indiana University, Bloomington, IN, USA
  3. University of Illinois, Urbana, IL, USA

The presenter(s) will be available for live Q&A at the end of this session (BCC West)

Speakers
JG

Juleen Graham

Johns Hopkins University



Monday July 20, 2020 11:30 - 11:35 EDT
Galaxy

11:35 EDT

CyVerse Learning Institute’s foundational open science skills workshop 🍐
Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

TysonLeeSwetnam

UniversityofArizona,TucsonAZ.Email:tswetnam@arizona.edu

ProjectWebsite:https://learning.cyverse.org/projects/foss-2020/en/latest/
SourceCode:https://github.com/CyVerse-learning-materials/foss-2020
License:CCBY4.0

Abstract
CyVerse is a research cyberinfrastructure funded by the National Science Foundation’s Directorate for Biological Sciences. CyVerse provides life scientists with computational infrastructure to handle big datasets and complex analyses, thus enabling data-driven discovery. Principal investigators have reported that access to computing resources is not the bottleneck to data-driven discovery, rather the requisite skills in utilizing cyberinfrastructure and access to training are the most limiting. Our “Foundational Open Science Skills (FOSS)” was designed as a weeklong, camp-style training to address these problems. The focus of FOSS is on computational research strategies, full lifecycle data management, the FAIR data principles, collaboration skills, and using open-source software. FOSS prepares researchers to meet the growing expectations of funding agencies, publishers, and research institutions for scientific reproducibility, data accessibility, and advanced analytics. In this talk, I will discuss our lessons learned, how participants become familiar with productivity software for organizing their data science lab group, communications, and research; and how we approach teaching computational skills from laptop to cloud and high-performance computing (HPC) systems. In the last twelve months, FOSS has been taught twice to over forty early career researchers. Participants have gone on to begin their tenure-track positions, conduct funded research, written new proposals utilizing FOSS techniques and have won competitive grant awards. To contribute back to the community, we have placed our training materials online in GitHub in ReadTheDocs format, where anyone can learn from them or contribute back to the project.

Speakers
avatar for Tyson Swetnam

Tyson Swetnam

Research Assistant Professor, University of Arizona
I work for CyVerse.org. Lately, I've been developing containerized workflows for use in cyberinfrastructure in life and earth science.  If you're interested in foundational open science skills or learning more about using free research computing come talk to me!



Monday July 20, 2020 11:35 - 11:40 EDT
BOSC

11:35 EDT

On-demand Galaxy with Laniakea: results and future perspectives 🌀
➞ Abstract

Tangaro Marco Antonio 1, Donvito Giacinto 2, Antonacci Marica 2, Chiara Matteo 3, Mandreoli Pietro 3, Alverà Martina 3, Pesole Graziano 1,4, Zambelli Federico 1,3

  1. Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (CNR), Bari, Italy
  2. National Institute for Nuclear Physics, Bari Section, Italy
  3. Dept. of Biosciences, University of Milan, Italy
  4. Dept. of Biosciences, Biotechnologies and Pharmacological Sciences, University of Bari, Italy

The presenter(s) will be available for live Q&A at the end of this session (BCC West).

Speakers
MA

Marco Antonio Tangaro

Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies - National Research Council, Bari, Italy



Monday July 20, 2020 11:35 - 11:40 EDT
Galaxy

11:40 EDT

HiSCiAp and Human Cell Atlas Galaxy instance: User-friendly, scalable tools and workflows for single-cell analysis 🌀
➞ Abstract

Moreno, P. 1, Huang, N. 1,2, Manning, J.R. 1, Mohammed S. 1, Solovyev A. 1, Polanski, K. 2, Chazarra, R. 1, Talavera-Lóopez, C. 1,2, Doyle, M. 3,4, Marnier, G. 1, Grüning, B. 5, Rasche, H. 5, Miao, C. 1, Bacon, W. 1, Perez-Riverol, Y. 1, Haeussler, M. 6, Brazma, A. 1, Meyer, K.B. 2, Teichmann, S. 2, Papatheodorou, I. 1

  1. EMBL-EBI
  2. Wellcome Sanger Institute
  3. Research Computing Facility, Peter MacCallum Cancer Centre, Melbourne, Victoria 3000, Australia
  4. Sir Peter MacCallum Department of Oncology, The University of Melbourne, Victoria 3010, Australia
  5. U. of Freiburg
  6. Genomics Institute, University of California at Santa Cruz, 1156 High Street, Santa Cruz 95064, USA

The presenter(s) will be available for live Q&A at the end of this session (BCC West).

Speakers
avatar for Pablo Moreno

Pablo Moreno

EMBL-EBI European Bioinformatics Institute



Monday July 20, 2020 11:40 - 11:55 EDT
Galaxy

11:40 EDT

Open Life Science: Empowering early career researchers to become open science leaders 🍐
→ Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

By the Open Life Science team members: Bérénice Batut, Malvika Sharan, Yo Yehudi

Project Website: http://openlifesci.org/
Source Code: https://github.com/open-life-science/open-life-science.github.io 
License: CC-BY for all training and mentoring materials, CC-BY-SA for the website content

Motivation: As scientists, we are provided training and guidance in how to conduct research in the lab, design algorithms, analyse data and publish them. However, scientists are rarely expected to apply important skills such as open science principles for tooling and road mapping their projects, planning reproducible workflows, involving others in their work, and leading an inclusive community. Modern bioinformatics communities stand in the interface of computational and biological research. This interdisciplinary position requires us to develop collaborative projects by implementing such “open by design” principles in our research projects systematically -- skills that aren’t necessarily taught at university or graduate school level.

About the project: Open Life Science (OLS) is a volunteer-driven training and mentoring program aimed at empowering early career researchers and potential academic leaders to become open science ambassadors. Participants join OLS with a proposal to work on an open science project and attend a series of one-on-one mentoring calls over 16 weeks, alternating with full cohort calls that provide training on specific open science and leadership skills. OLS’s work is underpinned by a community of over 50 mentors and expert guest speakers.

Cohort calls cover a broad spectrum of topics relevant to leading an open project, ranging from open science topics, community building, project and contribution management of GitHub repositories, and caring both for yourself and others in your community. Calls are designed to be interactive and engaging, utilising a mix of Zoom’s break-out room features to facilitate group discussion, collaborative document editing, and guest speakers from academia and industry giving short talks. The program is modelled on the exact principles we teach, and hence, all materials, including syllabus, call notes, and slides, are shared under the CC-BY licence. Cohort calls are recorded and shared openly on YouTube. Third-party organisations and individuals are encouraged to fork, remix and re-use materials.

Overview of the first round: OLS’s first cohort (OLS-1), known as “Open Seeds”, was conducted from January 2020 until May 2020 with 29 project leaders working on 20 projects. Project leaders came from around the world, including the Netherlands, Spain, Norway, Japan, India, Nepal, Thailand, Kenya, Brazil, Russia, Canada, the United Kingdom, and the United States. At the end of the program, the project leaders graduate by presenting their work, share their mentorship experience and discuss their future plans on publicly live-streamed video calls.

In this talk, we will report important observations and outcomes from running the first cohort of our mentoring and training program. At the time of writing, OLS-1 is in final stages of wrap-up and graduation, and we aim to open applications for OLS-2 in May 2020. We will also welcome new mentors and experts, including the project leaders from OLS-1, who will be encouraged to return to join the mentor and expert teams for OLS-2.

Speakers
avatar for Bérénice Batut

Bérénice Batut

Post-doc, University of Freiburg
avatar for Yo Yehudi

Yo Yehudi

Software Developer, University of Cambridge & Open Life Science
Integrated genomic data (InterMine)
avatar for Malvika Sharan

Malvika Sharan

Senior Researcher, The Alan Turing Institute
I am a senior researcher for the Tools, Practices and Systems research programme at The Alan Turing Institute, London. With a focus on Open Research, I lead a team of community managers and co-lead The Turing Way project that aims to make data science reproducible, collaborative... Read More →



Monday July 20, 2020 11:40 - 11:55 EDT
BOSC

11:55 EDT

Q & A 🌀
Question and Answer session for the just finished talks.

Moderators
Monday July 20, 2020 11:55 - 12:00 EDT
Galaxy

11:55 EDT

Q&A 🍐
The presenter(s) will be available for live Q&A in this session.

Moderators
avatar for Yo Yehudi

Yo Yehudi

Software Developer, University of Cambridge & Open Life Science
Integrated genomic data (InterMine)

Monday July 20, 2020 11:55 - 12:00 EDT
BOSC

12:00 EDT

Break!
Take a break!  Check your email, grab some food, acknowledge your family and pets, ...

Just make sure you are back in 15 minutes.

Monday July 20, 2020 12:00 - 12:00 EDT
Joint

12:15 EDT

Sponsor Session West
Learn more about BCC2020 Sponsors. Sponsors make this event possible and affordable, and are potential partners for your research.

Moderators
avatar for Dave Clements

Dave Clements

Training and Outreach Coordinator, Galaxy Project, Johns Hopkins University

Monday July 20, 2020 12:15 - 13:00 EDT
Joint
  Meeting-West

12:20 EDT

AWS Gold Sponsor Talk: Scalable genomics data analysis in the cloud
Slides

Abstract:
The amount of raw genomics data is continuously growing with some estimating that the amount of data world wide is on the order of Exabytes. Processing such mountains of FASTQs into science ready formats like VCFs, expression matrices, etc is no trivial task and requires workflow architectures that can scale in both performance and cost efficiency. The cloud offers practically unlimited compute capacity, elasticity, and flexibility to process enormous amounts of genomics data cost effectively and on-demand. In this talk, we’ll highlight the core patterns, architectures, and tooling used by many genomics customers who are leveraging the cloud to tackle their biggest genomics data processing challenges.

Sponsorship:
Amazon Web Services is a Gold Level sponsor of BCC2020.  Lee Pang is also giving this talk in BCC East. AWS is used in the research behind several presentations at BCC2020.

Speakers
avatar for Lee Pang

Lee Pang

Amazon Web Services
Lee is a Principal Bioinformatics Architect with the Health AI services team at AWS. He has a PhD in Bioengineering and over a decade of hands-on experience as a practicing research scientist and software engineer in bioinformatics, computational systems biology, and data science developing tools ranging from high throughput pipelines for *omics data processing... Read More →




Monday July 20, 2020 12:20 - 12:40 EDT
Joint

12:40 EDT

Software Sustainability Institute Gold Sponsor Talk: Most code is cr#p and that’s okay
Abstract
The COVID-19 pandemic has shone a spotlight on research software, with everyone from scientists to sceptics scrutinising the codes and models used to inform policy and interventions. This has highlighted a gap in expectations of the quality of this software – but what is the reality and what is good enough? What should we strive for as the people using software to power our research?

In this talk, I’ll discuss some of the challenges of developing reusable research software, and why collaboration and openness are the strongest tools to improve the quality of your code.

Sponsorship:
The Software Sustainability Institute has been working for a decade in this area, encouraging and enabling researchers and research software engineers to live up to our slogan of “better software, better research”.  The Software Sustainability Institute is a Gold Level sponsor of BCC2020.


Speakers
avatar for Neil Chue Hong

Neil Chue Hong

Director, Software Sustainability Institute, University of Edinburgh
Neil Chue Hong is the founding Director and PI of the Software Sustainability Institute and a Senior Research Fellow at EPCC, based at the University of Edinburgh. He graduated with an MPhys in Computational Physics, also from the University of Edinburgh. He completed an internship... Read More →



Monday July 20, 2020 12:40 - 13:00 EDT
Joint

13:00 EDT

eLife Innovation Sponsor Table
eLife works to improve research communication through open science and open technology innovation.

eLife is a non-profit organisation inspired by research funders and led by scientists. Our mission is to help scientists accelerate discovery by operating a platform for research communication that encourages and recognises the most responsible behaviours in science.

eLife sponsored childcare at the 2018 joint conference, and again at the 2019 Galaxy Conference. This year eLife is sponsoring closed captioning for conference talks.

Please stop by and learn more about eLife. We are located on the first floor of the Poster / Demo building,

Speakers
avatar for Emmy Tsang

Emmy Tsang

Innovation Community Manager, Delft University of Technology



Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

GigaScience Sponsor Table
GigaScience is an online open access, open data, open peer-review journal published by Oxford University Press and BGI. The journal offers ‘big data’ research from the life and biomedical sciences, and on top of 'Omics research includes the growing range of work that uses difficult-to-access large-scale data, such as imaging, neuroscience, ecology, systems biology, and other new types of shareable data. GigaScience is unique in the publishing industry as it publishes all research objects (data, software tools, source code, workflows, containers and other elements related to the work underpinning the findings in the article). Promoting Open Science, all published software needs to be under an OSI-license, all supporting data must be available and open, and all peer review is carried out transparently. Presenting workflows via our GigaGalaxy.net server, novel work presented at the meeting utilising Galaxy is eligible to a 15% APC if it is submitted to our Galaxy series.

Please stop by and learn more about GigaScience. We are located on the first floor of the Poster / Demo building,

Speakers
avatar for Ken Cho

Ken Cho

Systems Programmer Analyst, GigaScience
avatar for Scott Edmunds

Scott Edmunds

Editor in Chief, GigaScience Press/BGI Hong Kong
Scott Edmunds is the Editor in Chief of GigaScience Press. With over 15 years experience in Open Access and Open Data publishing he is co-founder of CivicSight (formerly Open Data Hong Kong) and CitizenScience.Asia, and is on the Board of Directors of the Dryad Digital Repository... Read More →
avatar for Laurie Goodman

Laurie Goodman

Publishing Director, GigaScience Press
Laurie Goodman, PhD, is the Publishing Director for GigaScience Press, which publishes the international, open-science journals GigaScience and GigaByte. Both journals have won awards for Innovation in publishing. Dr. Goodman received her BS and MS from Stanford University in 1986... Read More →



Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P1-05: : Automated generation of training materials from markdown documents 🌀
Abstract

This poster will be presented live at BCC West.

Speakers
avatar for Delphine Lariviere

Delphine Lariviere

Penn State University
Post-doc in the Galaxy Team (Nekrutenko Lab). Works on bacterial genomics, assembly, RNA Seq, TnSeq. Also interested in evolution, metagenomics, epigenetics and visualisation.



Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P1-06: : Automated real-time data analysis and visualizations for the SARS-CoV-2/Covid19 portal 🌀
Abstract

This poster will be presented live at BCC West.

Speakers
avatar for Marius van den Beek

Marius van den Beek

Penn State University



Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P1-09: : BioViz Connect: Web application linking CyVerse cloud resources to genomic visualization in the Integrated Genome Browser 🍐
➞ Abstract

This poster will be presented live at BCC West.

Advances in high throughput sequencing have increased the need for tools that aid in data storage,

analysis, annotation, and visualization. Many such tools are available, but their usability and

accessibility vary. To make essential tools more accessible, the bioinformatics community has

coalesced around the idea of using cloud-based platforms to provide access to computational power

and data storage resources. CyVerse is a multi-institution project focused on supporting life science

research by providing user-friendly access to national cyberinfrastructure resources, including HPC

clusters and storage infrastructure. As part of this effort, CyVerse developers built the Terrain

Application Programmer Interfaces (APIs), which offer programmatic access to these resources.

One important limitation of the CyVerse ecosystem, however, is that there is currently no easy way

for researchers to visualize genomic data sets stored in CyVerse accounts. This is problematic

because visualization is essential for all aspects of data analysis, from validating the output of

algorithms to detecting biologically meaningful patterns in data.

BioViz Connect solves this problem by connecting CyVerse resources to Integrated Genome

Browser, a full-featured, open source, visualization tool for genomics used by thousands of

researchers worldwide. BioViz Connect uses Terrain APIs to forward data from CyVerse into IGB.

The BioViz Connect interface (Figure 1) lets users annotate data files with key meta-data, notably

the version of reference genomes used in data analysis. Users can also run compute-intensive visual

analytics tasks and then display the results in IGB. To our knowledge, no other group has yet

experimented with using Terrain for application development outside of the CyVerse team.


Speakers
avatar for Nowlan Freese

Nowlan Freese

Research Associate, UNC Charlotte



Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P1-12: : Codeathons as a tool for improving diversity in computer science 🍐
➞ Abstract

This poster will be presented live at BCC West.

Speakers

Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P2-01: : Community genome annotation integrates with Galaxy via Apollo providing greater integration and more functional annotation options 🌀
Abstract

This poster will be presented live at BCC West.

Speakers
avatar for Nathan Dunn

Nathan Dunn

Software Developer, Lawrence Berkeley National Lab



Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P2-05: : CrowdGO: a wisdom of the crowd-based Gene Ontology prediction tool 🍐
➞ Abstract

This poster will be presented live at BCC West.

Speakers
MR

Maarten Reijnders

Department of Ecology and Evolution, University of Lausanne


Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P3-03: : EDAM: the ontology of bioinformatics operations, topics, data, and formats (update 2020) 🍐
➞ Abstract

This poster will be presented live at BCC West on Monday, and East on Tuesday.


Matúś Kalaš 1, Hervé Ménager 2, Alban Gaignard 3, Veit Schwämmle 4, Jon Ison 5, and the EDAM contributors and advisors

1. University of Bergen, Norway
2. Institut Pasteur, Paris, France
3. Univerity of Nantes, France
4. University of Southern Denmark, Ødense, Denmark
5. French Institute of Bioinformatics (ELIXIR France)

Project website: https://edamontology.org
Source code: https://github.com/edamontology/edamontology
License: CC BY-SA 4.0

EDAM is an ontology of well-established, familiar concepts that are prevalent within bioinformatics, and bioscientific data analysis in general [1,2]. The scope of EDAM includes types of data and data identifiers, data formats, operations, and topics. EDAM has a relatively simple structure, and comprises a set of concepts with terms, synonyms, definitions, relations, links, persistent identifiers, and some additional information (especially for data formats).

EDAM is developed in a participatory and transparent fashion, within a growing international community of contributors. The development of EDAM is coordinated with the development and curation of tools registries (e.g. bio.tools and BIII.eu); registries of training materials (e.g. TeSS); with packaging of open-source bioinformatics software (especially Debian Med [3]); the Common Workflow Language [4]; and other related communities and initiatives. These include the developers’ community of Galaxy [5], and collaborations with specialised networks of experts, such as within the development of EDAM-bioimaging [6]. EDAM-bioimaging is an extension of EDAM towards bioimage informatics and machine learning, where a broad group of experts in bioimaging, image analysis, and deep learning has been contributing to the common effort. The comprehensive but concise inclusion of machine learning topics is one of the new additions in 2020.The latest release of EDAM at the time of publication was version 1.24 [7], and EDAM-bioimaging version alpha06 [8].

In summary, EDAM functions as common controlled vocabulary when publishing, sharing, and integrating information about bioinformatics tools, workflows, training materials, and other resources. In addition, EDAM is also useful when choosing terminology, for data provenance, and in text mining (e.g. EDAMmap).

Poster published in F1000Research on 6 Jun 2020. https://doi.org/10.7490/f1000research.1117983.1
Video presentation: https://youtu.be/Jq16bnq8kbk


Speakers
avatar for Matúš Kalas

Matúš Kalas

Senior Engineer, University of Bergen
Attending GCC2022 virtually 🌌Working on open science|source|society|education, EDAM ontology, ELIXIR Norway, Bio.tools, ...



Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P3-06: : eSPiGA: a population genomic analyses package with graphical interface 🍐
➞ Abstract

This poster will be presented live at BCC West.


Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P4-01: : Functionally Assembled Terrestrial Ecosystem Simulator (FATES) with Community Land Model in Galaxy 🌀
Abstract

This poster will be presented live at BCC West.

Speakers
avatar for Anne Fouilloux

Anne Fouilloux

Research Software Engineer, University of Oslo
I am working on Galaxy Climate (development of tools, integration of climate data, training material).
HT

Hui Tang

University of Oslo, Department of Geosciences
SG

Sonya Geange

Department of Biological Sciences, University of Bergen, Norway



Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P4-05: : Goslin - A grammar of succinct lipid nomenclature 🍐
➞ Abstract

This poster will be presented live at BCC West.

Speakers
NH

Nils Hoffmann

Leibniz-Institut für Analytische Wissenschaften – ISAS – e.V.


Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P4-09: : HiSCiAp and Human Cell Atlas Galaxy instance: User-friendly, scalable tools and workflows for single-cell analysis 🌀
Abstract

This poster will be presented live at BCC West.

Speakers
avatar for Pablo Moreno

Pablo Moreno

EMBL-EBI European Bioinformatics Institute



Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P4-11: : Implementation of the IEEE-2791-2020 standard (BioCompute Objects) in Galaxy via workflow invocations 🌀
Abstract

This poster will be presented live at BCC West.

Speakers
avatar for Charles Hadley King

Charles Hadley King

Senior Research Associate, George Washington University



Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P5-01: : Integrating refgenie and Galaxy for reference data management: a proposal for IDC 🌀
Abstract

This poster will be presented live at BCC West.

Speakers
IE

Ignacio Eguinoa

ELIXIR Belgium - VIB Center for Plant Systems Biology



Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P5-03: : Jasmine: Fast and accurate structural variant comparison across many individuals 🍐
➞ Abstract

This poster will be presented live at BCC West.

Speakers

Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P5-08: : OpenBio.eu: An extrovert bioinformatics research object repository and workflow management system 🍐
➞ Abstract

This poster will be presented live at BCC West.


Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

13:00 EDT

P6-06: : sRNAflow: a tool for analysis of small RNA-seq data 🍐
➞ Abstract

This poster will be presented live at BCC West.

Speakers

Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P6-10: : The cloud-native Galaxy: Galaxy on Kubernetes 🌀
Abstract

This poster will be presented live at BCC West.

Speakers
avatar for Nuwan Goonasekera

Nuwan Goonasekera

University of Melbourne
avatar for The Other Enis Afgan

The Other Enis Afgan

Research scientist, Johns Hopkins University
avatar for Alexandru Mahmoud

Alexandru Mahmoud

Galaxy Team, Johns Hopkins University



Monday July 20, 2020 13:00 - 13:45 EDT
Joint

13:00 EDT

P7-01: : ViPRA-Haplo: de novo reconstruction of viral populations using paired end sequencing data) 🍐
➞ Abstract

This poster will be presented live at BCC West.

Viruses replicating within a host exist as a collection of closely related genetic variants known as viral haplotypes. The diversity in a viral population, or quasispecies, is due to mutations (insertions, deletions or substitutions) or recombination events that occur during virus replication. These haplotypes differ in relative frequencies and together play an important role in the fitness and evolution of the viral population. This variation in viral sequences poses a challenge to vaccine design and drug development. We present ViPRA-Haplo, a de novo assembly algorithm for reconstructing viral haplotypes in a virus population from paired-end next generation sequencing (NGS) data. The proposed Viral Path Reconstruction Algorithm (ViPRA) generates a subset of paths from a De Bruijn graph of reads using the pairing information of reads. These paths represent contigs of the virus. The paths generated by ViPRA are an over-estimation of the possible contigs. We then propose two methods to obtain an optimal set of contigs representing the viral haplotypes. The first method uses VSEARCH to cluster the paths reconstructed by ViPRA. The centroid in each cluster represents a contig. Second, we proposed a method MLEHaplo that generates a maximum likelihood estimate of the viral populations using the ViPRA paths. We evaluate and compare ViPRA-Haplo on a simulated data set, on a real HIV MiSeq data set (SRR961514) with sequencing errors, and on an emerging SARS-CoV-2 real data set (SRR10903401). In the simulated data, ViPRA-Haplo reconstructs full length viral haplotypes having a 99.7% sequence identity to the true viral haplotypes at 250x sequencing coverage. In the real NGS data, error correction software Karect is used to improve de novo assembly. The real HIV data set contains 714,994 pairs (2x250 bp) of reads that cover the five strains to 20,000x. Our method can reconstruct contigs that cover over 90% of each strain of the reference genomes, which is higher than the benchmark tool PEHaplo.  In the SARS-CoV-2 data, after filtering for SARS-CoV-2 contigs using the metagenomic classifier Centrifuge, the contigs reconstructed by our method cover over 99% of the reference genome.  The comparisons on both simulated and real data show that ViPRA-Haplo outperforms the existing tools by a higher coverage in reference genome(s), and in retaining the variation in viral sequence present naturally in the viral population.


Speakers
WL

Weiling Li

postdoc, Indiana University - Bloomington



Monday July 20, 2020 13:00 - 13:45 EDT
Joint