Loading…
BCC2020 has ended
➞ Set your timezone before doing anything else on this site (home page, on the right)
Limit what is shown by Type, Category, or Hemisphere
Registration closed July 15.

BCC2020 is online, global, and affordable. The meeting and training are now done, and the CoFest is under way.

The 2020 Bioinformatics Community Conference brings together the Bioinformatics Open Source Conference (BOSC) and the Galaxy Community Conference into a single event featuring training, a meeting, and a CollaborationFest. Events run from July 17 through July 25, and is held in both the eastern and western hemispheres.

Back To Schedule
Tuesday, July 21 • 10:30 - 10:45
CWL-Airflow: a lightweight pipeline manager supporting Common Workflow Language 🍐

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!



Abstract


The presenter(s) will be available for live Q&A in this session (BCC West).

Michael Kotliar 1*, Andrey V. Kartashov 1, Artem Barski 1,2

1 Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center and Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, USA and
2 Division of Human Genetics, Cincinnati Children’s Hospital Medical Center and Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, USA
*michael.kotliar@cchmc.org

Project Website: https://barski-lab.github.io/cwl-airflow/ 
Source Code: https://github.com/Barski-lab/cwl-airflow
License: Apache License 2.0

Modern biomedical research has seen a remarkable increase in the production and computational analysis of large datasets, leading to an urgent need to share standardized analytical techniques. However, of the >100 computational workflow systems used in research, most define their own specifications for computational pipelines. Common Workflow Language (CWL) working group was formed to create a language for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments. Herein, we present CWL-Airflow, a package that adds support for CWL to the Apache Airflow pipeline manager. Addition of the CWL capability to Airflow has made it more convenient for scientific computing, in which the users are more interested in the flow of data than the tasks being executed. While Airflow defines workflows only as sequences of steps to be executed (i.e., DAGs), the CWL description of inputs and outputs leads to better representation of data flow. This allows for a better understanding of data dependencies and produces more readable workflows.

After CWL-Airflow was published in 2019, we introduced major changes in the architecture of the program making it more suitable for large scale data processing. Original approach of creating a separate CWLDAG-class instance on each new run was replaced by more efficient one – triggering the same workflow with updated input parameters through API server. Additionally, we added Workflow Execution Service (WES) API as a standardized way to programmatically manage workflow execution process. In order to run a CWL pipeline in Airflow, our package loads the CWL workflow descriptor file and creates a CWLDAG-class instance that reflects the CWL workflow structure. Workflow step execution order is based on step inputs and outputs therefore implementing dataflow principles and architecture that are missing in Airflow. For computationally intensive pipelines Airflow can use the Celery task queue to distribute processing over multiple nodes. The Celery system helps not only to balance the load over the different machines but also to define task priorities by assigning them to the separate queues.

Since the key promise of CWL specification is the portability of analyses and their reproducibility, CWL-Airflow team took part in Global Alliance for Genomics and Health (GA4GH) Workflow Execution Challenge both as workflow author and as a participant. The results showed that CWLAirflow complies with the CWL specification, supports portability, and performs analysis in a reproducible manner. CWL-Airflow leverages all the benefits provided by Airflow such as scaling and multiple platforms support, web-based GUI, workflow execution pools and queues, simple installation and configuration. In summary, CWL-Airflow complies with CWL v. 1.1 specification and will provide users with the ability to execute CWL workflows anywhere Airflow can run – from a laptop to a cluster or cloud environment.

Speakers
avatar for Michael Kotliar

Michael Kotliar

Cincinnati Children's Hospital Medical Center



Tuesday July 21, 2020 10:30 - 10:45 EDT
BOSC