Loading…
BCC2020 has ended
➞ Set your timezone before doing anything else on this site (home page, on the right)
Limit what is shown by Type, Category, or Hemisphere
Registration closed July 15.

BCC2020 is online, global, and affordable. The meeting and training are now done, and the CoFest is under way.

The 2020 Bioinformatics Community Conference brings together the Bioinformatics Open Source Conference (BOSC) and the Galaxy Community Conference into a single event featuring training, a meeting, and a CollaborationFest. Events run from July 17 through July 25, and is held in both the eastern and western hemispheres.

Back To Schedule
Monday, July 20 • 00:35 - 00:40
Don’t worry about data management - use Cenzontle 🍐

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!



Abstract

The presenter(s) will be available for live Q&A in this session (BCC West).

Asis Hallab 1 , Verónica Suaste 2 , Francisco Ramírez 2 , Constantin Eiteneuer 1 , Thomas Voecking 1 , Alicia Mastretta-Yanes 2

1 Jülich Research Center, Germany. Email: asis.hallab@gmail.com
2 CONABIO, Mexico.

Project Website: https://sciencedb.github.io/ 
Source Code: https://github.com/ScienceDb
License: GPL-3

The need for a feature complete flexible management suite capable of handling big distributed data 
In life sciences data often is diverse, interdisciplinary, and stored at different sites. The reproducibility crisis has long been recognized. In the US alone an annual loss of 28 billion dollars has been attributed to research funding spent on projects that yielded not reproducible results (doi.org/10.1371/journal.pbio.1002165). Identified causes are diverse but regularly comprise insufficient data management. Data should be findable, accessable, interoperable, and reusable (FAIR) and a concise data management plan is key to receiving funding and publication. The problem is that creating a suitable data management platform is a considerable software engineering task in itself, more so for diverse big data. And even more so if several distributed data warehouses shall be integrated. Efficient and reliable data management often has no ideal solution, because research groups need to do science not data warehouse software engineering.

Solution: Have software built your data administration warehouse for you
We present Cenzontle. A set of automatic software generators that create your custom data warehouse for you automatically. Define your data formats in standard JSON and get a fully functional warehouse with none to minimal coding effort. The warehouse comprises two interfaces. A graphical browser based one that follows Google’s material design standards and thus have both a professional look and intuitive handling. No documentation is needed to use it. Custom visualizations with Plotly can be integrated and help the scientist to explore the data and form hypotheses. A programmatic interface (API) allows data scientists to build exhaustive queries, execute them efficiently, and thus feed data directly into their analysis pipelines from any programming language. A luxurious IDE helps with query building and has a complete searchable documentation. Standard “CRUD” access functions are offered to all data models. Data can be created, also en mass by uploading tables. It can be read, searched, sorted, and separated into mouth sized subsets. Records can be updated and deleted, of course. Most importantly different data storages can be incorporated. Use any number of databases and servers you like. Relations between records even on different servers is included. Full security is guaranteed using standard authentication and role based authorization, verified on each standard access function.

Speakers
AH

Asis Hallab

Jülich Research Center


Monday July 20, 2020 00:35 - 00:40 EDT
BOSC