→ AbstractThe presenter(s) will be available for live Q&A in this session (BCC West).
Asis Hallab 1 , Verónica Suaste 2 , Francisco Ramírez 2 , Constantin Eiteneuer 1 , Thomas Voecking 1 , Alicia Mastretta-Yanes 2
1 Jülich Research Center, Germany. Email: asis.hallab@gmail.com
2 CONABIO, Mexico.
Project Website:
https://sciencedb.github.io/ Source Code:
https://github.com/ScienceDbLicense: GPL-3
The need for a feature complete flexible management suite capable of handling big distributed data In life sciences data often is diverse, interdisciplinary, and stored at different sites. The reproducibility crisis has long been recognized. In the US alone an annual loss of 28 billion dollars has been attributed to research funding spent on projects that yielded not reproducible results (doi.org/10.1371/journal.pbio.1002165). Identified causes are diverse but regularly comprise insufficient data management. Data should be findable, accessable, interoperable, and reusable (FAIR) and a concise data management plan is key to receiving funding and publication. The problem is that creating a suitable data management platform is a considerable software engineering task in itself, more so for diverse big data. And even more so if several distributed data warehouses shall be integrated. Efficient and reliable data management often has no ideal solution, because research groups need to do science not data warehouse software engineering.
Solution: Have software built your data administration warehouse for youWe present Cenzontle. A set of automatic software generators that create your custom data warehouse for you automatically. Define your data formats in standard JSON and get a fully functional warehouse with none to minimal coding effort. The warehouse comprises two interfaces. A graphical browser based one that follows Google’s material design standards and thus have both a professional look and intuitive handling. No documentation is needed to use it. Custom visualizations with Plotly can be integrated and help the scientist to explore the data and form hypotheses. A programmatic interface (API) allows data scientists to build exhaustive queries, execute them efficiently, and thus feed data directly into their analysis pipelines from any programming language. A luxurious IDE helps with query building and has a complete searchable documentation. Standard “CRUD” access functions are offered to all data models. Data can be created, also en mass by uploading tables. It can be read, searched, sorted, and separated into mouth sized subsets. Records can be updated and deleted, of course. Most importantly different data storages can be incorporated. Use any number of databases and servers you like. Relations between records even on different servers is included. Full security is guaranteed using standard authentication and role based authorization, verified on each standard access function.