The data scientist will be associated with the research group of Wolfgang Huber (www.huber.embl.org) and be responsible for the data and information flow within the whole collaboration, which also includes the labs of Michael Boutros (https://www.dkfz.de/en/signaling) at the German Cancer Research Centre (DKFZ), Jan Lohmann () at the University of Heidelberg and OIiver Stegle (https://www.dkfz.de/en/bioinformatik-genomik-systemgenetik) at EMBL and DKFZ. In this highly interdisciplinary setting, closely connected to a unique data generation effort, he/she will have the opportunity to develop innovative workflows of large-scale data analysis and scientific exploitation.
About EMBL
The European Molecular Biology Laboratory (EMBL) is an international scientific research organisation with sites in Heidelberg, Cambridge, Barcelona, Grenoble, Hamburg and Rome. The research group of Wolfgang Huber develops computational and statistical methods for emerging experimental technologies aimed at fundamental biological discovery and biomedical and biotechnological translation.
Responsibility of Data Scientist
The data manager position is analogous to that of a scientific project manager, however with a focus on data resources and flows. Your tasks include:
- Establish and maintain the project’s data repository comprising all relevant resources including PB-scale high-throughput sequencing and imaging raw data, metadata, processed data tables, analysis results, analysis software code, within EMBL and DKFZ private cloud computing and object storage infrastructure;
- Establish and maintain a data portal that includes records of data provenance, versioning and dependencies;
- Support the research team in locating and using relevant data resources, and in ingesting newly produced data;
- Automate data quality control and consistency assurance, perform manual curation where needed, and proactively co-design production of additional data;
- Participate in project management tasks, including collaborative data production, analysis and presentation. Help information flow and information availability (written and oral) across the project and the participating groups.
Depending on their interests, the data manager/data scientist will also have the opportunity to engage in scientific aspects of the project from method development, benchmarking and optimization, to biological discovery. He/she will be located both at the joint project headquarters and data production centre at the BioQuant building on the university campus and at the Huber lab at EMBL Heidelberg; flexible options for teleworking exist.
Requirements
The candidate should hold a PhD in biology or another field of science and have experience with large-scale (e.g., omics) data analyses.
Intermediate-level skills in R or Python programming are required, as well as the ability to acquire proficiency with modern big-data management systems (cloud, object storage) and with data analytical tasks.
The tasks will also require the programmatic construction of web interfaces, for instance using tools such as Rmarkdown and shiny.
The post holder will have good organizational skills and enjoy working in an independent and proactive manner in a highly motivated, collegial international scientific environment. Interests in computational algorithms, statistical analysis or biological discovery are a plus.
Apply Now: Data Scientist in EMBL