DataLad EuroScipy 2025

Welcome to the website for the DataLad1 tutorial at EuroScipy 2025! Here, you can access all exercise materials and slides. The rest of this page walks you through the setup process and resources for the tutorial.

Installation

DataLad requires Git and Python 3.8 or later. All other dependencies can be installed via pip. If you want to learn more about DataLad and its dependencies, check out the handbook.

Git

To install and configure git follow these steps:

  • Install Git: https://git-scm.com/downloads
  • Configure your git user name: git config --global user.name "user"
  • Configure your git user email: git config --global user.email "user@mail.com"

Python

We recommend to create a new virtual environment for this project with Python 3.8 or later using a tool like uv, pixi or conda. In your dedicated environment, you can install DataLad and its dependencies via pip 2:

pip install datalad datalad-next git-annex

Some of the exercises require you to run Python scripts — you’ll have to install their dependencies as well:

pip install pandas seaborn.

Other

We also recommend to set the following git config, which enables the full functionality of the DataLad-next extension (installed above) by allowing it to override the behavior of the core DataLad package.

  • git config --global --add datalad.extensions.load next

Data

To demonstrate DataLad’s data management capabilities, we’ll use a dataset hosted on GIN. You don’t have to download it upfront since cloning the dataset is part of the exercises. The data contains measurements from different penguin species and was originally published by Kristen B Gorman and colleagues 3.

Further Reading

If you want to learn more about DataLad, you can check out the handbook which contains lots of beginner-friendly and advanced tutorials as well as the technical documentation which contains detailed descriptions of all DataLad features. If you want to learn more about the underlying file system operations, the git-annex documentation is a useful resource as well.

Footnotes

  1. Halchenko et al., (2021). DataLad: distributed system for joint management of code, data, and their relationship. Journal of Open Source Software, 6(63), 3262, https://doi.org/10.21105/joss.03262.↩︎

  2. Git-annex is not a Python package – it’s written in Haskell, and plenty of installation methods are available – but it is also distributed as a Python wheels package, making for a very convenient way to install as a dependency of DataLad in a virtual environment.↩︎

  3. Gorman, K. B., Williams, T. D., & Fraser, W. R. (2014). Ecological sexual dimorphism and environmental variability within a community of Antarctic penguins (genus Pygoscelis): https://doi.org/10.1371/journal.pone.0090081.↩︎