Part 3: Creating Backups and Sharing DataLad Datasets
Creating a Backup
Command | Description |
---|---|
git init --bare ~/mydir |
Create a --bare repository called mydir in the home directory (on Linux/macOS) |
git init --bare %USERPROFILE%\mydir |
Create a --bare repository called mydir in the home directory (on Windows / CMD) |
git init --bare "$env:USERPROFILE\mydir" |
Create a --bare repository called mydir in the home directory (on Windows / PowerShell) |
datalad siblings |
List all siblings of the current dataset |
datalad sibings add --name new --url ~/mydir |
Add the repository at ~/mydir as a new sibling with the name new |
datalad push --to new |
Push the dataset content to the sibling named new |
Exercise 1 List all siblings
of the current dataset.
datalad siblings
Exercise 2 Initialize a --bare
git repository at a path outside of this dataset.
On Linux/macOS
git init --bare ~/penguins_backup
On Windows
-bare %USERPROFILE%\penguins_backup git init -
Exercise 3 add
a new sibling to the dataset using the path to the newly created git repository as the --url
. Then, list all siblings
to confirm it was added.
On Linux/macOS
datalad siblings add --name backup --url ~/penguins_backup
datalad siblings
On Windows
datalad siblings add --name backup --url %USERPROFILE%\penguins_backup
datalad siblings
Exercise 4 Push the dataset to the new sibling twice.
We need to push tiwce because the first push initializes the repository’s annex ID and the second (and each subsequent) push actually tranfer the annexed files.
datalad push --to backup
datalad push --to backup
Exercise 5 Move to a directory outside of this dataset and clone
the new sibling dataset.
On Linux/macOS
cd ..
datalad clone ~/penguins_backup
On Windows
datalad clone %USERPROFILE%\penguins_backup
BONUS: Sharing your Dataset online
Command | Description |
---|---|
ssh-keygen |
Generate a public and private authentication key pair |
datalad siblings |
List all siblings of the current dataset |
datalad sibings add --name gin --url git@gin.g-node.org:/user/repo.git |
Add the gin repository at /https://gin.g-node.org/user/repo as a new sibling with the name gin |
datalad push --to gin |
Push the dataset content to the sibling named gin |
Exercise 6 Use ssh-keygen
to generate a public and private key pair (you don’t have to use a passphrase). Note the location where the public key is stored, e.g. .ssh/id_ed25519.pub
. Open the .pub
file and copy the whole content — it should look something like this: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBOYcoRKZZLWA4FWECpW2K/fTOvuRYXBnBA6gcea2bFq <user>@<computer>
ssh-keygen
Exercise 7 Login in to your GIN account, go to your user settings and add the copied ssh key. Now datalad should be able to connect to your GIN account!
Exercise 8 Create a new repository on GIN, make sure to NOT initialize it with a README.
Exercise 9 add
a new sibling to the dataset using the --url
of the newly created gin repository and confirm the connection. Then, list all siblings
to confirm it was added.
For the repository in the image above, the command would look like this:
datalad siblings add --name gin --url git@gin.g-node.org:/adswa/DataLad-101.git
Exercise 10 Push the dataset to the new GIN sibling. Then, open the repository in your browser to confirm the content was pushed.
datalad push --to gin
Exercise 11 Move to a directory outside of this dataset and clone
the new GIN sibling.
For the repository in the image above, the command would look like this:
cd ..
datalad clone datalad clone https://gin.g-node.org/adswa/DataLad-101
Further reading
In the examples above, the annex was published together with the Git repository. However, this is a bit of a special case, and in many scenarios they can be moved separately. For an overview and examples of several different publishing scenarios, see the Beyond shared infrastructure chapter of the DataLad handbook.
Git-annex supports multiple options for publishing file contents; see the list of built-in special remotes. And for a very special case, in which the Git repository is placed by git-annex in a non-git-aware hosting, see git-remote annex.
Finally, Forgejo is gaining popularity as a self-hosted software forge. Forgejo-aneksajo is a soft fork of Forgejo which adds git-annex capability. See also Collaborative infrastructure for a lab: Forgejo on the DataLad blog .s