Dataset tools#
shelephant#
Available commands:
command |
description |
---|---|
init |
Initialize a new dataset. |
add |
Add storage location to dataset. |
remove |
Remove storage location from dataset. |
rename |
Rename storage location. |
update |
Update dataset. |
status |
Show status of files. |
info |
Show global information about dataset. |
diff |
Show difference between two storage locations. |
lock |
Lock as storage location. |
send_storage |
Send f”/shelephant/storage/{name}.yaml” to storage location. |
get_storage |
Get f”/shelephant/storage/{name}.yaml” from storage location. |
cp |
Copy files from one location to another. |
mv |
Move files from one location to another (both local). |
rm |
Remove files from one location. |
pwd |
Print equivalent directory in the storage location. |
git |
Run git command on the database directory ( |
gitignore |
Add all symbolic links at |
usage: shelephant [-h] [--version]
{status,info,update,cp,mv,rm,pwd,diff,gitignore,send_storage,get_storage,add,remove,rename,lock,git,init}
Positional Arguments#
- command
Possible choices: status, info, update, cp, mv, rm, pwd, diff, gitignore, send_storage, get_storage, add, remove, rename, lock, git, init
Command to run.
Named Arguments#
- --version
show program’s version number and exit
shelephant init#
Initialize a shelephant database by creating a directory .shelephant
with an ‘empty’ database. Use shelephant add
to add storage locations.
usage: shelephant init [-h] [--version]
Named Arguments#
- --version
show program’s version number and exit
shelephant add#
Add a storage location to the database.
The database in .shelephant
is updated as follows:
The
name
is added to.shelephant/storage.yaml
.A file
.shelephant/storage/<name>.yaml
is created with the search settings and the present state of the storage location.A symlink
.shelephant/data/<name>
is created to the storage location. (if--ssh
is given, the symlink points to a dead link).
Note
A special case is
shelephant add here --rglob '*.h5'
which helps to investigate your database directory.
Note that here
is a reserved name and that you should not specify the root.
usage: shelephant add [-h] [--ssh str] [--mount Path] [--prefix Path]
[--rglob str] [--glob str] [--exec str] [--skip str]
[--shallow] [-q] [--version]
str [Path]
Positional Arguments#
- name
Name of the storage location.
- root
Path to the storage location.
Named Arguments#
- --ssh
SSH host (e.g. user@host).
- --mount
Optional mount location for SSH host.
- --prefix
Add prefix to all files.
- --rglob
Search pattern for
Path(root).rglob(...)
.Default: []
- --glob
Search pattern for
Path(root).glob(...)
.Default: []
- --exec
Command to run from
root
.Default: []
- --skip
Pattern to skip (Python regex).
Default: []
- --shallow
Do not compute checksums.
Default: False
- -q, --quiet
Do not print progress.
Default: False
- --version
show program’s version number and exit
shelephant remove#
Remove a storage location to the database.
The database in .shelephant
is updated as follows:
The
name
is removed from.shelephant/storage.yaml
..shelephant/storage/<name>.yaml
is removed.The symlink
.shelephant/data/<name>
is removed.
usage: shelephant remove [-h] [--version] str
Positional Arguments#
- name
Name of the storage location.
Named Arguments#
- --version
show program’s version number and exit
shelephant update#
Update the database. This function always update the symbolic links, and optionally updates the available files and checksums of (a) )storage location(s).
usage: shelephant update [-h] [--version] [--base-link] [--clean] [-s]
[--verbose] [--chunk <lambda>] [--force] [-q]
[str] [Path ...]
Positional Arguments#
- name
Update storage location(s).
- path
Update only specific paths on location.
Named Arguments#
- --version
show program’s version number and exit
- --base-link
Update link .shelephant/data/{name} based on .shelephant/storage/{name}.yaml.
Default: False
- --clean
Clean database entry with symlinks.
Default: False
- -s, --shallow
Do not compute checksums.
Default: False
- --verbose
Verbose commands.
Default: False
- --chunk
Chunk size for computing checksums (bytes).
Default: 30000000000.0
- --force
Force update of path(s).
Default: False
- -q, --quiet
Do not print progress.
Default: False
shelephant status#
Status of the storage locations.
Tip
Use --list
or --print0
to get a list of files instead of a table.
Use for example as:
shelephant cp source dest $(shelephant status --copies 1 --list)
or to copy in batches of 100:
shelephant status --copies 1 --print0 | xargs -n 100 -0 shelephant cp source dest $@
The latter you can also do with the --nout
(-n
) option of shelephant status
:
shelephant cp source dest $(shelephant status --copies 1 --list -n 100)
usage: shelephant status [-h] [--version] [--min-copies int] [--copies int]
[--ne] [--na] [--unknown] [--list] [--print0]
[-n int] [--table str] [--in-use str] [--not-on str]
[--on str] [-b]
[str ...]
Positional Arguments#
- path
Filter to paths (either one directory, or multiple files).
Named Arguments#
- --version
show program’s version number and exit
- --min-copies
Show files with minimal number of copies.
- --copies
Show files with specific number of copies.
- --ne
Show files with unequal copies.
Default: False
- --na
Show files unavailable somewhere.
Default: False
- --unknown
Show files with unknown sha256.
Default: False
- --list
Print list of files (no table).
Default: False
- --print0
Print list of files (no table).
Default: False
- -n, --nout
Maximal number of output arguments.
- --table
Select print style.
Default: “SINGLE_BORDER”
- --in-use
Select storage location in use (use ‘none’ for unavailable).
- --not-on
List files that are not on a storage location.
- --on
Limit to files available on storage location.
Default: []
- -b, --relative-to-base
Show path relative to base directory of dataset.
Default: False
shelephant info#
Show global information about dataset.
usage: shelephant info [-h] [--version] [--cachedir] [--basedir] [str ...]
Positional Arguments#
- location
Name of the storage location(s).
Named Arguments#
- --version
show program’s version number and exit
- --cachedir
Print cache-dir and quit.
Default: False
- --basedir
Print basedir (containing ‘.shelephant’) and quit.
Default: False
shelephant lock#
Lock as storage location.
usage: shelephant lock [-h] [--version] str
Positional Arguments#
- name
Name of the storage location.
Named Arguments#
- --version
show program’s version number and exit
shelephant cp#
Copy files between storage locations and update the database. After copying, the checksums on the destination are recomputed and the database updated. Use:
-s
,--shallow
to skip the checksum computation (store only path/size/mtime).
-x
,--no-update
to skip the database update all together.
Note
The paths that you specify are reduced to only the paths known to exist on the source.
If you know that the paths exist, but they are not part of the database
(or it is outdated), use -e
, --exists
to avoid the filter.
Tip
To make a clone call shelephant cp source destination .
from the dataset’s root.
Note
The copied files are added to the database of the destination.
There is no check that this fits dump
and search
settings.
usage: shelephant cp [-h] [--version] [--colors str] [-f] [-q] [-n] [-x] [-e]
[-s] [--mode str]
str str Path [Path ...]
Positional Arguments#
- source
name of the source
- destination
name of the destination
- path
path(s) to copy
Named Arguments#
- --version
show program’s version number and exit
- --colors
color scheme [none, dark]
Default: “dark”
- -f, --force
overwrite without prompt
Default: False
- -q, --quiet
do not print progress
Default: False
- -n, --dry-run
print copy-plan and exit
Default: False
- -x, --no-update
no database update
Default: False
- -e, --exists
all paths exists on source
Default: False
- -s, --shallow
do not compute checksums
Default: False
- --mode
use ‘sha256’, ‘rsync’, and/or ‘basic’ to compare files
Default: “sha256,basic”
shelephant mv#
Move files from one storage location to another.
Note
The copied files are added to the database of the destination.
There is no check that this fits dump
and search
settings.
usage: shelephant mv [-h] [--version] [--colors str] [-f] [-q] [-n]
str str Path [Path ...]
Positional Arguments#
- source
name of the source.
- destination
name of the destination.
- path
path(s) to copy.
Named Arguments#
- --version
show program’s version number and exit
- --colors
Color scheme [none, dark].
Default: “dark”
- -f, --force
Overwrite without prompt.
Default: False
- -q, --quiet
Do not print progress.
Default: False
- -n, --dry-run
Print copy-plan and exit.
Default: False
shelephant rm#
Remove files from a storage location.
Warning
This removes the actual data. The link is also if there is no alternative source left.
usage: shelephant rm [-h] [--version] [-f] [-q] [-n] str Path [Path ...]
Positional Arguments#
- source
name of the source.
- path
path(s) to remove.
Named Arguments#
- --version
show program’s version number and exit
- -f, --force
Overwrite without prompt.
Default: False
- -q, --quiet
Do not print progress.
Default: False
- -n, --dry-run
Print copy-plan and exit.
Default: False
shelephant pwd#
Change the current working directory to a storage location.
usage: shelephant pwd [-h] [--version] [--base] [--abspath] str
Positional Arguments#
- source
name of the source.
Named Arguments#
- --version
show program’s version number and exit
- --base
Print the base directory.
Default: False
- --abspath
Print absolute path.
Default: False
shelephant gitignore#
Add all symbolic links managed to the dataset’s root .gitignore
.
Note
This is the /path/to/dataset/.gitignore
file, not
/path/to/dataset/.shelephant/.gitignore
.
usage: shelephant gitignore [-h] [--version]
Named Arguments#
- --version
show program’s version number and exit