Background: Snakemake external Python scripts
Snakemake (https://snakemake.github.io/) is a popular Python-based workflow management system. Tasks in Snakemake are defined as rules in Snakefiles.
Each rule may define input files, output files and non-file parameters. Rule definitions in the Snakefiles can call an external command using the shell, directly contain code to be executed (e.g. Python code) or invoke an external script (in different programming languages, including Python), to which the rule parameters are passed.
The following is an example of calling an external Python script:
rule a:
input:
inputfile1="i1", inputfile2="i2"
output:
outputfile1="o1", outputfile2="o2"
params:
f="oo", bar=True
script:
script(path/to/example.py)
Inside the script example.py
, the input, output and params are
accessible from the attributes input
, output
and params
of the global variable snakemake
.
Background: CLI scripts based on docopt
Docopt (http://docopt.org/) is a command line interface description language with implementations in different languages, including Python.
The syntax of the script and the available options are described in a string,
which is passed to the docopt()
function. For example:
Usage:
test_script.py <inputfile1> <outputfile1> [INPUTFILE2] [options]
Options:
-2, --outputfile2 FNAME Output file 2 (default: stdout)
-f VALUE Option with a value
--bar Boolean option
The return value of the function is a dictionary, which contains the values
of options and positional arguments. In this example it would contain
the keys "<inputfile1>", "INPUTFILE2", "--outputfile1", "-x", "--bar"
.
SnaCLI
SnaCLI allows to easily combine the two approaches (docopt and snakemake) for providing arguments to a script. Therefore the script can be invoked both from the command line and from snakemake.
This is obtained by employing the snacli.args()
context manager.
Lists are passed to args()
describing how to obtain from the snakemake
rule the values of options and positional arguments described in the docopt
string.
For the examples given above:
with snacli.args(input=["<inputfile1>", "INPUTFILE2"],
output=["<outputfile1>", "--outputfile2"],
params=["--bar", "-f"]) as args:
# code which does something with args, as if they would
# come from docopt, e.g.
print(args["<inputfile>"])
The value yielded from the context manager is then always equivalent to the
value returned by docopt()
, both in the case that the script is invoked
from the command line and that it is called from a Snakefile. Thus, the
same code can be used in both cases, without modifications.
Mapping of docopt keys to snakemake names
Although Snakefiles support both named filenames (e.g. input: input1="file1"
) and unnamed filenames (e.g. input: "file1"
) in the input
and output keys, for SnaCLI to work, all filenames must be named.
The docopt keys contain formatting: UPCASE
or <angular>
for
positional arguments, an initial -
or --
for options.
These cannot be used directly as names in the input, output and params
of snakemake rules. Thus the snacli.args()
maps each docopt key value to a snakemake name value, by stripping the
initial -
or --
, removing the angular brackets and making upcase-only
keys lowcase. If further -
are present, they are replaced
by an underscore.
Examples of docopt keys and the corresponding snakemake name:
<inputfile1>
:inputfile1
INPUTFILE2
:inputfile2
--param1
:param1
-x
:x
--long-name
:long_name
Customized docopt key to snakemake name mapping
It is possible to manually override the mapping of docopt keys
to snakemake names by using, in the lists passed to snacli.args()
,
a 2-tuple (docopt_key, snakemake_name)
instead of a string.
Eg. to map --param-2
to param2
instead of param_2
:
# in the snakefile:
rule foo:
params: param2: "value"
script: "foo.py"
# in foo.py:
with snacli.args(params=["--param1", ("--param-2", "param2")]) as args:
print (args["--param-2"])
...
Passing options to docopt
By default, the docstring of the script (__doc__
) is passed to docopt()
as string. Another string can be used instead, by passing it to the keyword
argument doc
, e.g.
with snacli.args(doc=somestring, input= ...
Besides the lists of input, output and params, the snacli.args()
also
accepts keyword arguments which are passed to docopt()
, i.e. argv
,
help
, version
, options_first
– see the docopt documentation
for their meaning, e.g.
with snacli.args(version="1.0", input= ...
Using a script also as a non-interactive module
Sometimes a script shall be also used as non-interactive, i.e. imported as module in another Python module.
In case the script is imported as module, the value yielded by the
snacli.args()
context manager will be None
.
Thus to support inclusion as a module, an additional if
condition
must be added, e.g.:
with snacli.args(...) as args:
if args:
...
Reusing argument definitions in multiple scripts
If the same arguments are used in multiple scripts, they can be collected in a separate module and re-used.
For example, say that the optional arguments
--input1
and --param2
are used in multiple scripts.
Then the docstring of a script could be set to:
Usage:
foo [options]
Options:
--specific Specific option for this script only
{}
The definition string for the options and the mapping of docopt strings
to snakemake could be provided in a module bar
, which can be reused
also in other scripts, e.g.:
optstr='''
--input1 Input filename 1
--param2 VALUE Value of parameter 2
'''
optmap = {"input": ["--input1"], "params": ["--param2"]}
Then in the script, SnaCLI can be used as follows:
import bar
with snacli.args(bar.optmap, docvars=[bar.optstr]),
params = ["--specific"]) as args:
...
I.e. the common mapping is passed as positional argument to snacli.args
,
before any keyword argument, and the docvars keyword argument is used,
with the arguments which shall be passed to format()
called on the docopt string.
This can be generalized over multiple re-usable modules, e.g.:
...
Options:
{bar_opts}
{foo_opts}
...
import foo
import bar
with snacli.args(foo.optmap, bar.optmap,
docvars={"bar_opts": bar.optstr,
"foo_opts": foo.optstr},
params = ["--specific"]) as args:
...
Multiple entries for the same key
It is possible to override the name mapping for a key passed with a positional argument, using one of the following positional arguments, or in the keyword arguments.
In the example above, if foo.optmap contains {inputs = ["--specific"]}
, the later setting in the keyword argument params
would
be applied instead, i.e. specific
would be taken from params
and not
from inputs
.
Instead, using the same key in different lists of the same positional argument, or in different keyword arguments is an error, leading to unspecified behaviour.