I've started using scons to manage my data analysis workflows. A lot of the work is done by python scripts that read in a bunch of data from one more sources, crunch it, and spit out a new table. So there is a dependency not only on the input data but on the script and any modules it imports. You can get scons to run a script using a simple Command builder. For instance, you have some "script.py" that expects a command line like "script.py input1 input2 output".


env.Command('output.tbl', ['script.py', 'input1.tbl', 'input2.tbl'], 'python $SOURCES $TARGETS')
 


This works fairly well, but if script.py imports another module (e.g. with functions common to a bunch of scripts) you have to manually specify that dependency. But with a little extra code you can get scons to automatically scan python scripts for import statements and include any imports from the local directory as dependencies. I also like to use an Emitter that will move the script to the front of the list of dependencies, so I don't have to worry about what order I specify them in.


import os,re

import1_re = re.compile(r'^from\s+(\S+)\s+import',re.M)
import2_re = re.compile(r'import\s+(.+)$',re.M)

def pyfile_scan(node, env, path):
    imports = []
    search_path = os.path.join(*os.path.split(str(node))[:-1])
    text = node.get_contents()
    for item in (import1_re.findall(text) + import2_re.findall(text)):
        for x in item.split(','):
            test_file = x.strip() + '.py'
            if os.path.exists(os.path.join(search_path, test_file)): imports.append(test_file)
    return imports

def py_targets(target,source,env):
    """ pulls out the python script from the source list and generates a call to the script """
    out = []
    for x in source:
        if str(x).endswith('.py'):
            out.insert(0,x)
        else:
            out.append(x)
    return target,out

pybuild = Builder(action='python $SOURCES $TARGETS $SCRIPTOPTS',
                  emitter=py_targets)
pyscan  = Scanner(function = pyfile_scan,
                  skeys = ['.py'])

 


Now you can use the custom builder as follows, and scons will recognize any modules script.py depends on.


# add the python builder to the environment
env = Environment()
env.Append(BUILDERS = {'PyBuild' : pybuild})
env.Append(SCANNERS = pyscan)

env.PyBuild('output.tbl',['script.py','intput1.tbl','input2.tbl'])