Batch¶

pipeline.Batch Allows the configuration and running of pipelines on multiple sources.

Data sources can be extracted from either a CSV file, or a Data Package.

Arguments¶

source: Filepath to the lust of data sources to run the batch against.
source_type: ‘csv’ (CSV file) or ‘dp’ (Data Package file).
data_key: If source_type is ‘csv’, then this is the name of the header that indicates the data URL.
schema_key: If source_type is ‘csv’, then this is the name of the header that indicates the schema URL.
pipeline_options: The options keyword argument for the pipeline.Pipeline constructor.
post_task: Any callable that takes the batch instance as its only argument. Runs after the batch processing is complete.
pipeline_post_task: Any callable that takes a pipeline instance as its only argument. Runs on completion of each pipeline.

For an example of the batch processor at work, including use of post_task and pipeline_post_task, see spd-admin.