pipeline.Batch Allows the configuration and running of pipelines on multiple sources.
Data sources can be extracted from either a CSV file, or a Data Package.
- source: Filepath to the lust of data sources to run the batch against.
- source_type: ‘csv’ (CSV file) or ‘dp’ (Data Package file).
- data_key: If source_type is ‘csv’, then this is the name of the header that indicates the data URL.
- schema_key: If source_type is ‘csv’, then this is the name of the header that indicates the schema URL.
- pipeline_options: The options keyword argument for the pipeline.Pipeline constructor.
- post_task: Any callable that takes the batch instance as its only argument. Runs after the batch processing is complete.
- pipeline_post_task: Any callable that takes a pipeline instance as its only argument. Runs on completion of each pipeline.
For an example of the batch processor at work, including use of post_task and pipeline_post_task, see spd-admin.