Reports¶
The results of any run over data, either by a standalone processor or a pipeline, are written to a report.
Each report is an instance of a tellme.Report, which is a small library we also developed (See the TellMe library for more information on its API).
Reports can then be generated in a variety of output formats supported by TellMe.
Pipeline reports¶
In a pipeline, the pipeline.Pipeline each processor writes report results to the pipeline’s report instance.
After processing of the data is complete, additional calculations are performed for a summary.
Finally, the report is generated to an output format (a Python dict in this case) and returned.
From a top-level view, a pipeline report will have the following structure:
{
'success': True,
'meta': {'name': 'Pipeline'},
'summary': {#summary},
'results': [...]
}
All the interesting stuff is happening in the results array and the sumamry object.
See below for a description of each object in the results array, and likewise a description of the summary object.
Standalone processor reports¶
Standalone processors (for example, the built-in StructureProcessor) have a report object almost identical to that of a pipeline report, except they do not have a summary object.
Report result schema¶
{
'result_type': '# type of this result',
'result_category': '# category of this result (row/header)',
'result_level': '# level of this result (info/warning/error)',
'result_message': '# message of this result',
'result_context': [# a list of the values of the row that the result was generated from]
'row_index': '# index of the row',
'row_name': # 'headers' or valud of id or _id if present, or empty
'column_index': '# index of the column (can be None)',
'column_name': '# name of the column (can be '')',
}
Report summary schema¶
{
'message': '# a summary message',
'total_rows': # int,
'total_columns': # int,
'bad_rows': # int,
'bad_columns': # int,
'columns': [# list of dicts with position, name, type conformance (%) per column]
}