ODE/DataHub Canary Lambda Function Technical Challenges
Challenges and Concerns
- Configuration:
- How do we manage the configuration of this function? How do we manage the configuration the validation library?
- Logic will be required for handling and referencing multiple configuration files in order to handle multiple message types
- Logic will be required for matching configuration file to record
- Sampling:
- There is a trade off between sampling more s3 files and requiring more memory management logic
- S3 query results can only be constrained with the prefix (filepath) and cannot be wildcarded, meaning it will require logic to query for specific filetypes
- How do we ensure the sampling covers enough data to be confident in the results?
- Result Reporting:
- What should be the format of the report?
- Do we want to send reports when no validation failures occurred?
- How can we make it easy for the user to find the invalid data?
- What is the response process to an invalid report?
- Deployment and Integration:
- How do we package the validation library?
- How do we ensure the validation library is up to date?
- Consider using Travis to automate deployment for basic CICD (credential management w/ CSN may be difficult)
- How do we manage the automation? (Serverless CloudFormation templates are one of the best ways)