ODE/DataHub Canary Lambda Function Technical Challenges

Challenges and Concerns

  • Configuration:
    • How do we manage the configuration of this function? How do we manage the configuration the validation library?
    • Logic will be required for handling and referencing multiple configuration files in order to handle multiple message types
    • Logic will be required for matching configuration file to record
  • Sampling:
    • There is a trade off between sampling more s3 files and requiring more memory management logic
    • S3 query results can only be constrained with the prefix (filepath) and cannot be wildcarded, meaning it will require logic to query for specific filetypes
    • How do we ensure the sampling covers enough data to be confident in the results?
  • Result Reporting:
    • What should be the format of the report?
    • Do we want to send reports when no validation failures occurred?
    • How can we make it easy for the user to find the invalid data?
    • What is the response process to an invalid report?
  • Deployment and Integration:
    • How do we package the validation library?
    • How do we ensure the validation library is up to date?
    • Consider using Travis to automate deployment for basic CICD (credential management w/ CSN may be difficult)
    • How do we manage the automation? (Serverless CloudFormation templates are one of the best ways)