SDC Metadata Bug Analysis and Recovery Report
Overview
This document outlines the results obtained by running the pseudoqueries specified in the ODE Metadata Bug Impact, Resolution, and Prevention document.
WYDOT's production ODE instance was updated to the new version with the metadata bug fix (release 1.0.6) on April 11, 2019 at 6:30PM Mountain Time which corresponds to a timezone-naive UTC timestamp of April 12, 2019, 12:30AM. For the purposes of this document, dateOfBugFix is considered to be the end of the day on 04/11/2019 (midnight 4/12, in UTC time, i.e. 2019-04-12T00:00:00.000Z ISO format).
Note that these timestamps are inclusive; all data on December 3rd and April 11th is analyzed.
Timeline
High-Level Information
Data Store:
Data Provider Source Analyzed: wydot
Findings
Query Number | Query Name | Pseudoquery | Result | Notes |
---|---|---|---|---|
1 | totalRecordCount |
Executed: SELECT COUNT(*) from wydot_bsm_core SELECT COUNT(*) from wydot_tim_metadata | BSM: 6,458,749 TIM: 416,763 | Pay attention to dates in query: has to be "less than", not "less or equal to" as these are string comparisons with times appended (would exclude 04/11/2019 if done "less than or equal" 04/11/2019) |
2 | timBroadcastRecordCount |
Executed: SELECT COUNT(*) from wydot_tim_metadata | 74,843 | |
3 | goodOtherRecordCount | SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload AND metadata.recordGeneratedBy != TMC Executed: SELECT COUNT(*) from wydot_bsm_core SELECT COUNT(*) from wydot_tim_metadata | BSM: 0 TIM: 226,457 | |
4 | badBsmRecordCount | SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload Executed: SELECT COUNT(*) from wydot_bsm_core | 6,458,749 | |
5 | badOtherRecordCount | SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload Executed: SELECT COUNT(*) from wydot_tim_metadata | BSM: 0 TIM: 115,463 | BSM covered by #3 |
6 | removableRecordCount | removableRecordCount = badBsmRecordCount + badOtherRecordCount | BSM: 6,458,749 TIM: 115,463 | SDC/CVP has BSM and TIM in separate tables |
7 | totalRecordCount (correctness) | totalRecordCount equal to (==) (timBroadcastRecordCount + removableRecordCount + goodOtherRecordCount)
| EQUAL | Left side: 416,763 Right side: 416,763 |
8 | earliestGeneratedAt |
| ||
9 | latestGeneratedAt |
| ||
10 | invalidRecordCount | invalidRecordCount = removableRecordCount | ||
11 | invalidS3FileCount |
| ||
12 | invalidS3Files |
| ||
13 | listOfLogFilesBefore |
| ||
14 | validRecordCount | SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 2/13/2019 AND AND metadata.recordGeneratedBy != TMC AND
| ||
15 | validS3FileCount | SELECT COUNT(s3-filename) WHERE metadata.odeReceivedAt >= 2/13/2019 AND AND metadata.recordGeneratedBy != TMC AND [ | ||
16 | validS3Files | SELECT s3FileName WHERE AND metadata.recordGeneratedBy != TMC [AND metadata.serialId.bundleSize > 1 ] | ||
17 | listOfLogFilesAfter | SELECT metadata.logFilename WHERE AND metadata.recordGeneratedBy != TMC [ |
Data Cleanup Actions
The data cleanup process has three main steps: identifying the bad data, deleting it, and confirming success. The above queries have accomplished the first step of identifying the bad data. In this section we will discuss the process of safely and correctly deleting the bad data.
Step 1: Identify scope of bad data
This has been completed by the queries above and in particular the list of files generated by query 12.
Step 2: Delete the bad data
TBD
Step 3: Verify cleanup
TBD