Overview
This document outlines the results obtained by running the pseudoqueries specified in the ODE Metadata Bug Impact, Resolution, and Prevention document.
...
Query Number | Query Name | Pseudoquery | Result | Notes |
---|---|---|---|---|
1 | totalRecordCount | SELECT COUNT(*) WHERE AND
| 3,601,067 | 28,730 records found in these files DID NOT satisfy the query constraint. |
2 | timBroadcastRecordCount | SELECT COUNT(*) WHERE | 95,065 | 3,534,732 did not satisfy constraint |
3 | goodOtherRecordCount | SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload AND metadata.recordGeneratedBy != TMC | 0 | 3,629,797 did not satisfy constraint |
4 | badBsmRecordCount | SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload | 3,506,002 | 123,795 did not satisfy constraint |
5 | badOtherRecordCount | SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload | 0 | 3,629,797 did not satisfy constraint |
6 | removableRecordCount | removableRecordCount = badBsmRecordCount + badOtherRecordCount | 3,506,002 + 0 = 3,506,002 | |
7 | totalRecordCount (correctness) | totalRecordCount equal to (==) (timBroadcastRecordCount + removableRecordCount + goodOtherRecordCount) | totalRecordCount = 3,601,067 timBroadcastRecordCount + removableRecordCount + goodOtherRecordCount = 95,065 + 3,506,002 + 0 = 3,601,067 | (Correct) |
8 | earliestGeneratedAt |
| 1970-01-01 00:00:00 | From null data. |
9 | latestGeneratedAt |
| 2019-04-11 15:45:58.766000 | |
10 | invalidRecordCount | invalidRecordCount = removableRecordCount | 3,506,002 | For clarification only. |
11 | invalidS3FileCount |
| 1,488 files | |
12 | invalidS3Files |
| ||
13 | listOfLogFilesBefore |
| 4,754 files | |
14 | validRecordCount | SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 2/13/2019 AND AND metadata.recordGeneratedBy != TMC AND
| 2,012,382 | 1,617,415 records did not satisfy query constraints |
15 | validS3FileCount | SELECT COUNT(s3-filename) WHERE metadata.odeReceivedAt >= 2/13/2019 AND AND metadata.recordGeneratedBy != TMC AND [ | 920 | |
16 | validS3Files | SELECT s3FileName WHERE AND metadata.recordGeneratedBy != TMC [AND metadata.serialId.bundleSize > 1 ] | valid_s3_file_list_fixed.txt | |
17 | listOfLogFilesAfter | SELECT metadata.logFilename WHERE AND metadata.recordGeneratedBy != TMC [ | 3,087 files |
...
Step 2: Delete the bad data (dry run on local data)
For each file listed in the invalid_s3_file_list_fixed.txt
above, the file in the local clone of the S3 bucket will be deleted.
Step 3: Verify cleanup (dry run on local data)
The queries in the Pseudo-queries for Validating the Clean Up section of the ODE Metadata Bug Impact, Resolution, and Prevention document will be run locally and asserted that actual outputs match expected outputs.
...
Step 4: Delete the bad data (real data in on S3 Sandbox)
For each file listed in the invalid_s3_file_list_fixed.txt
above, the file in the S3 bucket will be deleted.
Step 5: Verify cleanup (real data in on S3 Sandbox)
The queries in the Pseudo-queries for Validating the Clean Up section of the ODE Metadata Bug Impact, Resolution, and Prevention document will be run against the S3 bucket and asserted that actual outputs match expected outputs.
...