Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

This document outlines the results obtained by running the pseudoqueries specified in the ODE Metadata Bug Impact, Resolution, and Prevention document.

...

Query NumberQuery NamePseudoqueryResultNotes
1totalRecordCountSELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix3,601,06728,730 records found in these files DID NOT satisfy the query constraint.
2timBroadcastRecordCountSELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.recordGeneratedBy == TMC 95,0653,534,732 did not satisfy constraint
3goodOtherRecordCountSELECT COUNT(*) WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload AND metadata.recordGeneratedBy != TMC03,629,797 did not satisfy constraint
4badBsmRecordCountSELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload3,506,002123,795 did not satisfy constraint
5badOtherRecordCountSELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload03,629,797 did not satisfy constraint
6removableRecordCountremovableRecordCount = badBsmRecordCount + badOtherRecordCount3,506,002 + 0 = 3,506,002

7

totalRecordCount (correctness)totalRecordCount equal to (==) (timBroadcastRecordCount  + removableRecordCount + goodOtherRecordCount)

totalRecordCount = 3,601,067

timBroadcastRecordCount  + removableRecordCount + goodOtherRecordCount = 95,065 + 3,506,002 + 0 = 3,601,067

3,601,067 ==  3,601,067

(Correct)
8earliestGeneratedAt

SELECT MIN(metadata.recordGeneratedAt) WHERE 

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload) OR

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload)

1970-01-01 00:00:00From null data.
9latestGeneratedAt

SELECT MAX(metadata.recordGeneratedAt) WHERE 

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload) OR

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload)

2019-04-11 15:45:58.766000
10 invalidRecordCountinvalidRecordCount = removableRecordCount3,506,002For clarification only.
11invalidS3FileCount

SELECT COUNT(s3-filename) WHERE 

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload) OR

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload)

1,488 files
12invalidS3Files

SELECT s3FileName WHERE 

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload) OR

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload)

invalid_s3_file_list_fixed.txt


13listOfLogFilesBefore

SELECT metadata.logFilename WHERE 

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload) OR

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload)

4,754 files


invalid_log_file_list.txt


14validRecordCountSELECT COUNT(*) WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.recordGeneratedBy != TMC AND metadata.recordGeneratedAt >= earliestGeneratedAt AND metadata.recordGeneratedAt <= latestGeneratedAt [AND metadata.serialId.bundleSize > 1]2,012,3821,617,415 records did not satisfy query constraints
15validS3FileCountSELECT COUNT(s3-filename) WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.recordGeneratedBy != TMC AND metadata.recordGeneratedAt >= earliestGeneratedAt AND metadata.recordGeneratedAt <= latestGeneratedAt [AND metadata.serialId.bundleSize > 1]920
16validS3FilesSELECT s3FileName WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.recordGeneratedBy != TMC [AND metadata.serialId.bundleSize > 1]valid_s3_file_list_fixed.txt
17listOfLogFilesAfterSELECT metadata.logFilename WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.recordGeneratedBy != TMC [AND metadata.serialId.bundleSize > 1]

3,087 files

valid_log_file_list.txt


...

Step 2: Delete the bad data (dry run on local data)

For each file listed in the invalid_s3_file_list_fixed.txt above, the file in the local clone of the S3 bucket will be deleted.

Step 3: Verify cleanup (dry run on local data)

The queries in the Pseudo-queries for Validating the Clean Up section of the  ODE Metadata Bug Impact, Resolution, and Prevention document will be run locally and asserted that actual outputs match expected outputs.

...

Step 4: Delete the bad data (real data in on S3 Sandbox)

For each file listed in the invalid_s3_file_list_fixed.txt above, the file in the S3 bucket will be deleted.

Step 5: Verify cleanup (real data in on S3 Sandbox)

The queries in the Pseudo-queries for Validating the Clean Up section of the  ODE Metadata Bug Impact, Resolution, and Prevention document will be run against the S3 bucket and asserted that actual outputs match expected outputs.

...