...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Overview
This document outlines the results obtained by running the pseudoqueries specified in the ODE Metadata Bug Impact, Resolution, and Prevention document.
...
Note that these timestamps are inclusive; all data on December 3rd and April 11th is analyzed.
Timeline
High-Level Information
Data Store:
Data Provider Source Analyzed: wydot
Query Number | Query Name | Pseudoquery | Result | Notes |
---|
1 | totalRecordCount | SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix
Executed: SELECT COUNT(*) from wydot_bsm_core WHERE metadataodeReceivedAt >= '2018-12-03' AND metadataodeReceivedAt < '2019-04-12'; SELECT COUNT(*) from wydot_tim_metadata WHERE metadataodeReceivedAt >= '2018-12-03' AND metadataodeReceivedAt < '2019-04-12'; | BSM: 6,458,749 TIM: 416,763 | Pay attention to dates in query: has to be "less than", not "less or equal to" as these are string comparisons with times appended (would exclude 04/11/2019 if done "less than or equal" 04/11/2019) |
2 | timBroadcastRecordCount | SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.recordGeneratedBy == TMC
Executed: SELECT COUNT(*) from wydot_tim_metadata WHERE metadataodeReceivedAt >= '2018-12-03' AND metadataodeReceivedAt < '2019-04-12' AND metadatarecordGeneratedBy = 'TMC';
|
74,843 |
|
3 | goodOtherRecordCount | SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload AND metadata.recordGeneratedBy != TMC
Executed: SELECT COUNT(*) from wydot_bsm_core WHERE metadataodeReceivedAt >= '2019-02-13' AND metadataodeReceivedAt < '2019-04-12' AND metadatapayloadType != 'us.dot.its.jpo.ode.model.OdeBsmPayload' AND metadatarecordGeneratedBy != 'TMC'; SELECT COUNT(*) from wydot_tim_metadata WHERE metadataodeReceivedAt >= '2019-02-13' AND metadataodeReceivedAt < '2019-04-12' AND metadatapayloadType != 'us.dot.its.jpo.ode.model.OdeBsmPayload' AND metadatarecordGeneratedBy != 'TMC';
| BSM: 0 TIM: 226,457 |
|
4 | badBsmRecordCount | SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload
Executed: SELECT COUNT(*) from wydot_bsm_core WHERE metadataodeReceivedAt >= '2018-12-03' AND metadataodeReceivedAt < '2019-04-12' AND metadatapayloadType = 'us.dot.its.jpo.ode.model.OdeBsmPayload';
| 6,458,749 |
|
5 | badOtherRecordCount | SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload
Executed: SELECT COUNT(*) from wydot_tim_metadata WHERE metadataodeReceivedAt >= '2018-12-03' AND metadataodeReceivedAt < '2019-02-13' AND metadatapayloadType != 'us.dot.its.jpo.ode.model.OdeBsmPayload' AND metadatarecordGeneratedBy != 'TMC';
| BSM: 0 TIM: 115,463 | BSM covered by #3
|
6 | removableRecordCount | removableRecordCount = badBsmRecordCount + badOtherRecordCount
| BSM: 6,458,749 TIM: 115,463 | SDC/CVP has BSM and TIM in separate tables |
7 | totalRecordCount (correctness) | totalRecordCount equal to (==) (timBroadcastRecordCount + removableRecordCount + goodOtherRecordCount)
6,458,749 + 416,763 ==? 74,843 + 6,458,749 + 115,463 + 226,457
| EQUAL | Left side: 416,763 Right side: 416,763 |
8 | earliestGeneratedAt | SELECT MIN(metadata.recordGeneratedAt) WHERE
(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND
metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload
) OR
(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload
)
|
|
|
9 | latestGeneratedAt | SELECT MAX(metadata.recordGeneratedAt) WHERE
(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND
metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload
) OR
(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload
)
|
|
|
10 | invalidRecordCount | invalidRecordCount = removableRecordCount |
|
|
11 | invalidS3FileCount | SELECT COUNT(s3-filename) WHERE
(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND
metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload) OR
(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload
)
|
|
|
12 | invalidS3Files | SELECT s3FileName WHERE
(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND
metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload) OR
(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload
)
|
|
|
13 | listOfLogFilesBefore | SELECT metadata.logFilename WHERE
(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND
metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload) OR
(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload
)
|
|
|
14 | validRecordCount | SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix
AND metadata.recordGeneratedBy != TMC AND metadata.recordGeneratedAt >= earliestGeneratedAt AND metadata.recordGeneratedAt <= latestGeneratedAt [AND metadata.serialId.bundleSize > 1 ]
|
|
|
15 | validS3FileCount | SELECT COUNT(s3-filename) WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.recordGeneratedBy != TMC AND metadata.recordGeneratedAt >= earliestGeneratedAt AND metadata.recordGeneratedAt <= latestGeneratedAt
[AND metadata.serialId.bundleSize > 1 ] |
|
|
16 | validS3Files | SELECT s3FileName WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix
AND metadata.recordGeneratedBy != TMC [AND metadata.serialId.bundleSize > 1 ] |
|
|
17 | listOfLogFilesAfter | SELECT metadata.logFilename WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix
AND metadata.recordGeneratedBy != TMC [AND metadata.serialId.bundleSize > 1 ] |
|
|
The data cleanup process has three main steps: identifying the bad data, deleting it, and confirming success. The above queries have accomplished the first step of identifying the bad data. In this section we will discuss the process of safely and correctly deleting the bad data.
...