SDC Metadata Bug Analysis and Recovery Report

Overview

This document outlines the results obtained by running the pseudoqueries specified in the ODE Metadata Bug Impact, Resolution, and Prevention document.

WYDOT's production ODE instance was updated to the new version with the metadata bug fix (release 1.0.6) on April 11, 2019 at 6:30PM Mountain Time which corresponds to a timezone-naive UTC timestamp of April 12, 2019, 12:30AM. For the purposes of this document, dateOfBugFix is considered to be the end of the day on 04/11/2019 (midnight 4/12, in UTC time, i.e. 2019-04-12T00:00:00.000Z ISO format).


Note that these timestamps are inclusive; all data on December 3rd and April 11th is analyzed.

Timeline


High-Level Information

Data Store:

Data Provider Source Analyzed: wydot

Findings

Query NumberQuery NamePseudoqueryResultNotes
1totalRecordCount

SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix


Executed:

SELECT COUNT(*) from wydot_bsm_core
WHERE metadataodeReceivedAt >= '2018-12-03' AND metadataodeReceivedAt < '2019-04-12';

SELECT COUNT(*) from wydot_tim_metadata
WHERE metadataodeReceivedAt >= '2018-12-03' AND metadataodeReceivedAt < '2019-04-12';

BSM: 6,458,749

TIM: 416,763

Pay attention to dates in query: has to be "less than", not "less or equal to" as these are string comparisons with times appended (would exclude 04/11/2019 if done "less than or equal" 04/11/2019)
2timBroadcastRecordCount

SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.recordGeneratedBy == TMC 


Executed:

SELECT COUNT(*) from wydot_tim_metadata
WHERE metadataodeReceivedAt >= '2018-12-03' AND metadataodeReceivedAt < '2019-04-12'
AND metadatarecordGeneratedBy = 'TMC';



74,843


3goodOtherRecordCount

SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload AND metadata.recordGeneratedBy != TMC


Executed:

SELECT COUNT(*) from wydot_bsm_core
WHERE metadataodeReceivedAt >= '2019-02-13' AND metadataodeReceivedAt < '2019-04-12'
AND metadatapayloadType != 'us.dot.its.jpo.ode.model.OdeBsmPayload' AND metadatarecordGeneratedBy != 'TMC';

SELECT COUNT(*) from wydot_tim_metadata
WHERE metadataodeReceivedAt >= '2019-02-13' AND metadataodeReceivedAt < '2019-04-12'
AND metadatapayloadType != 'us.dot.its.jpo.ode.model.OdeBsmPayload' AND metadatarecordGeneratedBy != 'TMC';


BSM: 0

TIM: 226,457


4badBsmRecordCount

SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload


Executed:

SELECT COUNT(*) from wydot_bsm_core
WHERE metadataodeReceivedAt >= '2018-12-03' AND metadataodeReceivedAt < '2019-04-12'
AND metadatapayloadType = 'us.dot.its.jpo.ode.model.OdeBsmPayload';


6,458,749
5badOtherRecordCount

SELECT COUNT(*) WHERE metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload


Executed:

SELECT COUNT(*) from wydot_tim_metadata
WHERE metadataodeReceivedAt >= '2018-12-03' AND metadataodeReceivedAt < '2019-02-13'
AND metadatapayloadType != 'us.dot.its.jpo.ode.model.OdeBsmPayload' AND metadatarecordGeneratedBy != 'TMC';


BSM: 0

TIM: 115,463

BSM covered by #3



6removableRecordCount

removableRecordCount = badBsmRecordCount + badOtherRecordCount


BSM: 6,458,749

TIM: 115,463

SDC/CVP has BSM and TIM in separate tables

7

totalRecordCount (correctness)

totalRecordCount equal to (==) (timBroadcastRecordCount  + removableRecordCount + goodOtherRecordCount)


6,458,749 + 416,763 ==?  74,843 + 6,458,749 + 115,463 + 226,457


EQUAL

Left side: 416,763

Right side: 416,763

8earliestGeneratedAt

SELECT MIN(metadata.recordGeneratedAt) WHERE 

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload) OR

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload)



9latestGeneratedAt

SELECT MAX(metadata.recordGeneratedAt) WHERE 

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload) OR

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload)



10 invalidRecordCountinvalidRecordCount = removableRecordCount

11invalidS3FileCount

SELECT COUNT(s3-filename) WHERE 

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload) OR

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload)



12invalidS3Files

SELECT s3FileName WHERE 

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload) OR

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload)



13listOfLogFilesBefore

SELECT metadata.logFilename WHERE 

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt <= 2/12/2019 AND metadata.recordGeneratedBy != TMC AND metadata.payloadType != us.dot.its.jpo.ode.model.OdeBsmPayload) OR

(metadata.odeReceivedAt >= 12/3/2018 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.payloadType == us.dot.its.jpo.ode.model.OdeBsmPayload)





14validRecordCountSELECT COUNT(*) WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.recordGeneratedBy != TMC AND metadata.recordGeneratedAt >= earliestGeneratedAt AND metadata.recordGeneratedAt <= latestGeneratedAt [AND metadata.serialId.bundleSize > 1]

15validS3FileCountSELECT COUNT(s3-filename) WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.recordGeneratedBy != TMC AND metadata.recordGeneratedAt >= earliestGeneratedAt AND metadata.recordGeneratedAt <= latestGeneratedAt [AND metadata.serialId.bundleSize > 1]

16validS3FilesSELECT s3FileName WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.recordGeneratedBy != TMC [AND metadata.serialId.bundleSize > 1]

17listOfLogFilesAfterSELECT metadata.logFilename WHERE metadata.odeReceivedAt >= 2/13/2019 AND metadata.odeReceivedAt < dateOfBugFix AND metadata.recordGeneratedBy != TMC [AND metadata.serialId.bundleSize > 1]



Data Cleanup Actions


The data cleanup process has three main steps: identifying the bad data, deleting it, and confirming success. The above queries have accomplished the first step of identifying the bad data. In this section we will discuss the process of safely and correctly deleting the bad data.


Step 1: Identify scope of bad data

This has been completed by the queries above and in particular the list of files generated by query 12.

Step 2: Delete the bad data

TBD

Step 3: Verify cleanup

TBD