"The temptation to form premature theories upon insufficient data is the bane of our profession" - Sherlock Holmes
Step 3: Quality control
We were getting close, and visualizing the data showed a need for a row-by-row quality review. Multiple pins showed up over 100 miles away from the municipality we were analyzing. We started our focus there and uncovered some zip codes with numbers transposed.
We still had the original data and just extended the information. Because of this we were able to review and compare outliers easily. Other data elements were self-correcting cross streets where they were listed as brother street a/street b and street b/street a resulted in the same coordinates.
Quality control processes in data require constant, repetitive assessments. Perfect data, like perfect testing, comes at too high a cost, and the insights gained from perfect data are directionally significant at a much lower cost in both time and effort. Given the data elements and errors in the source data, we had to loop through a review cycle six times. Each time we identified the root causes of the data errors we were able to automate corrections that reduced each subsequent run.
Our data is not error free, but after review we have great confidence that those errors are not materially significant.
Step 1: Understanding the data
Step 2: Building the data set
Step 3: Quality control
Step 4: Presentation of insight
- What story did the data tell?
Step 5: Lessons learned.
By: Patrick Grant, Director of Public Sector Sales