Data Quality stories

Yesterday I went to a London IDC conference on business intelligence and integration, but fate seemed not to be on my side. I had gone mainly to see a presentation on MDM by Deloittes, but when I turned up this was missing from the agenda entirely (the presenter had pulled out with less than two days notice). The next presentation I wanted to see suffered from the presenter being taken ill, and so had someone else reading his slides (which never ends well), and the third presentation I fancied was also cancelled and had an unrelated substitute. This was in no way the organiser’s fault, but was a shame.

The most entertaining thing I saw was a presentation by a consultancy firm on some real-life data quality situations they had encountered at clients, showing just how tricky data quality algorithms can be. There was the “false positive” of two customer records, both A. Smith, both at the same address and both with the same date of birth. Not unreasonably the data quality algorithm duly rejected adding the second of these since it was obviously a duplicate record. It turned out to be two twins, Alice and Anthea.

The most amusing was a company who had a call centre records system and had put in a new data quality system to help them with this. An obscure element of this was a feature that automatically removed any records containing profanity by searching for swear-words within the text fields, presumably to protect the delicate feelings of the call centre manager. All seemed to be going well until after a few weeks it became clear that every single call from a certain English town had been blocked by the system. That town was Scunthorpe (think about it). The joys of unintended consequences.

Separately, I discovered that a well known high street bank has identified 8,000 active customers who, according to their systems, have an age over 160. At least they know they have a problem, and an active data quality program to address it.

One thought on “Data Quality stories”

  1. Ah, good old Scunthorpe again! We had a similar problem once (though less subtle) with “Tit Lane”. Actually it turns out someone has even written a book called “Rude Britain: The 100 Rudest Place Names in Britain” – surely essential reading for data cleansing developers?! Sorry to pollute such a serious blog with trivia though…

Comments are closed.