There is an interesting article in CIO Insight by Peter Fade, a professor of marketing at the top-rated Wharton Business School. in this he discusses the limitations of data mining, and it is an article that anyone contemplating investing in this technology should read carefully. I set up a small data mining practice when I was running a consulting division at Shell, and found it a thankless job. Although I had an articulate and smart data mining expert and we invested in what at the time was a high quality data mining tool, we found time and again that it was very hard to find real-world problems where the benefits of data mining could be shown. Either the data was such a mess that little sense could be made of it, or the insights shown by the data mining technology were, as Homer Simpson might say, of the “well, duh” variety.
Professor Faber argues that in most cases the best you can hope for is to develop simple probabilistic models of aggregate behaviour, and you simply cannot get down to the level of predicting individual behaviour using the level of data that we typically have, however alluring the sales demonstrations may be. Moreover, such models can mostly be built in Excel and don’t need large investments in sophisticated data mining tools.
While I am sure there are some very real examples where data mining can work well e.g. why some groups of people are better credit risks than others, the main point he makes is that the vision of 1-1 marketing via a data mining tool is a fantasy, and that the tools have been seriously oversold. Well, that is something that we in the software industry really do understand. We all want technology to provide magical insights into a messy and complex world that is hard to predict. Unfortunately the technology at present is generally as useful as a crystal ball when it comes to predicting individual behaviour. Yet there is still that urge to go into the tent and peer into the mists of the crystal ball in search of patterns.