One big problem with big data the hype. Champions of big data promote it as a revolutionary advance. But even the examples that people give of the successes of big data, like Google Flu Trends, though useful, are small potatoes in the larger scheme of things. They are far less important than the great innovations of the 19th and 20th centuries, like antibiotics, automobiles and the airplane. Big data is here to stay, as it should be. But let’s be realistic: It’s an important resource for anyone analyzing data, not a silver bullet. So says an editorial in today’s Times.
Is big data really all it’s cracked up to be? There is no doubt that big data is a valuable tool that has already had a critical impact in certain areas. We need to be levelheaded about what big data can — and can’t — do.
The first thing to note is that although big data is very good at detecting correlations, especially subtle correlations that an analysis of smaller data sets might miss, it never tells us which correlations are meaningful. A big data analysis might reveal, for instance, that from 2006 to 2011 the United States murder rate was well correlated with the market share of Internet Explorer: Both went down sharply. But it’s hard to imagine there is any causal relationship between the two.
Second, big data can work well as an adjunct to scientific inquiry but rarely succeeds as a wholesale replacement.
Third, many tools that are based on big data can be easily gamed. Google’s celebrated search engine, rightly seen as a big data success story, is not immune to “Google bombing” and “spamdexing,” wily techniques for artificially elevating website search placement.
Fourth, even when the results of a big data analysis aren’t intentionally gamed, they often turn out to be less robust than they initially seem.
A fifth concern might be called the echo-chamber effect, which also stems from the fact that much of big data comes from the web. Whenever the source of information for a big data analysis is itself a product of big data, opportunities for vicious cycles abound.
A sixth worry is the risk of too many correlations. If you look 100 times for correlations between two variables, you risk finding, purely by chance, about five bogus correlations that appear statistically significant — even though there is no actual meaningful connection between the variables. Absent careful supervision, the magnitudes of big data can greatly amplify such errors.
Seventh, big data is prone to giving scientific-sounding solutions to hopelessly imprecise questions.
FINALLY, big data is at its best when analyzing things that are extremely common, but often falls short when analyzing things that are less common. For instance, programs that use big data to deal with text, such as search engines and translation programs, often rely heavily on something called trigrams: sequences of three words in a row (like “in a row”). Reliable statistical information can be compiled about common trigrams, precisely because they appear frequently. But no existing body of data will ever be large enough to include all the trigrams that people might use, because of the continuing inventiveness of
It’s good to read that another marketing buzzword is being pulled down to earth. Some journalists, who masquerade as masquerade as marketing analysts, for example, have said that big data are the life’s blood of marketing, but before they say things like that perhaps they should get up off their asses and actually do some work in marketing.
The reality is that big data is not the golden road for marketers it’s just another piece of the puzzle to gain a better understanding of illogical consumer behavior. This was a very good article and it’s good to see that the hype cycle of things like big data and social media are quickly coming back to earth.