As part of WAN-IFRA’s upcoming Shaping the Future of News Publishing report on data, we spoke to Kenneth Cukier, data editor of The Economist and co-author of Big Data: A Revolution That Will Transform How We Live, Work and Think, about how story analytics are impacting editorial decision making. Below is an edited transcript of the phone interview.
WAN-IFRA: How can news organizations use data to better understand their audiences?
Kenneth Cukier: It’s very popular for some of the online outlets like Gawker, the Huffington Post and Forbes to look at what articles are doing well or even what’s trending on Twitter and other platforms and then write articles to meet what looks like the thirst of the marketplace. This is a sort of “follow the herd” mentality for journalism. It’s not necessarily wrong — lots of news organizations do this. But they also need to use more patience, judgement and wisdom when they do that, to think about what our readers need to know (it’s not the break-up or the plastic surgery of a Hollywood celebrity that happens to be trending).
That’s one example of how news organizations can use the technology for their purposes, but there’s a lot more. Let me give you another example. Lots of companies now are doing interactive charts. A wise news organization would take all the data exhaust that’s coming from a user’s interaction and see what’s most popular. The first thing you’d be able to do is rank that and create default settings, tabs. If everyone’s using the Big Mac Index to compare America and Britain, but we designed it with 30 different countries, maybe we should have a little tab that says, “compare the pound with the dollar.” It’s about learning from the data and then feeding it back to people, being able to finally examine how they read or how they look at things, which in the past you couldn’t do so well.
Is data a more effective way of learning about readers than, for instance, focus groups? Why?
Absolutely, 100 percent. But there’s room for both. The most important thing to realize is that when you ask people questions about what they read, what they liked, how long they stayed, they almost always get it wrong. They tend to think they’ve seen a lot more material than they truly have. They tend to think they have spent more time with it, in terms of engagement, than they truly have as well. This data is well-documented in the literature.
So these focus groups are ridiculous for certain sorts of things. They’re really good if you want to ask more nuanced questions, but where it’s not good at all is where you can collect observed data rather than reported data. Reported data is a survey, a focus group. Observed data is what’s actually happening; it’s been passively collected based on the actual habits of people. In economics we call that the revealed preference. Being able to get the revealed preference is critical.
Here’s a great little aphorism. Michael Kinsley was the editor of Slate. He was publishing a lot of articles about the Monica Lewinsky scandal and Bill Clinton. And he was getting lots of letters and criticisms for his coverage. But his conclusion was that, “The letters all say, ‘No, no no,’ but the clicks all say, ‘Yes, yes yes.’”
Likewise, the nice thing about observed data and reported data is that people might say, “Oh I subscribe to the newspaper because I really, really like the diversity.” But everybody reads the U.S. section and the sports section and don’t read anything else. So we would learn that.
But the signal can be misleading. People might like to subscribe to a publication that aspires to give them the world. On the once or twice a year that they ever look at the international section, they’re happy that it’s there. At The Economist, the last two pages each week have long tables of numbers, and part of it is that’s who we are. And we recognize that most of our subscribers don’t look at those pages, but a lot of them like those pages so if they ever want to know what the GDP growth is in Brazil, they can get it in a moment’s notice. Publications still do that. They can’t listen to the data so much that they cut off their thinking hat because some of the data is a false signal.
Another example: many media organizations are creating video, but we don’t really know how people want to interact with that video. So how do we want to depict things? If we found out that three-quarters of the people watching our videos were looking at it on their mobile phones and not on their desktops, it would absolutely affect how we would choose to shoot that video because we would have to presume a super small screen that wouldn’t have a lot of detail as opposed to a wide screen that does. So we would have to build only big images in shots, not far away ones.
In the past, we never could determine that sort of information. Now we can. We have to be the sort of media organization that is smart enough to collect the data, and that costs money. Then you have to be smart enough to analyze the data, and thirdly you need a feedback loop so you feed back that learning into what you’re actually producing.
Other than tabloidization, do you see any downfalls of using data to make editorial decisions?
The data is always imperfect. Data is not reality in the same way that a map is not territory. It’s simply a simulacrum of reality. So we might not collect the right metric; the data could mislead us; we could be giving the data too much importance. The data is imperfect, so we need more data and better data to understand something more fully.
For example, with videos, the concept of “average time” is actually ridiculous. Because what if you found out that 30 percent of your visits clicked out after one second or three seconds, so you have this huge cohort, one-third of people, who get rid of it immediately. And almost everybody else, 70 percent, actually engage with the video until the very end. [If the average length of watching the video, then, was 1 minute 30 seconds], you would think you have to produce all your videos in 1 minute and 30 seconds. But in fact, no, you see this clustering effect.
I think the biggest problem for news organizations is not just the tabloidization. One is that the data is misleading, and you don’t realize it. The second is that the data is just always about the past. And originality comes from looking ahead, coming up with ideas that could never be justified in data because they’re so original that you don’t have any data to support them. An example of that is Steve Jobs’ launch of the iPad. How was he going to create data, how was he going to do market research on a product that was yet to exist?
An argument against using data to make editorial decisions is that computers don’t have the news judgement of seasoned journalists. What do you think about that?
I think that’s true, but at the same time I don’t think we’re going to hand over all of our decision making to an algorithm. I think the algorithm can actually help our decision making dramatically. The algorithm can tell you when is the best time to put on a new piece of content, or, more importantly, when to send your Twitter ad. How would an editor know that if not listening to data? If he uses his gumption and his instinct and his experience and his judgement to decide when to send out a tweet, that’s a very stupid editor. It’s a complex problem; there are a lot factors involved, and computers crunch the numbers like that really well. Human beings simply can’t do that.
So the argument needs to move up a notch. We’re all big boys and girls now. It’s not man versus machine. It’s how do we best apply this tool for the best outcome that we can get.
As technology advances and story metrics become more advanced, how do you see the relationship between analytics and journalism evolving?
I think in the future, every newsroom is going to have to watch the data, collect the data, and listen to the data. And if they don’t they’re not going to be as effective as they could be in understanding what their readers want. I think [data technology] is being adopted at different rates, but I think that everyone is going to get to the same base level of using data to inform their decisions.
Today there is a real prejudice against using data in the newsroom because we’re afraid if we quantify things too much we’ll lose what’s most valuable to what we do, which is, as journalists we’re for the underdog, we’re for the story that needs to be told that may not appeal to the whole world. We’re afraid that if we rely only on the clicks, then we’re going to distort our values, and we’ll be showing just the stuff that is eye candy, catnip, to the Internet audience. I think that is a real fear. It’s going to be important that as newsrooms adopt this technology they bring their values to the data.
Photo credit: Cukier.com