Tamedia’s curation algorithm promising but not a panacea

Can a machine replace a journalist’s gut feeling? This is the question Swiss media group Tamedia set out to answer when it built an algorithm to potentially take over the curation process for one of its news apps.

by Simone Flueckiger simone.flueckiger@wan-ifra.org | November 16, 2017

With an algorithm named Octopus, Tamedia aimed to determine whether a machine can replace a journalist’s editorial judgment when it comes to curating stories from various platforms. Spoiler alert: it can’t, at least not yet, but it complements the editors’ work.

The media group’s “12-app”, which gives users access to 12 top stories of the day in one edition, served as the platform for the test.

Launched two years ago, the app has been downloaded (for free) some 200,000 times, and is accessed by 80,000 to 90,000 unique monthly users, said Michael Marti, member of the chief editorial team at Tages-Anzeiger, speaking at WAN-IFRA’s International Newsroom Summit in London last week.

The app pursues two parallel business goals: one, as a free reward for subscribers to one of the group’s titles who can access all the content, and two, to convert new users of the app to subscribe for CHF6 (€5.15) per month (users have access to one free article per month before they hit the paywall). According to Marti, the app has between 5,000 and 6,000 paid subscribers.

‘Time-consuming and complex’ manual curation

The content for each daily 12-app edition is sourced from across the group’s portfolio, which includes the brand’s flagship daily newspaper Tages-Anzeiger, around 20 regional newspapers, and women’s magazines.

Typically, from 7:00-8:30 a.m. the 12-app team scans Tamedia’s websites, newspapers, and magazines to put together the day’s edition, followed by a meeting with editorial management to seek final approval for the selection.

“It’s a quite traditional way of selecting stories, and a time-consuming and complex process,” Marti said.

“So, we soon began to think there must be a way to develop a system that could offer support or even complete the selection process automatically.”

The team approached Tamedia’s data scientists who offered to build an algorithm to automate the curation process, kicking off the Octopus project.

Bringing automation into the mix

“We wanted the algorithm to choose typical 12-app articles, which are articles that are similar to those published in previous editions,” Marti said. “So, we had to train a neural network to distinguish the ‘DNA’ of these 12-app articles from other articles.”

They built a database of 3,000 articles that had been selected by editors in the past, to teach the algorithm to recognise the properties of 12-app articles.

The 12-app article ‘DNA’, or statistical properties, are then converted into a machine-readable format via automated text analysis and natural language processing, which form the basis used to teach the artificial network to distinguish between what is considered a 12-app article and other articles.

Articles are awarded a score that indicates how similar they are to previously published stories, internally referred to as the “12-appishness”.

Based on that, as well as some hard-coded filters, the algorithm generates a list of the 10 most suitable stories for the app overnight, selected from a pool of more than 400 stories. The list is then published to a Slack channel where an editor can open and review it.

It also contains a feedback function for editors, allowing them to express whether or not they liked a suggestion in a bid to continuously optimise the algorithm.

‘Fully automated curation is unrealistic’

As for the results, Marti stated that the algorithm draws content from a wider variety of sources than humanly possible with around 50 percent of its suggestions considered appropriate by editors and used to compose editions. It also cuts the time needed to select stories by half to roughly 45 minutes, without affecting the overall quality of the edition.

“Fully automated curation is unrealistic for quality-based products such as best-of apps,” he said, summarising the results.

“This system should ideally be used to aid in the decision-making process, so we have a combination of human and machine.”

The algorithm mainly struggles with identifying topicality, as it relies solely on text without taking metadata or other contextual information into account.

“It also has trouble with articles from the culture section where we publish a lot of reviews, because it misidentifies them as portraits or reports,” Marti said, but as he continued optimistically, “It’s an ongoing project, so we’re always training and optimising the algorithm.”

Simone Flueckiger

simone.flueckiger@wan-ifra.org