The team of 20 sits at the intersection of BBC News, BBC R&D, Digital Products and CPW Engineering. It is an incubation hub, with a mix of editorial, software development, product and design roles, which hands over ideas to other teams to turn them into fully stable products for newsroom and audience consumption.
The team works on an eight-week project cycle of rapidly building prototypes of new tools and injecting them into newsroom functions in a relatively short period of time.
Conor Molumby (Senior Journalist, BBC, UK) and Joe Whitwell (Journalist, BBC, UK) joined WAN-IFRA’s recent Middle East Media Leaders eSummit to talk about automating story formats.
The News Lab’s aim is to look at personalisation as a structural requirement for universal service in the digital ecosystem with challenges of infinite choice and a diverse audience. To best serve its global audience through its application and website, the team sees personalisation as a core requirement.
A multiple story format for different consumption patterns
The team has dedicated the past couple of years to working on automation. Most recently, they have developed a multiple story format that suits the different consumption patterns of a diverse audience with the help of Artificial Intelligence. Here’s how they have achieved it:
Graphical storytelling: This idea is targeted at young audiences who are often:
- reading only headlines or very short articles
- consuming news through social media, and
- desiring more accessible story formats, particularly more visual.
Inspired partly from Instagram and comic books, the team has created a prototype for journalists to automatically generate graphical stories from text. The idea is for journalists to input a text story into the tool and obtain a first draft of the visual version of it, which can be used on social media platforms.
The tool uses entity extraction to analyse the text, pull out keywords from within it and match those keywords to an experimental in-house library of about 150 images.
The reason for picking the health domain was because of the formulaic and structured stories, typically based on clinical trials and latest findings. The team took a sample of 500 articles from their health team and scoured through them to identify recurring keywords.
“The image repository was actually commissioned as a bespoke way to communicate health stories in particular, the origins of which were actually pre-COVID,” says Molumby. “We also have the option to break down the 150 images and take a combinatorial approach to create more versions of visual representations of text from just those 150 images.”
Another tool to aid sentence analysis is inferring a panel type, which is a mobile phone-sized screen image of a block of text. The purpose of this semi-automated tool is to represent numbers, proportions or ratios as they appear in the text in a visual way.
The semi-automation feature allows an intervening editing interface for journalists to approve the automatic results from the first draft, choose from variations provided by the tool, but more so for full editorial control in case the content generated is misleading.
So far, the team has published a couple of test Facebook Stories and conducted research among focus groups within that target young audience demographic. Ninety-six percent of the focus group found the format engaging and 86 percent said they would share it with their friends.
Summarising stories with language models
A majority of online news stories still look like print newspaper articles – simply reset for the internet, with a standard headline, image and wall of text format.
“Our audiences are looking for a social media update, an email bulletin or a voice bulletin through a smart speaker or in their cars,” says Whitwell. “And while it would be fantastic to create these formats for various channels, there is a shortage of journalists and their time.”
To bridge this gap, the News Labs team uses a language model, which entails inputting a block of text to transform into Abstract Summaries suited to different platform specifications. For instance, the model can supply a single line for social media, a couple of sentences for an email update, and bullet points for a voice bulletin.
This technology also allows the team to automate image galleries.
You thought there was nothing new to say about the news in 2020? Well, how about a 2 minute video jam-packed with our innovations in BBC News. https://t.co/7qZZOUukKT
— BBC News Labs (@BBC_News_Labs) January 28, 2021
“Anyone who has worked in a newsroom knows that finding good images is time consuming. So, we use Extractive Summarisation here,” says Whitwell.
In this type of summarisation, the language model scans text and pulls out the important points, which are then run through entity extraction. That extracted entity is then used as a keyword and fed into the image repository to find the best visual matches for the given text using artificial intelligence.
Final decisions still made by journalists
“We still have a journalist in the loop to make the final editorial call. We’re trying to semi-automate as many steps to save a journalist’s already limited time in the newsroom and reach as many members of the audience as possible on as many forums as possible,” Whitwell says.
The team is also working on audio automation by taking the production data that is no longer needed after a broadcast and building coherent audio blocks out of it through the automation of chapterising linear output, with a tool called Slicer.
Another tool the team used a couple of years ago was called Skippy, and it is now the BBC Interactive News service on smart speakers.