How The Guardian and AFP used machine learning to understand quotes

2022-05-11. Quotes have always been used in news articles to bring life to a story and, more importantly, to add authenticity, accuracy and balance. Data scientists at The Guardian and AFP have found a way to give quotes a life of their own and to ensure that the accuracy and attribution of the sources are ironclad – thanks in part to AI.

by Neha Gupta neha.gupta@wan-ifra.org | May 11, 2022

In 2021, The Guardian took part in the Journalism AI Collab Challenges, a project connecting global newsrooms to understand how artificial intelligence can improve journalism.

One particular challenge was to answer the question “How might we use modular journalism and AI to assemble new storytelling formats and reach underserved audiences?”

Anna Vissens, Lead Scientist, and Michel Schammel, Senior Data Scientist, at Guardian News & Media, United Kingdom, joined WAN-IFRA’s virtual Newsroom Summit in late April to talk about the learnings from this project.
What are quotes?
The team defined modules as fragments of a story that live independently but can be repurposed, or even replaced, by another fragment. Based on this definition, quotes are strongly qualified as modules.

Taking Wikipedia as the starting point, here’s how the team defined a quote:

A quotation is the repetition of a sentence, phrase, or passage from speech or text that someone has said or written. In oral speech, it is the representation of an utterance that is introduced by a quotative marker, such as a verb of saying. For example: John said: "I saw Mary today."

In written text, quotations are signaled by quotation marks.

“It looks simple but we wrestled with questions like – what about song lyrics? Or poems? Are they quotes? What if someone doesn’t say it but thinks about it? Do we treat thoughts as we would speech?” said Vissens.
Why are they doing this?
There are several use cas...

Keen to read more? This content is exclusively available to our WAN-IFRA Members. If you believe your company is already a member, or you'd like to join in a personal capacity, please contact customer.service@wan-ifra.org If you're a media journalist please reach out to dean.roper@wan-ifra.org

Neha Gupta

Multimedia Journalist

neha.gupta@wan-ifra.org