12 December 2024
Virtual

"Augmented Journalism" : Predicting how articles perform

DATA SCIENCE MONTHLY MEETUP

Replay the meetup

Yoann Péron, Data Science Manager at Le Télégramme,  presented a project on “augmented journalism”, a collaboration between the French local newspaper and IRISA research center, aimed at helping journalists improve their articles by providing information on their potential success and suggesting ways to enhance them and understand the different capacities of their articles and how to improve them for various platforms.

Improving Journalist Articles With Data

Yoann discussed the two-year project aimed at helping journalists improve their articles by providing information on their potential success and suggesting ways to enhance them. The project involved analyzing data on article views, engagement, and conversions. Four capacities were defined for articles, including audience and engagement metrics. The project also involved text analysis and numerical data to provide more insights into article content. However, challenges were encountered in defining what constitutes an angle for an article, and difficulties were faced with ChatGPT during the project’s development.

Model Development and Performance Discussion

Yoann discussed the development of a model to predict the number of page views for articles based on various factors such as the number of words, uniqueness, length, and quotes. He mentioned that they tried to use journalist classifications, but it didn’t improve the model. They also experimented with using ChatGPT to predict new data sets and found their own model to be more effective.

Le Télégramme’s Data Analysis and App

Yoann discussed the infrastructure and processes used for data analysis and model training at Le Telegram. He explained that they primarily use Python and AWS SageMaker for their developments, with data stored in an AWS data lake and Amazon Athena. The models are trained on Amazon DynamoDB and sent to Amazon Elastic File System (EFS) for inference. He also mentioned the use of R for data journalism already used by the newsroom for an app covering elections. Yoann then presented the current usage of their app, which includes comparing articles for similarities, making predictions of capacities, and providing suggestions for articles. He demonstrated the app using an article about a musical program, showing how it compares to similar articles and provides suggestions.

🎥 REPLAY THE MEET-UP

⬇️ DOWNLOAD THE PDF