DATA SCIENCE MONTHLY MEETUP
Replay the meetupYoann Péron, Data Science Manager at Le Télégramme, presented a project on “augmented journalism”, a collaboration between the French local newspaper and IRISA research center, aimed at helping journalists improve their articles by providing information on their potential success and suggesting ways to enhance them and understand the different capacities of their articles and how to improve them for various platforms.
Improving Journalist Articles With Data
Yoann discussed the two-year project aimed at helping journalists improve their articles by providing information on their potential success and suggesting ways to enhance them. The project involved analyzing data on article views, engagement, and conversions. Four capacities were defined for articles, including audience and engagement metrics. The project also involved text analysis and numerical data to provide more insights into article content. However, challenges were encountered in defining what constitutes an angle for an article, and difficulties were faced with ChatGPT during the project’s development.
Model Development and Performance Discussion
Yoann discussed the development of a model to predict the number of page views for articles based on various factors such as the number of words, uniqueness, length, and quotes. He mentioned that they tried to use journalist classifications, but it didn’t improve the model. They also experimented with using ChatGPT to predict new data sets and found their own model to be more effective.
Le Télégramme’s Data Analysis and App
Yoann discussed the infrastructure and processes used for data analysis and model training at Le Telegram. He explained that they primarily use Python and AWS SageMaker for their developments, with data stored in an AWS data lake and Amazon Athena. The models are trained on Amazon DynamoDB and sent to Amazon Elastic File System (EFS) for inference. He also mentioned the use of R for data journalism already used by the newsroom for an app covering elections. Yoann then presented the current usage of their app, which includes comparing articles for similarities, making predictions of capacities, and providing suggestions for articles. He demonstrated the app using an article about a musical program, showing how it compares to similar articles and provides suggestions.