Drawing ideas from stories: a new AI solution for newsrooms

By Marc Górriz Blanch (BBC), Gabriela Gruszynski Sanseverino (UPS) and Mathias Felipe de Lima Santos (UNAV)

At a recent workshop, JOLT ESRs worked in teams to develop ideas for research projects that address a major issue surrounding media and technology. Here, three ESRs consider how AI can be ethically used to generate illustrations for news stories.

Artificial Intelligence (AI) is far from what we have seen on movie screens. The machines have not taken over the world, a relief in a year that brought us a global pandemic, wildfires, hurricanes, massive protests, and even ‘murder hornets’. Nonetheless, AI is far more present in our daily lives than many people realise. There will always be ethical questions regarding the application of AI, but technological developments and solutions are changing in response and shaping how AI is used.

Journalism is one area that has been transformed by technology. Rapid technological and social changes have revolutionised journalism with the emergence of new tools and practices, norms, and routines.  In some cases, this has meant experimenting with AI solutions to address the demands of digital journalism.

News workers have been incorporating AI into a wide range of areas across the news value chain such as targeting audience demographics, metadata creation, dynamic advertising, and content personalisation. For example, The New York Times and ElPais are using the AI-based perspective tool to analyse the level of toxicity in their comment sections through keyword recognition.

Illustrating the news

In this context, we asked ourselves: how else could AI help with newsroom practices? What demands still exist? No doubt, journalists on the daily grind could create a list of key areas in which AI-technologies might facilitate their work or provide resources and tools for creating more engaging news stories. Drawing on our expertise in the visual, we decide  to focus on the images that illustrate news stories.

The advent of digital photography put into question the survival of press photographers and photojournalists in news media organizations. These practitioners became incongruent with the reality news companies . In fact, many news organizations now rely on freelance photographers or image brokers to illustrate their content through image-stock portals such as GettyImages and iStockPhoto .

However, in an industry that is facing stiff competition and navigating through business model changes, image-stock portals are not widely used. This is especially true for news outlets that are already struggling to pay their essential bills. Consequently,  many news outlets tend to reuse of the same image for several news stories. Moreover, the overall set of available images is limited, which compromises the ability of editors or newsrooms to accurately illustrate their content. This fact clearly limits the impact of the story in engaging readers in online platforms and social media, where the visual content plays a crucial role to draw the audience’s attention.

Considering the technological challenges and financial constraints, we propose a product based on a computational model that aims to respond to these issues in the context of the media ecosystem. Our method applies a threefold approach to a database of news articles and multimedia content. First, we applied text mining (also known as text analytics) centred on a neural language processing (NLP) system. In it, our algorithm transforms the text in documents and databases into normalized, structured data suitable for analysis or to drive a machine learning method. In our case, a  neural network gathers the text from a news article and identifies the most informative keywords based on a global analysis of the semantic information to summarise its content. Second, an AI-generative model outputs an image that best illustrates the story based on the semantic information of the processed keywords.

Many works propose end-to-end solutions to solve multimodal tasks. For example, the visual question answering (VQA) task aims to answer questions presented in an image using natural language, such as the number of objects of a specific colour, the events happening on the left-hand side of a street, among others applications.  In addition, other scholars propose a video object segmentation with referring expressions – e.g. a linguistic phrase and a clip of a video – and a neural network generates binary masks for the object to which the phrase refers.

Inspired by this work, we propose a multimodal Generative Adversarial Network (GAN), a class of machine learning frameworks, capable of generating images based on the semantic information of a given set of keywords. We, therefore, introduce a neural network that integrates the bidirectional transformer model BERT as a language encoder, which leverages the input article and yields an embedding for each of the main keywords. The resultant linguistic features serve as a condition for the GAN to decode an output image. Conditional GANs have proven a great success in image-to-image translation tasks, such as image colorization (from black and white to colour) or satellite images (from natural image to satellite maps), but rarely have been applied to multimodal tasks such as the text to image problem proposed in this work. 

However, this process is similar to what deep learning does to generate deep fakes. As such, we recognize that this raises questions about ethics and the wider risks AI poses for the future of the planet. Deepfakes are AI-generated fake images, videos, or audio recordings that look and sound like real ones. In fact, these leverage powerful techniques that combined machine learning (ML) and artificial intelligence served to manipulate or create visual content to deceive people.  

In our case, instead of using synthetic media  – in which a person in an existing image is replaced with someone else’s likeness – we rely on drawings or cartoon images. Indeed, this has shown an important venture in the media industry, as we can see news outlets, instead of using pictures, are relying on drawings to represent their interviewees. This experiment adds to a growing corpus of research and practical approach by showing how AI can be applied to journalism and finding a “healthy” application of deep fakes in the industry.

Project Members


Funding Image

This project is funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska Curie grant agreement No 765140