By Marc Górriz Blanch
Archived broadcast content such as previously broadcasted shows, and historical event coverage is of relevance to journalism to support incoming news stories, and provide context to current events. JOLT partners BBC R&D are investigating how to harness such visual content through automatically enhancing the colour using some of the most recent breakthroughs in machine learning. Adding colour information to images has become an area of signiﬁcant interest for many, including within the broadcasting sector where this approach can be especially beneficial for the restoration of archive material. In this project, we introduce a novel colourisation technique based on Generative Adversarial Networks (GANs), which improves related AI-based computer-assisted technologies and leads to more natural, realistic and plausible colourful content.
An Open Source software is now available on this work via the BBC GitHub. You can also read our research paper, presented in ‘When AI meets Multimedia’ – a Multimedia Signal Processing workshop from the Institute of Electrical and Electronics Engineers.
Colourisation refers to the process of adding colours to greyscale images so that the coloured results are perceptually meaningful and visually appealing. It has subsequently been shown that the task is complex and results could actually bear little resemblance to the colours in real life due to the large degrees of freedom possible in the task. Our colourisation GAN addresses this problem by means of an adversarial algorithm which aim to mimic the natural colour distribution of a given dataset of colour images (training dataset) by forcing the generated results to be indistinguishable from natural content. The algorithm comprises two AI systems called “generator and discriminator”. The generator tries to produce realistic (but fake) colours from a black and white image, while the discriminator acts as a judge and tries to identify whether the results are fake or not. A competition takes place as the generator then tries to get better at producing realistically coloured images, and the discriminator gets better at detecting fake images. This video shows the generator competing with the discriminator to colourise greyscale images.
Typically, the GAN framework suffers from unstable behaviour with uncertain quality of results. This can be very disturbing when applied in the colourisation task. Our approach uses a training workflow and neural network design which aims to minimize the failure risk of GANs while maximising the quality of automatic colourisation task. The main contributions of our work include:
- The integration of instance and batch normalisation in both generator and discriminator with the aim to boost the network capabilities to generalise the style changes of the content while encouraging stabilisation during training. We propose a novel generator architecture introducing both normalization techniques into the popular U-Net architecture.
- The use of spectral normalization for regularising the learning parameters, aiming to improve the generalisation of the network and to prevent instability during training. This technique penalises proportionally the weights of the network based on their size, aiming to keep small values during training and hence preventing small changes in the input leading to large changes in the output.
- Increase the precision of GANs framework when colourising small areas and local details by using multiple discriminators at different scales in order to tackle high-resolution images without varying the discriminator architecture.
After achieving promising results with still images, we are now aiming to adapt our system to more realistic broadcasting uses, addressing any inconsistencies when applying our research on video footage. We also want to move into ‘style transfer’ – improving results by transferring colour information from other similar material. For instance, if we have a black and white image of a car in a forest, we could transfer over colour information from a frame in a similar coloured image. In this example, a red car in a forest on a cloudy day would be a point of reference for our algorithm and would copy over the colour of the vehicle, plus grey colours for the sky, dark colours for the lighting and so on. This kind of style transfer reduces the ambiguity of the colours chosen by the algorithm.