Temporal Consistent Automatic Video Colorization via Semantic Correspondence
Yu Zhang
Siqi Chen*
Mingdao Wang
Xianlin Zhang
Chuang Zhu
Yue Zhang
Xueming Li
[Paper]
[GitHub]


Abstract

Video colorization task has recently attracted wide attention. Recent methods mainly work on the temporal consistency in adjacent frames or frames with small interval. However, it still faces severe challenge of the inconsistency between frames with large interval. To address this issue, we propose a novel video colorization framework, which combines semantic correspondence into automatic video colorization to keep long-range consistency. Firstly, a reference colorization network is designed to automatically colorize the first frame of each video, obtaining a reference image to supervise the following whole colorization process. Such automatically colorized reference image can not only avoid labor-intensive and time-consuming manual selection, but also enhance the similarity between reference and grayscale images. Afterwards, a semantic correspondence network and an image colorization network are introduced to colorize a series of the remaining frames with the help of the reference. Each frame is supervised by both the reference image and the immediately colorized preceding frame to improve both short-range and long-range temporal consistency. Extensive experiments demonstrate that our method outperforms other methods in maintaining temporal consistency both qualitatively and quantitatively. In the NTIRE 2023 Video Colorization Challenge, our method ranks at the 3rd place in Color Distribution Consistency(CDC) Optimization track.


Method Comparison

Comparison of different frameworks in video colorization: (a) Colorization with post-processing, (b) Colorization with dense long-term loss, (c) Colorization with semantic correspondence (Ours).


Overall Framework

The overall framework of our method. There are mainly three components: a reference colorization network, an image colorization network and a semantic correspondence network. The reference colorization network generate a colorized reference image using the first grayscale frame of the video. The semantic correspondence network and the image colorization network then leverage the reference to supervise the whole colorization process.



Visualization Results

Visual comparison with the state-of-the-art methods on Videvo test set.