LLVIP: A Visible-infrared Paired Dataset for Low-light Vision
Xinyu Jia
Chuang Zhu*
Minzhen Li
Wenqi Tang
Wenli Zhou


It is very challenging for various visual tasks such as image fusion, pedestrian detection and image-to-image translation in low light conditions due to the loss of effective target areas. In this case, infrared and visible images can be used together to provide both rich detail information and effective target areas. In this paper, we present LLVIP, a visible-infrared paired dataset for low-light vision. This dataset contains 30976 images, or 15488 pairs, most of which were taken at very dark scenes, and all of the images are strictly aligned in time and space. Pedestrians in the dataset are labeled. We compare the dataset with other visible-infrared datasets and evaluate the performance of some popular visual algorithms including image fusion, pedestrian detection and image-to-image translation on the dataset. The experimental results demonstrate the complementary effect of fusion on image information, and find the deficiency of existing algorithms of the three visual tasks in very low-light conditions. We believe the LLVIP dataset will contribute to the community of computer vision by promoting image fusion, pedestrian detection and image-to-image translation in very low-light applications. (The wavelength: 8~14um (thermal infrared images).)

Dataset Overview

Data Collection and Annotation

Video and PPT

Download LLVIP

Provide your information to get access to the dataset, before this please check and agree the term of use and license.

If you encounter something wrong (may be blocked), please contact us (czhu@bupt.edu.cn) without hesitation.


This LLVIP Dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree to our license terms.


title={LLVIP: A Visible-infrared Paired Dataset for Low-light Vision},
author={Jia, Xinyu and Zhu, Chuang and Li, Minzhen and Tang, Wenqi and Zhou, Wenli},
journal={arXiv preprint arXiv:2108.10831},

Applications and Baselines

Image Translations

It can be seen that both the quality of the generated image and the similarity to the real image are not satisfactory. Specifically, the background in the generated image is messy, the contours of pedestrian and the car is not clear and the details are wrong, and there are many artifacts on the image.

Image Fusion

From the first column on the left, we can clearly see that when the light condition is poor, visible images can hardly distinguish human body and background. In infrared images, objects such as human body can be easily distinguished with clear outline, but there is no internal texture information. Fusion algorithms combine the information of the two kinds of images more or less, so that human bodies are highlighted and the images contain some texture information.

(Low-light) Pedestrian Detection

There are many missed detection phenomena in visible images. The infrared image highlights pedestrians, and achieves a better effect in the detection task, which not only proves the necessity of infrared images but also indicates that the performance of pedestrian detection algorithm is not good enough under low-light conditions. There is at least some discrepancy between the results of visible and infrared images. This dataset can then be used to study and improve the performance of pedestrian detection algorithms at night.