Provide your information to get access to the dataset, before this please check and agree the term of use and license.

If you encounter something wrong (may be blocked), please contact us without hesitation.


This dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree to our license terms bellow:

  1. That you include a reference to the dataset in any work that makes use of the dataset. For research papers, cite our preferred publication as listed on our website; for other media cite our preferred publication as listed on our website or link to the website.
  2. That you do not distribute this dataset or modified versions. It is permissible to distribute derivative works in as far as they are abstract representations of this dataset (such as models trained on it or additional annotations that do not directly include any of our data).
  3. That you may not use the dataset or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain.
  4. That all rights not expressly granted to you are reserved by us.


title={Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides},
author={Xu, Feng and Zhu, Chuang and Tang, Wenqi and Wang, Ying and Zhang, Yu and Li, Jie and Jiang, Hongchuan and Shi, Zhongyue and Liu, Jun and Jin, Mulan},
journal={Frontiers in Oncology},


The Early Breast Cancer Core-Needle Biopsy WSI (BCNB) Dataset includes core-needle biopsy whole slide images (WSIs) of early breast cancer patients and the corresponding clinical data. The WSIs have been examined and annotated by two independent and experienced pathologists blinded to all patient-related information.

This dataset is introduced in our paper "Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides", which is accepted by Frontiers in Oncology, and you can also get access our paper from Arxiv or MedRxiv. For more details, please visit the github repo and grand-challenge page. Based on the BCNB Dataset, we have studied the deep learning algorithm for predicting the metastatic status of axillary lymph node (ALN) preoperatively by using multiple instance learning (MIL), and have achieved the best AUC of 0.831 in the independent test cohort. For more details, please review our paper.


There are WSIs of 1058 patients. Part of tumor regions are annotated in WSIs, the extra annotations should be done by yourself if needed. Except for the WSIs, we have also provided the clinical data of each patient, which includes age, tumor size, tumor type, ER, PR, HER2, HER2 expression, histological grading, surgical, Ki67, molecular subtype, number of lymph node metastases, and the metastatic status of axillary lymph node (ALN). The dataset has been desensitized, and not contained the privacy information of patients.

The slides were scanned with Iscan Coreo pathologic scanner, and the WSIs were viewed at 200x magnification using Image Viewer software.

The WSIs are provided with .jpg format and the clinical data are provided with .xlsx format. The dataset is collected and organized by the experienced doctors of our research group.

Based on this dataset, we have studied the prediction of the metastatic status of ALN in our paper, which is a weakly supervised classification task. However, other researches based on our dataset are also feasible, such as the prediction of histological grading, molecular subtype, HER2, ER, and PR. We do not limit the specific content for your research, and any research based on our dataset is welcome.

Please note that the dataset is only used for education and research, and the usage for commercial and clinical applications is not allowed. The usage of this dataset must follow the license.


For your convenience in research, we have split the BCNB Dataset into training cohort, validation cohort, and independent test cohort with the ratio as 6: 2: 2. The overall clinical characteristics statistics information of the BCNB Dataset are as follows:


Clinical Data


Annotation information is stored in .json with the following format, where "vertices" have recorded coordinates of each point in the polygonal annotated area.

        "positive": [
                "name": "Annotation 0",
                "vertices": [
        "negative": []

Code for data preprocessing

We provided some codes for data preprocessing, which can be used to extract annotated tumor regions of all WSIs, and cutting patches with fixed size from all extracted annotated tumor regions, they may be helpful for you. Please check the code for more details.