Our Datasets

CIRRMRI600+

This dataset comprises 628 abdominal MRI scans (310 T1-weighted (T1W) and 318 T2-weighted (T2W)) volumetric scans along with corresponding segmentation masks annotated by physicians. CirrMRI600+ is a single-center, multivendor, and multisequence dataset. To our knowledge, CirrMRI600+ is the first dataset designed explicitly for liver cirrhosis research and incorporates both T1W and T2W MRI images.

Summary: [600+ T1 and T2-weighted Liver cirrhosis volumetric scan with corresponding ground truth]

Link: https://osf.io/cuk24/

PanSegData​

Pancreas MRI segmentation lacks datasets. We collected 767 MRI and 1,350 CT scans. Our PanSegNet model, combining nnUNet and Transformers, achieved Dice scores of 88.3% (CT), 85.0% (T1W), and 86.3% (T2W). Strong volume correlation and high intra-observer agreement ensure reliable segmentation across modalities and centers.

Summary: [700+ MRI T1 and T2 Pancreas volumetric scan with corresponding ground truth]

Link: https://osf.io/kysnj/

 

Peri-Pancreatic Edema Dataset​

We collected 255 CT scans of pancreatitis patients with IRB approval. The dataset includes 179 cases with peri-pancreatic edema and 76 without it, labeled as 1 and 0, respectively. Expert radiologists annotated pancreas masks, ensuring accuracy. This dataset supports deep learning advancements in pancreatic research, providing valuable insights for future studies and clinical applications.

Summary: [Volumetric CT scans of 255 pancreatitis patients along with their labels]

Link: https://osf.io/cuk24/

 

Gastrovision

We present GastroVision, a multi-center open-access GI endoscopy dataset with 27 classes, including anatomical landmarks, abnormalities, and polyp removals. It contains 8,000 images from hospitals in Norway and Sweden, annotated by expert endoscopists. Extensive benchmarking validates its significance for advancing AI-based GI disease detection and classification.

Summary:(Detection and classification) [publication]

Link: https://drive.google.com/drive/u/1/folders/1T35gqO7jIKNxC-gVA2YVOMdsL7PSqeAa

PolypGen Video Sequence Dataset

We curated PolypGen, a dataset from six centers with 300+ patients, including single-frame and sequence data. It features 3,762 annotated polyp labels with precise boundaries, verified by six senior gastroenterologists. This is the most comprehensive detection and segmentation dataset, developed by computational scientists and expert gastroenterologists.

Summary: (Video polyp segmentation and detection [publication]

Link: https://drive.google.com/drive/u/2/folders/16uL9n84SrMt7IiQFzTUQNaJ9TbHJ8DhW

 

PolypGen Still Frames

PolypGen is a polyp segmentation and detection dataset with 8,037 frames, including 3,762 positive and 4,275 negative samples from six hospitals. It covers diverse populations, endoscopic systems, and treatment procedures. Initially used in EndoCV2021, it promotes generalizable deep learning models through multicenter collaboration and benchmarking challenges.

Summary: [Polyp segmentation][publication]

Link: https://www.synapse.org/Synapse:syn26376615/wiki/613312

 

Polyp segmentation & detection dataset

Kvasir-SEG

Kvasir-SEG (46.2 MB) contains 1,000 polyp images with ground truth masks from Kvasir Dataset v2. Resolutions range from 332×487 to 1920×1072 pixels. Images and masks are stored separately in JPEG format, with bounding box coordinates in a JSON file. This open-access dataset supports polyp detection research.

Summary: (segmentation, detection, and localization) [publication]

Link: https://datasets.simula.no/kvasir-seg/

Endocv 2021

We aim to create a comprehensive dataset from 6 global centers, offering 5 dataset types: i) multi-centre train-test split, ii) polyp size-based split, iii) data center-wise split, iv) modality split (test only), and v) one hidden center test. Participants will be evaluated on all types. The dataset includes detection bounding boxes, pixel-wise segmentation, and negative samples.

Summary: (segmentation, detection, and localization) [publication]

Link: https://endocv2021.grand-challenge.org/

Kvasir-Instrument

The Kvasir-Instrument dataset (170 MB) includes 590 endoscopic tool images with corresponding ground truth masks. Image resolutions range from 720×576 to 1280×1024 and are encoded in JPEG format. It is the first GI tract organ tools dataset, offering a train-test split for method development. Bounding box coordinates are stored in a JSON file for automatic tool segmentation research.

Summary:(segmentation, detection, and localization) [publication]

Link: https://datasets.simula.no/kvasir-instrument/

Kvasir-sessile

The Kvasir-SEG dataset (size 46.2 MB) contains 1000 polyp images and their corresponding ground truth from the Kvasir Dataset v2. The resolution of the images contained in Kvasir-SEG varies from 332×487 to 1920×1072 pixels. The images and its corresponding masks are stored in two separate folders with the same filename. The image files are encoded using JPEG compression, and online browsing is facilitated. The open-access dataset can be easily downloaded for research and educational purposes.

Summary: (segmentation, detection, and localization) [publication]

Link: https://endocv2021.grand-challenge.org/

Endotect 2020 Challenge Dataset

We aim to create a comprehensive dataset from 6 global centers, offering 5 dataset types: i) multi-centre train-test split, ii) polyp size-based split, iii) data center-wise split, iv) modality split (test only), and v) one hidden center test. Participants will be evaluated on all types. The dataset includes detection bounding boxes, pixel-wise segmentation, and negative samples.

Summary: (classification, segmentation, detection and localization) [publication]

Link: http://home.simula.no/~paalh/publications/files/icpr2020-endotect.pdf

Medico automatic polyp segmentation dataset

The Medico Automatic Polyp Segmentation Challenge aims to benchmark semantic segmentation algorithms for accurate and efficient detection of all polyp types using a publicly available dataset. By providing 1,000 segmented images and a separate test set, the challenge emphasizes robustness, speed, and generalization, encouraging multimedia researchers to contribute impactful solutions to medical diagnostics.

Summary: (segmentation, detection, and localization) [publication]

Link: https://www.kaggle.com/datasets/debeshjha1/medico-automatic-polyp-segmentation-challenge

Wireless capsule endoscopy dataset

Kvasir-capsule

This paper introduces **Kvasir-Capsule**, a large-scale video capsule endoscopy dataset containing over 4.7 million frames, including 47,238 medically verified and annotated images from 14 anomaly classes. By addressing the lack of accessible, labeled medical data, this dataset aims to advance AI-driven diagnostic tools for VCE, helping improve anomaly detection and reduce manual workload in clinical practice. [Publication]

Link:https://osf.io/dv2ag/

KvasirCapsule-SEG

This paper proposes **NanoNet**, a lightweight deep learning model designed for real-time segmentation in video capsule endoscopy and colonoscopy, achieving high accuracy with significantly fewer parameters. With only \~36,000 parameters, NanoNet balances speed, model complexity, and performance, making it well-suited for deployment on low-end endoscope hardware in clinical settings.

Link: https://datasets.simula.no/kvasir-capsule-seg/

Sports Data

CIRRMRI600+

This dataset comprises 628 abdominal MRI scans (310 T1-weighted (T1W) and 318 T2-weighted (T2W)) volumetric scans along with corresponding segmentation masks annotated by physicians. CirrMRI600+ is a single-center, multivendor, and multisequence dataset. To our knowledge, CirrMRI600+ is the first dataset designed explicitly for liver cirrhosis research and incorporates both T1W and T2W MRI images.

Link: https://osf.io/cuk24/

House activity Data

CIRRMRI600+

This dataset comprises 628 abdominal MRI scans (310 T1-weighted (T1W) and 318 T2-weighted (T2W)) volumetric scans along with corresponding segmentation masks annotated by physicians. CirrMRI600+ is a single-center, multivendor, and multisequence dataset. To our knowledge, CirrMRI600+ is the first dataset designed explicitly for liver cirrhosis research and incorporates both T1W and T2W MRI images.

Link: https://osf.io/cuk24/

Overall Datasets Information

These datasets can be used for medical image segmentation, detection, localization, and classification tasks. These are our publicly available datasets. More details about the datasets can be found on its official webpage and corresponding paper.

CirrMRI600+ [600+ T1 and T2-weighted Liver cirrhosis volumetric scan with corresponding ground truth]

PanSegData [700+ MRI T1 and T2 Pancreas volumetric scan with corresponding ground truth]

Peri-Pancreatic Edema Dataset [Volumetric CT scans of 255 pancreatitis patients along with their labels]

New Classification dataset

 

Polyp segmentation & detection dataset

Gastrointestinal tract classification datasets

Wireless capsule endoscopy dataset

Sports Data

The PMData Dataset

House activity Data

The HTAD Dataset

More datasets can be found on my kaggle webpage and for more health and medicine related datasets, please visit this webpage.