Our Datasets

CIRRMRI600+

This dataset comprises 628 abdominal MRI scans (310 T1-weighted (T1W) and 318 T2-weighted (T2W)) volumetric scans along with corresponding segmentation masks annotated by physicians. CirrMRI600+ is a single-center, multivendor, and multisequence dataset. To our knowledge, CirrMRI600+ is the first dataset designed explicitly for liver cirrhosis research and incorporates both T1W and T2W MRI images.

Summary: [600+ T1 and T2-weighted Liver cirrhosis volumetric scan with corresponding ground truth]

Link: https://osf.io/cuk24/

PanSegData​

Pancreas MRI segmentation lacks datasets. We collected 767 MRI and 1,350 CT scans. Our PanSegNet model, combining nnUNet and Transformers, achieved Dice scores of 88.3% (CT), 85.0% (T1W), and 86.3% (T2W). Strong volume correlation and high intra-observer agreement ensure reliable segmentation across modalities and centers.

Summary: [700+ MRI T1 and T2 Pancreas volumetric scan with corresponding ground truth]

Link: https://osf.io/kysnj/

 

Peri-Pancreatic Edema Dataset​

We collected 255 CT scans of pancreatitis patients with IRB approval. The dataset includes 179 cases with peri-pancreatic edema and 76 without it, labeled as 1 and 0, respectively. Expert radiologists annotated pancreas masks, ensuring accuracy. This dataset supports deep learning advancements in pancreatic research, providing valuable insights for future studies and clinical applications.

Summary: [Volumetric CT scans of 255 pancreatitis patients along with their labels]

Link: https://osf.io/cuk24/

 

Gastrovision

We present GastroVision, a multi-center open-access GI endoscopy dataset with 27 classes, including anatomical landmarks, abnormalities, and polyp removals. It contains 8,000 images from hospitals in Norway and Sweden, annotated by expert endoscopists. Extensive benchmarking validates its significance for advancing AI-based GI disease detection and classification.

Summary:(Detection and classification) [publication]

Link: https://drive.google.com/drive/u/1/folders/1T35gqO7jIKNxC-gVA2YVOMdsL7PSqeAa

PolypGen Video Sequence Dataset

We curated PolypGen, a dataset from six centers with 300+ patients, including single-frame and sequence data. It features 3,762 annotated polyp labels with precise boundaries, verified by six senior gastroenterologists. This is the most comprehensive detection and segmentation dataset, developed by computational scientists and expert gastroenterologists.

Summary: (Video polyp segmentation and detection [publication]

Link: https://drive.google.com/drive/u/2/folders/16uL9n84SrMt7IiQFzTUQNaJ9TbHJ8DhW

 

PolypGen Still Frames

PolypGen is a polyp segmentation and detection dataset with 8,037 frames, including 3,762 positive and 4,275 negative samples from six hospitals. It covers diverse populations, endoscopic systems, and treatment procedures. Initially used in EndoCV2021, it promotes generalizable deep learning models through multicenter collaboration and benchmarking challenges.

Summary: [Polyp segmentation][publication]

Link: https://www.synapse.org/Synapse:syn26376615/wiki/613312

 

Polyp segmentation & detection dataset

Kvasir-SEG

Kvasir-SEG (46.2 MB) contains 1,000 polyp images with ground truth masks from Kvasir Dataset v2. Resolutions range from 332×487 to 1920×1072 pixels. Images and masks are stored separately in JPEG format, with bounding box coordinates in a JSON file. This open-access dataset supports polyp detection research.

Summary: (segmentation, detection, and localization) [publication]

Link: https://datasets.simula.no/kvasir-seg/

Endocv 2021

We aim to create a comprehensive dataset from 6 global centers, offering 5 dataset types: i) multi-centre train-test split, ii) polyp size-based split, iii) data center-wise split, iv) modality split (test only), and v) one hidden center test. Participants will be evaluated on all types. The dataset includes detection bounding boxes, pixel-wise segmentation, and negative samples.

Summary: (segmentation, detection, and localization) [publication]

Link: https://endocv2021.grand-challenge.org/

Kvasir-Instrument

The Kvasir-Instrument dataset (170 MB) includes 590 endoscopic tool images with corresponding ground truth masks. Image resolutions range from 720×576 to 1280×1024 and are encoded in JPEG format. It is the first GI tract organ tools dataset, offering a train-test split for method development. Bounding box coordinates are stored in a JSON file for automatic tool segmentation research.

Summary:(segmentation, detection, and localization) [publication]

Link: https://datasets.simula.no/kvasir-instrument/

Kvasir-sessile

The Kvasir-SEG dataset (size 46.2 MB) contains 1000 polyp images and their corresponding ground truth from the Kvasir Dataset v2. The resolution of the images contained in Kvasir-SEG varies from 332×487 to 1920×1072 pixels. The images and its corresponding masks are stored in two separate folders with the same filename. The image files are encoded using JPEG compression, and online browsing is facilitated. The open-access dataset can be easily downloaded for research and educational purposes.

Summary: (segmentation, detection, and localization) [publication]

Link: https://endocv2021.grand-challenge.org/

Endotect 2020 Challenge Dataset

We aim to create a comprehensive dataset from 6 global centers, offering 5 dataset types: i) multi-centre train-test split, ii) polyp size-based split, iii) data center-wise split, iv) modality split (test only), and v) one hidden center test. Participants will be evaluated on all types. The dataset includes detection bounding boxes, pixel-wise segmentation, and negative samples.

Summary: (classification, segmentation, detection and localization) [publication]

Link: http://home.simula.no/~paalh/publications/files/icpr2020-endotect.pdf

Endocv 2021

We aim to create a comprehensive dataset from 6 global centers, offering 5 dataset types: i) multi-centre train-test split, ii) polyp size-based split, iii) data center-wise split, iv) modality split (test only), and v) one hidden center test. Participants will be evaluated on all types. The dataset includes detection bounding boxes, pixel-wise segmentation, and negative samples.

Summary: (segmentation, detection, and localization) [publication]

Link: https://endocv2021.grand-challenge.org/

Endocv 2021

We aim to create a comprehensive dataset from 6 global centers, offering 5 dataset types: i) multi-centre train-test split, ii) polyp size-based split, iii) data center-wise split, iv) modality split (test only), and v) one hidden center test. Participants will be evaluated on all types. The dataset includes detection bounding boxes, pixel-wise segmentation, and negative samples.

Summary: (segmentation, detection, and localization) [publication]

Link: https://endocv2021.grand-challenge.org/

Wireless capsule endoscopy dataset

CIRRMRI600+

This dataset comprises 628 abdominal MRI scans (310 T1-weighted (T1W) and 318 T2-weighted (T2W)) volumetric scans along with corresponding segmentation masks annotated by physicians. CirrMRI600+ is a single-center, multivendor, and multisequence dataset. To our knowledge, CirrMRI600+ is the first dataset designed explicitly for liver cirrhosis research and incorporates both T1W and T2W MRI images.

Link: https://osf.io/cuk24/

CIRRMRI600+

This dataset comprises 628 abdominal MRI scans (310 T1-weighted (T1W) and 318 T2-weighted (T2W)) volumetric scans along with corresponding segmentation masks annotated by physicians. CirrMRI600+ is a single-center, multivendor, and multisequence dataset. To our knowledge, CirrMRI600+ is the first dataset designed explicitly for liver cirrhosis research and incorporates both T1W and T2W MRI images.

Link: https://osf.io/cuk24/

Sports Data

CIRRMRI600+

This dataset comprises 628 abdominal MRI scans (310 T1-weighted (T1W) and 318 T2-weighted (T2W)) volumetric scans along with corresponding segmentation masks annotated by physicians. CirrMRI600+ is a single-center, multivendor, and multisequence dataset. To our knowledge, CirrMRI600+ is the first dataset designed explicitly for liver cirrhosis research and incorporates both T1W and T2W MRI images.

Link: https://osf.io/cuk24/

House activity Data

CIRRMRI600+

This dataset comprises 628 abdominal MRI scans (310 T1-weighted (T1W) and 318 T2-weighted (T2W)) volumetric scans along with corresponding segmentation masks annotated by physicians. CirrMRI600+ is a single-center, multivendor, and multisequence dataset. To our knowledge, CirrMRI600+ is the first dataset designed explicitly for liver cirrhosis research and incorporates both T1W and T2W MRI images.

Link: https://osf.io/cuk24/

Overall Datasets Information

These datasets can be used for medical image segmentation, detection, localization, and classification tasks. These are our publicly available datasets. More details about the datasets can be found on its official webpage and corresponding paper.

CirrMRI600+ [600+ T1 and T2-weighted Liver cirrhosis volumetric scan with corresponding ground truth]

PanSegData [700+ MRI T1 and T2 Pancreas volumetric scan with corresponding ground truth]

Peri-Pancreatic Edema Dataset [Volumetric CT scans of 255 pancreatitis patients along with their labels]

New Classification dataset

 

Polyp segmentation & detection dataset

Gastrointestinal tract classification datasets

Wireless capsule endoscopy dataset

Sports Data

The PMData Dataset

House activity Data

The HTAD Dataset

More datasets can be found on my kaggle webpage and for more health and medicine related datasets, please visit this webpage.