Gastrointestinal (GI) cancers are among the most common cancers worldwide. In particular, colorectal cancer is the most lethal in terms of the number of incidences and mortality (third most common cause of cancer and the second most common cause of cancer-related deaths). Colonoscopy is the gold standard for screening patients for colorectal cancer. During the colonoscopy, gastroenterologists examine the large bowel, detect precancerous abnormal tissue growths (for example, polyps and precancerous tumors) and remove them through the scope if necessary. Although colonoscopy is considered the gold standard, it is an operator-dependent procedure. Previous research has shown significant missing rates for GI abnormalities, e.g., polyp miss detection is around 22%-28%. Early detection of GI lesions and cancers at the curable stage can help reduce mortality. The development of automated, accurate, and efficient methods for detecting GI cancers could benefit gastroenterologists and patients. In this thesis, we have developed algorithms for Computer aided diagnosis (CADx) system that can help to highlight suspicious lesions on the screen and alert gastroenterologists in real-time, improving the clinical outcome irrespective of operator experience. This could save millions of lives when integrated into screening programs. Our proposed architectures are based on machine learning (ML) and deep learning (DL) based algorithms for GI tract abnormality detection and classification, polyp segmentation and surgical instrument segmentation.
The medical field is becoming more interdisciplinary, and the importance of medical image data is increasing rapidly. Medical image analysis can play a central role in disease detection, diagnosis, and treatment. With the increasing number of medical images, there is enormous potential to improve the screening quality. Deep learning (DL), in particular, convolutional neural network based models, has tremendous potential to automate and enhance the medical image analysis procedure and provide an accurate diagnosis. The automated analysis of medical images could reduce the burden of medical experts and provide quality and accessible healthcare to a larger population. In medical imaging, classification, detection, and semantic segmentation tasks are crucial for clinical practice. Thus, we aim to develop deep learning based models for classification, segmentation and detection based algorithms by using imaging datasets for identifying abnormalities at an early stage.
The availability of public datasets was one of the significant challenges for the development of automated methods for GI abnormality classification and segmentation. We addressed this problem by collecting, curating, annotating, and publicly releasing several datasets, including the world’s largest publicly available GI endoscopy and video capsule endoscopy datasets. We have released several gastrointestinal endoscopy and colonoscopy datasets, including datasets from multi-clinical centers from different countries. All of our datasets can be freely downloaded under open source license for academic research(no prior consent required) and industrial purposes (prior consent required). We have released open-access datasets such as HyperKvasir, Kvasir Capsule, PolypGen, Kvasir-SEG, Kvasir-instrument, KvasirCapsule-SEG, Medico automatic polyp segmentation challenge, EndoTect challenge, and 3rd International Endoscopy Computer Vision Challenge and Workshop (EndoCV2021). All these datasets were collected from different centers in Norway except PolypGen dataset, that was collected in collaboration with different hospitals in Norway, different parts of Europe (United Kingdom, France, Italy) and Africa (Egypt) to address the lack of multi-institutional datasets in the field. The multi-center dataset consists of still frames and video polyp sequence dataset with both positive samples and negative samples. The video sequence dataset is important as it mimics real-world colonoscopy scenarios. The senior gastroenterologists identified the problems in detecting lesions in the hospitals and provided feedback on the datasets and results. To the best of our knowledge, HyperKvasir and Kvasir-Capsule are the world’s largest and most diverse publicly available gastrointestinal tract and video capsule endoscopy datasets. Similarly, PolypGen is also the world’s most comprehensive publicly available multi-center polyp still frame and video sequence dataset available for academic research and innovation. Kvasir-SEG is the most famous polyp segmentation benchmark dataset in the world. Once we had developed datasets, we provided benchmarks on these datasets to encourage other researchers to use our datasets and develop novel and reproducible methods on the publicly available datasets.
At first, we designed automated methods for multi-class GI tract findings classification using Global features and ML techniques. We demonstrated that our best method achieved a Matthew correlation coefficient (MCC) of 0.8353 on Medico 2018 challenge dataset. We performed cross-dataset bias study on four GI endoscopy datasets and five ML techniques in the context of GI findings and abnormalities classification. Our experimental results suggested that a multi-center or cross-dataset evaluation is important for a realistic understanding of the performance of the ML models in the real-world setting. Moreover, we performed a comprehensive study where we evaluated, compared, dissected, ranked, and summarized 23 automated classification methods presented in the different GI endoscopy competitions. We analyzed from ML methods using global image features to recent CNN based approaches using transfer learning and specialized data augmentation. Our study showed significant results improvement for GI tract finding classification, efficiency, and automatic reporting tasks over three consecutive years.
We advocate organizing more competitions and analyzing the clinical applicability of the developed methods based on their merits, such as higher accuracy, higher speed, robustness and transparency. Additionally, we organized the 2020 Endotect challenge, where we proposed classification tasks, evaluation metrics, datasets and evaluated the participant’s results. In GI endoscopy, crowdsourcing is a popular technique for solving complex problems. We designed training and test dataset for the participants. In our competition, Team Howard achieved an MCC score of 0.9030 and a processing speed of 129.74 frames per second (FPS). Besides that, we have built algorithms and highlighted some of the results that show that our algorithm has the potential to identify, detect, and segment potential lesions with high accuracy and high speed. We have shown that the designed algorithm performs well with other medical imaging datasets as well. However, we conjecture that it will also perform well with non-medical (natural) imaging datasets.
After designing classification algorithms, we focused our research on polyp segmentation, which remained a critical and unsolved issue in the field. Therefore, we studied and proposed DL based architectures for automatic polyp segmentation and other medical image segmentation tasks (for example, ResUNet++, DoubleU-Net, and ResUNet++ + CRF + TTA) that provided improved results with the publicly available datasets. All these three algorithms are popular medical benchmarks in the field. One of the example contributions of our proposed algorithm is the combination of “ResUNet++, conditional random field, and test time augmentation”, where the proposed method achieved DSC of ≥ 0.8500 with three still images and achieved≥0.8800 with two video sequence datasets. The proposed architectures can identify and segment flat and sessile polyps with high accuracy, which is one of the significant contributions of our semantic segmentation architecture. Additionally, in our papers, we have analyzed best performing cases(where the proposed models perform well) and failure cases(where the proposed models fail). Thus, we have achieved promising results for colorectal polyp segmentation. Multi-centre datasets and randomized trials with information from thousands of patients are essential to evaluate better if our methods are clinically significant. The results have shown that our architectures can improve clinical outcomes and help endoscopists as our methods can identify multiple polyps simultaneously, including lesions such as flat polyps or sessile polyps that are often overlooked by endoscopists during the colonoscopy examination. The architectures presented in the study can make endoscopy efficient, easy, accessible and reduce the miss-rate and overall load of endoscopists, nurses, and hospitals. We conjecture that our architectures can be helpful in detecting missed lesions regardless of the endoscopist’s experience and current attentiveness. Additionally, the designed models for CADx (classification, detection and segmentation) have the potential to minimize the eye movement of the endoscopists, which would make the endoscopy procedure easier. As the endoscopy procedure is time-consuming, convolutional neural network models have a large potential to provide convenient support during an examination.
In the polyp segmentation tasks, usually processing speed was often neglected. Once we achieved promising results for automatic colon polyp segmentation, we concentrated our research on building lightweight models that can achieve promising performance in terms of accuracy and processing speed. Towards this, we have also designed architectures that have the capability of segmenting polyps in real-time with high frame per second (for example, ColonSegNet, NanoNet, PNS-Net, and DDANet). One of our polyp segmentation architectures designed considering processing speed is ColonSegNet. It achieved a real-time processing speed of 182.38 FPS, with decent DSC and mIoU scores. The results suggest that ColonSegNet could be helpful to the clinician during the live examination. Similarly, we have developed NanoNet, which is a lightweight architecture with a smaller model size and low computational cost. NanoNet has only 36,561 parameters. The model could be integrated into low-end endoscope hardware devices. Moreover, we performed extensive studies on the generalizability of our DL models on publicly available polyp datasets. For this, trained DL based models on datasets consisting of different cohort populations were collected using different types of scopes, modalities, and protocols to understand the generalizability of the methods. We also trained our algorithms on an in-distribution dataset and tested on out-of-distribution polyp data under multi-center cross-dataset testing. Moreover, we invite multimedia and medical image analysis community researchers through the open-access datasets, challenges, competitions and workshops to automatically provide their solutions to segment polyps with high accuracy and high processing speed and to tackle generalizability problems in polyp segmentation tasks.
As an important part of the procedure, we also performed surgical instrument segmentation to be able to detect which instruments are used during the examination. The developed method is able to segment different types of surgical instruments in real time. For the surgical instrument segmentation in laparoscopy, we have researched and proposed two methods. One of the architectures is based on RASNet based approach. The other one is a comprehensive comparison of the state-of-the-art medical image segmentation architectures, where DDANet based approach outperforms others state-of-the-art. DDANet achieved a DSC of 0.8739 and mIoU of 0.8183 and real-time processing speed of 101.36 FPS with the robust-mis challenge dataset. Thus, our algorithms are designed for polyp and surgical instrument segmentation and can also be exploited for other medical or non-medical image segmentation tasks. All of our work and algorithms are open-sourced and received very well by the community. We have made the source code and train-test split of the datasets available for most proposed methods. It helps in method reproducibility and result comparison with the other recent state-of-the-art (SOTA) methods. Our results prove that proposed DL-based CADx systems might greatly assist clinicians in the future.
In summary, we aim to solve a real-world problem related to GI diseases classification, detection, segmentation and surgical instrument segmentation. Figure 1 shows an example of colonoscopy datasets and classification, segmentation and detection algorithms along with the paper-wise contribution based on objectives. The CADe and CADx can solve the current challenges of miss-detection in the field and significantly impact the current healthcare system. Our goal is to provide technology to assist clinicians in improving the healthcare system based on their higher accuracy requirements in real-time speed. To achieve these, we have collected and curated many GI tract datasets and proposed different ML and DL based architectures for each task. Our models showed promising results with the real data obtained from different endoscopic equipment. Thus, clinicians could test our architectures to verify their usefulness in the clinical setting. Moreover, our architectures can be extended to other medical image analysis and natural image segmentation tasks.