ColonSegNet is a popular lightweight polyp segmentation architecture that has been utilized at an industrial application by NVIDIA for the Clara Holoscan Sample App for colonoscopy polyps’ segmentation. With an image size of 512*512, ColonSegNet achieves a dice coefficient of 82.06% for the polyp segmentation tasks and achieves average precision of 80.00% for the polyp detection tasks. One reason that makes the algorithm important is its high real-time processing speed of 182.82 frames per second (polyp segmentation) and 180 fps for polyp detection which makes it important for being integrated into clinical settings and mobile applications devices.
ColonSegNet is an encoder-decoder architecture for the segmentation of colonoscopy and medical images. The architecture is very efficient in terms of processing speed (i.e., produces segmentation of colonoscopic polyp in real-time) and competitive in terms of performance. A comprehensive comparison of the state-of-the-art computer vision baseline methods on the Kvasir-SEG dataset is presented. The best approaches show real-time performance for polyp detection, localisation, and segmentation.
The figure shows the block diagram of the proposed ColonSegNet. It is an encoder-decoder that uses residual block with squeeze and excitation network as the main component. The network is designed to have very few trainable parameters as compared to other network baselines networks such as U-Net, PSPNet, DeepLabV3+, and others. The use of fewer trainable parameters makes the proposed architecture a very lightweight network that leads to real-time performance.
The network consists of two encoder blocks and two decoder blocks. The encoder network learns to extract all the necessary information from the input image, which is then passed to the decoder. Each decoder block consists of two skip connections from the encoder. The first is a simple concatenation, and the second skip connection passed through a transpose convolution to incorporate multi-scale features in the decoder. These multi-scale features help the decoder to generate more semantic and meaningful information in the form of a segmentation mask.
The input image is fed to the first encoder, which consists of two residual blocks and a 3*3 strided convolution in between them. This layer is followed by a 2*2 max-pooling. Here, the output feature map spatial dimensions are reduced to 1/4 of the input image. The second encoder consists of two residual blocks and a 3*3 strided convolution in between them.
The decoder starts with a transpose convolution, where the first decoder uses a stride value of 4, which increases the feature map spatial dimensions by 4. Similarly, the second decoder uses a stride value of 2, increasing the spatial dimensions by 2. Then, the network follows a simple concatenation and a residual block. Next, it is concatenated with the second skip connection and again followed by a residual block. The output of the last decoder block passes through a 1*1 convolution and a sigmoid activation function, generating the binary segmentation mask.
Quantitative and Qualitative results
The proposed ColonSegNet detected and localised polyps at 180 frames per second. Similarly, ColonSegNet segmented polyps at the speed of 182.38 frames per second. The automatic polyp detection, localisation, and segmentation algorithms showed good performance, as evidenced by high average precision, IoU, and FPS for the detection algorithm and DSC, IoU, precision, recall, F2-score, and FPS for the segmentation algorithm. While algorithms investigated in this paper show a clear strength to be used in clinical settings to help gastroenterologists for the polyp detection, localisation, and segmentation task, computational scientists can build upon these methods to further improve in terms of accuracy, speed and robustness.
Paper link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7968127/
GitHub link: https://github.com/DebeshJha/ColonSegNet