Low Powered Models for Disease Detection and Classification for Radiology Images
Project Description -
The aim of this project is to create Deep Learning models for detection and classification of radiology images. The models must be compressed such that they can be deployed to low powered devices like ARM devices, Android devices, etc. Compression techniques such as Quantization and Pruning can be used.
Geeta Priya Padmanabhan
Tech Stack -
Project Link - Click here
Commits - Click here
Merge Requests - Click here
Why to do this -
There has been a lot of progress in developing Machine Learning models that predict the medical condition of a patient based upon specific inputs relevant to the diagnosis of that condition. However, these models have drawbacks while deployment in real-time on edge devices. Firstly, they have been trained on high-end GPUs that consume a lot of power and have a lot of computational capacity. Edge devices function on limited power and have a considerably low computational limit. Next, these models are extremely large in size, usually a few hundred megabytes. While training, a large amount of space is available. But the same is not reflected on edge devices having low storage capacity. Healthcare professionals do not have high-end machines available for immediate usage of these models. But edge devices, being low-cost, are easily available. To tackle the problem of model deployment, we use model compression techniques that reduce four factors - power consumption, storage usage, computational cost and latency of detection models in the healthcare category.
What have you done -
For the purpose of this project, 2 datasets were used -
RSNA Pneumonia Detection Dataset
Chest-XRay 14 Dataset
The compression techniques used were -
Model Pruning + Dynamic Quantization
Model Pruning + Float16 Quantization
Model Pruning + Int8 Quantization
RSNA Pneumonia Detection -
Two models were trained on this dataset - DenseNet201 and InceptionV3. We achieved the following results in the models’ performance with respect to accuracy and size.
Accuracies comparing original and compressed models -
Size comparing original and compressed models -
Accuracies comparing pruned and quantized-pruned models -
Size comparing pruned and quantized-pruned models -
Chest XRay14 -
Pretrained CheXNet model was used for this dataset from Bruce Chou's Github repository (link in references). The following results were obtained for this dataset.
AUROC Score Comparison between orginial and compressed models -
AUROC Score Comparison between Pruned and Quantized Pruned models -
Model Size Comparison -
How have you done it -
The general pipeline goes like this -
Step 1 - Data Exploration and Cleaning
In this step, we take raw data and explore it. We find out the number of classes, number of data items per class and the general distribution of data points. After deriving these insights, we clean the raw data to get rid of any unnecessary features or data entries. We also restructure tabular data such that it can be fed to the models. This involves steps like creating one-hot encoding of labels, creating extra columns, modifing path variable to redirect to images. In case of images, activities such as augmentation, resizing, shearing, etc are performed.
Step 2 - Modelling
This is the next step in which we initialize data generators that generate preprocessed images and labels in fixed batches. Data is split into train-val-test subsets. Model architectures are initialized. We have used 3 architectures for this project - DenseNet201, InceptionV3 and CheXNet. We also initialize callbacks, checkpoints and optimizers that will be used during training.
Step 3 - Training and Model Evaluation
Here, we train the models till we achieve acceptable performance. The model should neither be underfit nor overfit. After training is over, we evaluate the models. We evaluated DenseNet and InceptionV3 trained on the RSNA Pnemonia Detection Dataset based on accuracy. This is because the models directly output the class of the input image. CheXNet trained on Chest-XRay14 dataset was evaluated based on AUROC score because the output was not a fixed class but a class probability score. We also record the size of this original model.
Step 4 - Model Pruning
In model pruning, we trim the unnecessary connections in the neural network. here, I have used Polynomial Decay as the sparsity function. Pruning starts from 50% and goes upto 80% of the total weights in the model. After this, we remove the excess connections and compress the layers of the neural network. This model gets saved in a .h5 format.
Step 5 - Post-training Quantization
After the models are trained, we quantize them. This is done using Tensorflow Lite Converters. There are 3 types of quantizations that we are performing in this project - Dynamic, Float16 and Int8 quantization. We initialize the converter as per our requirement and pass the pre-trained or pruned model to it. The output is a quantized model in the form of a TFLite FlatBuffer. We evaluate the quantized models based on accuracy/AUROC score (as per the original model) and size.
Step 6 - Inference
For performing inference, using normal .h5 models, we use the model.predict() fucntion. While using TFLite models, we initialize interpreters that will set the input and output tensors. We invoke the interpreter on an input image and retrive the output tensor returned by the interpreter. The inference script was run for all models - original, pruned, quantized and hybrid.
Future Scope -
1. Testing these models on actual hardware such as Raspberry Pi and Android phones.
2. Compressing object detection/segmentation/UV based models.
3. Creating a UI to serve these models on the frontend.
4. CheXNet - https://github.com/brucechou1983/CheXNet-Keras