Chest X-Ray Pneumonia Classification

From PCA + SVM baseline to multiclass ResNet-50 with TTA and Grad-CAM

Python & PyTorch PCA & SVM ResNet-18 / ResNet-50 TTA & Grad-CAM

This project presents a single, coherent journey through pneumonia classification on chest X-rays, starting with a classical machine learning baseline and progressing to deep convolutional models capable of multiclass diagnosis. All experiments use the Kaggle Chest X-Ray (Pneumonia) dataset.

The workflow is organized into two major stages:

Phase 1: Normal vs Pneumonia (binary)
Phase 2: Normal vs Viral vs Bacterial pneumonia (multiclass)

Motivation

Chest X-rays are among the most widely used imaging modalities for respiratory illness. In many clinical settings, the first question is:

“Does this patient have pneumonia or not?”

This project tackles that core question first with a classical PCA + SVM baseline and a deep ResNet-18 binary classifier. It then extends to a more challenging—but clinically nuanced—task: differentiating Normal, Viral pneumonia, and Bacterial pneumonia from X-ray images alone.

The goals are to:

Show a clear progression from classical ML to deep learning
Quantitatively compare PCA + SVM vs ResNet-18 on the same binary task
Explore multiclass performance with ResNet-34 and ResNet-50
Use Grad-CAM to visualize how the network reasons
Discuss results in the context of real-world radiology practice

Dataset

Source: Kaggle Chest X-Ray Images (Pneumonia)
Modalities: Frontal chest X-rays
Tasks:
- Binary: Normal vs Pneumonia
- Multiclass: Normal vs Viral vs Bacterial pneumonia
Preprocessing:
- Resize to 224×224 pixels
- Optional grayscale conversion
- Normalization to ImageNet mean and standard deviation
Data splits: train / validation / test, with a stratified 10% validation split created from the train set.
Data augmentation (train only):
- Random horizontal flip
- Small rotations
- Random affine transforms (translation, shear)
- Light color jitter

These augmentations encourage the models to become robust to common imaging variations such as patient positioning, orientation, and brightness differences.

Phase 1 – Classical ML and Binary Deep Learning (Normal vs Pneumonia)

1.1 Classical Baseline: PCA → SVM

The first part of the project builds a classical ML pipeline for the binary task of distinguishing Normal vs Pneumonia:

Convert images to grayscale (if needed) and resize to a fixed resolution
Flatten and standardize pixel intensities
Apply Principal Component Analysis (PCA) to retain ~95–97% of the variance
Train an RBF-kernel SVM with hyperparameter tuning for C and gamma

Scree and PCA plot — Figure: Scree plot (left) and cumulative explained variance (right) for PCA. The scree plot shows that the first few principal components capture the majority of variance, with subsequent components contributing minimally. The cumulative variance curve indicates that retaining ~97% of the dataset’s variance requires 497 principal components. This provides an efficient reduced-dimensional representation for classical machine learning while preserving most of the structure present in the original images.

Accuracy: ≈ 88%
ROC-AUC: ≈ 0.92

ROC_AUC and precision-recall plot — Figure: ROC curve (left) and Precision–Recall curve (right) for the PCA → SVM baseline on the Normal vs Pneumonia task. The SVM achieves an AUC of 0.92, indicating strong discrimination between classes, while the PR curve (AP = 0.95) shows high precision across most recall values. Together, these curves demonstrate that PCA-reduced features support solid classical ML performance before transitioning to deep CNNs.

Binary PCA → SVM performance (Normal vs Pneumonia):

This baseline is fast and interpretable, but it has clear limitations: it relies on global, hand-crafted features that struggle to capture subtle lung opacities and diffuse infiltrates.

1.2 Deep Learning: ResNet-18 Binary Classifier

The same binary classification problem is then tackled with a deep convolutional model:

Backbone: ResNet-18 pretrained on ImageNet
Modification: Replace final fully connected layer with a 2-class output head
Loss: Cross-entropy with class weights to address class imbalance
Optimization: Adam optimizer with learning rate scheduling
Training tricks:
- Data augmentation (flip, rotation, affine, color jitter)
- Validation-based early stopping

Binary ResNet-18 performance (Normal vs Pneumonia):

AUC-ROC: ≈ 0.95
F1-score: ≈ 0.90

Compared to the PCA + SVM pipeline, the ResNet-18 model delivers a clear improvement. It learns spatial and texture patterns directly from the images, which is especially important for subtle or atypical pneumonia cases. This stage demonstrates why CNNs are now the standard for medical image classification tasks.

GitHub

Phase 2 – Multiclass Deep Learning (Normal / Viral / Bacterial)

The second stage extends the task to a more clinically nuanced and challenging setting: Normal vs Viral pneumonia vs Bacterial pneumonia.

2.1 ResNet-34: Establishing a Multiclass Baseline

The multiclass experiments begin with ResNet-34:

Pretrained on ImageNet
Final fully connected layer adapted for 3 classes
Class-weighted cross-entropy loss to balance Normal, Viral, and Bacterial samples
Fine-tuning strategies:
- Unfreezing only the last residual block (layer4)
- Unfreezing both layer3 and layer4
Regularization via weight decay (values around 1e-3 to 5e-4 explored)

ResNet-34 multiclass performance (typical best run):

Weighted F1: ≈ 0.839
Weighted AUC-ROC: ≈ 0.952

These results are strong, but improvements from additional fine-tuning eventually saturate, suggesting that the bottleneck is not purely model capacity—it is also the intrinsic visual overlap between Viral and Bacterial pneumonia on X-ray.

GitHub

Download Model

2.2 ResNet-50 with Strong Regularization and TTA (Final Model)

To explore whether a deeper model can capture finer distinctions, the project moves to ResNet-50:

Backbone: ResNet-50 pretrained on ImageNet
Output head: 3-class fully connected layer
Loss: Class-weighted cross-entropy
Regularization: Increased weight_decay to 2e-3 to control overfitting
Training scheme: same augmentation and early stopping pipeline as before
Test-Time Augmentation (TTA):
- At inference, predictions are made on multiple augmented versions of each image (e.g., original and horizontally flipped)
- Logits are averaged before computing the final softmax probabilities

Final Multiclass Performance (ResNet-50 + Strong Regularization + TTA)

Metric	Score
Accuracy	85.10%
Weighted F1	0.8525
Weighted AUC-ROC	0.955
Virus F1	0.760
Normal F1	0.8598
Bacteria F1	0.9018

Confusion Matrix (Test Set)

mutli-class confusion matrix — Figure: Multiclass confusion matrix for the final ResNet-50 model with TTA. The classifier achieves strong performance on Normal and Bacterial pneumonia, with most samples correctly assigned to their respective classes. Viral pneumonia remains the most challenging category, showing moderate confusion with Bacterial cases—a pattern consistent with the clinical overlap in radiographic appearance between viral and bacterial pneumonia.

In this configuration, Normal and Bacterial pneumonia are well-separated, while Viral pneumonia remains the most ambiguous class. This is aligned with clinical expectations: viral and bacterial pneumonia often share overlapping radiographic patterns, making them difficult to distinguish even for experienced radiologists.

Effect of TTA: Evaluating the ResNet-50 model with Test-Time Augmentation further improves the weighted F1 score and AUC, indicating that the model’s predictions are stable under natural image variations and that averaging across augmented views reduces prediction noise near decision boundaries.

Grad-CAM Interpretability

To understand where the network is looking when it predicts a given class, Grad-CAM heatmaps are generated from the final convolutional layers of the trained models.

For a chosen target class (e.g., Normal, Viral, or Bacterial), gradients of the class score are backpropagated to the last conv layer.
Channel-wise importance weights are computed via global average pooling of these gradients.
A weighted sum of the feature maps is taken, followed by a ReLU and upsampling to the input image size.
The resulting heatmap is normalized and overlaid on the original chest X-ray.

Grad-CAM for Normal X-ray — **Normal Chest X-Ray**
Grad-CAM shows low activation in clear lung fields, consistent with a normal finding.

Grad-CAM for Pneumonia X-ray — **Pneumonia (Viral/Bacterial)**
Grad-CAM highlights regions of opacity and consolidation that drive the model’s decision.

These visualizations confirm that the model is not focusing on irrelevant artifacts, but rather on anatomically meaningful lung regions, which is essential for trust and clinical interpretability.

Clinical Interpretation and Limitations

The results of this project highlight an important distinction between what is technically achievable with deep learning and what is clinically realistic with chest X-ray data alone:

Normal vs Pneumonia: The ResNet-18 binary model and the ResNet-50 multiclass model both show strong capability for detecting pneumonia, making them potentially useful as triage or decision-support tools.
Viral vs Bacterial: Even with deeper networks, careful regularization, and TTA, the separation between Viral and Bacterial pneumonia remains imperfect. This is expected because pathogen-specific information is only partially encoded in plain radiographs.

In real-world practice, differentiating viral from bacterial pneumonia typically requires:

Laboratory tests (e.g., PCR, antigen tests, cultures)
Blood biomarkers
Detailed clinical history and physical examination

Therefore, the most clinically impactful outcome of this work is a robust, interpretable Normal vs Pneumonia classifier, with the multiclass extension providing further insight into the limitations of using imaging alone for etiologic diagnosis.

Summary

This project provides a compact but thorough case study in medical image classification, starting from a classical PCA + SVM baseline and progressing to modern deep CNNs with multiclass outputs and interpretability:

Classical ML baseline (PCA → SVM) establishes a reference point
ResNet-18 significantly improves Normal vs Pneumonia performance
ResNet-34 and ResNet-50 handle a more challenging 3-class task
Test-Time Augmentation and weight decay are key for stable generalization
Grad-CAM visualizations offer clinically meaningful insight into model behavior
Final ResNet-50 + TTA model achieves:
- 85.1% accuracy
- 0.8525 weighted F1
- 0.955 weighted AUC-ROC

Overall, the project demonstrates both technical depth (model design, training strategy, interpretability) and scientific reasoning about what chest X-rays can and cannot tell us about the underlying disease process.

Acknowledgments

The development of this project was supported by several influential resources: Andrew Ng’s Deep Learning Specialization, Aurélien Géron’s Hands-On Machine Learning, Jeremy Howard and Sylvain Gugger’s Deep Learning for Coders.