Chest X-Ray Pneumonia Classification

From PCA + SVM baseline to multiclass ResNet-50 with TTA and Grad-CAM

Python & PyTorch PCA & SVM ResNet-18 / ResNet-50 TTA & Grad-CAM

This project presents a single, coherent journey through pneumonia classification on chest X-rays, starting with a classical machine learning baseline and progressing to deep convolutional models capable of multiclass diagnosis. All experiments use the Kaggle Chest X-Ray (Pneumonia) dataset.

The workflow is organized into two major stages:

  • Phase 1: Normal vs Pneumonia (binary)
  • Phase 2: Normal vs Viral vs Bacterial pneumonia (multiclass)

Motivation

Chest X-rays are among the most widely used imaging modalities for respiratory illness. In many clinical settings, the first question is:

“Does this patient have pneumonia or not?”

This project tackles that core question first with a classical PCA + SVM baseline and a deep ResNet-18 binary classifier. It then extends to a more challenging—but clinically nuanced—task: differentiating Normal, Viral pneumonia, and Bacterial pneumonia from X-ray images alone.

The goals are to:

  • Show a clear progression from classical ML to deep learning
  • Quantitatively compare PCA + SVM vs ResNet-18 on the same binary task
  • Explore multiclass performance with ResNet-34 and ResNet-50
  • Use Grad-CAM to visualize how the network reasons
  • Discuss results in the context of real-world radiology practice

Dataset

  • Source: Kaggle Chest X-Ray Images (Pneumonia)
  • Modalities: Frontal chest X-rays
  • Tasks:
    • Binary: Normal vs Pneumonia
    • Multiclass: Normal vs Viral vs Bacterial pneumonia
  • Preprocessing:
    • Resize to 224×224 pixels
    • Optional grayscale conversion
    • Normalization to ImageNet mean and standard deviation
  • Data splits: train / validation / test, with a stratified 10% validation split created from the train set.
  • Data augmentation (train only):
    • Random horizontal flip
    • Small rotations
    • Random affine transforms (translation, shear)
    • Light color jitter

These augmentations encourage the models to become robust to common imaging variations such as patient positioning, orientation, and brightness differences.


Phase 1 – Classical ML and Binary Deep Learning (Normal vs Pneumonia)

1.1 Classical Baseline: PCA → SVM

The first part of the project builds a classical ML pipeline for the binary task of distinguishing Normal vs Pneumonia:

  • Convert images to grayscale (if needed) and resize to a fixed resolution
  • Flatten and standardize pixel intensities
  • Apply Principal Component Analysis (PCA) to retain ~95–97% of the variance
  • Train an RBF-kernel SVM with hyperparameter tuning for C and gamma
Scree and PCA plot
Figure: Scree plot (left) and cumulative explained variance (right) for PCA. The scree plot shows that the first few principal components capture the majority of variance, with subsequent components contributing minimally. The cumulative variance curve indicates that retaining ~97% of the dataset’s variance requires 497 principal components. This provides an efficient reduced-dimensional representation for classical machine learning while preserving most of the structure present in the original images.
  • Accuracy: ≈ 88%
  • ROC-AUC: ≈ 0.92
ROC_AUC and precision-recall plot
Figure: ROC curve (left) and Precision–Recall curve (right) for the PCA → SVM baseline on the Normal vs Pneumonia task. The SVM achieves an AUC of 0.92, indicating strong discrimination between classes, while the PR curve (AP = 0.95) shows high precision across most recall values. Together, these curves demonstrate that PCA-reduced features support solid classical ML performance before transitioning to deep CNNs.

Binary PCA → SVM performance (Normal vs Pneumonia):

This baseline is fast and interpretable, but it has clear limitations: it relies on global, hand-crafted features that struggle to capture subtle lung opacities and diffuse infiltrates.

1.2 Deep Learning: ResNet-18 Binary Classifier

The same binary classification problem is then tackled with a deep convolutional model:

  • Backbone: ResNet-18 pretrained on ImageNet
  • Modification: Replace final fully connected layer with a 2-class output head
  • Loss: Cross-entropy with class weights to address class imbalance
  • Optimization: Adam optimizer with learning rate scheduling
  • Training tricks:
    • Data augmentation (flip, rotation, affine, color jitter)
    • Validation-based early stopping

Binary ResNet-18 performance (Normal vs Pneumonia):

  • AUC-ROC: ≈ 0.95
  • F1-score: ≈ 0.90

Compared to the PCA + SVM pipeline, the ResNet-18 model delivers a clear improvement. It learns spatial and texture patterns directly from the images, which is especially important for subtle or atypical pneumonia cases. This stage demonstrates why CNNs are now the standard for medical image classification tasks.


Phase 2 – Multiclass Deep Learning (Normal / Viral / Bacterial)

The second stage extends the task to a more clinically nuanced and challenging setting: Normal vs Viral pneumonia vs Bacterial pneumonia.

2.1 ResNet-34: Establishing a Multiclass Baseline

The multiclass experiments begin with ResNet-34:

  • Pretrained on ImageNet
  • Final fully connected layer adapted for 3 classes
  • Class-weighted cross-entropy loss to balance Normal, Viral, and Bacterial samples
  • Fine-tuning strategies:
    • Unfreezing only the last residual block (layer4)
    • Unfreezing both layer3 and layer4
  • Regularization via weight decay (values around 1e-3 to 5e-4 explored)

ResNet-34 multiclass performance (typical best run):

  • Weighted F1: ≈ 0.839
  • Weighted AUC-ROC: ≈ 0.952

These results are strong, but improvements from additional fine-tuning eventually saturate, suggesting that the bottleneck is not purely model capacity—it is also the intrinsic visual overlap between Viral and Bacterial pneumonia on X-ray.

2.2 ResNet-50 with Strong Regularization and TTA (Final Model)

To explore whether a deeper model can capture finer distinctions, the project moves to ResNet-50:

  • Backbone: ResNet-50 pretrained on ImageNet
  • Output head: 3-class fully connected layer
  • Loss: Class-weighted cross-entropy
  • Regularization: Increased weight_decay to 2e-3 to control overfitting
  • Training scheme: same augmentation and early stopping pipeline as before
  • Test-Time Augmentation (TTA):
    • At inference, predictions are made on multiple augmented versions of each image (e.g., original and horizontally flipped)
    • Logits are averaged before computing the final softmax probabilities

Final Multiclass Performance (ResNet-50 + Strong Regularization + TTA)

Metric Score
Accuracy85.10%
Weighted F10.8525
Weighted AUC-ROC0.955
Virus F10.760
Normal F10.8598
Bacteria F10.9018

Confusion Matrix (Test Set)

mutli-class confusion matrix
Figure: Multiclass confusion matrix for the final ResNet-50 model with TTA. The classifier achieves strong performance on Normal and Bacterial pneumonia, with most samples correctly assigned to their respective classes. Viral pneumonia remains the most challenging category, showing moderate confusion with Bacterial cases—a pattern consistent with the clinical overlap in radiographic appearance between viral and bacterial pneumonia.

In this configuration, Normal and Bacterial pneumonia are well-separated, while Viral pneumonia remains the most ambiguous class. This is aligned with clinical expectations: viral and bacterial pneumonia often share overlapping radiographic patterns, making them difficult to distinguish even for experienced radiologists.

Effect of TTA: Evaluating the ResNet-50 model with Test-Time Augmentation further improves the weighted F1 score and AUC, indicating that the model’s predictions are stable under natural image variations and that averaging across augmented views reduces prediction noise near decision boundaries.

Grad-CAM Interpretability

To understand where the network is looking when it predicts a given class, Grad-CAM heatmaps are generated from the final convolutional layers of the trained models.

  • For a chosen target class (e.g., Normal, Viral, or Bacterial), gradients of the class score are backpropagated to the last conv layer.
  • Channel-wise importance weights are computed via global average pooling of these gradients.
  • A weighted sum of the feature maps is taken, followed by a ReLU and upsampling to the input image size.
  • The resulting heatmap is normalized and overlaid on the original chest X-ray.
Grad-CAM for Normal X-ray
Normal Chest X-Ray
Grad-CAM shows low activation in clear lung fields, consistent with a normal finding.
Grad-CAM for Pneumonia X-ray
Pneumonia (Viral/Bacterial)
Grad-CAM highlights regions of opacity and consolidation that drive the model’s decision.

These visualizations confirm that the model is not focusing on irrelevant artifacts, but rather on anatomically meaningful lung regions, which is essential for trust and clinical interpretability.


Clinical Interpretation and Limitations

The results of this project highlight an important distinction between what is technically achievable with deep learning and what is clinically realistic with chest X-ray data alone:

  • Normal vs Pneumonia: The ResNet-18 binary model and the ResNet-50 multiclass model both show strong capability for detecting pneumonia, making them potentially useful as triage or decision-support tools.
  • Viral vs Bacterial: Even with deeper networks, careful regularization, and TTA, the separation between Viral and Bacterial pneumonia remains imperfect. This is expected because pathogen-specific information is only partially encoded in plain radiographs.

In real-world practice, differentiating viral from bacterial pneumonia typically requires:

  • Laboratory tests (e.g., PCR, antigen tests, cultures)
  • Blood biomarkers
  • Detailed clinical history and physical examination

Therefore, the most clinically impactful outcome of this work is a robust, interpretable Normal vs Pneumonia classifier, with the multiclass extension providing further insight into the limitations of using imaging alone for etiologic diagnosis.


Summary

This project provides a compact but thorough case study in medical image classification, starting from a classical PCA + SVM baseline and progressing to modern deep CNNs with multiclass outputs and interpretability:

  • Classical ML baseline (PCA → SVM) establishes a reference point
  • ResNet-18 significantly improves Normal vs Pneumonia performance
  • ResNet-34 and ResNet-50 handle a more challenging 3-class task
  • Test-Time Augmentation and weight decay are key for stable generalization
  • Grad-CAM visualizations offer clinically meaningful insight into model behavior
  • Final ResNet-50 + TTA model achieves:
    • 85.1% accuracy
    • 0.8525 weighted F1
    • 0.955 weighted AUC-ROC

Overall, the project demonstrates both technical depth (model design, training strategy, interpretability) and scientific reasoning about what chest X-rays can and cannot tell us about the underlying disease process.

Acknowledgments

The development of this project was supported by several influential resources: Andrew Ng’s Deep Learning Specialization, Aurélien Géron’s Hands-On Machine Learning, Jeremy Howard and Sylvain Gugger’s Deep Learning for Coders.

© 2025 Imran Khan