SEM Microstructure Phase Segmentation using Deep Learning

Overview

Accurate phase separation in Scanning Electron Microscope (SEM) imagery is essential for downstream quantitative metrology tasks such as morphology analysis, grain boundary characterization, and defect detection. Traditional segmentation approaches based on thresholding or manual annotation are often unreliable in the presence of imaging noise, low contrast regions, and overlapping material phases.

This project investigates whether deep learning-based semantic segmentation can reliably separate microstructural phases from grayscale SEM imagery under severe class imbalance conditions. A multi-class U-Net architecture was developed and evaluated using both a custom implementation and a transfer learning approach with a pretrained ResNet34 encoder.

The final model achieved:

  • Test Mean IoU: ~0.88
  • Test Mean Dice Score: ~0.91

indicating strong agreement between predicted and ground truth microstructural phase boundaries.

Dataset used in this study was obtained from the publicly available SEM microstructure dataset hosted on Zenodo.


Dataset Engineering & Label Integrity

Each SEM sample is provided with a corresponding pixel-wise segmentation mask where integer IDs represent different material phases.

Before model development, several dataset integrity checks were performed:

  • Remapping raw label IDs to contiguous class indices
  • Verification of valid mask values across the dataset
  • Overlay visualization of masks with input SEM images
  • Pixel-frequency analysis of each phase class
  • Group-wise training/validation split to prevent leakage from augmented patches

Pixel fraction analysis revealed a highly imbalanced distribution across classes:

Phase Approx. Pixel Fraction
Class 0 ~34%
Class 1 <1%
Class 2 ~41%
Class 3 ~11%
Class 4 ~13%

This imbalance was expected to negatively affect model performance for minority phases and motivated the use of class-weighted loss functions during training.

SEM microstructure segmentation overview showing original image, remapped mask labels, and model overlay
Figure: Example visualization from the SEM microstructure segmentation dataset. Left: original grayscale SEM image. Middle: ground-truth segmentation mask with remapped class IDs (0–4). Right: overlay visualization combining the mask with the original image.

Baseline Model: U-Net from Scratch

A minimal U-Net architecture was implemented from scratch using PyTorch to establish a performance baseline.

Training Setup:

  • Input: Grayscale SEM image (1 channel)
  • Output: 5-class segmentation mask
  • Loss Function: Hybrid Cross Entropy + Dice Loss
  • Class Weights: Computed from pixel frequency distribution
  • Optimizer: AdamW

Initial training results indicated:

  • Strong performance on dominant classes
  • Near-zero IoU for minority phases

This behavior confirmed that standard convolutional feature learning was insufficient to capture rare microstructural patterns from limited training data.


Transfer Learning: U-Net with Pretrained ResNet34 Encoder

To improve generalization across underrepresented phases, transfer learning was applied using a pretrained ResNet34 encoder from the ImageNet dataset.

Adaptations included:

  • Replicating grayscale SEM input to 3 channels
  • Freezing encoder weights for initial training epochs
  • Training decoder layers independently
  • Gradual encoder unfreezing
  • Reduced learning rate during fine-tuning
  • ReduceLROnPlateau scheduler
  • Early stopping based on validation IoU

Loss Function:

$\mathcal{L} = 0.6 \cdot \text{CrossEntropy} + 0.4 \cdot \text{DiceLoss}$

This training schedule resulted in improved convergence and significantly higher IoU values across previously underperforming classes.


Evaluation & Model Validation

Model performance was evaluated using several complementary metrics:

Pixel-wise Metrics

  • Mean Intersection over Union (mIoU)
  • Mean Dice Score

Confusion Matrix (Normalized %)

Provides insight into inter-class confusion at the pixel level.

Example observations:

  • Class 1 occasionally misclassified as Class 0
  • Class 3 shows boundary confusion with adjacent phases
  • Majority classes exhibit >90% prediction accuracy
Confusion matrix for 5-class SEM microstructure segmentation model
Figure: Global pixel-wise confusion matrix (%) for the trained 5-class segmentation model. Diagonal elements represent correct predictions for each microstructural phase, while off-diagonal entries correspond to misclassified pixels. High diagonal dominance demonstrates strong class separability for most phases, whereas moderate confusion between certain classes reflects visually similar texture or boundary regions in SEM microstructure images.

Boundary F1 Score

To assess morphological edge fidelity, boundary-aware F1 score was computed between predicted masks and ground truth segmentation.

This metric is particularly relevant in SEM-based metrology workflows where accurate phase boundary delineation directly influences morphology and dimensional measurements.


Class Frequency vs IoU Analysis

A correlation plot between class pixel frequency and IoU was generated to examine model sensitivity to data imbalance.

Results indicate:

  • Improved IoU for minority classes after transfer learning
  • Reduced performance gap between dominant and rare phases
  • Balanced segmentation capability across phase types

Qualitative Overlay Results

Model predictions were visually compared against ground truth masks using alpha-overlay visualization.

Displayed examples include:

  1. Ground Truth segmentation
  2. Model prediction
  3. Overlay comparison

These qualitative results demonstrate consistent alignment between predicted and actual phase boundaries.


Artifacts & Reproducibility

  • Trained model weights (.pt) available for download
  • Full implementation and training pipeline provided in GitHub repository
  • Evaluation scripts for IoU, Dice, Confusion Matrix, and Boundary F1 included

Conclusion

This project developed and evaluated a deep learning pipeline for multi-class semantic segmentation of Scanning Electron Microscopy (SEM) images, targeting five microstructural phases: Matrix, Carbide, Void, Reprecipitate, and Dilution zone. A UNet architecture with a pretrained ResNet-34 encoder was trained on an augmented dataset and evaluated on a held-out test set of 54 images.

Performance Summary

The model demonstrated strong segmentation performance across the test set. Evaluated at the image level to avoid batch-size bias, the model achieved the following aggregate results:

Metric Mean Std Min Max
mIoU 0.872 0.088 0.723 0.958
mDice 0.912 0.079 0.759 0.978
mBF1 0.728 0.027 0.678 0.773

The mean Intersection over Union (mIoU) of 0.872 and mean Dice score of 0.912 confirm that the model produces highly accurate pixel-wise segmentation masks across the majority of test images. The mean Boundary F1 (mBF1) of 0.728 reflects moderate precision at class boundaries, which is expected given the fine structural detail and low contrast between adjacent phases in SEM imagery.

Per-Class Analysis

The pixel-wise confusion matrix revealed notable differences in per-class difficulty:

Class Correctly Classified (%) Primary Confusion
Matrix 97.3 Dilution (1.97%)
Carbide 75.4 Matrix (22.3%)
Void 98.9 Reprecipitate (1.14%)
Reprecipitate 94.6 Void (2.78%)
Dilution 93.3 Matrix (4.84%)

The Void and Matrix classes achieved near-perfect classification rates, likely due to their visually distinct appearance in SEM images. The Carbide class was the most challenging, with 22.3% of Carbide pixels misclassified as Matrix. This is consistent with the visual similarity between fine carbide precipitates and the surrounding matrix at certain imaging conditions, and is reflected in the lowest per-class IoU observed during evaluation.

Qualitative Evaluation on Held-Out Test Data

The best performing image group (mIoU = 0.961) consisted of contrast- and brightness-augmented variants of the same base image (sem700_x256_y288), suggesting the model handles photometric variation well.

Best performing segmentation example with high IoU
Figure: Best-performing prediction on the held-out test set (mIoU ≈ 0.961). Left: original grayscale SEM microstructure image. Middle: ground-truth segmentation mask with remapped class IDs (0–4). Right: predicted segmentation overlay. The strong agreement between the predicted regions and the ground-truth mask indicates that the model successfully captures the dominant structural boundaries and phase distributions present in the microstructure.

The lowest scoring images (sem800_x0_y256 with VerticalFlip and GridDistortion augmentations) achieved mIoU = 0.710, indicating that geometric distortions — particularly grid distortion — remain a challenge for precise boundary delineation, consistent with the relatively lower mBF1 scores observed overall.

Lower performing segmentation example showing subtle misclassifications
Figure: Lower-performing prediction example from the held-out test set (mIoU ≈ 0.71). Left: original grayscale SEM image. Middle: ground-truth segmentation mask. Right: predicted overlay visualization. The overall microstructure is visually similar to the high-performing case, but subtle boundary differences lead to localized misclassification of certain regions. These errors likely arise from limited representation of similar morphological patterns in the training dataset, suggesting that additional labeled samples could further improve segmentation performance.

Limitations and Future Work

Despite strong overall performance, several limitations remain. The Carbide class presents a persistent source of confusion with the Matrix class, which could be addressed through targeted strategies such as increased augmentation for underrepresented classes, harder negative mining, or incorporating higher resolution inputs to preserve fine-grained carbide morphology. The mBF1 scores, while acceptable, suggest that boundary precision could be improved through boundary-aware loss functions or post-processing refinement such as conditional random fields (CRF).

Future work could explore applying the pretrained pipeline to new SEM imaging conditions, extending the class set to additional microstructural phases, and deploying the model as a real-time inference service — for which the FastAPI and Docker integration planned as the next stage of this project provides a natural foundation.


Acknowledgement

SEM microstructure dataset used in this work was obtained from the publicly available Zenodo repository.

Safdar, M. (2025). Scanning Electron Microscopy (SEM) Dataset of Additively Manufactured Ni-WC Metal Matrix Composites for Semantic Segmentation (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.17315241

All credit for dataset creation and annotation belongs to the original contributors.


© 2026 Imran Khan