Severstal: Steel Defect Detection
12th place solution overview
The Team:
Ilya Dobrynin @IlyaDo
Pavel Yakubovskiy @qubvel ML trainings, 16.11.2019 Denis Kolpakov @kolpakovd Content
1. Task overview 2. Our team solution 3. Other teams solutions Task overview Task Overview
Images 256x1600 (~ 1 GB)
● 12658 train ● 1801 test (public) ● ~3600 test (private)
Defect masks
● 4 classes (unknown) ● rle encoded
Metric Mean Dice = 2TP / (2TP + FP + FN) Task Overview
● 85.56 % empty masks (empty submit - 0.85560 LB) ● 247 images with defect of class “2” ● no masks intersection ● same train - test class distribution (LB probing) ● different “black area” train - test distribution ● different steel rolls for train/public/private * Task Overview
● not accurate masks ● image duplicates ● image duplicates with different masks
https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/107053#latest-621775 Task Overview
Synchronous kernel-only competition
Limitations:
1. No internet 2. 1 hour run-time 3. 30 hours/week with GPU
The hard part - don't forget about “fair” submission code: Task Overview
Holdout - 451 sample from the trainset, stratified by class.
Local validation ~0.95 Public Leaderboard ~0.91
Steel Adversarial Validation - a classifier can distinguish whether it is a train or test image with 85% accuracy. Solution overview Best Solution Overview Images
3 step pipeline: empty MC 1. Multilabel Classification (MC) empty 2. Multilabel Segmentation (MS) MS
- works as additional classifier non-empty 3. Binary Segmentation (BS) BS1 BS2 BS3 BS4
RLE
Submission 1. Multilabel Classification
Model Senet154, 3 folds Tips:
Size Resize 128x800 1. Pseudolabels (+0.002 on LB):
Augs Normalize, HFlip Confidence = np.mean(np.abs(np.subtract(prob_cls, 0.5)))
Optim RAdam 2. Binarization threshold search based on f1 score.
Loss BCE
TTA None 2. Multilabel Segmentation
Tips Models PSPNet on se-ResNext101_32x4d FPN on Senet154 1. Custom augmentation CropNonEmptyMaskIfExists: PSPNet on Senet154 crop image with defect area if one exists, else make random crop. Size Full images, Crops 256x256 2. Using Senet154 classifier from Stage 1 as a Augs Normalize, HFlip, VFlip, backbone for the segmentation. CropNonEmptyMaskIfExists 3. Remove small holes and objects < 128 px 4. Mask vanishing if the sum of pixels is less than Optim Adam, RAdam (700; 1300; 1000; 1300) Loss BCE, BCE + Jaccard 5. Thresholds (0.45; 0.45; 0.45; 0.45)
TTA HFlip 3. Binary Segmentation
● Unet / FPN (SE-ResNeXt50 + SCSE attention in decoder) ● Only images with defects Unet FPN ● CropNonEmptyMaskIfExists 256x768 ● Train 60 epochs with MultiStepLR ● RAdam optimizer ● Loss: (1 - Dice) + Focal(gamma=2) ● Batch size: 8 images, no accumulation SE-ResNext50 ● 5 best checkpoints saving for SWA ● Train augmentations: Flip, Scale, Brightness, Illumination, JpegCompression ● Test augmentations: Flip LB score improvement over time
16 - multi-label classification senet154 + multi-label segmentation (6 models)
14 - add pseudolabels for classifier
13 - add 2 more models for multi-label segmentation
10 - mask min size threshold optimization
04 - add binary segmentation (1 model per class) as third stage
03 - ensemble of binary segmentation models on third stage What did not work
● multi-task training (classification + segmentation) Multi-task network ● binary classification (1 vs all)
● EfficientNets mask Encoder Decoder ● soft labels ● hard augmentations Global Pooling ● complex LR scheduling schemes Linear, Sigmoid ● hard examples mining label
● batchnorm fusion ● classification and segmentation soft merging ○ mean_prob = mean(mask[mask > threshold]) Random Shuffle 1st place solution
● Score 0.92124 / 0.90883
● Classification Label “1” ○ EfficientNet-B1 ○ ResNet34
● Large crops 224 x 1558 Label “0” ● Segmentation ○ 3 x Unet(EfficientNet-B3) ○ Public kernel models ● Defect blackout augmentation ● 2 rounds of pseudolabels What if “1st place solution”
● Score 0.91854 / 0.91023 - best submission 1. When the classifier output is high (fault), we leave the ● Classification + Segmentation pixel thresholds at their ● Custom merging approach normal level. 2. When the classifier output ● Pseudolabels is low (no fault), we raise ● RMS averaging costs 1st place the pixel threshold by some factor. ● Grayscale images
Averaging Technique Public LB Private LB
RMS 0.91844 0.90274
Mean 0.91699 0.90975 4th place solution
● Two head: segmentation + classification ● Inference soft gating: mask.sigmoid() * classifier.sigmoid() ● FPN, Unet ○ densenet201 ○ efficientnetb5 ○ resnet34 ○ seresnext50 ● Focal loss / BCE + Dice loss ● Training with grayscale ● Catalyst ● FP16 Apex Efficiency round Useful Tools
[PyTorch] https://github.com/qubvel/segmentation_models.pytorch
[PyTorch] https://github.com/qubvel/ttach
[Keras] https://github.com/qubvel/segmentation_models
[Keras] https://github.com/qubvel/efficientnet
[Keras] https://github.com/qubvel/tta_wrapper
[Everywhere] https://github.com/albu/albumentations Useful Papers
Gradients https://medium.com/huggingface/training-larger-batches-practical-ti accumulation ps-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255
Stochastic Weight https://towardsdatascience.com/stochastic-weight-averaging-a-new Averaging -way-to-get-state-of-the-art-results-in-deep-learning-c639ccf36a
Pseudolabels https://arxiv.org/abs/1904.04445 implementation Contacts
Ilya Dobrynin Pavel Yakubovskiy Denis Kolpakov
@llyaDo @qubvel @kolpakovd iliyadobrynin@yandex.ru [email protected] [email protected] @IlyaDo @qubvel @kolpakovdenis