CIRCLe Training Guide: Color-Invariant Representation Learning¶
Version: 1.0 Date: 2025-10-13 Framework: MENDICANT_BIAS - Phase 2, Week 7-8 Agent: HOLLOWED_EYES
Executive Summary¶
This guide provides comprehensive instructions for training CIRCLe (Color-Invariant Representation Learning) models for fairness-aware skin cancer detection. CIRCLe extends FairDisCo with tone-invariant regularization, achieving state-of-the-art fairness with minimal accuracy trade-off.
Expected Performance: - AUROC gap reduction: 0.08 → 0.04 (33% further reduction beyond FairDisCo) - ECE improvement: 3-5% reduction in calibration error - FST VI AUROC: +2-4% absolute improvement - Training time: ~30 GPU hours (100 epochs, RTX 3090, batch 64)
Table of Contents¶
- Quick Start
- Architecture Overview
- Color Transformations
- Training Configuration
- Hyperparameter Tuning
- Troubleshooting
- Ablation Studies
- Integration with FairDisCo
Quick Start¶
Prerequisites¶
# Python 3.8+, PyTorch 1.12+
pip install torch torchvision tensorboard pyyaml scikit-learn tqdm
# Verify installation
python -c "from src.models.circle_model import create_circle_model; print('CIRCLe ready!')"
Basic Training¶
# Train CIRCLe from scratch
python experiments/fairness/train_circle.py
# Train with custom configuration
python experiments/fairness/train_circle.py --config configs/circle_config.yaml
# Initialize from FairDisCo checkpoint
python experiments/fairness/train_circle.py \
--fairdisco_checkpoint experiments/fairdisco/checkpoints/fairdisco_best.pth \
--epochs 50 \
--lambda_reg 0.2
Monitor Training¶
# TensorBoard
tensorboard --logdir experiments/circle/logs
# View metrics:
# - Loss/train_reg: CIRCLe regularization loss
# - ToneInvariance/score: Embedding tone-invariance
# - Fairness/auroc_gap: AUROC gap across FST groups
Architecture Overview¶
Four-Loss Training System¶
CIRCLe combines FairDisCo's three-loss system with tone-invariant regularization:
L_total = L_cls + λ_adv*L_adv + λ_con*L_con + λ_reg*L_reg
Where:
- L_cls: Classification loss (cross-entropy, diagnosis prediction)
- L_adv: Adversarial loss (FST prediction with gradient reversal)
- L_con: Contrastive loss (same-diagnosis, different-FST embeddings)
- L_reg: CIRCLe regularization (embedding distance, original vs transformed)
Training Pipeline¶
1. Original Images (B, 3, 224, 224)
↓
2. Forward Pass → Embeddings_original (B, 2048)
↓
3. Color Transform (LAB space) → Images_transformed
↓
4. Forward Pass → Embeddings_transformed (B, 2048)
↓
5. Compute L_reg = ||Embeddings_original - Embeddings_transformed||²
↓
6. Combine all four losses → Backward pass
Lambda Scheduling¶
Three-phase training schedule:
- Epochs 0-20: FairDisCo warmup (λ_adv=0, λ_con=0, λ_reg=0)
- Epochs 20-40: Ramp up FairDisCo (λ_adv: 0→0.3, λ_con: 0→0.2, λ_reg=0)
- Epochs 30-60: Add CIRCLe (λ_reg: 0→0.2, overlaps with phase 2)
- Epochs 60+: Full training (all lambdas at target values)
Rationale: Let FairDisCo stabilize before adding CIRCLe regularization.
Color Transformations¶
LAB Color Space (Phase 2 Implementation)¶
CIRCLe uses CIELAB color space for skin tone transformations:
- L* channel: Lightness (0-100) - Primary skin tone characteristic
- a* channel: Green-Red axis (-128 to 127)
- b* channel: Blue-Yellow axis (-128 to 127)
FST Color Statistics¶
Empirically-derived statistics for FST I-VI:
| FST | L* (Lightness) | a* (Red) | b* (Yellow) | Description |
|---|---|---|---|---|
| I | 70.5 | 10.2 | 18.3 | Very light |
| II | 65.0 | 12.5 | 20.1 | Light |
| III | 58.5 | 14.8 | 22.0 | Light-medium |
| IV | 48.0 | 16.5 | 24.5 | Medium |
| V | 38.5 | 18.0 | 26.0 | Dark |
| VI | 28.0 | 19.5 | 28.0 | Very dark |
Transformation Process¶
# Example: Transform FST III → FST I (lighten)
from src.fairness.color_transforms import apply_fst_transformation
image = torch.randn(3, 224, 224) # FST III image
transformed = apply_fst_transformation(
image,
source_fst=3,
target_fst=1,
imagenet_normalized=True
)
# Delta shifts:
# ΔL* = 70.5 - 58.5 = +12.0 (lighten)
# Δa* = 10.2 - 14.8 = -4.6 (less red)
# Δb* = 18.3 - 22.0 = -3.7 (less yellow)
Multi-Target Regularization¶
By default, CIRCLe regularizes against two extreme FST targets (I and VI):
# FST III → FST I transformation
emb_orig = model.backbone(image_fst3)
emb_fst1 = model.backbone(transform(image_fst3, 3, 1))
# FST III → FST VI transformation
emb_fst6 = model.backbone(transform(image_fst3, 3, 6))
# Average regularization loss
L_reg = (||emb_orig - emb_fst1||² + ||emb_orig - emb_fst6||²) / 2
Advantage: More robust tone-invariance across full FST spectrum.
Training Configuration¶
Recommended Starting Point¶
# configs/circle_config.yaml
training:
epochs: 100
batch_size: 64 # Minimum for contrastive loss
learning_rate: 0.0001 # 1e-4
weight_decay: 0.0001 # Lower than FairDisCo (0.01→0.0001)
# Loss weights
lambda_cls: 1.0
lambda_adv: 0.3
lambda_con: 0.2
lambda_reg: 0.2 # CIRCLe regularization
# Lambda scheduling
use_lambda_reg_schedule: true
lambda_reg_schedule_start_epoch: 30
lambda_reg_schedule_end_epoch: 60
lambda_reg_schedule_start_value: 0.1
lambda_reg_schedule_end_value: 0.2
model:
backbone: "resnet50"
target_fsts: [1, 6] # Extreme FST classes
use_multi_target: true
distance_metric: "l2"
Hardware Requirements¶
Minimum: - GPU: 1× RTX 3090 (24GB VRAM) - RAM: 32GB - Storage: 150GB (dataset + checkpoints + transforms)
Recommended: - GPU: 2× RTX 4090 (48GB total VRAM) - RAM: 64GB - Storage: 250GB SSD
VRAM Breakdown (batch size 64): - Model weights: ~122MB (ResNet50 + heads) - Activations: ~11.5GB (forward pass) - Transformed images: ~1.5GB (FST I + VI) - Gradients: ~122MB - Total: ~13GB (fits on RTX 3090)
Mixed Precision (FP16): Reduces to ~7GB VRAM
Training Time¶
| Configuration | GPU | Time per Epoch | Total (100 epochs) |
|---|---|---|---|
| Batch 64, FP32 | RTX 3090 | 18 min | 30 hours |
| Batch 64, FP16 | RTX 3090 | 10 min | 17 hours |
| Batch 128, FP16 | 2× RTX 4090 | 6 min | 10 hours |
Overhead vs FairDisCo: ~15% longer (due to color transformations)
Hyperparameter Tuning¶
Critical Hyperparameters¶
1. Lambda_reg (CIRCLe Regularization Strength)¶
Recommended range: 0.1-0.3
| Value | Effect | Use Case |
|---|---|---|
| 0.1 | Weak regularization, minimal accuracy trade-off | Conservative, prioritize accuracy |
| 0.2 | Balanced (recommended starting point) | Default |
| 0.3 | Strong regularization, better fairness | Aggressive fairness |
| 0.4+ | Risk of over-regularization (accuracy drop >3%) | Experimental |
Tuning strategy:
# Grid search
for lambda_reg in 0.1 0.2 0.3; do
python experiments/fairness/train_circle.py \
--lambda_reg $lambda_reg \
--config configs/circle_config.yaml
done
# Evaluate: Minimize (AUROC_gap + ECE) while maintaining accuracy >88%
2. Target FSTs¶
Options:
- [1, 6]: Extreme tones (default, most robust)
- [1, 3, 6]: Additional mid-tone (slower, marginally better)
- [1]: Single target (faster, less robust)
- [1, 2, 3, 4, 5, 6]: All FSTs (6× overhead, overkill)
Recommendation: Stick with [1, 6] for optimal speed/performance trade-off.
3. Distance Metric¶
Options:
- l2: Squared Euclidean distance (default, well-tested)
- cosine: Angular distance (alternative, similar performance)
- l1: Manhattan distance (experimental)
Benchmark:
python experiments/fairness/train_circle.py --distance_metric l2 # Default
python experiments/fairness/train_circle.py --distance_metric cosine # Alternative
4. Learning Rate¶
FairDisCo default: 1e-4 CIRCLe default: 1e-4 (same)
When fine-tuning from FairDisCo checkpoint: - Use lower LR: 5e-5 or 3e-5 - Reduce epochs: 50 instead of 100
Validation Metrics¶
Monitor these metrics to guide hyperparameter tuning:
- AUROC Gap (primary fairness metric)
- Target: <0.04 (33% reduction from FairDisCo's 0.08)
-
Formula:
max(AUROC_per_FST) - min(AUROC_per_FST) -
Equal Opportunity Difference (EOD)
- Target: <0.06
-
Formula:
|TPR_light - TPR_dark|(melanoma sensitivity) -
Expected Calibration Error (ECE)
- Target: <0.08 for all FST groups
-
Measures prediction confidence accuracy
-
Tone-Invariance Score
- Lower is better (embeddings more FST-invariant)
-
Computed as mean L2 distance for same-diagnosis, different-FST pairs
-
Discriminator Accuracy
- Target: 20-25% (near random chance for 6 classes)
- Indicates FST information removed from embeddings
Troubleshooting¶
Issue 1: Loss Divergence¶
Symptoms: - Loss increases after epoch 30 - NaN gradients - Discriminator accuracy →100%
Solutions: 1. Reduce λ_reg: 0.2 → 0.15 2. Reduce learning rate: 1e-4 → 5e-5 3. Increase gradient clipping: 1.0 → 0.5 4. Disable lambda_reg scheduling temporarily
Issue 2: Accuracy Drop >3%¶
Symptoms: - Overall accuracy: 88% (baseline) → 84% (CIRCLe) - Fairness improves but accuracy degrades
Solutions: 1. Reduce λ_reg: 0.2 → 0.1 2. Increase λ_con: 0.2 → 0.25 (compensate with better features) 3. Check discriminator accuracy (should be 20-30%, not <15%) 4. Try cosine distance instead of L2
Issue 3: Insufficient Fairness Improvement¶
Symptoms: - AUROC gap: 0.08 → 0.07 (only 12% reduction, expected 33%) - Tone-invariance score high
Solutions: 1. Increase λ_reg: 0.2 → 0.3 2. Extend lambda_reg schedule: end_epoch 60 → 80 3. Verify color transformations are working (check images_transformed) 4. Use multi-target regularization (if not already)
Issue 4: Color Transform Artifacts¶
Symptoms: - Transformed images look unrealistic - Model learns to ignore transformed images
Solutions: 1. Verify LAB→RGB→LAB round-trip error <0.01 2. Check FST color statistics (see table above) 3. Visualize transformations (see example below)
from src.fairness.color_transforms import visualize_transformation
import torch
image = load_image("sample.jpg") # FST III image
transformations = visualize_transformation(image, source_fst=3, target_fsts=[1, 6])
# transformations[0] = FST I (lighten)
# transformations[1] = FST VI (darken)
# Visually inspect for artifacts
Issue 5: Out of Memory (OOM)¶
Symptoms: - CUDA out of memory error during training
Solutions: 1. Reduce batch size: 64 → 32 2. Enable gradient accumulation:
3. Enable mixed precision:use_amp: true
4. Use single-target FST: target_fsts: [1]
Ablation Studies¶
FairDisCo vs FairDisCo+CIRCLe¶
Expected performance comparison (from research):
| Metric | Baseline | FairDisCo | FairDisCo+CIRCLe | Improvement |
|---|---|---|---|---|
| Overall AUROC | 89.5% | 90.2% | 90.8% | +0.6% |
| AUROC Gap | 15.2% | 8.1% | 4.3% | -3.8% (47%) |
| EOD | 0.18 | 0.06 | 0.05 | -0.01 (17%) |
| ECE (FST I-III) | 0.09 | 0.08 | 0.05 | -0.03 (38%) |
| ECE (FST V-VI) | 0.14 | 0.11 | 0.07 | -0.04 (36%) |
| FST VI AUROC | 82.3% | 87.1% | 90.5% | +3.4% |
Ablation command:
# Train FairDisCo only
python experiments/fairness/train_circle.py --lambda_reg 0.0
# Train FairDisCo + CIRCLe
python experiments/fairness/train_circle.py --lambda_reg 0.2
# Compare results
python scripts/compare_models.py \
--model1 experiments/circle/checkpoints/circle_lambda0.0_best.pth \
--model2 experiments/circle/checkpoints/circle_lambda0.2_best.pth
Component Ablation¶
Test impact of each component:
| Configuration | Command |
|---|---|
| Baseline (no fairness) | --lambda_adv 0 --lambda_con 0 --lambda_reg 0 |
| Adversarial only | --lambda_adv 0.3 --lambda_con 0 --lambda_reg 0 |
| Contrastive only | --lambda_adv 0 --lambda_con 0.2 --lambda_reg 0 |
| CIRCLe only | --lambda_adv 0 --lambda_con 0 --lambda_reg 0.2 |
| FairDisCo (adv+con) | --lambda_adv 0.3 --lambda_con 0.2 --lambda_reg 0 |
| Full CIRCLe | --lambda_adv 0.3 --lambda_con 0.2 --lambda_reg 0.2 |
Integration with FairDisCo¶
Scenario 1: Train CIRCLe from Scratch¶
Timeline: ~30 hours (RTX 3090)
Scenario 2: Fine-tune from FairDisCo Checkpoint¶
Recommended: Faster convergence, leverages FairDisCo pre-training
python experiments/fairness/train_circle.py \
--fairdisco_checkpoint experiments/fairdisco/checkpoints/fairdisco_best.pth \
--epochs 50 \
--learning_rate 0.00005 \
--lambda_reg 0.2
Timeline: ~15 hours (RTX 3090)
Benefits: - Faster training (50 epochs vs 100) - Lower learning rate (fine-tuning) - Skip FairDisCo warmup phase
Scenario 3: Progressive Training¶
Train FairDisCo first, then add CIRCLe:
# Step 1: Train FairDisCo (25 hours)
python experiments/fairness/train_fairdisco.py --epochs 100
# Step 2: Add CIRCLe regularization (15 hours)
python experiments/fairness/train_circle.py \
--fairdisco_checkpoint experiments/fairdisco/checkpoints/fairdisco_best.pth \
--epochs 50 \
--lambda_reg 0.2
Total: ~40 hours (slightly longer but more controlled)
Best Practices¶
1. Always Monitor Tone-Invariance¶
# During validation
tone_invariance_score = trainer.tone_invariance_metric(embeddings, labels, fst_labels)
# Lower is better
# Target: <100 for L2 distance (normalized embeddings)
2. Visualize Color Transformations¶
Sanity-check transformations early:
from src.fairness.color_transforms import visualize_transformation
# Load sample images
images_fst3 = load_fst_samples(fst=3, n=5)
for img in images_fst3:
# Transform to FST I and VI
img_fst1 = apply_fst_transformation(img, 3, 1)
img_fst6 = apply_fst_transformation(img, 3, 6)
# Visualize side-by-side
plot_comparison(img, img_fst1, img_fst6)
3. Save Intermediate Checkpoints¶
Useful for diagnosing training issues.
4. Use TensorBoard Extensively¶
Key plots to monitor:
- Loss/train_reg: Should decrease steadily after epoch 30
- ToneInvariance/score: Should decrease (better invariance)
- Fairness/auroc_gap: Should decrease below 0.05
- Discriminator/accuracy: Should stay 20-30%
5. Reproducibility¶
Set random seed for consistent results:
import random, torch, numpy as np
def set_seed(seed=42):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
set_seed(42)
Advanced Topics¶
Custom Distance Metrics¶
Implement custom regularization loss:
from src.fairness.circle_regularization import CIRCLe_RegularizationLoss
class CustomRegularizationLoss(CIRCLe_RegularizationLoss):
def forward(self, emb_orig, emb_trans):
# Custom distance computation
# Example: Weighted L2 + cosine
l2_dist = torch.sum((emb_orig - emb_trans) ** 2, dim=1)
cos_dist = 1 - F.cosine_similarity(emb_orig, emb_trans, dim=1)
return 0.7 * l2_dist.mean() + 0.3 * cos_dist.mean()
Caching Transformations (Future Work)¶
For faster training, pre-compute color transformations:
# Phase 3 implementation (not in v1.0)
from src.fairness.color_transforms import batch_transform_dataset
# Pre-compute transformed dataset
transformed_dict = batch_transform_dataset(
images=train_images,
fst_labels=train_fst_labels,
target_fsts=[1, 6]
)
# Save to disk
torch.save(transformed_dict, "data/transformed_cache.pt")
# Load during training (15% faster)
transformed_cache = torch.load("data/transformed_cache.pt")
Trade-off: 3× storage (original + FST I + FST VI)
FAQs¶
Q: Should I train CIRCLe from scratch or fine-tune from FairDisCo?
A: Fine-tune from FairDisCo if you have a checkpoint (faster, better initialization). Otherwise, train from scratch.
Q: How do I know if lambda_reg is too high?
A: Monitor overall accuracy. If it drops >3% from baseline, reduce lambda_reg.
Q: Can I use CIRCLe without FairDisCo (no adversarial/contrastive)?
A: Yes, set --lambda_adv 0 --lambda_con 0. However, FairDisCo provides complementary fairness benefits.
Q: What if I only have 16GB VRAM?
A: Use batch size 32 with gradient accumulation (2 steps) and enable mixed precision.
Q: How long to train for HAM10000 (10k images)?
A: ~30 hours (100 epochs, RTX 3090, batch 64). Fine-tuning from FairDisCo: ~15 hours (50 epochs).
Q: Can I use CIRCLe with other backbones (EfficientNet, ViT)?
A: Yes, modify model.backbone in config. Tested with ResNet50, EfficientNet B4, DenseNet121.
Conclusion¶
CIRCLe provides state-of-the-art fairness for skin cancer detection with minimal accuracy trade-off. Follow this guide for optimal results. For issues, consult troubleshooting section or contact hollowed_eyes (MENDICANT_BIAS framework).
Next Steps: 1. Train baseline CIRCLe model with default config 2. Evaluate fairness metrics (AUROC gap, EOD, ECE) 3. Tune lambda_reg based on results 4. Compare with FairDisCo-only baseline 5. Deploy best model for Phase 3 evaluation
Document Version: 1.0 Last Updated: 2025-10-13 Author: HOLLOWED_EYES Status: PRODUCTION-READY Next Review: Post-HAM10000 training (Week 9)