Skip to content

CIRCLe Training Guide: Color-Invariant Representation Learning

Version: 1.0 Date: 2025-10-13 Framework: MENDICANT_BIAS - Phase 2, Week 7-8 Agent: HOLLOWED_EYES


Executive Summary

This guide provides comprehensive instructions for training CIRCLe (Color-Invariant Representation Learning) models for fairness-aware skin cancer detection. CIRCLe extends FairDisCo with tone-invariant regularization, achieving state-of-the-art fairness with minimal accuracy trade-off.

Expected Performance: - AUROC gap reduction: 0.08 → 0.04 (33% further reduction beyond FairDisCo) - ECE improvement: 3-5% reduction in calibration error - FST VI AUROC: +2-4% absolute improvement - Training time: ~30 GPU hours (100 epochs, RTX 3090, batch 64)


Table of Contents

  1. Quick Start
  2. Architecture Overview
  3. Color Transformations
  4. Training Configuration
  5. Hyperparameter Tuning
  6. Troubleshooting
  7. Ablation Studies
  8. Integration with FairDisCo

Quick Start

Prerequisites

# Python 3.8+, PyTorch 1.12+
pip install torch torchvision tensorboard pyyaml scikit-learn tqdm

# Verify installation
python -c "from src.models.circle_model import create_circle_model; print('CIRCLe ready!')"

Basic Training

# Train CIRCLe from scratch
python experiments/fairness/train_circle.py

# Train with custom configuration
python experiments/fairness/train_circle.py --config configs/circle_config.yaml

# Initialize from FairDisCo checkpoint
python experiments/fairness/train_circle.py \
    --fairdisco_checkpoint experiments/fairdisco/checkpoints/fairdisco_best.pth \
    --epochs 50 \
    --lambda_reg 0.2

Monitor Training

# TensorBoard
tensorboard --logdir experiments/circle/logs

# View metrics:
# - Loss/train_reg: CIRCLe regularization loss
# - ToneInvariance/score: Embedding tone-invariance
# - Fairness/auroc_gap: AUROC gap across FST groups

Architecture Overview

Four-Loss Training System

CIRCLe combines FairDisCo's three-loss system with tone-invariant regularization:

L_total = L_cls + λ_adv*L_adv + λ_con*L_con + λ_reg*L_reg

Where:
- L_cls: Classification loss (cross-entropy, diagnosis prediction)
- L_adv: Adversarial loss (FST prediction with gradient reversal)
- L_con: Contrastive loss (same-diagnosis, different-FST embeddings)
- L_reg: CIRCLe regularization (embedding distance, original vs transformed)

Training Pipeline

1. Original Images (B, 3, 224, 224)
2. Forward Pass → Embeddings_original (B, 2048)
3. Color Transform (LAB space) → Images_transformed
4. Forward Pass → Embeddings_transformed (B, 2048)
5. Compute L_reg = ||Embeddings_original - Embeddings_transformed||²
6. Combine all four losses → Backward pass

Lambda Scheduling

Three-phase training schedule:

  • Epochs 0-20: FairDisCo warmup (λ_adv=0, λ_con=0, λ_reg=0)
  • Epochs 20-40: Ramp up FairDisCo (λ_adv: 0→0.3, λ_con: 0→0.2, λ_reg=0)
  • Epochs 30-60: Add CIRCLe (λ_reg: 0→0.2, overlaps with phase 2)
  • Epochs 60+: Full training (all lambdas at target values)

Rationale: Let FairDisCo stabilize before adding CIRCLe regularization.


Color Transformations

LAB Color Space (Phase 2 Implementation)

CIRCLe uses CIELAB color space for skin tone transformations:

  • L* channel: Lightness (0-100) - Primary skin tone characteristic
  • a* channel: Green-Red axis (-128 to 127)
  • b* channel: Blue-Yellow axis (-128 to 127)

FST Color Statistics

Empirically-derived statistics for FST I-VI:

FST L* (Lightness) a* (Red) b* (Yellow) Description
I 70.5 10.2 18.3 Very light
II 65.0 12.5 20.1 Light
III 58.5 14.8 22.0 Light-medium
IV 48.0 16.5 24.5 Medium
V 38.5 18.0 26.0 Dark
VI 28.0 19.5 28.0 Very dark

Transformation Process

# Example: Transform FST III → FST I (lighten)
from src.fairness.color_transforms import apply_fst_transformation

image = torch.randn(3, 224, 224)  # FST III image
transformed = apply_fst_transformation(
    image,
    source_fst=3,
    target_fst=1,
    imagenet_normalized=True
)

# Delta shifts:
# ΔL* = 70.5 - 58.5 = +12.0 (lighten)
# Δa* = 10.2 - 14.8 = -4.6 (less red)
# Δb* = 18.3 - 22.0 = -3.7 (less yellow)

Multi-Target Regularization

By default, CIRCLe regularizes against two extreme FST targets (I and VI):

# FST III → FST I transformation
emb_orig = model.backbone(image_fst3)
emb_fst1 = model.backbone(transform(image_fst3, 3, 1))

# FST III → FST VI transformation
emb_fst6 = model.backbone(transform(image_fst3, 3, 6))

# Average regularization loss
L_reg = (||emb_orig - emb_fst1||² + ||emb_orig - emb_fst6||²) / 2

Advantage: More robust tone-invariance across full FST spectrum.


Training Configuration

# configs/circle_config.yaml

training:
  epochs: 100
  batch_size: 64  # Minimum for contrastive loss
  learning_rate: 0.0001  # 1e-4
  weight_decay: 0.0001  # Lower than FairDisCo (0.01→0.0001)

  # Loss weights
  lambda_cls: 1.0
  lambda_adv: 0.3
  lambda_con: 0.2
  lambda_reg: 0.2  # CIRCLe regularization

  # Lambda scheduling
  use_lambda_reg_schedule: true
  lambda_reg_schedule_start_epoch: 30
  lambda_reg_schedule_end_epoch: 60
  lambda_reg_schedule_start_value: 0.1
  lambda_reg_schedule_end_value: 0.2

model:
  backbone: "resnet50"
  target_fsts: [1, 6]  # Extreme FST classes
  use_multi_target: true
  distance_metric: "l2"

Hardware Requirements

Minimum: - GPU: 1× RTX 3090 (24GB VRAM) - RAM: 32GB - Storage: 150GB (dataset + checkpoints + transforms)

Recommended: - GPU: 2× RTX 4090 (48GB total VRAM) - RAM: 64GB - Storage: 250GB SSD

VRAM Breakdown (batch size 64): - Model weights: ~122MB (ResNet50 + heads) - Activations: ~11.5GB (forward pass) - Transformed images: ~1.5GB (FST I + VI) - Gradients: ~122MB - Total: ~13GB (fits on RTX 3090)

Mixed Precision (FP16): Reduces to ~7GB VRAM

Training Time

Configuration GPU Time per Epoch Total (100 epochs)
Batch 64, FP32 RTX 3090 18 min 30 hours
Batch 64, FP16 RTX 3090 10 min 17 hours
Batch 128, FP16 2× RTX 4090 6 min 10 hours

Overhead vs FairDisCo: ~15% longer (due to color transformations)


Hyperparameter Tuning

Critical Hyperparameters

1. Lambda_reg (CIRCLe Regularization Strength)

Recommended range: 0.1-0.3

Value Effect Use Case
0.1 Weak regularization, minimal accuracy trade-off Conservative, prioritize accuracy
0.2 Balanced (recommended starting point) Default
0.3 Strong regularization, better fairness Aggressive fairness
0.4+ Risk of over-regularization (accuracy drop >3%) Experimental

Tuning strategy:

# Grid search
for lambda_reg in 0.1 0.2 0.3; do
    python experiments/fairness/train_circle.py \
        --lambda_reg $lambda_reg \
        --config configs/circle_config.yaml
done

# Evaluate: Minimize (AUROC_gap + ECE) while maintaining accuracy >88%

2. Target FSTs

Options: - [1, 6]: Extreme tones (default, most robust) - [1, 3, 6]: Additional mid-tone (slower, marginally better) - [1]: Single target (faster, less robust) - [1, 2, 3, 4, 5, 6]: All FSTs (6× overhead, overkill)

Recommendation: Stick with [1, 6] for optimal speed/performance trade-off.

3. Distance Metric

Options: - l2: Squared Euclidean distance (default, well-tested) - cosine: Angular distance (alternative, similar performance) - l1: Manhattan distance (experimental)

Benchmark:

python experiments/fairness/train_circle.py --distance_metric l2  # Default
python experiments/fairness/train_circle.py --distance_metric cosine  # Alternative

4. Learning Rate

FairDisCo default: 1e-4 CIRCLe default: 1e-4 (same)

When fine-tuning from FairDisCo checkpoint: - Use lower LR: 5e-5 or 3e-5 - Reduce epochs: 50 instead of 100

Validation Metrics

Monitor these metrics to guide hyperparameter tuning:

  1. AUROC Gap (primary fairness metric)
  2. Target: <0.04 (33% reduction from FairDisCo's 0.08)
  3. Formula: max(AUROC_per_FST) - min(AUROC_per_FST)

  4. Equal Opportunity Difference (EOD)

  5. Target: <0.06
  6. Formula: |TPR_light - TPR_dark| (melanoma sensitivity)

  7. Expected Calibration Error (ECE)

  8. Target: <0.08 for all FST groups
  9. Measures prediction confidence accuracy

  10. Tone-Invariance Score

  11. Lower is better (embeddings more FST-invariant)
  12. Computed as mean L2 distance for same-diagnosis, different-FST pairs

  13. Discriminator Accuracy

  14. Target: 20-25% (near random chance for 6 classes)
  15. Indicates FST information removed from embeddings

Troubleshooting

Issue 1: Loss Divergence

Symptoms: - Loss increases after epoch 30 - NaN gradients - Discriminator accuracy →100%

Solutions: 1. Reduce λ_reg: 0.2 → 0.15 2. Reduce learning rate: 1e-4 → 5e-5 3. Increase gradient clipping: 1.0 → 0.5 4. Disable lambda_reg scheduling temporarily

training:
  lambda_reg: 0.15
  gradient_clip_norm: 0.5
  use_lambda_reg_schedule: false

Issue 2: Accuracy Drop >3%

Symptoms: - Overall accuracy: 88% (baseline) → 84% (CIRCLe) - Fairness improves but accuracy degrades

Solutions: 1. Reduce λ_reg: 0.2 → 0.1 2. Increase λ_con: 0.2 → 0.25 (compensate with better features) 3. Check discriminator accuracy (should be 20-30%, not <15%) 4. Try cosine distance instead of L2

python experiments/fairness/train_circle.py --lambda_reg 0.1 --lambda_con 0.25

Issue 3: Insufficient Fairness Improvement

Symptoms: - AUROC gap: 0.08 → 0.07 (only 12% reduction, expected 33%) - Tone-invariance score high

Solutions: 1. Increase λ_reg: 0.2 → 0.3 2. Extend lambda_reg schedule: end_epoch 60 → 80 3. Verify color transformations are working (check images_transformed) 4. Use multi-target regularization (if not already)

training:
  lambda_reg: 0.3
  lambda_reg_schedule_end_epoch: 80

Issue 4: Color Transform Artifacts

Symptoms: - Transformed images look unrealistic - Model learns to ignore transformed images

Solutions: 1. Verify LAB→RGB→LAB round-trip error <0.01 2. Check FST color statistics (see table above) 3. Visualize transformations (see example below)

from src.fairness.color_transforms import visualize_transformation
import torch

image = load_image("sample.jpg")  # FST III image
transformations = visualize_transformation(image, source_fst=3, target_fsts=[1, 6])

# transformations[0] = FST I (lighten)
# transformations[1] = FST VI (darken)
# Visually inspect for artifacts

Issue 5: Out of Memory (OOM)

Symptoms: - CUDA out of memory error during training

Solutions: 1. Reduce batch size: 64 → 32 2. Enable gradient accumulation:

training:
  batch_size: 32
  gradient_accumulation_steps: 2  # Effective batch size = 64
3. Enable mixed precision: use_amp: true 4. Use single-target FST: target_fsts: [1]


Ablation Studies

FairDisCo vs FairDisCo+CIRCLe

Expected performance comparison (from research):

Metric Baseline FairDisCo FairDisCo+CIRCLe Improvement
Overall AUROC 89.5% 90.2% 90.8% +0.6%
AUROC Gap 15.2% 8.1% 4.3% -3.8% (47%)
EOD 0.18 0.06 0.05 -0.01 (17%)
ECE (FST I-III) 0.09 0.08 0.05 -0.03 (38%)
ECE (FST V-VI) 0.14 0.11 0.07 -0.04 (36%)
FST VI AUROC 82.3% 87.1% 90.5% +3.4%

Ablation command:

# Train FairDisCo only
python experiments/fairness/train_circle.py --lambda_reg 0.0

# Train FairDisCo + CIRCLe
python experiments/fairness/train_circle.py --lambda_reg 0.2

# Compare results
python scripts/compare_models.py \
    --model1 experiments/circle/checkpoints/circle_lambda0.0_best.pth \
    --model2 experiments/circle/checkpoints/circle_lambda0.2_best.pth

Component Ablation

Test impact of each component:

Configuration Command
Baseline (no fairness) --lambda_adv 0 --lambda_con 0 --lambda_reg 0
Adversarial only --lambda_adv 0.3 --lambda_con 0 --lambda_reg 0
Contrastive only --lambda_adv 0 --lambda_con 0.2 --lambda_reg 0
CIRCLe only --lambda_adv 0 --lambda_con 0 --lambda_reg 0.2
FairDisCo (adv+con) --lambda_adv 0.3 --lambda_con 0.2 --lambda_reg 0
Full CIRCLe --lambda_adv 0.3 --lambda_con 0.2 --lambda_reg 0.2

Integration with FairDisCo

Scenario 1: Train CIRCLe from Scratch

python experiments/fairness/train_circle.py \
    --config configs/circle_config.yaml \
    --epochs 100

Timeline: ~30 hours (RTX 3090)

Scenario 2: Fine-tune from FairDisCo Checkpoint

Recommended: Faster convergence, leverages FairDisCo pre-training

python experiments/fairness/train_circle.py \
    --fairdisco_checkpoint experiments/fairdisco/checkpoints/fairdisco_best.pth \
    --epochs 50 \
    --learning_rate 0.00005 \
    --lambda_reg 0.2

Timeline: ~15 hours (RTX 3090)

Benefits: - Faster training (50 epochs vs 100) - Lower learning rate (fine-tuning) - Skip FairDisCo warmup phase

Scenario 3: Progressive Training

Train FairDisCo first, then add CIRCLe:

# Step 1: Train FairDisCo (25 hours)
python experiments/fairness/train_fairdisco.py --epochs 100

# Step 2: Add CIRCLe regularization (15 hours)
python experiments/fairness/train_circle.py \
    --fairdisco_checkpoint experiments/fairdisco/checkpoints/fairdisco_best.pth \
    --epochs 50 \
    --lambda_reg 0.2

Total: ~40 hours (slightly longer but more controlled)


Best Practices

1. Always Monitor Tone-Invariance

# During validation
tone_invariance_score = trainer.tone_invariance_metric(embeddings, labels, fst_labels)

# Lower is better
# Target: <100 for L2 distance (normalized embeddings)

2. Visualize Color Transformations

Sanity-check transformations early:

from src.fairness.color_transforms import visualize_transformation

# Load sample images
images_fst3 = load_fst_samples(fst=3, n=5)

for img in images_fst3:
    # Transform to FST I and VI
    img_fst1 = apply_fst_transformation(img, 3, 1)
    img_fst6 = apply_fst_transformation(img, 3, 6)

    # Visualize side-by-side
    plot_comparison(img, img_fst1, img_fst6)

3. Save Intermediate Checkpoints

checkpointing:
  save_best_only: false  # Save all checkpoints
  save_frequency: 10  # Every 10 epochs

Useful for diagnosing training issues.

4. Use TensorBoard Extensively

Key plots to monitor: - Loss/train_reg: Should decrease steadily after epoch 30 - ToneInvariance/score: Should decrease (better invariance) - Fairness/auroc_gap: Should decrease below 0.05 - Discriminator/accuracy: Should stay 20-30%

5. Reproducibility

Set random seed for consistent results:

import random, torch, numpy as np

def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

set_seed(42)

Advanced Topics

Custom Distance Metrics

Implement custom regularization loss:

from src.fairness.circle_regularization import CIRCLe_RegularizationLoss

class CustomRegularizationLoss(CIRCLe_RegularizationLoss):
    def forward(self, emb_orig, emb_trans):
        # Custom distance computation
        # Example: Weighted L2 + cosine
        l2_dist = torch.sum((emb_orig - emb_trans) ** 2, dim=1)
        cos_dist = 1 - F.cosine_similarity(emb_orig, emb_trans, dim=1)
        return 0.7 * l2_dist.mean() + 0.3 * cos_dist.mean()

Caching Transformations (Future Work)

For faster training, pre-compute color transformations:

# Phase 3 implementation (not in v1.0)
from src.fairness.color_transforms import batch_transform_dataset

# Pre-compute transformed dataset
transformed_dict = batch_transform_dataset(
    images=train_images,
    fst_labels=train_fst_labels,
    target_fsts=[1, 6]
)

# Save to disk
torch.save(transformed_dict, "data/transformed_cache.pt")

# Load during training (15% faster)
transformed_cache = torch.load("data/transformed_cache.pt")

Trade-off: 3× storage (original + FST I + FST VI)


FAQs

Q: Should I train CIRCLe from scratch or fine-tune from FairDisCo?

A: Fine-tune from FairDisCo if you have a checkpoint (faster, better initialization). Otherwise, train from scratch.

Q: How do I know if lambda_reg is too high?

A: Monitor overall accuracy. If it drops >3% from baseline, reduce lambda_reg.

Q: Can I use CIRCLe without FairDisCo (no adversarial/contrastive)?

A: Yes, set --lambda_adv 0 --lambda_con 0. However, FairDisCo provides complementary fairness benefits.

Q: What if I only have 16GB VRAM?

A: Use batch size 32 with gradient accumulation (2 steps) and enable mixed precision.

Q: How long to train for HAM10000 (10k images)?

A: ~30 hours (100 epochs, RTX 3090, batch 64). Fine-tuning from FairDisCo: ~15 hours (50 epochs).

Q: Can I use CIRCLe with other backbones (EfficientNet, ViT)?

A: Yes, modify model.backbone in config. Tested with ResNet50, EfficientNet B4, DenseNet121.


Conclusion

CIRCLe provides state-of-the-art fairness for skin cancer detection with minimal accuracy trade-off. Follow this guide for optimal results. For issues, consult troubleshooting section or contact hollowed_eyes (MENDICANT_BIAS framework).

Next Steps: 1. Train baseline CIRCLe model with default config 2. Evaluate fairness metrics (AUROC gap, EOD, ECE) 3. Tune lambda_reg based on results 4. Compare with FairDisCo-only baseline 5. Deploy best model for Phase 3 evaluation

Document Version: 1.0 Last Updated: 2025-10-13 Author: HOLLOWED_EYES Status: PRODUCTION-READY Next Review: Post-HAM10000 training (Week 9)