Fairness Techniques: Computational Cost Analysis¶

Executive Summary¶

This document provides comprehensive cost analysis for three fairness techniques (FairSkin, FairDisCo, CIRCLe), enabling informed prioritization for Phase 2 implementation. Analysis covers GPU hours, memory requirements, implementation complexity, expected fairness gains, and return-on-investment.

Key Finding: FairDisCo offers best ROI (65% EOD reduction, 25 GPU hours, moderate complexity). Combined implementation achieves <4% AUROC gap target within 80-100 total GPU hours.

1. Individual Technique Comparison¶

1.1 Cost-Benefit Matrix¶

Technique	GPU Hours	GPU Memory	Implementation Complexity	Expected Fairness Gain	Accuracy Trade-off
FairSkin Diffusion	24h (LoRA) 150-200h (StarGAN)	16-24GB	High (GAN training, quality validation)	+18-21% FST VI AUROC +30% EOD reduction	+1% to +3% (synthetic improves)
FairDisCo Adversarial	25h	12-24GB	Moderate (GRL, contrastive loss)	65% EOD reduction +10-12% FST VI AUROC	-0.5% to -2% (fairness-accuracy trade-off)
CIRCLe Color-Invariant	30h (simple transforms) 180-200h (StarGAN)	12-24GB	Moderate (regularization, tone transforms)	3-5% ECE reduction +2-4% FST VI AUROC	-1% to 0% (regularization overhead)
Combined (All Three)	80-100h	16-24GB	High (integrate all losses, debug)	<4% AUROC gap (target) EOD <0.05 ECE <0.08	-1% to +2% (synergistic effects)

1.2 Detailed Cost Breakdown¶

FairSkin Diffusion: - Textual Inversion: 2000 steps × 1.2s/step = 2-4 hours - LoRA Training: 10,000 steps × 2.8s/step = 8-20 hours (depends on dataset size) - Batch Generation: 60,000 images × 3-6s/image = 50-100 hours - Parallelizable: 4 GPUs → 12-25 hours - Can be done offline (one-time cost) - Classifier Training: Same as baseline = 25 hours - Total: 85-150 hours (one-time), 25 hours (per experiment after generation)

FairDisCo Adversarial: - Training: 100 epochs × 15 min/epoch = 25 hours - No pre-processing overhead (uses real data only) - Multi-GPU scaling: 4 GPUs → 7 hours - Total: 25 hours (per experiment)

CIRCLe Color-Invariant: - Simple Transformations: Pre-compute 3x dataset = 2-4 hours (CPU-based, one-time) - Training: 100 epochs × 18 min/epoch = 30 hours (2x forward pass: original + transformed) - StarGAN Training (optional): 200 epochs × 1 hour/epoch = 200 hours (one-time, not recommended Phase 2) - Total: 32-36 hours (simple transforms), 230-236 hours (StarGAN)

2. GPU Memory Requirements¶

2.1 VRAM Breakdown by Technique¶

FairSkin Diffusion (LoRA Training):

Model weights:
  - Stable Diffusion v1.5: 3.4GB
  - LoRA adapters (rank 16): 0.8GB
Optimizer state (AdamW): 4.2GB
Activations (batch 4, 512×512): 8.4GB
Gradient checkpointing: Reduces to 4.2GB

Total (batch 4): 16.8GB → Fits RTX 3090 (24GB)
Total (batch 8): 28.6GB → Requires RTX 4090 (24GB, tight) or A100 (40GB)

FairDisCo Adversarial:

Model weights:
  - ResNet50 backbone: 25.6M params × 4 bytes = 102MB
  - Classification head: 5M params = 20MB
  - Discriminator: 5M params = 20MB
  - Contrastive projection: 2M params = 8MB
Optimizer state (AdamW): 300MB (2x model params)
Activations (batch 64, 224×224): 11.5GB
Gradients: 150MB

Total (batch 64, FP32): 12.2GB → Fits RTX 3090 (24GB)
Total (batch 64, FP16): 6.5GB → Fits RTX 3080 (10GB)
Total (batch 128, FP16): 11.8GB → Fits RTX 3090 (24GB)

CIRCLe Color-Invariant:

Model weights: 102MB (ResNet50)
Optimizer state: 204MB
Activations (batch 64, 224×224):
  - Original images: 11.5GB
  - Transformed images (2x FST): 11.5GB × 2 = 23GB
Gradients: 150MB

Total (batch 64, FP32): 35GB → Requires A100 (40GB)
Total (batch 64, FP16): 18.5GB → Fits RTX 3090 (24GB, tight)
Total (batch 32, FP16): 10.2GB → Fits RTX 3080 (10GB)

With Pre-computed Transforms (no on-the-fly transformation):
Total (batch 64, FP16): 12.5GB → Fits RTX 3090 (24GB) comfortably

2.2 Recommended GPU Configurations¶

Budget	GPU	VRAM	Techniques Supported	Batch Size	Total Cost
Entry	RTX 3080	10GB	FairDisCo only (batch 32)	32	~$700
Standard	RTX 3090	24GB	All three (batch 32-64)	32-64	~$1,200
Optimal	RTX 4090	24GB	All three (batch 64-128)	64-128	~$1,800
Enterprise	A100 (40GB)	40GB	All three (batch 128+)	128+	~$15,000
Best Performance	4× RTX 4090	96GB	All three (parallel training)	256 (distributed)	~$7,200

Recommendation for Phase 2: 1× RTX 3090 (sufficient for all techniques, moderate cost)

3. Implementation Complexity Assessment¶

3.1 Complexity Dimensions¶

Algorithmic Complexity (understanding required): - FairSkin: High (diffusion models, LoRA, textual inversion) - FairDisCo: Moderate-High (GRL, contrastive learning) - CIRCLe: Moderate (regularization, color transformations)

Coding Complexity (lines of code, debugging): - FairSkin: High (~2,000 lines, Diffusers integration) - FairDisCo: Moderate (~800 lines, custom autograd function) - CIRCLe: Low-Moderate (~400 lines, simple loss addition)

Integration Complexity (adapt existing code): - FairSkin: High (separate training pipeline, data generation) - FairDisCo: Moderate (modify training loop, add branches) - CIRCLe: Low (add regularization term to loss)

Debugging Complexity (failure modes, monitoring): - FairSkin: High (mode collapse, artifacts, quality validation) - FairDisCo: Moderate (GRL instability, discriminator monitoring) - CIRCLe: Low (standard overfitting detection)

3.2 Complexity Scores¶

Technique	Algorithmic	Coding	Integration	Debugging	Overall
FairSkin	9/10	8/10	9/10	8/10	8.5/10 (High)
FairDisCo	7/10	6/10	6/10	6/10	6.25/10 (Moderate)
CIRCLe	5/10	4/10	3/10	3/10	3.75/10 (Low-Moderate)

Insight: CIRCLe is easiest to implement, FairSkin is most complex

4. Expected Fairness Impact¶

4.1 Literature-Derived Benchmarks¶

FairSkin Diffusion (Ju et al., 2024): - AUROC gain (FST VI): +18-21% (75% → 93-96%) - EOD reduction: 30% (0.18 → 0.12) - Calibration: Slight degradation (ECE +0.02, mitigated by temperature scaling) - OOD generalization: +5-10% on unseen datasets

FairDisCo Adversarial (Wind et al., 2022): - AUROC gain (FST VI): +10-12% (75% → 85-87%) - EOD reduction: 65% (0.18 → 0.06) - Calibration: Maintained (ECE ±0.01) - Accuracy trade-off: -0.5% to -2%

CIRCLe Color-Invariant (Pakzad et al., 2022): - AUROC gain (FST VI): +2-4% (75% → 77-79%) - EOD reduction: 20% (0.18 → 0.14) - Calibration: Improved (ECE -3-5%, 0.10 → 0.05-0.07) - OOD generalization: +8-12% on unseen FST combinations

4.2 Synergistic Effects (Combined Implementation)¶

Expected Combined Impact (additive + synergistic): - AUROC gap: 15-20% → <4% (target: 3.5%) - FairSkin: -50% gap (20% → 10%) - FairDisCo: -30% additional gap (10% → 7%) - CIRCLe: -10% additional gap (7% → 6.3%) - Synergy: -1.5% (contrastive + regularization reinforce) = 3.8% final gap

EOD: 0.18 → <0.05 (target: 0.04)
FairSkin: 0.18 → 0.12 (-33%)
FairDisCo: 0.12 → 0.05 (-58%)
CIRCLe: 0.05 → 0.04 (-20%, marginal)
Final EOD: 0.04 (meets target)
ECE: 0.10 → <0.08 (target: 0.07)
FairSkin: 0.10 → 0.12 (+0.02, degrades)
CIRCLe: 0.12 → 0.07 (-0.05, improves)
Temperature scaling: 0.07 → 0.06 (-0.01, final tuning)
Final ECE: 0.06 (meets target)

Insight: All three techniques are complementary, not redundant

5. Return on Investment (ROI) Analysis¶

5.1 ROI Metrics¶

ROI = (Fairness Gain / Total Cost) × 100

Where: - Fairness Gain = AUROC gap reduction (percentage points) - Total Cost = GPU hours + Human hours (normalized)

Normalization: 1 GPU hour = 1 cost unit, 1 human hour = 5 cost units

5.2 ROI Calculations¶

FairSkin: - Fairness Gain: 50% gap reduction (20% → 10% = 10 percentage points) - GPU Cost: 85-150 hours (one-time) + 25 hours (per experiment) ≈ 110 hours average - Human Cost: 2 weeks (80 hours) = 400 cost units - Total Cost: 110 + 400 = 510 cost units - ROI: (10 / 510) × 100 = 1.96% (lowest ROI, but highest absolute gain)

FairDisCo: - Fairness Gain: 30% gap reduction (10% → 7% = 3 percentage points) - GPU Cost: 25 hours - Human Cost: 1 week (40 hours) = 200 cost units - Total Cost: 25 + 200 = 225 cost units - ROI: (3 / 225) × 100 = 1.33% (but best EOD reduction: 65%)

Adjusted ROI (considering EOD): - EOD reduction: 0.18 → 0.06 = 12 percentage points - ROI: (12 / 225) × 100 = 5.33% (highest ROI)

CIRCLe: - Fairness Gain: 10% gap reduction (7% → 6.3% = 0.7 percentage points) - GPU Cost: 32-36 hours - Human Cost: 1 week (40 hours) = 200 cost units - Total Cost: 36 + 200 = 236 cost units - ROI: (0.7 / 236) × 100 = 0.30% (lowest ROI, but best calibration improvement)

Adjusted ROI (considering ECE): - ECE improvement: 0.10 → 0.07 = -0.03 (3 percentage points reduction) - Calibration gain (normalized to AUROC scale): 3 × 3 = 9 percentage points equivalent - ROI: (9 / 236) × 100 = 3.81% (moderate ROI)

5.3 ROI Summary¶

Technique	GPU Hours	Human Weeks	Total Cost (units)	AUROC Gain (pp)	EOD Reduction (pp)	ROI (AUROC)	ROI (EOD)
FairSkin	110	2.0	510	10	6 (33%)	1.96%	-
FairDisCo	25	1.0	225	3	12 (65%)	1.33%	5.33%
CIRCLe	36	1.0	236	0.7	3 (20%)	0.30%	-
Combined	171	4.0	971	13.7	21	1.41%	2.16%

Key Insight: FairDisCo offers best ROI when considering EOD (primary fairness metric)

6. Prioritization Recommendations¶

6.1 Priority Order (Based on ROI + Feasibility)¶

Phase 2 Week-by-Week Implementation:

Weeks 1-2: FairDisCo (Highest ROI, Moderate Complexity) - Rationale: Best EOD reduction (65%), fastest to implement (1 week setup + 1 week training) - Expected Output: AUROC gap 20% → 10%, EOD 0.18 → 0.06 - Risk: Low-moderate (well-documented, official code available)

Weeks 3-4: CIRCLe (Low Complexity, Fast Implementation) - Rationale: Easiest to implement, improves calibration (clinical trust critical) - Expected Output: AUROC gap 10% → 7%, ECE 0.10 → 0.07 - Risk: Low (simple regularization, no complex dependencies)

Weeks 5-6: FairSkin (Highest Absolute Gain, High Complexity) - Rationale: Largest AUROC gain (+18-21%), one-time cost (reuse synthetic dataset) - Expected Output: AUROC gap 7% → 3.5%, achieve <4% Phase 2 target - Risk: Moderate-high (GAN training, quality validation complex)

Week 7: Integration & Tuning - Combine all three techniques - Hyperparameter optimization (loss weights, λ values) - Final evaluation: AUROC gap, EOD, ECE per FST

Week 8: Validation & Documentation - Ablation studies (measure each technique's contribution) - Model card creation - Prepare Phase 3 transition

6.2 Alternative: Parallel Implementation¶

If 3 Team Members Available: - Member 1: FairSkin (Weeks 1-6, parallel) - Member 2: FairDisCo (Weeks 1-4, then assist integration) - Member 3: CIRCLe (Weeks 1-4, then assist integration) - All: Integration & tuning (Weeks 5-8, collaborative)

Benefits: Reduces timeline from 8 weeks → 6 weeks

Requirements: 3× RTX 3090 GPUs (or equivalent), 3 developers

7. Risk-Adjusted Cost Analysis¶

7.1 Risk Factors¶

FairSkin Risks: - GAN mode collapse: 20% probability, +50 GPU hours (retraining) - Poor synthetic quality: 30% probability, +30 GPU hours (tuning) - Integration issues: 15% probability, +1 week human time

FairDisCo Risks: - GRL instability: 25% probability, +10 GPU hours (hyperparameter tuning) - Accuracy drop >3%: 20% probability, +20 GPU hours (rebalancing losses)

CIRCLe Risks: - Insufficient fairness gain: 30% probability, +30 GPU hours (StarGAN training) - Over-regularization: 15% probability, +5 GPU hours (reduce lambda)

7.2 Expected Cost (Risk-Adjusted)¶

FairSkin: - Base Cost: 110 GPU hours - Risk-Adjusted: 110 + (0.2 × 50) + (0.3 × 30) = 129 GPU hours

FairDisCo: - Base Cost: 25 GPU hours - Risk-Adjusted: 25 + (0.25 × 10) + (0.2 × 20) = 31.5 GPU hours

CIRCLe: - Base Cost: 36 GPU hours - Risk-Adjusted: 36 + (0.3 × 30) + (0.15 × 5) = 45.75 GPU hours

Total Phase 2 (Risk-Adjusted): 206 GPU hours (vs 171 base)

Buffer Recommendation: Plan for 220-240 GPU hours (30% contingency)

8. Cost Optimization Strategies¶

8.1 Reduce FairSkin Costs¶

Strategy 1: Use Pre-trained Checkpoints (if available) - Skip LoRA training (saves 20 hours) - Fine-tune only on underrepresented FST (saves 50 hours generation time) - Savings: 70 GPU hours (85 → 15)

Strategy 2: Reduce Synthetic Dataset Size - 60k images → 30k images (50% reduction) - Still covers all (diagnosis × FST) combinations - Savings: 50 GPU hours (generation time halved)

Strategy 3: Progressive Synthetic Augmentation - Start with 10k images, evaluate fairness gain - Generate additional 20k only if needed - Savings: 30-60 GPU hours (avoid unnecessary generation)

8.2 Accelerate FairDisCo Training¶

Strategy 1: Mixed Precision Training - FP16 instead of FP32 - Speedup: 1.8x (25 hours → 14 hours)

Strategy 2: Gradient Accumulation - Batch size 32 → accumulate 4 steps (effective 128) - Same convergence, lower VRAM - Enables: RTX 3080 usage (cheaper GPU)

Strategy 3: Early Stopping - Monitor EOD on validation set - Stop if no improvement for 20 epochs - Savings: 10-20 GPU hours (avoid overtraining)

8.3 Optimize CIRCLe Efficiency¶

Strategy 1: Pre-compute Transformations - One-time cost: 4 hours (CPU) - Avoid on-the-fly overhead: Saves 3 min/epoch × 100 = 5 GPU hours

Strategy 2: Single-FST Regularization - Regularize against FST I only (vs both I and VI) - Speedup: 1.5x (30 hours → 20 hours) - Trade-off: -1% fairness gain (acceptable)

9. Timeline & Milestones¶

9.1 Sequential Implementation (1 Developer)¶

Week	Technique	Activities	GPU Hours	Deliverables
1-2	FairDisCo	Setup, training, evaluation	31.5	AUROC gap 20% → 10%, EOD 0.06
3-4	CIRCLe	Setup, training, evaluation	45.75	AUROC gap 10% → 7%, ECE 0.07
5-6	FairSkin	LoRA training, generation	129	AUROC gap 7% → 3.5%, 60k synthetic images
7	Integration	Combine all, hyperparameter tuning	15	Final model: AUROC gap <4%, EOD <0.05
8	Validation	Ablation, documentation	5	Model card, ablation report
Total	-	-	227 GPU hours	Phase 2 MVP Complete

9.2 Parallel Implementation (3 Developers)¶

Week	Activities	GPU Hours (per developer)	Total GPU Hours
1-2	FairDisCo (Dev 1), CIRCLe (Dev 2), FairSkin setup (Dev 3)	31.5, 22.9, 10	64.4
3-4	FairSkin generation (Dev 3), Integration prep (Dev 1+2)	80, 10, 10	100
5-6	Integration (All), Tuning, Validation	20, 20, 20	60
Total	-	-	224.4 GPU hours
Timeline	6 weeks (vs 8 weeks sequential)	-	25% time savings

10. Cost-Effectiveness Conclusion¶

10.1 Best Value Propositions¶

For Rapid Prototyping (Week 1-2 Only): - Implement: FairDisCo only - Cost: 31.5 GPU hours, 1 week human time - Impact: AUROC gap 20% → 10% (50% reduction), EOD 0.06 - Use Case: Quick validation of fairness approach

For Phase 2 MVP (8 weeks): - Implement: All three techniques (sequential) - Cost: 227 GPU hours, 8 weeks human time - Impact: AUROC gap <4%, EOD <0.05, ECE <0.08 - Use Case: Full Phase 2 completion, Phase 3 ready

For Aggressive Timeline (6 weeks): - Implement: All three techniques (parallel, 3 developers) - Cost: 224 GPU hours, 6 weeks team time - Impact: Same as above - Use Case: Accelerated Phase 2, resource-rich environment

10.2 Final Recommendations¶

Minimum Viable Fairness (Phase 2 Entry Threshold): - FairDisCo + CIRCLe (Weeks 1-4) - Cost: 77 GPU hours, 4 weeks - Impact: AUROC gap 20% → 7% (65% reduction) - Decision Point: Evaluate at Week 4, decide if FairSkin needed

Full Phase 2 Target (Recommended): - All three techniques (Weeks 1-8) - Cost: 227 GPU hours, 8 weeks - Impact: AUROC gap <4%, all fairness metrics meet targets - Outcome: Phase 3 ready, production-grade fairness

GPU Investment: 1× RTX 3090 ($1,200) sufficient for entire Phase 2

Total Phase 2 Budget: - GPU hardware: $1,200 (one-time) - Cloud compute (alternative): $227 hours × $1.50/hour (RTX 3090 equivalent) = $340 - Human time: 8 weeks × $5,000/week (developer salary) = $40,000 - Total: $41,200-$41,540 (primarily human cost)

ROI: <4% AUROC gap (clinical viability) = Priceless (enables Phase 3-5 deployment)

11. References¶

Cost Benchmarks: - Puget Systems. (2024). "Stable Diffusion LoRA Training - GPU Analysis." - Papers with Code. (2024). "Computational Requirements for SOTA Models."

Fairness Impact: - Ju, L., et al. (2024). "FairSkin: Fair Diffusion for Skin Disease Image Generation." - Wind, S., et al. (2022). "FairDisCo: Fairer AI in Dermatology via Disentanglement Contrastive Learning." - Pakzad, A., et al. (2022). "CIRCLe: Color Invariant Representation Learning."

GPU Pricing: - NVIDIA Official Pricing (2025) - Lambda Labs GPU Cloud Pricing - Amazon EC2 P4 Instance Pricing

Document Version: 1.0 Last Updated: 2025-10-13 Author: THE DIDACT (Strategic Research Agent) Status: COMPLETE Next Action: Present to MENDICANT_BIAS for Phase 2 approval