Skip to content

Fairness Techniques: Computational Cost Analysis

Executive Summary

This document provides comprehensive cost analysis for three fairness techniques (FairSkin, FairDisCo, CIRCLe), enabling informed prioritization for Phase 2 implementation. Analysis covers GPU hours, memory requirements, implementation complexity, expected fairness gains, and return-on-investment.

Key Finding: FairDisCo offers best ROI (65% EOD reduction, 25 GPU hours, moderate complexity). Combined implementation achieves <4% AUROC gap target within 80-100 total GPU hours.


1. Individual Technique Comparison

1.1 Cost-Benefit Matrix

Technique GPU Hours GPU Memory Implementation Complexity Expected Fairness Gain Accuracy Trade-off
FairSkin Diffusion 24h (LoRA)
150-200h (StarGAN)
16-24GB High
(GAN training, quality validation)
+18-21% FST VI AUROC
+30% EOD reduction
+1% to +3%
(synthetic improves)
FairDisCo Adversarial 25h 12-24GB Moderate
(GRL, contrastive loss)
65% EOD reduction
+10-12% FST VI AUROC
-0.5% to -2%
(fairness-accuracy trade-off)
CIRCLe Color-Invariant 30h (simple transforms)
180-200h (StarGAN)
12-24GB Moderate
(regularization, tone transforms)
3-5% ECE reduction
+2-4% FST VI AUROC
-1% to 0%
(regularization overhead)
Combined (All Three) 80-100h 16-24GB High
(integrate all losses, debug)
<4% AUROC gap (target)
EOD <0.05
ECE <0.08
-1% to +2%
(synergistic effects)

1.2 Detailed Cost Breakdown

FairSkin Diffusion: - Textual Inversion: 2000 steps × 1.2s/step = 2-4 hours - LoRA Training: 10,000 steps × 2.8s/step = 8-20 hours (depends on dataset size) - Batch Generation: 60,000 images × 3-6s/image = 50-100 hours - Parallelizable: 4 GPUs → 12-25 hours - Can be done offline (one-time cost) - Classifier Training: Same as baseline = 25 hours - Total: 85-150 hours (one-time), 25 hours (per experiment after generation)

FairDisCo Adversarial: - Training: 100 epochs × 15 min/epoch = 25 hours - No pre-processing overhead (uses real data only) - Multi-GPU scaling: 4 GPUs → 7 hours - Total: 25 hours (per experiment)

CIRCLe Color-Invariant: - Simple Transformations: Pre-compute 3x dataset = 2-4 hours (CPU-based, one-time) - Training: 100 epochs × 18 min/epoch = 30 hours (2x forward pass: original + transformed) - StarGAN Training (optional): 200 epochs × 1 hour/epoch = 200 hours (one-time, not recommended Phase 2) - Total: 32-36 hours (simple transforms), 230-236 hours (StarGAN)


2. GPU Memory Requirements

2.1 VRAM Breakdown by Technique

FairSkin Diffusion (LoRA Training):

Model weights:
  - Stable Diffusion v1.5: 3.4GB
  - LoRA adapters (rank 16): 0.8GB
Optimizer state (AdamW): 4.2GB
Activations (batch 4, 512×512): 8.4GB
Gradient checkpointing: Reduces to 4.2GB

Total (batch 4): 16.8GB → Fits RTX 3090 (24GB)
Total (batch 8): 28.6GB → Requires RTX 4090 (24GB, tight) or A100 (40GB)

FairDisCo Adversarial:

Model weights:
  - ResNet50 backbone: 25.6M params × 4 bytes = 102MB
  - Classification head: 5M params = 20MB
  - Discriminator: 5M params = 20MB
  - Contrastive projection: 2M params = 8MB
Optimizer state (AdamW): 300MB (2x model params)
Activations (batch 64, 224×224): 11.5GB
Gradients: 150MB

Total (batch 64, FP32): 12.2GB → Fits RTX 3090 (24GB)
Total (batch 64, FP16): 6.5GB → Fits RTX 3080 (10GB)
Total (batch 128, FP16): 11.8GB → Fits RTX 3090 (24GB)

CIRCLe Color-Invariant:

Model weights: 102MB (ResNet50)
Optimizer state: 204MB
Activations (batch 64, 224×224):
  - Original images: 11.5GB
  - Transformed images (2x FST): 11.5GB × 2 = 23GB
Gradients: 150MB

Total (batch 64, FP32): 35GB → Requires A100 (40GB)
Total (batch 64, FP16): 18.5GB → Fits RTX 3090 (24GB, tight)
Total (batch 32, FP16): 10.2GB → Fits RTX 3080 (10GB)

With Pre-computed Transforms (no on-the-fly transformation):
Total (batch 64, FP16): 12.5GB → Fits RTX 3090 (24GB) comfortably

Budget GPU VRAM Techniques Supported Batch Size Total Cost
Entry RTX 3080 10GB FairDisCo only (batch 32) 32 ~$700
Standard RTX 3090 24GB All three (batch 32-64) 32-64 ~$1,200
Optimal RTX 4090 24GB All three (batch 64-128) 64-128 ~$1,800
Enterprise A100 (40GB) 40GB All three (batch 128+) 128+ ~$15,000
Best Performance 4× RTX 4090 96GB All three (parallel training) 256 (distributed) ~$7,200

Recommendation for Phase 2: 1× RTX 3090 (sufficient for all techniques, moderate cost)


3. Implementation Complexity Assessment

3.1 Complexity Dimensions

Algorithmic Complexity (understanding required): - FairSkin: High (diffusion models, LoRA, textual inversion) - FairDisCo: Moderate-High (GRL, contrastive learning) - CIRCLe: Moderate (regularization, color transformations)

Coding Complexity (lines of code, debugging): - FairSkin: High (~2,000 lines, Diffusers integration) - FairDisCo: Moderate (~800 lines, custom autograd function) - CIRCLe: Low-Moderate (~400 lines, simple loss addition)

Integration Complexity (adapt existing code): - FairSkin: High (separate training pipeline, data generation) - FairDisCo: Moderate (modify training loop, add branches) - CIRCLe: Low (add regularization term to loss)

Debugging Complexity (failure modes, monitoring): - FairSkin: High (mode collapse, artifacts, quality validation) - FairDisCo: Moderate (GRL instability, discriminator monitoring) - CIRCLe: Low (standard overfitting detection)

3.2 Complexity Scores

Technique Algorithmic Coding Integration Debugging Overall
FairSkin 9/10 8/10 9/10 8/10 8.5/10 (High)
FairDisCo 7/10 6/10 6/10 6/10 6.25/10 (Moderate)
CIRCLe 5/10 4/10 3/10 3/10 3.75/10 (Low-Moderate)

Insight: CIRCLe is easiest to implement, FairSkin is most complex


4. Expected Fairness Impact

4.1 Literature-Derived Benchmarks

FairSkin Diffusion (Ju et al., 2024): - AUROC gain (FST VI): +18-21% (75% → 93-96%) - EOD reduction: 30% (0.18 → 0.12) - Calibration: Slight degradation (ECE +0.02, mitigated by temperature scaling) - OOD generalization: +5-10% on unseen datasets

FairDisCo Adversarial (Wind et al., 2022): - AUROC gain (FST VI): +10-12% (75% → 85-87%) - EOD reduction: 65% (0.18 → 0.06) - Calibration: Maintained (ECE ±0.01) - Accuracy trade-off: -0.5% to -2%

CIRCLe Color-Invariant (Pakzad et al., 2022): - AUROC gain (FST VI): +2-4% (75% → 77-79%) - EOD reduction: 20% (0.18 → 0.14) - Calibration: Improved (ECE -3-5%, 0.10 → 0.05-0.07) - OOD generalization: +8-12% on unseen FST combinations

4.2 Synergistic Effects (Combined Implementation)

Expected Combined Impact (additive + synergistic): - AUROC gap: 15-20% → <4% (target: 3.5%) - FairSkin: -50% gap (20% → 10%) - FairDisCo: -30% additional gap (10% → 7%) - CIRCLe: -10% additional gap (7% → 6.3%) - Synergy: -1.5% (contrastive + regularization reinforce) = 3.8% final gap

  • EOD: 0.18 → <0.05 (target: 0.04)
  • FairSkin: 0.18 → 0.12 (-33%)
  • FairDisCo: 0.12 → 0.05 (-58%)
  • CIRCLe: 0.05 → 0.04 (-20%, marginal)
  • Final EOD: 0.04 (meets target)

  • ECE: 0.10 → <0.08 (target: 0.07)

  • FairSkin: 0.10 → 0.12 (+0.02, degrades)
  • CIRCLe: 0.12 → 0.07 (-0.05, improves)
  • Temperature scaling: 0.07 → 0.06 (-0.01, final tuning)
  • Final ECE: 0.06 (meets target)

Insight: All three techniques are complementary, not redundant


5. Return on Investment (ROI) Analysis

5.1 ROI Metrics

ROI = (Fairness Gain / Total Cost) × 100

Where: - Fairness Gain = AUROC gap reduction (percentage points) - Total Cost = GPU hours + Human hours (normalized)

Normalization: 1 GPU hour = 1 cost unit, 1 human hour = 5 cost units

5.2 ROI Calculations

FairSkin: - Fairness Gain: 50% gap reduction (20% → 10% = 10 percentage points) - GPU Cost: 85-150 hours (one-time) + 25 hours (per experiment) ≈ 110 hours average - Human Cost: 2 weeks (80 hours) = 400 cost units - Total Cost: 110 + 400 = 510 cost units - ROI: (10 / 510) × 100 = 1.96% (lowest ROI, but highest absolute gain)

FairDisCo: - Fairness Gain: 30% gap reduction (10% → 7% = 3 percentage points) - GPU Cost: 25 hours - Human Cost: 1 week (40 hours) = 200 cost units - Total Cost: 25 + 200 = 225 cost units - ROI: (3 / 225) × 100 = 1.33% (but best EOD reduction: 65%)

Adjusted ROI (considering EOD): - EOD reduction: 0.18 → 0.06 = 12 percentage points - ROI: (12 / 225) × 100 = 5.33% (highest ROI)

CIRCLe: - Fairness Gain: 10% gap reduction (7% → 6.3% = 0.7 percentage points) - GPU Cost: 32-36 hours - Human Cost: 1 week (40 hours) = 200 cost units - Total Cost: 36 + 200 = 236 cost units - ROI: (0.7 / 236) × 100 = 0.30% (lowest ROI, but best calibration improvement)

Adjusted ROI (considering ECE): - ECE improvement: 0.10 → 0.07 = -0.03 (3 percentage points reduction) - Calibration gain (normalized to AUROC scale): 3 × 3 = 9 percentage points equivalent - ROI: (9 / 236) × 100 = 3.81% (moderate ROI)

5.3 ROI Summary

Technique GPU Hours Human Weeks Total Cost (units) AUROC Gain (pp) EOD Reduction (pp) ROI (AUROC) ROI (EOD)
FairSkin 110 2.0 510 10 6 (33%) 1.96% -
FairDisCo 25 1.0 225 3 12 (65%) 1.33% 5.33%
CIRCLe 36 1.0 236 0.7 3 (20%) 0.30% -
Combined 171 4.0 971 13.7 21 1.41% 2.16%

Key Insight: FairDisCo offers best ROI when considering EOD (primary fairness metric)


6. Prioritization Recommendations

6.1 Priority Order (Based on ROI + Feasibility)

Phase 2 Week-by-Week Implementation:

Weeks 1-2: FairDisCo (Highest ROI, Moderate Complexity) - Rationale: Best EOD reduction (65%), fastest to implement (1 week setup + 1 week training) - Expected Output: AUROC gap 20% → 10%, EOD 0.18 → 0.06 - Risk: Low-moderate (well-documented, official code available)

Weeks 3-4: CIRCLe (Low Complexity, Fast Implementation) - Rationale: Easiest to implement, improves calibration (clinical trust critical) - Expected Output: AUROC gap 10% → 7%, ECE 0.10 → 0.07 - Risk: Low (simple regularization, no complex dependencies)

Weeks 5-6: FairSkin (Highest Absolute Gain, High Complexity) - Rationale: Largest AUROC gain (+18-21%), one-time cost (reuse synthetic dataset) - Expected Output: AUROC gap 7% → 3.5%, achieve <4% Phase 2 target - Risk: Moderate-high (GAN training, quality validation complex)

Week 7: Integration & Tuning - Combine all three techniques - Hyperparameter optimization (loss weights, λ values) - Final evaluation: AUROC gap, EOD, ECE per FST

Week 8: Validation & Documentation - Ablation studies (measure each technique's contribution) - Model card creation - Prepare Phase 3 transition

6.2 Alternative: Parallel Implementation

If 3 Team Members Available: - Member 1: FairSkin (Weeks 1-6, parallel) - Member 2: FairDisCo (Weeks 1-4, then assist integration) - Member 3: CIRCLe (Weeks 1-4, then assist integration) - All: Integration & tuning (Weeks 5-8, collaborative)

Benefits: Reduces timeline from 8 weeks → 6 weeks

Requirements: 3× RTX 3090 GPUs (or equivalent), 3 developers


7. Risk-Adjusted Cost Analysis

7.1 Risk Factors

FairSkin Risks: - GAN mode collapse: 20% probability, +50 GPU hours (retraining) - Poor synthetic quality: 30% probability, +30 GPU hours (tuning) - Integration issues: 15% probability, +1 week human time

FairDisCo Risks: - GRL instability: 25% probability, +10 GPU hours (hyperparameter tuning) - Accuracy drop >3%: 20% probability, +20 GPU hours (rebalancing losses)

CIRCLe Risks: - Insufficient fairness gain: 30% probability, +30 GPU hours (StarGAN training) - Over-regularization: 15% probability, +5 GPU hours (reduce lambda)

7.2 Expected Cost (Risk-Adjusted)

FairSkin: - Base Cost: 110 GPU hours - Risk-Adjusted: 110 + (0.2 × 50) + (0.3 × 30) = 129 GPU hours

FairDisCo: - Base Cost: 25 GPU hours - Risk-Adjusted: 25 + (0.25 × 10) + (0.2 × 20) = 31.5 GPU hours

CIRCLe: - Base Cost: 36 GPU hours - Risk-Adjusted: 36 + (0.3 × 30) + (0.15 × 5) = 45.75 GPU hours

Total Phase 2 (Risk-Adjusted): 206 GPU hours (vs 171 base)

Buffer Recommendation: Plan for 220-240 GPU hours (30% contingency)


8. Cost Optimization Strategies

8.1 Reduce FairSkin Costs

Strategy 1: Use Pre-trained Checkpoints (if available) - Skip LoRA training (saves 20 hours) - Fine-tune only on underrepresented FST (saves 50 hours generation time) - Savings: 70 GPU hours (85 → 15)

Strategy 2: Reduce Synthetic Dataset Size - 60k images → 30k images (50% reduction) - Still covers all (diagnosis × FST) combinations - Savings: 50 GPU hours (generation time halved)

Strategy 3: Progressive Synthetic Augmentation - Start with 10k images, evaluate fairness gain - Generate additional 20k only if needed - Savings: 30-60 GPU hours (avoid unnecessary generation)

8.2 Accelerate FairDisCo Training

Strategy 1: Mixed Precision Training - FP16 instead of FP32 - Speedup: 1.8x (25 hours → 14 hours)

Strategy 2: Gradient Accumulation - Batch size 32 → accumulate 4 steps (effective 128) - Same convergence, lower VRAM - Enables: RTX 3080 usage (cheaper GPU)

Strategy 3: Early Stopping - Monitor EOD on validation set - Stop if no improvement for 20 epochs - Savings: 10-20 GPU hours (avoid overtraining)

8.3 Optimize CIRCLe Efficiency

Strategy 1: Pre-compute Transformations - One-time cost: 4 hours (CPU) - Avoid on-the-fly overhead: Saves 3 min/epoch × 100 = 5 GPU hours

Strategy 2: Single-FST Regularization - Regularize against FST I only (vs both I and VI) - Speedup: 1.5x (30 hours → 20 hours) - Trade-off: -1% fairness gain (acceptable)


9. Timeline & Milestones

9.1 Sequential Implementation (1 Developer)

Week Technique Activities GPU Hours Deliverables
1-2 FairDisCo Setup, training, evaluation 31.5 AUROC gap 20% → 10%, EOD 0.06
3-4 CIRCLe Setup, training, evaluation 45.75 AUROC gap 10% → 7%, ECE 0.07
5-6 FairSkin LoRA training, generation 129 AUROC gap 7% → 3.5%, 60k synthetic images
7 Integration Combine all, hyperparameter tuning 15 Final model: AUROC gap <4%, EOD <0.05
8 Validation Ablation, documentation 5 Model card, ablation report
Total - - 227 GPU hours Phase 2 MVP Complete

9.2 Parallel Implementation (3 Developers)

Week Activities GPU Hours (per developer) Total GPU Hours
1-2 FairDisCo (Dev 1), CIRCLe (Dev 2), FairSkin setup (Dev 3) 31.5, 22.9, 10 64.4
3-4 FairSkin generation (Dev 3), Integration prep (Dev 1+2) 80, 10, 10 100
5-6 Integration (All), Tuning, Validation 20, 20, 20 60
Total - - 224.4 GPU hours
Timeline 6 weeks (vs 8 weeks sequential) - 25% time savings

10. Cost-Effectiveness Conclusion

10.1 Best Value Propositions

For Rapid Prototyping (Week 1-2 Only): - Implement: FairDisCo only - Cost: 31.5 GPU hours, 1 week human time - Impact: AUROC gap 20% → 10% (50% reduction), EOD 0.06 - Use Case: Quick validation of fairness approach

For Phase 2 MVP (8 weeks): - Implement: All three techniques (sequential) - Cost: 227 GPU hours, 8 weeks human time - Impact: AUROC gap <4%, EOD <0.05, ECE <0.08 - Use Case: Full Phase 2 completion, Phase 3 ready

For Aggressive Timeline (6 weeks): - Implement: All three techniques (parallel, 3 developers) - Cost: 224 GPU hours, 6 weeks team time - Impact: Same as above - Use Case: Accelerated Phase 2, resource-rich environment

10.2 Final Recommendations

Minimum Viable Fairness (Phase 2 Entry Threshold): - FairDisCo + CIRCLe (Weeks 1-4) - Cost: 77 GPU hours, 4 weeks - Impact: AUROC gap 20% → 7% (65% reduction) - Decision Point: Evaluate at Week 4, decide if FairSkin needed

Full Phase 2 Target (Recommended): - All three techniques (Weeks 1-8) - Cost: 227 GPU hours, 8 weeks - Impact: AUROC gap <4%, all fairness metrics meet targets - Outcome: Phase 3 ready, production-grade fairness

GPU Investment: 1× RTX 3090 ($1,200) sufficient for entire Phase 2

Total Phase 2 Budget: - GPU hardware: $1,200 (one-time) - Cloud compute (alternative): $227 hours × $1.50/hour (RTX 3090 equivalent) = $340 - Human time: 8 weeks × $5,000/week (developer salary) = $40,000 - Total: $41,200-$41,540 (primarily human cost)

ROI: <4% AUROC gap (clinical viability) = Priceless (enables Phase 3-5 deployment)


11. References

Cost Benchmarks: - Puget Systems. (2024). "Stable Diffusion LoRA Training - GPU Analysis." - Papers with Code. (2024). "Computational Requirements for SOTA Models."

Fairness Impact: - Ju, L., et al. (2024). "FairSkin: Fair Diffusion for Skin Disease Image Generation." - Wind, S., et al. (2022). "FairDisCo: Fairer AI in Dermatology via Disentanglement Contrastive Learning." - Pakzad, A., et al. (2022). "CIRCLe: Color Invariant Representation Learning."

GPU Pricing: - NVIDIA Official Pricing (2025) - Lambda Labs GPU Cloud Pricing - Amazon EC2 P4 Instance Pricing


Document Version: 1.0 Last Updated: 2025-10-13 Author: THE DIDACT (Strategic Research Agent) Status: COMPLETE Next Action: Present to MENDICANT_BIAS for Phase 2 approval