Testing Guide¶

Overview¶

This guide provides comprehensive instructions for running tests, interpreting results, and contributing new tests to the skin cancer detection fairness project.

Table of Contents¶

Test Infrastructure
Running Tests
Test Categories
Coverage Reports
Writing New Tests
Continuous Integration
Troubleshooting

Test Infrastructure¶

Test Framework¶

We use pytest as our testing framework with the following plugins: - pytest-cov: Code coverage reporting - pytest-xdist: Parallel test execution (optional)

Directory Structure¶

tests/
├── __init__.py
├── conftest.py                   # Shared fixtures
├── unit/                         # Unit tests
│   ├── __init__.py
│   ├── test_data_preprocessing.py
│   ├── test_fairness_metrics.py
│   ├── test_models.py
│   └── test_utils.py
├── integration/                  # Integration tests
│   ├── __init__.py
│   ├── test_training_pipeline.py
│   └── test_evaluation_pipeline.py
└── fixtures/                     # Test data and mocks
    ├── __init__.py
    └── sample_data.py

Configuration Files¶

pytest.ini: Main pytest configuration
.coveragerc: Coverage reporting configuration
conftest.py: Shared fixtures and pytest hooks

Running Tests¶

Basic Test Execution¶

Run all tests:

pytest

Run with verbose output:

pytest -v

Run with detailed output and print statements:

pytest -vv -s

Running Specific Test Categories¶

Run only unit tests:

pytest -m unit

Run only integration tests:

pytest -m integration

Run only fairness-related tests:

pytest -m fairness

Run only model tests:

pytest -m models

Running Specific Test Files¶

Run a specific test file:

pytest tests/unit/test_fairness_metrics.py

Run a specific test class:

pytest tests/unit/test_fairness_metrics.py::TestAUROCPerFST

Run a specific test function:

pytest tests/unit/test_fairness_metrics.py::TestAUROCPerFST::test_auroc_perfect_classifier

Excluding Slow Tests¶

Skip slow tests (>5 seconds):

pytest -m "not slow"

Parallel Execution¶

Run tests in parallel (requires pytest-xdist):

pytest -n auto

Run with 4 parallel workers:

pytest -n 4

Running Tests Requiring GPU¶

Run GPU-specific tests (requires CUDA):

pytest -m requires_gpu

Skip GPU tests:

pytest -m "not requires_gpu"

Test Categories¶

Unit Tests¶

Purpose: Test individual functions and classes in isolation.

Characteristics: - Fast execution (<1 second per test) - No external dependencies - Use mock data - High code coverage

Examples: - Image normalization functions - Fairness metric calculations - Model initialization - Configuration loading

Markers: @pytest.mark.unit

Integration Tests¶

Purpose: Test end-to-end workflows and component interactions.

Characteristics: - Slower execution (1-30 seconds per test) - Test multiple components together - Verify complete pipelines - Use realistic data flows

Examples: - Full training loop - Evaluation pipeline - Checkpoint saving/loading - Multi-model comparison

Markers: @pytest.mark.integration, @pytest.mark.pipeline

Fairness Tests¶

Purpose: Validate fairness metrics and bias detection.

Characteristics: - Test demographic parity - Verify equalized odds - Validate calibration - FST-stratified analysis

Examples: - AUROC per FST group - Equalized Odds Difference (EOD) - Expected Calibration Error (ECE) - Confusion matrix per demographic

Markers: @pytest.mark.fairness

Slow Tests¶

Purpose: Comprehensive tests that take >5 seconds.

Characteristics: - Multi-epoch training - Large dataset processing - Extensive model evaluation

Markers: @pytest.mark.slow

Coverage Reports¶

Generating Coverage Reports¶

Run tests with coverage:

pytest --cov=src --cov-report=html --cov-report=term

Coverage Report Types¶

Terminal Report:

pytest --cov=src --cov-report=term-missing

Shows coverage with missing line numbers in terminal.

HTML Report:

pytest --cov=src --cov-report=html

Generates detailed HTML report in htmlcov/index.html.

XML Report (for CI):

pytest --cov=src --cov-report=xml

Generates coverage.xml for tools like Codecov.

Coverage Targets¶

Project-wide targets: - Overall: >80% code coverage - Critical modules (fairness, models): >90%

Viewing Coverage:

# Open HTML report
open htmlcov/index.html  # macOS
xdg-open htmlcov/index.html  # Linux
start htmlcov/index.html  # Windows

Interpreting Coverage¶

Coverage metrics: - Statements: Lines executed - Branches: Conditional paths taken - Missing: Lines not covered by tests

Example output:

Name                            Stmts   Miss  Cover   Missing
-------------------------------------------------------------
src/data/preprocessing.py          45      3    93%   12, 45, 67
src/fairness/metrics.py            78      5    94%   23, 89-92
src/models/resnet.py               56      8    86%   34, 67-73
-------------------------------------------------------------
TOTAL                             179     16    91%

Writing New Tests¶

Test Naming Conventions¶

Files: test_<module_name>.py

test_data_preprocessing.py
test_fairness_metrics.py

Classes: Test<Functionality>

class TestAUROCPerFST:
    pass

Functions: test_<what_is_being_tested>

def test_auroc_perfect_classifier():
    pass

Test Structure (AAA Pattern)¶

def test_example():
    # Arrange: Set up test data and conditions
    model = create_model()
    data = load_test_data()

    # Act: Execute the functionality being tested
    result = model.predict(data)

    # Assert: Verify expected outcomes
    assert result.shape == (10, 7)
    assert torch.isfinite(result).all()

Using Fixtures¶

Accessing shared fixtures (from conftest.py):

def test_with_fixture(mock_dataloader, device):
    # Use pre-configured dataloader and device
    batch = next(iter(mock_dataloader))
    assert batch['image'].device.type == device.type

Creating local fixtures:

@pytest.fixture
def custom_dataset():
    return MockHAM10000(num_samples=50, seed=42)

def test_custom(custom_dataset):
    assert len(custom_dataset) == 50

Parametrized Tests¶

Test multiple inputs efficiently:

@pytest.mark.parametrize("batch_size,expected", [
    (8, (8, 3, 224, 224)),
    (16, (16, 3, 224, 224)),
    (32, (32, 3, 224, 224)),
])
def test_batch_shapes(batch_size, expected):
    data = torch.randn(*expected)
    assert data.shape == expected

Marking Tests¶

Add markers to categorize tests:

@pytest.mark.unit
@pytest.mark.fairness
def test_fairness_metric():
    pass

@pytest.mark.integration
@pytest.mark.slow
def test_full_training():
    pass

@pytest.mark.requires_gpu
def test_cuda_operations():
    pass

Testing Exceptions¶

Verify error handling:

def test_invalid_input_raises_error():
    with pytest.raises(ValueError, match="Expected 4D tensor"):
        process_image(torch.randn(3, 224))  # Wrong dimensions

Skipping Tests¶

Skip tests conditionally:

@pytest.mark.skipif(not torch.cuda.is_available(),
                    reason="CUDA not available")
def test_gpu_training():
    pass

Approximate Comparisons¶

For floating-point comparisons:

def test_loss_value():
    loss = compute_loss(predictions, labels)
    assert loss == pytest.approx(0.5, abs=0.01)  # Within 0.01

Continuous Integration¶

GitHub Actions Example¶

name: Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3

    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.9'

    - name: Install dependencies
      run: |
        pip install -r requirements.txt
        pip install pytest pytest-cov

    - name: Run tests
      run: |
        pytest --cov=src --cov-report=xml --cov-report=term

    - name: Upload coverage
      uses: codecov/codecov-action@v3
      with:
        files: ./coverage.xml

Pre-commit Hooks¶

Run tests before committing:

# .git/hooks/pre-commit
#!/bin/bash
pytest -m "not slow" --maxfail=1

Troubleshooting¶

Common Issues¶

Issue: Tests fail with "ModuleNotFoundError"

Solution: Ensure PYTHONPATH includes project root
export PYTHONPATH="${PYTHONPATH}:$(pwd)"

Issue: CUDA out of memory in GPU tests

Solution: Reduce batch sizes in tests or skip GPU tests
pytest -m "not requires_gpu"

Issue: Tests are too slow

Solution: Run only fast tests or use parallel execution
pytest -m "not slow" -n auto

Issue: Fixture not found

Solution: Check fixture is defined in conftest.py or test file
Ensure fixture name matches function parameter exactly

Debugging Tests¶

Run with Python debugger:

pytest --pdb  # Drop into debugger on failure

Run with verbose traceback:

pytest --tb=long  # Full traceback
pytest --tb=short  # Concise traceback

Print output during test:

pytest -s  # Show print statements

Performance Profiling¶

Profile test execution time:

pytest --durations=10  # Show 10 slowest tests

Best Practices¶

DO:¶

Write tests for all new features
Keep tests independent (no shared state)
Use descriptive test names
Test edge cases and error conditions
Mock external dependencies
Keep tests fast (<1s for unit tests)
Aim for >80% code coverage

DON'T:¶

Write tests that depend on external services
Use hard-coded file paths
Test implementation details (test behavior, not internals)
Write overly complex test logic
Skip writing tests for "simple" code

Resources¶

Pytest Documentation: https://docs.pytest.org/ Coverage.py Documentation: https://coverage.readthedocs.io/ Python Testing Best Practices: https://realpython.com/pytest-python-testing/

Contact¶

For questions about testing: - Check existing tests for examples - Review this guide - Consult team documentation

Last Updated: 2025-10-13