Testing Guide¶
Overview¶
This guide provides comprehensive instructions for running tests, interpreting results, and contributing new tests to the skin cancer detection fairness project.
Table of Contents¶
- Test Infrastructure
- Running Tests
- Test Categories
- Coverage Reports
- Writing New Tests
- Continuous Integration
- Troubleshooting
Test Infrastructure¶
Test Framework¶
We use pytest as our testing framework with the following plugins:
- pytest-cov: Code coverage reporting
- pytest-xdist: Parallel test execution (optional)
Directory Structure¶
tests/
├── __init__.py
├── conftest.py # Shared fixtures
├── unit/ # Unit tests
│ ├── __init__.py
│ ├── test_data_preprocessing.py
│ ├── test_fairness_metrics.py
│ ├── test_models.py
│ └── test_utils.py
├── integration/ # Integration tests
│ ├── __init__.py
│ ├── test_training_pipeline.py
│ └── test_evaluation_pipeline.py
└── fixtures/ # Test data and mocks
├── __init__.py
└── sample_data.py
Configuration Files¶
pytest.ini: Main pytest configuration.coveragerc: Coverage reporting configurationconftest.py: Shared fixtures and pytest hooks
Running Tests¶
Basic Test Execution¶
Run all tests:
Run with verbose output:
Run with detailed output and print statements:
Running Specific Test Categories¶
Run only unit tests:
Run only integration tests:
Run only fairness-related tests:
Run only model tests:
Running Specific Test Files¶
Run a specific test file:
Run a specific test class:
Run a specific test function:
Excluding Slow Tests¶
Skip slow tests (>5 seconds):
Parallel Execution¶
Run tests in parallel (requires pytest-xdist):
Run with 4 parallel workers:
Running Tests Requiring GPU¶
Run GPU-specific tests (requires CUDA):
Skip GPU tests:
Test Categories¶
Unit Tests¶
Purpose: Test individual functions and classes in isolation.
Characteristics: - Fast execution (<1 second per test) - No external dependencies - Use mock data - High code coverage
Examples: - Image normalization functions - Fairness metric calculations - Model initialization - Configuration loading
Markers: @pytest.mark.unit
Integration Tests¶
Purpose: Test end-to-end workflows and component interactions.
Characteristics: - Slower execution (1-30 seconds per test) - Test multiple components together - Verify complete pipelines - Use realistic data flows
Examples: - Full training loop - Evaluation pipeline - Checkpoint saving/loading - Multi-model comparison
Markers: @pytest.mark.integration, @pytest.mark.pipeline
Fairness Tests¶
Purpose: Validate fairness metrics and bias detection.
Characteristics: - Test demographic parity - Verify equalized odds - Validate calibration - FST-stratified analysis
Examples: - AUROC per FST group - Equalized Odds Difference (EOD) - Expected Calibration Error (ECE) - Confusion matrix per demographic
Markers: @pytest.mark.fairness
Slow Tests¶
Purpose: Comprehensive tests that take >5 seconds.
Characteristics: - Multi-epoch training - Large dataset processing - Extensive model evaluation
Markers: @pytest.mark.slow
Coverage Reports¶
Generating Coverage Reports¶
Run tests with coverage:
Coverage Report Types¶
Terminal Report:
Shows coverage with missing line numbers in terminal.
HTML Report:
Generates detailed HTML report in htmlcov/index.html.
XML Report (for CI):
Generates coverage.xml for tools like Codecov.
Coverage Targets¶
Project-wide targets: - Overall: >80% code coverage - Critical modules (fairness, models): >90%
Viewing Coverage:
# Open HTML report
open htmlcov/index.html # macOS
xdg-open htmlcov/index.html # Linux
start htmlcov/index.html # Windows
Interpreting Coverage¶
Coverage metrics: - Statements: Lines executed - Branches: Conditional paths taken - Missing: Lines not covered by tests
Example output:
Name Stmts Miss Cover Missing
-------------------------------------------------------------
src/data/preprocessing.py 45 3 93% 12, 45, 67
src/fairness/metrics.py 78 5 94% 23, 89-92
src/models/resnet.py 56 8 86% 34, 67-73
-------------------------------------------------------------
TOTAL 179 16 91%
Writing New Tests¶
Test Naming Conventions¶
Files: test_<module_name>.py
Classes: Test<Functionality>
Functions: test_<what_is_being_tested>
Test Structure (AAA Pattern)¶
def test_example():
# Arrange: Set up test data and conditions
model = create_model()
data = load_test_data()
# Act: Execute the functionality being tested
result = model.predict(data)
# Assert: Verify expected outcomes
assert result.shape == (10, 7)
assert torch.isfinite(result).all()
Using Fixtures¶
Accessing shared fixtures (from conftest.py):
def test_with_fixture(mock_dataloader, device):
# Use pre-configured dataloader and device
batch = next(iter(mock_dataloader))
assert batch['image'].device.type == device.type
Creating local fixtures:
@pytest.fixture
def custom_dataset():
return MockHAM10000(num_samples=50, seed=42)
def test_custom(custom_dataset):
assert len(custom_dataset) == 50
Parametrized Tests¶
Test multiple inputs efficiently:
@pytest.mark.parametrize("batch_size,expected", [
(8, (8, 3, 224, 224)),
(16, (16, 3, 224, 224)),
(32, (32, 3, 224, 224)),
])
def test_batch_shapes(batch_size, expected):
data = torch.randn(*expected)
assert data.shape == expected
Marking Tests¶
Add markers to categorize tests:
@pytest.mark.unit
@pytest.mark.fairness
def test_fairness_metric():
pass
@pytest.mark.integration
@pytest.mark.slow
def test_full_training():
pass
@pytest.mark.requires_gpu
def test_cuda_operations():
pass
Testing Exceptions¶
Verify error handling:
def test_invalid_input_raises_error():
with pytest.raises(ValueError, match="Expected 4D tensor"):
process_image(torch.randn(3, 224)) # Wrong dimensions
Skipping Tests¶
Skip tests conditionally:
@pytest.mark.skipif(not torch.cuda.is_available(),
reason="CUDA not available")
def test_gpu_training():
pass
Approximate Comparisons¶
For floating-point comparisons:
def test_loss_value():
loss = compute_loss(predictions, labels)
assert loss == pytest.approx(0.5, abs=0.01) # Within 0.01
Continuous Integration¶
GitHub Actions Example¶
name: Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-cov
- name: Run tests
run: |
pytest --cov=src --cov-report=xml --cov-report=term
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: ./coverage.xml
Pre-commit Hooks¶
Run tests before committing:
Troubleshooting¶
Common Issues¶
Issue: Tests fail with "ModuleNotFoundError"
Issue: CUDA out of memory in GPU tests
Issue: Tests are too slow
Issue: Fixture not found
Solution: Check fixture is defined in conftest.py or test file
Ensure fixture name matches function parameter exactly
Debugging Tests¶
Run with Python debugger:
Run with verbose traceback:
Print output during test:
Performance Profiling¶
Profile test execution time:
Best Practices¶
DO:¶
- Write tests for all new features
- Keep tests independent (no shared state)
- Use descriptive test names
- Test edge cases and error conditions
- Mock external dependencies
- Keep tests fast (<1s for unit tests)
- Aim for >80% code coverage
DON'T:¶
- Write tests that depend on external services
- Use hard-coded file paths
- Test implementation details (test behavior, not internals)
- Write overly complex test logic
- Skip writing tests for "simple" code
Resources¶
Pytest Documentation: https://docs.pytest.org/ Coverage.py Documentation: https://coverage.readthedocs.io/ Python Testing Best Practices: https://realpython.com/pytest-python-testing/
Contact¶
For questions about testing: - Check existing tests for examples - Review this guide - Consult team documentation
Last Updated: 2025-10-13