RAG Service API Reference¶

Retrieval-Augmented Generation service for grounding LLM responses with security knowledge bases using ChromaDB vector search and semantic embeddings.

Service Overview¶

Property	Value
Base URL	`http://rag-service:8002` (internal), `https://api.ai-soc.example.com:8300` (external)
Protocol	HTTP/HTTPS (REST)
Content Type	`application/json`
Authentication	API Key (Bearer token) or JWT
Vector Database	ChromaDB (persistent storage)
Embedding Model	nomic-embed-text (137M parameters, 768 dimensions)
Search Algorithm	HNSW (Hierarchical Navigable Small World)
Latency	50-200ms average (embedding + retrieval)

Authentication¶

All endpoints except /health require authentication.

API Key Authentication¶

POST /retrieve HTTP/1.1
Host: rag-service:8002
Authorization: Bearer aisoc_<your-api-key>
Content-Type: application/json

JWT Authentication¶

POST /retrieve HTTP/1.1
Host: rag-service:8002
Authorization: Bearer eyJhbGc...
Content-Type: application/json

Endpoints¶

GET /health¶

Health check endpoint for monitoring vector database connectivity.

Request¶

GET /health HTTP/1.1
Host: rag-service:8002

Response¶

Status: 200 OK

{
  "status": "healthy",
  "service": "rag-service",
  "version": "1.0.0",
  "chromadb_connected": true
}

Response Fields:

Field	Type	Description
`status`	string	Service status: `healthy`, `degraded`, `unhealthy`
`service`	string	Service identifier
`version`	string	API version
`chromadb_connected`	boolean	Whether ChromaDB is reachable

POST /retrieve¶

Retrieve relevant context from knowledge base using semantic search.

Request¶

POST /retrieve HTTP/1.1
Host: rag-service:8002
Authorization: Bearer aisoc_<your-api-key>
Content-Type: application/json

{
  "query": "What are common brute force attack techniques?",
  "collection": "mitre_attack",
  "top_k": 3,
  "min_similarity": 0.7
}

Request Body Schema:

{
  "query": {
    "type": "string",
    "minLength": 1,
    "required": true,
    "description": "Search query for semantic matching"
  },
  "collection": {
    "type": "string",
    "default": "mitre_attack",
    "enum": ["mitre_attack", "cve_database", "incident_history", "security_runbooks"],
    "description": "Knowledge base collection name"
  },
  "top_k": {
    "type": "integer",
    "minimum": 1,
    "maximum": 10,
    "default": 3,
    "description": "Number of results to return"
  },
  "min_similarity": {
    "type": "number",
    "minimum": 0.0,
    "maximum": 1.0,
    "default": 0.7,
    "description": "Minimum cosine similarity threshold (0.0-1.0)"
  }
}

Response (Success)¶

Status: 200 OK

{
  "query": "What are common brute force attack techniques?",
  "results": [
    {
      "document": "MITRE ATT&CK T1110: Brute Force - Adversaries may use brute force techniques to gain access to accounts when passwords are unknown or when password hashes are obtained. Without knowledge of the password for an account or set of accounts, an adversary may systematically guess the password using a repetitive or iterative mechanism.",
      "metadata": {
        "technique_id": "T1110",
        "tactic": "Credential Access",
        "sub_techniques": ["T1110.001", "T1110.002", "T1110.003", "T1110.004"],
        "data_sources": ["Authentication logs", "Application logs"],
        "mitigations": ["M1036", "M1027", "M1032"]
      },
      "similarity_score": 0.92
    },
    {
      "document": "T1110.001 - Password Guessing: Adversaries may use brute force techniques to attempt access to accounts by guessing passwords. This technique involves trying common passwords, dictionary words, or systematically generated passwords until the correct one is found.",
      "metadata": {
        "technique_id": "T1110.001",
        "parent_technique": "T1110",
        "tactic": "Credential Access",
        "detection": "Monitor authentication logs for multiple failed attempts"
      },
      "similarity_score": 0.89
    },
    {
      "document": "T1110.003 - Password Spraying: Adversaries may use a single or small list of commonly used passwords against many different accounts to attempt to acquire valid account credentials. This technique avoids account lockouts by trying one password against multiple accounts.",
      "metadata": {
        "technique_id": "T1110.003",
        "parent_technique": "T1110",
        "tactic": "Credential Access"
      },
      "similarity_score": 0.85
    }
  ],
  "total_results": 3
}

Response Fields:

Field	Type	Description
`query`	string	Original search query (echo)
`results`	array	Matching documents with metadata
`results[].document`	string	Retrieved text content
`results[].metadata`	object	Document metadata (technique IDs, tactics, etc.)
`results[].similarity_score`	float	Cosine similarity (0.0-1.0)
`total_results`	integer	Number of results returned

Response (No Results Found)¶

Status: 200 OK (empty results)

{
  "query": "quantum entanglement in cybersecurity",
  "results": [],
  "total_results": 0
}

Interpretation: No documents exceeded the min_similarity threshold of 0.7.

Response (Collection Not Found)¶

Status: 404 Not Found

{
  "error": "Collection not found",
  "detail": "Collection 'invalid_collection' does not exist",
  "available_collections": ["mitre_attack", "cve_database", "incident_history", "security_runbooks"]
}

POST /ingest¶

Ingest documents into knowledge base collection.

Request¶

POST /ingest HTTP/1.1
Host: rag-service:8002
Authorization: Bearer aisoc_<your-api-key>
Content-Type: application/json

{
  "collection": "incident_history",
  "documents": [
    {
      "text": "Ransomware incident on 2025-10-15 affecting file servers. Attack vector: phishing email with malicious attachment. Impact: 50 workstations encrypted. Response: Restored from backups, implemented email filtering.",
      "metadata": {
        "incident_id": "INC-2025-001",
        "date": "2025-10-15",
        "severity": "HIGH",
        "attack_type": "Ransomware",
        "resolution_status": "Resolved"
      }
    },
    {
      "text": "SQL injection attempt detected on web application. Attack blocked by WAF. No data exfiltration occurred. Patched vulnerability in login form.",
      "metadata": {
        "incident_id": "INC-2025-002",
        "date": "2025-10-18",
        "severity": "MEDIUM",
        "attack_type": "SQL Injection",
        "resolution_status": "Resolved"
      }
    }
  ]
}

Request Body Schema:

{
  "collection": {
    "type": "string",
    "required": true,
    "description": "Target collection name"
  },
  "documents": {
    "type": "array",
    "required": true,
    "minItems": 1,
    "maxItems": 100,
    "items": {
      "type": "object",
      "properties": {
        "text": {
          "type": "string",
          "minLength": 1,
          "description": "Document text content"
        },
        "metadata": {
          "type": "object",
          "description": "Arbitrary metadata fields"
        }
      },
      "required": ["text"]
    }
  }
}

Response (Success)¶

Status: 201 Created

{
  "status": "success",
  "collection": "incident_history",
  "documents_added": 2,
  "embedding_time_ms": 45,
  "indexing_time_ms": 23,
  "total_time_ms": 68
}

Response (Partial Success)¶

Status: 207 Multi-Status

{
  "status": "partial_success",
  "collection": "incident_history",
  "documents_added": 8,
  "documents_failed": 2,
  "failed_documents": [
    {
      "index": 3,
      "error": "Empty text field"
    },
    {
      "index": 7,
      "error": "Text exceeds maximum length (10,000 characters)"
    }
  ]
}

GET /collections¶

List available knowledge base collections with statistics.

Request¶

GET /collections HTTP/1.1
Host: rag-service:8002
Authorization: Bearer aisoc_<your-api-key>

Response¶

Status: 200 OK

{
  "collections": [
    {
      "name": "mitre_attack",
      "description": "MITRE ATT&CK techniques and tactics (version 14.0)",
      "document_count": 793,
      "status": "ready",
      "last_updated": "2025-10-20T12:00:00Z",
      "embedding_dimensions": 768,
      "index_type": "HNSW"
    },
    {
      "name": "cve_database",
      "description": "Critical vulnerabilities (CVSS >= 7.0)",
      "document_count": 2547,
      "status": "ready",
      "last_updated": "2025-10-23T08:00:00Z",
      "embedding_dimensions": 768,
      "index_type": "HNSW"
    },
    {
      "name": "incident_history",
      "description": "Resolved security incidents from TheHive",
      "document_count": 156,
      "status": "ready",
      "last_updated": "2025-10-24T10:15:30Z",
      "embedding_dimensions": 768,
      "index_type": "HNSW"
    },
    {
      "name": "security_runbooks",
      "description": "Incident response playbooks and procedures",
      "document_count": 42,
      "status": "ready",
      "last_updated": "2025-10-15T14:30:00Z",
      "embedding_dimensions": 768,
      "index_type": "HNSW"
    }
  ],
  "total_collections": 4,
  "total_documents": 3538
}

DELETE /collections/{collection_name}¶

Delete an entire collection (admin only).

Request¶

DELETE /collections/test_collection HTTP/1.1
Host: rag-service:8002
Authorization: Bearer aisoc_admin_api_key

Response¶

Status: 204 No Content

Error Codes¶

HTTP Status	Error Code	Description
400	`invalid_query`	Query validation failed (empty or malformed)
401	`unauthorized`	Missing or invalid authentication
404	`collection_not_found`	Specified collection does not exist
413	`payload_too_large`	Document batch exceeds size limit
429	`rate_limit_exceeded`	Request quota exhausted
503	`chromadb_unavailable`	Vector database unreachable
500	`internal_error`	Unexpected server error

Knowledge Base Collections¶

mitre_attack¶

Description: Complete MITRE ATT&CK framework (version 14.0)

Contents: - 793 techniques and sub-techniques - Tactics, data sources, mitigations - Platform-specific information - Detection guidance

Example Queries: - "How do adversaries escalate privileges?" - "What are common lateral movement techniques?" - "Reconnaissance tactics in cyber attacks"

Metadata Fields:

{
  "technique_id": "T1110.001",
  "tactic": "Credential Access",
  "sub_techniques": ["T1110.001", "T1110.002"],
  "platforms": ["Windows", "Linux", "macOS"],
  "data_sources": ["Authentication logs"],
  "mitigations": ["M1036", "M1027"]
}

cve_database¶

Description: High-severity CVE database (CVSS >= 7.0)

Contents: - 2,547 critical vulnerabilities - CVE descriptions, affected software - Exploit availability, remediation

Example Queries: - "Recent remote code execution vulnerabilities" - "Critical Apache web server CVEs" - "Vulnerabilities affecting Windows Server 2019"

Metadata Fields:

{
  "cve_id": "CVE-2025-1234",
  "cvss_score": 9.8,
  "severity": "CRITICAL",
  "affected_software": "Apache HTTP Server 2.4.x",
  "exploit_available": true,
  "published_date": "2025-09-15"
}

incident_history¶

Description: Resolved security incidents from TheHive case management

Contents: - 156 historical incidents - Attack patterns, resolution procedures - Lessons learned, indicators of compromise

Example Queries: - "How was the ransomware incident resolved?" - "Previous SQL injection attempts" - "Incidents involving phishing emails"

Metadata Fields:

{
  "incident_id": "INC-2025-001",
  "date": "2025-10-15",
  "severity": "HIGH",
  "attack_type": "Ransomware",
  "resolution_status": "Resolved",
  "resolution_time_hours": 4.5
}

security_runbooks¶

Description: Incident response playbooks and SOC procedures

Contents: - 42 response playbooks - NIST-aligned procedures - Escalation guidelines, checklists

Example Queries: - "Malware infection response procedure" - "DDoS attack mitigation steps" - "Data breach notification requirements"

Metadata Fields:

{
  "playbook_id": "PB-RANSOMWARE-001",
  "incident_type": "Ransomware",
  "severity_level": "P0",
  "estimated_duration_minutes": 30,
  "required_tools": ["EDR", "Backup System"]
}

Rate Limiting¶

Profile	Default	Retrieve Endpoint	Ingest Endpoint
Strict	30 req/min	20 req/min	5 req/min
Moderate	100 req/min	50 req/min	10 req/min
Permissive	300 req/min	150 req/min	50 req/min

Rate Limit Headers:

X-RateLimit-Limit: 50
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1698765432

Example Usage¶

Python (RAG Integration with Alert Triage)¶

import httpx
import asyncio

async def enrich_alert_with_context(alert_description: str):
    """Retrieve threat intelligence context for alert analysis"""

    url = "http://rag-service:8002/retrieve"
    headers = {
        "Authorization": "Bearer aisoc_your_api_key",
        "Content-Type": "application/json"
    }

    payload = {
        "query": alert_description,
        "collection": "mitre_attack",
        "top_k": 3,
        "min_similarity": 0.7
    }

    async with httpx.AsyncClient(timeout=5.0) as client:
        response = await client.post(url, json=payload, headers=headers)

        if response.status_code == 200:
            data = response.json()

            # Extract relevant context for LLM
            context = []
            for result in data['results']:
                context.append({
                    "technique": result['metadata'].get('technique_id'),
                    "description": result['document'],
                    "similarity": result['similarity_score']
                })

            return context
        else:
            print(f"RAG error: {response.status_code}")
            return []

# Usage in LLM prompt
alert = "Multiple failed SSH login attempts from 192.168.1.50"
context = await enrich_alert_with_context(alert)

llm_prompt = f"""
Analyze this security alert: {alert}

Relevant threat intelligence:
{context}

Provide verdict, severity, and recommendations.
"""

cURL (Retrieve)¶

curl -X POST http://rag-service:8002/retrieve \
  -H "Authorization: Bearer aisoc_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are common brute force techniques?",
    "collection": "mitre_attack",
    "top_k": 3,
    "min_similarity": 0.7
  }'

cURL (Ingest Incident)¶

curl -X POST http://rag-service:8002/ingest \
  -H "Authorization: Bearer aisoc_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "collection": "incident_history",
    "documents": [
      {
        "text": "Phishing campaign detected targeting finance department...",
        "metadata": {
          "incident_id": "INC-2025-042",
          "severity": "HIGH",
          "attack_type": "Phishing"
        }
      }
    ]
  }'

Embedding Model Information¶

nomic-embed-text¶

Specialization: Text embeddings optimized for semantic search Parameters: 137 million Dimensions: 768 Context Window: 8,192 tokens

Performance: - Embedding latency: 20-40ms (batch of 10) - Quality: MTEB score 62.4 - Memory: 550MB model size

Advantages: - Fast inference (CPU-optimized) - Strong semantic understanding - Efficient batching - No GPU required

Vector Search Configuration¶

HNSW Index Parameters¶

index_configuration:
  algorithm: HNSW  # Hierarchical Navigable Small World
  space: cosine    # Cosine similarity metric
  ef_construction: 200  # Higher = better quality, slower indexing
  M: 16           # Number of connections per node
  ef_search: 100  # Higher = better recall, slower search

Performance Characteristics: - Build time: ~1 second per 1,000 documents - Search time: <50ms for 10,000 documents - Recall@10: >95% for typical queries

Production Considerations¶

Scaling¶

Horizontal Scaling (Read Replicas):

# docker-compose.yml
services:
  rag-service:
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '2.0'
          memory: 4G

Throughput: - Single instance: 200-300 retrievals/minute - 3 replicas: 600-900 retrievals/minute - Ingestion: 50-100 documents/minute (single instance)

ChromaDB Persistence¶

Data Directory:

services:
  chromadb:
    volumes:
      - chromadb-data:/chroma/chroma

Backup Strategy:

# Backup ChromaDB data directory
tar -czf chromadb-backup-$(date +%Y%m%d).tar.gz /var/lib/docker/volumes/chromadb-data

# Restore from backup
tar -xzf chromadb-backup-20251024.tar.gz -C /var/lib/docker/volumes/chromadb-data

Monitoring¶

Prometheus Metrics:

# Retrieval latency
histogram_quantile(0.95, rag_retrieve_duration_seconds_bucket)

# Embedding throughput
rate(rag_embeddings_generated_total[5m])

# Collection size growth
rag_collection_documents_total{collection="mitre_attack"}

Changelog¶

Version 1.0.0 (Current)¶

Initial production release
ChromaDB vector storage
nomic-embed-text embeddings
4 knowledge base collections
HNSW index for fast retrieval
RESTful API with OpenAPI schema

Version 1.1.0 (Planned - Week 5)¶

MITRE ATT&CK v14.0 update
CVE database auto-sync (NVD feeds)
TheHive incident auto-ingestion
Hybrid search (dense + sparse)
Re-ranking with cross-encoder
Query expansion with synonyms

Support¶

API Issues: api-support@ai-soc.example.com RAG Questions: rag-team@ai-soc.example.com Documentation: https://docs.ai-soc.example.com/api/rag-service

Document Version: 1.0 Last Updated: October 24, 2025 Maintained By: AI-SOC RAG Team