Skip to content

CICIDS2017 Baseline Models - Evaluation Report

Generated: 2025-10-13 18:51:02 Mission: OPERATION ML-BASELINE Agent: HOLLOWED_EYES

Executive Summary

This report presents the performance evaluation of three baseline machine learning models trained on the CICIDS2017 intrusion detection dataset for binary classification (BENIGN vs ATTACK).

Model Performance Comparison

Model Accuracy Precision Recall F1-Score FP Rate Inference Time
Random Forest 99.28% 99.29% 99.28% 99.28% 0.25% 0.0008ms
Xgboost 99.21% 99.23% 99.21% 99.21% 0.09% 0.0003ms
Decision Tree 99.10% 99.13% 99.10% 99.11% 0.24% 0.0002ms

Detailed Model Results

Random Forest

Classification Metrics: - Accuracy: 99.28% - Precision: 99.29% - Recall: 99.28% - F1-Score: 99.28% - False Positive Rate: 0.25%

Performance Characteristics: - Training Time: 2.57s - Average Inference Time: 0.0008ms/sample - Model Size: 2.93MB

Confusion Matrix:

[[ 8840    22]
 [  282 32858]]

  • True Negatives (BENIGN correctly identified): 8,840
  • False Positives (BENIGN incorrectly flagged as ATTACK): 22
  • False Negatives (ATTACK missed): 282
  • True Positives (ATTACK correctly detected): 32,858

Xgboost

Classification Metrics: - Accuracy: 99.21% - Precision: 99.23% - Recall: 99.21% - F1-Score: 99.21% - False Positive Rate: 0.09%

Performance Characteristics: - Training Time: 0.79s - Average Inference Time: 0.0003ms/sample - Model Size: 0.18MB

Confusion Matrix:

[[ 8854     8]
 [  325 32815]]

  • True Negatives (BENIGN correctly identified): 8,854
  • False Positives (BENIGN incorrectly flagged as ATTACK): 8
  • False Negatives (ATTACK missed): 325
  • True Positives (ATTACK correctly detected): 32,815

Decision Tree

Classification Metrics: - Accuracy: 99.10% - Precision: 99.13% - Recall: 99.10% - F1-Score: 99.11% - False Positive Rate: 0.24%

Performance Characteristics: - Training Time: 5.22s - Average Inference Time: 0.0002ms/sample - Model Size: 0.03MB

Confusion Matrix:

[[ 8841    21]
 [  355 32785]]

  • True Negatives (BENIGN correctly identified): 8,841
  • False Positives (BENIGN incorrectly flagged as ATTACK): 21
  • False Negatives (ATTACK missed): 355
  • True Positives (ATTACK correctly detected): 32,785

Best Model Recommendation

Highest Accuracy: Random Forest (99.28%)

Best F1-Score: Random Forest (99.28%)

Fastest Inference: Decision Tree (0.0002ms/sample)

Recommendation for Production:

The following model(s) meet all performance targets: Random Forest, Xgboost, Decision Tree