Skip to content

Added an Isolation Forest algorithm for anomaly detection #12264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions machine_learning/isolation_forest_for_anamoly_prediction.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.ensemble import IsolationForest
from sklearn.model_selection import train_test_split


def isolation_forest(features: np.ndarray, test_features: np.ndarray) -> np.ndarray:
"""
This function trains an Isolation Forest algorithm and predicts anomalies.

More on Isolation Forest:
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.IsolationForest.html
https://en.wikipedia.org/wiki/Isolation_forest

Parameters:
features (np.ndarray): The training data (features) on which the Isolation Forest model will be trained.

Check failure on line 16 in machine_learning/isolation_forest_for_anamoly_prediction.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/isolation_forest_for_anamoly_prediction.py:16:89: E501 Line too long (108 > 88)
test_features (np.ndarray): The test data (features) to predict whether they are anomalies or not.

Check failure on line 17 in machine_learning/isolation_forest_for_anamoly_prediction.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/isolation_forest_for_anamoly_prediction.py:17:89: E501 Line too long (102 > 88)

Returns:
np.ndarray: Array of predictions where -1 indicates an anomaly.

Raises:
ValueError: If `features` or `test_features` are not two-dimensional arrays or have mismatched feature sizes.

Check failure on line 23 in machine_learning/isolation_forest_for_anamoly_prediction.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/isolation_forest_for_anamoly_prediction.py:23:89: E501 Line too long (113 > 88)

Examples:
>>> features, _ = make_blobs(n_samples=100, centers=3, cluster_std=0.60, random_state=0)

Check failure on line 26 in machine_learning/isolation_forest_for_anamoly_prediction.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/isolation_forest_for_anamoly_prediction.py:26:89: E501 Line too long (92 > 88)
>>> test_features = np.array([[0, 0], [10, 10], [-1, -1]]) # Adjusted to ensure variability

Check failure on line 27 in machine_learning/isolation_forest_for_anamoly_prediction.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/isolation_forest_for_anamoly_prediction.py:27:89: E501 Line too long (96 > 88)
>>> predictions = isolation_forest(features, test_features)
>>> np.unique(predictions)
array([-1, 1]) # Expecting some normal points and some anomalies

>>> isolation_forest(np.array([[1, 2]]), np.array([[1]])) # Test for ValueError
Traceback (most recent call last):
...
ValueError: `features` and `test_features` must have the same number of features.
"""
# Validate input shapes
if features.ndim != 2 or test_features.ndim != 2:
raise ValueError(
"`features` and `test_features` must be two-dimensional arrays."
)
if features.shape[1] != test_features.shape[1]:
raise ValueError(
"`features` and `test_features` must have the same number of features."
)

iso_forest = IsolationForest(n_estimators=100, random_state=42)
iso_forest.fit(features)

predictions = iso_forest.predict(test_features)
return predictions


def main() -> None:
"""
Main function to demonstrate the use of Isolation Forest on a synthetic dataset.
"""
features, _ = make_blobs(n_samples=100, centers=3, cluster_std=0.60, random_state=0)
x_train, x_test = train_test_split(features, test_size=0.2, random_state=42)

test_features = np.array([[1, 1], [5, 5], [10, 10], [6, 6], [-1, -1]])
predictions = isolation_forest(x_train, test_features)

Check failure on line 62 in machine_learning/isolation_forest_for_anamoly_prediction.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (F841)

machine_learning/isolation_forest_for_anamoly_prediction.py:62:5: F841 Local variable `predictions` is assigned to but never used


if __name__ == "__main__":
import doctest

doctest.testmod(verbose=True)
main()
Loading