Skip to content

Catboost regressor #11877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
100 changes: 100 additions & 0 deletions machine_learning/catboost_regressor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
"""
CatBoost Regressor Example.

This script demonstrates the usage of the CatBoost Regressor for a simple regression task.

Check failure on line 4 in machine_learning/catboost_regressor.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/catboost_regressor.py:4:89: E501 Line too long (90 > 88)
CatBoost is a powerful gradient boosting library that handles categorical features
automatically and is highly efficient.

Make sure to install CatBoost using:
pip install catboost

Contributed by: @AHuzail
"""

import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from catboost import CatBoostRegressor


def data_handling() -> tuple:

Check failure on line 21 in machine_learning/catboost_regressor.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (I001)

machine_learning/catboost_regressor.py:14:1: I001 Import block is un-sorted or un-formatted
"""
Loads and handles the California Housing dataset (replacement for deprecated Boston dataset).

Check failure on line 23 in machine_learning/catboost_regressor.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (E501)

machine_learning/catboost_regressor.py:23:89: E501 Line too long (97 > 88)

Returns:
tuple: A tuple of (features, target), where both are numpy arrays.

Example:
>>> features, target = data_handling()
>>> isinstance(features, np.ndarray)
True
>>> isinstance(target, np.ndarray)
True
>>> features.shape
(20640, 8)
>>> target.shape
(20640,)
"""
housing = fetch_california_housing()
features = housing.data
target = housing.target
return features, target


def catboost_regressor(features: np.ndarray, target: np.ndarray) -> CatBoostRegressor:
"""
Trains a CatBoostRegressor using the provided features and target values.

Args:
features (np.ndarray): The input features for the regression model.
target (np.ndarray): The target values for the regression model.

Returns:
CatBoostRegressor: A trained CatBoost regressor model.

Example:
>>> features, target = data_handling()
>>> model = catboost_regressor(features, target)
>>> isinstance(model, CatBoostRegressor)
True
"""
regressor = CatBoostRegressor(iterations=100, learning_rate=0.1, depth=6, verbose=0)
regressor.fit(features, target)
return regressor


def main() -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there is no test file in this pull request nor any test function or class in the file machine_learning/catboost_regressor.py, please provide doctest for the function main

"""
Main function to run the CatBoost Regressor example.

It loads the data, splits it into training and testing sets,
trains the regressor on the training data, and evaluates its performance
on the test data.

Example:
>>> main()
Mean Squared Error on Test Set:
"""
# Load and split the dataset
features, target = data_handling()
x_train, x_test, y_train, y_test = train_test_split(
features, target, test_size=0.25, random_state=42
)

# Train CatBoost Regressor
regressor = catboost_regressor(x_train, y_train)

# Predict on the test set
predictions = regressor.predict(x_test)

# Evaluate the performance using Mean Squared Error
mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error on Test Set: {mse:.4f}")


if __name__ == "__main__":
import doctest

doctest.testmod(verbose=True)
main()
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
beautifulsoup4
catboost
fake_useragent
imageio
keras
Expand Down
Loading