Skip to content

Added Ridge Regression To Machine Learning #12108 #12126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions machine_learning/ridge_regression.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
"""
Ridge Regression with L2 Regularization using Gradient Descent.

Ridge Regression is a type of linear regression that includes an L2 regularization
term to prevent overfitting and improve generalization. It is commonly used when
multicollinearity is present in the data.

More on Ridge Regression: https://en.wikipedia.org/wiki/Tikhonov_regularization
"""

from typing import Tuple

Check failure on line 11 in machine_learning/ridge_regression.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (UP035)

machine_learning/ridge_regression.py:11:1: UP035 `typing.Tuple` is deprecated, use `tuple` instead
import numpy as np
import pandas as pd

def load_data(file_path: str) -> Tuple[np.ndarray, np.ndarray]:

Check failure on line 15 in machine_learning/ridge_regression.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (I001)

machine_learning/ridge_regression.py:11:1: I001 Import block is un-sorted or un-formatted

Check failure on line 15 in machine_learning/ridge_regression.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (UP006)

machine_learning/ridge_regression.py:15:34: UP006 Use `tuple` instead of `Tuple` for type annotation
"""
Load data from a CSV file and return features and target arrays.

Args:
file_path: Path to the CSV file.

Returns:
A tuple containing features (X) and target (y) as numpy arrays.

Example:
>>> data = pd.DataFrame({'ADR': [200, 220], 'Rating': [1.2, 1.4]})
>>> data.to_csv('sample.csv', index=False)
>>> X, y = load_data('sample.csv')
>>> X.shape == (2, 1) and y.shape == (2,)
True
"""
data = pd.read_csv(file_path)
X = data[['Rating']].to_numpy() # Use .to_numpy() instead of .values (PD011)

Check failure on line 33 in machine_learning/ridge_regression.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (N806)

machine_learning/ridge_regression.py:33:5: N806 Variable `X` in function should be lowercase
y = data['ADR'].to_numpy()
return X, y

def ridge_gradient_descent(
X: np.ndarray, y: np.ndarray, reg_lambda: float, learning_rate: float,

Check failure on line 38 in machine_learning/ridge_regression.py

View workflow job for this annotation

GitHub Actions / ruff

Ruff (N803)

machine_learning/ridge_regression.py:38:5: N803 Argument name `X` should be lowercase

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide descriptive name for the parameter: X

Please provide descriptive name for the parameter: y

num_iters: int = 1000
) -> np.ndarray:
"""
Perform Ridge Regression using gradient descent.

Args:
X: Feature matrix.
y: Target vector.
reg_lambda: Regularization parameter (lambda).
learning_rate: Learning rate for gradient descent.
num_iters: Number of iterations for gradient descent.

Returns:
Optimized weights (coefficients) for predicting ADR from Rating.

Example:
>>> X = np.array([[1.2], [1.4]])
>>> y = np.array([200, 220])
>>> ridge_gradient_descent(X, y, reg_lambda=0.1, learning_rate=0.01).shape == (1,)
True
"""
weights = np.zeros(X.shape[1])
m = len(y)

for _ in range(num_iters):
predictions = X @ weights
error = predictions - y
gradient = (X.T @ error + reg_lambda * weights) / m
weights -= learning_rate * gradient

return weights

def mean_absolute_error(y_true: np.ndarray, y_pred: np.ndarray) -> float:
"""
Calculate the Mean Absolute Error (MAE) between true and predicted values.

Args:
y_true: Actual values.
y_pred: Predicted values.

Returns:
Mean absolute error.

Example:
>>> mean_absolute_error(np.array([200, 220]), np.array([205, 215]))
5.0
"""
return np.mean(np.abs(y_true - y_pred))

if __name__ == "__main__":
import doctest
doctest.testmod()

# Load the data
X, y = load_data("sample.csv")

# Fit the Ridge Regression model
optimized_weights = ridge_gradient_descent(X, y, reg_lambda=0.1, learning_rate=0.01)

# Make predictions
y_pred = X @ optimized_weights

# Calculate Mean Absolute Error
mae = mean_absolute_error(y, y_pred)
print("Optimized Weights:", optimized_weights)
print("Mean Absolute Error:", mae)
Loading