Skip to content

Commit 44cb411

Browse files
committed
feat: Implement Principal Component Analysis (PCA)
- Added a Python implementation of PCA using NumPy and scikit-learn - Standardizes the dataset before applying PCA for better performance - Computes principal components and explained variance ratio - Uses the Iris dataset as a sample for demonstration - Provides a modular structure for easy extension and dataset modification
1 parent f528ce3 commit 44cb411

File tree

1 file changed

+66
-0
lines changed

1 file changed

+66
-0
lines changed
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
"""
2+
Principal Component Analysis (PCA) is a dimensionality reduction technique
3+
commonly used in machine learning. It transforms high-dimensional data into
4+
lower dimensions while retaining most of the information.
5+
6+
Here,we use a dataset (Iris dataset) and apply PCA to reduce the
7+
dimensionality. We compute the principal components and transform the dataset
8+
into a lower-dimensional space.
9+
10+
We reduce the number of columns form 4 to 2
11+
12+
"""
13+
14+
import numpy as np
15+
import requests
16+
from sklearn.decomposition import PCA
17+
from sklearn.preprocessing import StandardScaler
18+
from sklearn.datasets import load_iris
19+
20+
21+
def collect_dataset():
22+
"""Collect dataset (Iris dataset)
23+
:return: Feature matrix and target values
24+
"""
25+
data = load_iris()
26+
return np.array(data.data), np.array(data.target)
27+
28+
29+
def apply_pca(data_x, n_components):
30+
"""Apply Principal Component Analysis (PCA)
31+
:param data_x: Original dataset
32+
:param n_components: Number of principal components
33+
:return: Transformed dataset and explained variance
34+
"""
35+
# Standardizing the features
36+
scaler = StandardScaler()
37+
data_x_scaled = scaler.fit_transform(data_x)
38+
39+
# Applying PCA
40+
pca = PCA(n_components=n_components)
41+
principal_components = pca.fit_transform(data_x_scaled)
42+
43+
# Explained variance ratio
44+
explained_variance = pca.explained_variance_ratio_
45+
46+
return principal_components, explained_variance
47+
48+
49+
def main():
50+
"""Driver function"""
51+
data_x, data_y = collect_dataset()
52+
# Set number of principal components
53+
n_components = 3
54+
55+
# Apply PCA
56+
transformed_data, variance_ratio = apply_pca(data_x, n_components)
57+
58+
print("Transformed Dataset (First 5 rows):")
59+
print(transformed_data[:5])
60+
61+
print("\nExplained Variance Ratio:")
62+
print(variance_ratio)
63+
64+
65+
if __name__ == "__main__":
66+
main()

0 commit comments

Comments
 (0)