Skip to content

Update run.py #11967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Update run.py #11967

wants to merge 2 commits into from

Conversation

Aryan-Rajesh-Python
Copy link

Key Improvements:

Hyperparameter Configuration:

Introduced a CONFIG dictionary to easily manage and tune hyperparameters for various models (SVR, Random Forest, XGBoost, and SARIMAX).

Model Variety:

Added Random Forest and XGBoost regressors, enhancing the predictive capability through ensemble learning methods.

Feature Engineering:

Included a feature_engineering function that creates new features such as day of the week and week of the year from the date, potentially improving model performance.

Advanced Prediction Methods:

Implemented SARIMAX for time series forecasting, providing a more comprehensive approach to predict total users based on historical data and external factors.

Enhanced Data Normalization:

Incorporated normalization of input data using Normalizer from sklearn to standardize the data before training the models.

Model Evaluation Metrics:

Added an evaluate_predictions function that calculates and logs multiple evaluation metrics: Mean Squared Error (MSE), Mean Absolute Error (MAE), and R² score for better insights into model performance.

Improved Data Safety Check:

Refined the data_safety_checker function to assess the safety of predictions compared to actual results, using a more comprehensive logic to determine safety status.

Command-Line Interface:

Implemented argument parsing for loading CSV data through command-line arguments, enhancing usability and flexibility.

Model Persistence:

Added functionality to save trained models using joblib, allowing for easy reuse and deployment without the need for retraining.

Visualization Improvements:

Enhanced the plot_results function to visualize predictions against actual results clearly, making it easier to interpret model performance visually.

Aryan-Rajesh-Python and others added 2 commits October 11, 2024 13:19
Hyperparameter Optimization: The CONFIG dictionary allows easy tuning of model parameters. You can expand it to include other models or settings as needed.

Additional Models: Introduced Random Forest and XGBoost regressors for improved prediction performance, allowing for a more robust ensemble approach.

Feature Engineering: Added a new feature_engineering function that generates new features from the date, like the day of the week and week of the year, which can significantly enhance model performance.

Handling Missing Values: You can further extend the load_data function to handle missing values based on your dataset's characteristics.

Comprehensive Evaluation Metrics: Added a new evaluate_predictions function to provide mean squared error (MSE), mean absolute error (MAE), and R² metrics, giving a better understanding of model performance.

Command-Line Arguments: Enabled loading of the CSV file via command-line arguments for greater flexibility.

Model Persistence: Added functionality to save trained models using joblib, allowing for easy reuse without retraining.

Visualization Enhancements: The plotting function can be further enhanced by adding residual plots or feature importance plots if using tree-based models.
@algorithms-keeper algorithms-keeper bot added the tests are failing Do not merge until tests pass label Oct 11, 2024
@Aryan-Rajesh-Python Aryan-Rajesh-Python deleted the patch-1 branch October 11, 2024 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests are failing Do not merge until tests pass
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant