Skip to content

Categoricalimputer improvements #89

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 29, 2017
Merged

Conversation

dukebody
Copy link
Collaborator

@dukebody dukebody commented Apr 17, 2017

From #87 with a minor fix.

Enhancements:

  • makes that CategoricalImputer also inherits from BaseEstimator.
  • add missing_values param: to specify which is the placeholder for the missing values.
  • add copy param: to specify whether to perfom the imputation in a copy of X or inplace.
  • add y param in fit for Pipeline compatibility.
  • raise NotFittedError in transform if the imputer was not previously fitted.

Fix bugs:

@dukebody
Copy link
Collaborator Author

@arnau126

@dukebody dukebody force-pushed the categoricalimputer-improvements branch from 4c68601 to 5eb6c50 Compare April 17, 2017 09:54
@dukebody dukebody force-pushed the categoricalimputer-improvements branch from 5eb6c50 to e0d7c00 Compare April 17, 2017 10:17
@dukebody
Copy link
Collaborator Author

Rebased over flake8 fixes.


modes = pd.Series(X).mode()
if modes.shape[0] == 0:
raise ValueError('No value is repeteated more than '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: repeteated -> repeated.

modes = pd.Series(X).mode()
if modes.shape[0] == 0:
raise ValueError('No value is repeteated more than '
'twice in the column')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at pandas doc, I think that should be:

"No value is repeated more than once in the column."

or

"No value is repeated at least twice in the column"

@arnau126
Copy link
Collaborator

About

CategoricalImputer: Error out when no mode is found

seems like no longer will be a problem in pandas 0.20.0 since they will change mode behavior to always return at least one value:
pandas-dev/pandas#15744

@dukebody
Copy link
Collaborator Author

dukebody commented Apr 19, 2017 via email

@dukebody
Copy link
Collaborator Author

dukebody commented Apr 19, 2017 via email

@dukebody dukebody force-pushed the categoricalimputer-improvements branch from e0d7c00 to 5ce407c Compare April 29, 2017 16:43
@dukebody dukebody merged commit b0df3dd into master Apr 29, 2017
@dukebody dukebody deleted the categoricalimputer-improvements branch April 29, 2017 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants