-
Notifications
You must be signed in to change notification settings - Fork 415
Categoricalimputer improvements #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4c68601
to
5eb6c50
Compare
5eb6c50
to
e0d7c00
Compare
Rebased over flake8 fixes. |
|
||
modes = pd.Series(X).mode() | ||
if modes.shape[0] == 0: | ||
raise ValueError('No value is repeteated more than ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: repeteated -> repeated.
modes = pd.Series(X).mode() | ||
if modes.shape[0] == 0: | ||
raise ValueError('No value is repeteated more than ' | ||
'twice in the column') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at pandas doc, I think that should be:
"No value is repeated more than once in the column."
or
"No value is repeated at least twice in the column"
About
seems like no longer will be a problem in pandas 0.20.0 since they will change |
Not sure. The issue discussed seems to be about the mode of a Series with
only one item. What will be the behavior when there are multiple items but
none of them appears more than once?
2017-04-19 10:45 GMT+02:00 Arnau <[email protected]>:
… About
CategoricalImputer: Error out when no mode is found
seems like no longer will be a problem in pandas 0.20.0 since they will
change mode behavior to always return at least one value:
pandas-dev/pandas#15744 <pandas-dev/pandas#15744>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#89 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACj4Spuups51NZp0vqcoWOZ4_FlMrrOks5rxcmdgaJpZM4M_BbA>
.
|
Self-answer: From the tests I understand that if there are multiple items
and none appears more than once then the mode will be an array with all the
items, inversely sorted.
2017-04-19 10:50 GMT+02:00 Israel Saeta Pérez <[email protected]>:
… Not sure. The issue discussed seems to be about the mode of a Series with
only one item. What will be the behavior when there are multiple items but
none of them appears more than once?
2017-04-19 10:45 GMT+02:00 Arnau ***@***.***>:
> About
>
> CategoricalImputer: Error out when no mode is found
>
> seems like no longer will be a problem in pandas 0.20.0 since they will
> change mode behavior to always return at least one value:
> pandas-dev/pandas#15744 <pandas-dev/pandas#15744>
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#89 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AACj4Spuups51NZp0vqcoWOZ4_FlMrrOks5rxcmdgaJpZM4M_BbA>
> .
>
|
e0d7c00
to
5ce407c
Compare
From #87 with a minor fix.
Enhancements:
CategoricalImputer
also inherits fromBaseEstimator
.missing_values
param: to specify which is the placeholder for the missing values.copy
param: to specify whether to perfom the imputation in a copy ofX
or inplace.y
param in fit forPipeline
compatibility.NotFittedError
in transform if the imputer was not previously fitted.Fix bugs: