Skip to content

inverse_transform() in BaseNEncoder does not raise an exception #121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
janmotl opened this issue Sep 6, 2018 · 3 comments
Closed

inverse_transform() in BaseNEncoder does not raise an exception #121

janmotl opened this issue Sep 6, 2018 · 3 comments

Comments

@janmotl
Copy link
Collaborator

janmotl commented Sep 6, 2018

inverse_transform() in BaseNEncoder does not raise an exception when the test set contains a new value:

        X = create_dataset(n_rows=100, has_none=False)
        X_t = create_dataset(n_rows=50, has_none=False)
        X_t_extra = create_dataset(n_rows=50, extras=True, has_none=False)
        cols = ['underscore', 'none', 'extra', 321]

         enc = category_encoders.BaseNEncoder(verbose=1, cols=cols)
         enc.fit(X, y)
         with self.assertRaises(ValueError):
                _ = enc.inverse_transform(enc.transform(X_t_extra))
@janmotl
Copy link
Collaborator Author

janmotl commented Sep 11, 2018

It looks like an error could be in the remaining encoders. OneHot, Binary and Ordinal use following code:

for col in self.cols:
    if any(X[col] == 0):
        raise ValueError("inverse_transform is not supported because transform impute "
                                     "the unknown category -1 when encode %s"%(col,))

While BaseN uses:

for col in self.cols:
    if any(X[col] == -1):
        raise ValueError("inverse_transform is not supported because transform impute "
                                     "the unknown category -1 when encode %s"%(col,))

Note the difference between X[col] == 0 and X[col] == -1.

My hypothesis is that there should be -1 everywhere. 0 is merely a historical artifact from the time when 0 used to be used to mark unknown categories. Am I right?

@wdm0006
Copy link
Collaborator

wdm0006 commented Sep 14, 2018

Ah yes, should be -1 everywhere, 0 is legacy and causes issues. Should switch it to -1.

janmotl added a commit that referenced this issue Sep 15, 2018
@janmotl
Copy link
Collaborator Author

janmotl commented Sep 16, 2018

Resolved with commit 037c7cd

@janmotl janmotl closed this as completed Sep 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants