Skip to content

ENH: One-hot decoding #34260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
clbarnes opened this issue May 19, 2020 · 2 comments
Closed

ENH: One-hot decoding #34260

clbarnes opened this issue May 19, 2020 · 2 comments

Comments

@clbarnes
Copy link
Contributor

Is your feature request related to a problem?

pd.get_dummies provides a way to turn a sequence of category-like data into a one-hot encoded data frame. However, there is no easy way (to my knowledge) of going in the other direction: given a boolean dataframe where the row sums are all 1, produce a categorical series. This task is particularly valuable for serialisation.

Describe the solution you'd like

Some way of constructing a Categorical array from a one-hot encoded dataframe (view). To avoid piling extra functionality into the existing constructor, a class method could be used.

Scratch implementation:

import numpy as np 
import pandas as pd

class Categorical:
    ...
    
    @classmethod
    def from_dummies(cls, df: pd.DataFrame, **kwargs):
        onehot = df.astype(bool)

        if (onehot.sum(axis=1) > 1).any():
            raise ValueError("Some rows belong to >1 category")

        index_into = pd.Series([np.nan] + list(onehot.columns))
        mult_by = np.arange(1, len(index_into))

        indexes = (onehot.astype(int) * mult_by).sum(axis=1)
        values = index_into[indexes]

        return cls(values, df.columns, **kwargs)

Describe alternatives you've considered

  • A free function (less discoverable, less self-documenting)
  • Importing scikit-learn

Additional context

sklearn.preprocessing.OneHotEncoder.inverse_transform

@clbarnes clbarnes added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels May 19, 2020
@dsaxton
Copy link
Member

dsaxton commented May 19, 2020

Duplicate of #8745 and I think @MarcoGorelli is working on this here #31795

@dsaxton dsaxton removed the Needs Triage Issue that has not been reviewed by a pandas team member label May 19, 2020
@clbarnes
Copy link
Contributor Author

Sorry, my bad! I searched for various combinations of "one hot" and the first couple of pages of "categorical" to no avail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants