Skip to content

BUG: astype('category', categories=...) failes on a series of categorical type #10696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Jul 29, 2015 · 8 comments · Fixed by #18710
Closed
Labels
Bug Categorical Categorical Data Type
Milestone

Comments

@jorisvandenbossche
Copy link
Member

s.astype('category', categories=['a', 'b', 'c']) fails when the series is already of Categorical dtype:

TypeError: _astype() got an unexpected keyword argument 'categories'

I am not sure if this should work (it would then be equivalent to set_categories?), but in any case the current error message is not informative:

In [49]: s = pd.Series(['a', 'b', 'a'])

In [50]: s
Out[50]:
0    a
1    b
2    a
dtype: object

In [51]: s.astype('category')
Out[51]:
0    a
1    b
2    a
dtype: category
Categories (2, object): [a, b]

In [52]: s.astype('category', categories=['a', 'b', 'c'])
Out[52]:
0    a
1    b
2    a
dtype: category
Categories (3, object): [a, b, c]

In [53]: scat = s.astype('category')

In [54]: scat.astype('category', categories=['a', 'b', 'c'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-54-f955e6286a85> in <module>()
----> 1 scat.astype('category', categories=['a', 'b', 'c'])

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\generic.pyc in astype(self, dty
pe, copy, raise_on_error, **kwargs)
   2415
   2416         mgr = self._data.astype(
-> 2417             dtype=dtype, copy=copy, raise_on_error=raise_on_error, **kwa
rgs)
   2418         return self._constructor(mgr).__finalize__(self)
   2419

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\internals.pyc in astype(self, d
type, **kwargs)
   2516
   2517     def astype(self, dtype, **kwargs):
-> 2518         return self.apply('astype', dtype=dtype, **kwargs)
   2519
   2520     def convert(self, **kwargs):

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\internals.pyc in apply(self, f,
 axes, filter, do_integrity_check, **kwargs)
   2471                                                  copy=align_copy)
   2472
-> 2473             applied = getattr(b, f)(**kwargs)
   2474
   2475             if isinstance(applied, list):

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\internals.pyc in astype(self, d
type, copy, raise_on_error, values, **kwargs)
    371     def astype(self, dtype, copy=False, raise_on_error=True, values=None
, **kwargs):
    372         return self._astype(dtype, copy=copy, raise_on_error=raise_on_er
ror,
--> 373                             values=values, **kwargs)
    374
    375     def _astype(self, dtype, copy=False, raise_on_error=True, values=Non
e,

TypeError: _astype() got an unexpected keyword argument 'categories'

@jorisvandenbossche jorisvandenbossche added Bug Categorical Categorical Data Type labels Jul 29, 2015
@jorisvandenbossche jorisvandenbossche added this to the 0.17.0 milestone Jul 29, 2015
@jreback
Copy link
Contributor

jreback commented Jul 29, 2015

Actually I think can just blow away this entire function _astype, here: https://github.com/pydata/pandas/blob/master/pandas/core/internals.py#L1768

As the top-level is ok for this.

I think an astype to a different astype (with different categories) is ok, though not efficient

@jreback jreback modified the milestones: Next Major Release, 0.17.0 Aug 20, 2015
@jreback jreback modified the milestones: 0.18.1, Next Major Release Mar 12, 2016
@jreback jreback modified the milestones: 0.18.2, 0.18.1 Apr 26, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: Next Major Release, 0.19.0 Aug 21, 2016
has2k1 added a commit to has2k1/plotnine that referenced this issue Jun 20, 2017
The workaround was due to a bug in pandas,
pandas-dev/pandas#10409 that has been fixed.
When that was fixed upstream, the local fix led to another
bug, pandas-dev/pandas#10696!!
@Aylr
Copy link

Aylr commented Oct 31, 2017

For users looking waiting for a fix, I'm using this inefficient hack of changing a category to an object, then immediately back to a category with new levels.:

 X[col] = X[col].astype(object).astype('category', categories=self.categorical_levels[col])

@jreback
Copy link
Contributor

jreback commented Oct 31, 2017

this is fixed in 0.21.0; @Aylr would you like to put up a validation test?

In [21]: scat.astype(pd.api.types.CategoricalDtype(categories=['a', 'b', 'c']))
Out[21]: 
0    a
1    b
2    a
dtype: category
Categories (2, object): [a, b]

the original useage with passing categories should actually show the deprecation warning however

@jreback jreback modified the milestones: Next Major Release, 0.21.1 Oct 31, 2017
@Aylr
Copy link

Aylr commented Nov 4, 2017

@jreback happy to write some tests. Should I create a new issue number? What branch should I submit a PR to?

@jreback
Copy link
Contributor

jreback commented Nov 4, 2017

no need to create a new issue. submit to master (we will then back port to 0.21.1)

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.21.1, 0.22.0 Nov 30, 2017
@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Nov 30, 2017

This is actually not really fixed, as the resulting categories differ from the one specified in the dtype in astype, see #10696 (comment) (the output has cats [a, b], but ['a', 'b', 'c'] was specified).

I am not fully sure what is the best behaviour. But either the output categories should be conformed to the passed categories (like set_categories, I think is a sensible thing to do), or a more helpful error message should be raised.

(anyhow, since it is not a regression and already present for a longer time, removed from 0.21.1 milestone)

@jreback
Copy link
Contributor

jreback commented Nov 30, 2017

@jorisvandenbossche this is not fixed at all

@jorisvandenbossche
Copy link
Member Author

@jorisvandenbossche this is not fixed at all

That's what I am saying?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants