Pandas 1.0 no longer handles `numpy.str_`s as catgories #31499

flying-sheep · 2020-01-31T16:18:34Z

Code Sample

import pandas as pd
pd.Categorical(['1', '0', '1'], [np.str_('0'), np.str_('1')])

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/angerer/Dev/Python/venvs/env-pandas-1/lib/python3.8/site-packages/pandas/core/arrays/categorical.py", line 385, in __init__
    codes = _get_codes_for_values(values, dtype.categories)
  File "/home/angerer/Dev/Python/venvs/env-pandas-1/lib/python3.8/site-packages/pandas/core/arrays/categorical.py", line 2576, in _get_codes_for_values
    t.map_locations(cats)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1403, in pandas._libs.hashtable.StringHashTable.map_locations
TypeError: Expected unicode, got numpy.str_

Problem description

I know that having a list of numpy.str_s seems weird, but it easily happens when you use non-numpy algorithms on numpy arrays (e.g. natsort.natsorted in our case), or via comprehensions or so:

>>> np.array(['1', '0'])[0].__class__
<class 'numpy.str_'>
>>> [type(s) for s in np.array(['1', '0'])]
[<class 'numpy.str_'>, <class 'numpy.str_'>]

Expected Output

A normal pd.Categorical

Pandas version

pandas 1.0

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-01-31T16:30:41Z

This changed from 0.25.3?

Are you able to pin down what change caused it?

jorisvandenbossche · 2020-01-31T17:03:21Z

Yes, in 0.25.3 this worked.

At least, it gives a categorical with object categories with those numpy strings (but with Series constructor, we also preserve the numpy strings, and don't convert to python strings, so that seems the "expected" behaviour).

jorisvandenbossche · 2020-01-31T17:18:15Z

My guess is that it's related to #30419 which changed get_c_string implementation (which is used in StringHashTable to get the c string from the string object) cc @jbrockmendel

jbrockmendel · 2020-01-31T20:46:49Z

maybe if we're lucky it will be good enough to change L706 in hashtable_class_helper.pxi.in from v = get_c_string(val) to v = get_c_string(<str>val), but this is really a PITA because the previous line is precisely a check for isintance(val, str) which is True for np.str_ objects

flying-sheep · 2020-02-01T17:04:04Z

Yeah, it’s kinda shitty. I think implicit conversation would be better than a hard-to-interpret error here though.

flying-sheep mentioned this issue Jan 31, 2020

Fix test breakages scverse/scanpy#1015

Merged

3 tasks

jorisvandenbossche added Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version labels Jan 31, 2020

jorisvandenbossche added this to the 1.0.1 milestone Jan 31, 2020

jbrockmendel mentioned this issue Feb 1, 2020

REGR: Categorical with np.str_ categories #31528

Merged

5 tasks

TomAugspurger closed this as completed in #31528 Feb 4, 2020

AtomScott mentioned this issue Aug 28, 2020

OmegaConf.create() error: 'str_' is not a supported primitive type omry/omegaconf#344

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas 1.0 no longer handles `numpy.str_`s as catgories #31499

Pandas 1.0 no longer handles `numpy.str_`s as catgories #31499

flying-sheep commented Jan 31, 2020 •

edited

Loading

TomAugspurger commented Jan 31, 2020

jorisvandenbossche commented Jan 31, 2020

jorisvandenbossche commented Jan 31, 2020

jbrockmendel commented Jan 31, 2020

flying-sheep commented Feb 1, 2020

Pandas 1.0 no longer handles numpy.str_s as catgories #31499

Pandas 1.0 no longer handles numpy.str_s as catgories #31499

Comments

flying-sheep commented Jan 31, 2020 • edited Loading

Code Sample

Problem description

Expected Output

Pandas version

TomAugspurger commented Jan 31, 2020

jorisvandenbossche commented Jan 31, 2020

jorisvandenbossche commented Jan 31, 2020

jbrockmendel commented Jan 31, 2020

flying-sheep commented Feb 1, 2020

Pandas 1.0 no longer handles `numpy.str_`s as catgories #31499

Pandas 1.0 no longer handles `numpy.str_`s as catgories #31499

flying-sheep commented Jan 31, 2020 •

edited

Loading