`series.dt.tz_localize()` on Categorical operates on categories, not values #27952

adamhooper · 2019-08-16T17:56:52Z

Code Sample, a copy-pastable example if possible

datetimes = pd.Series(['2019-01-01', '2019-01-01', '2019-01-02'], dtype='datetime64[ns]')
categorical = datetimes.astype('category')
categorical.dt.tz_localize(None)

Produces:

0   2019-01-01
1   2019-01-02
dtype: datetime64[ns]

Problem description

.dt.tz_localize() is operating on categorical.cat.categories. It should be operating on categorical.astype('datetime64[ns]').values. This is just plain wrong.

Expected Output

According to Categorical docs, "The returned Series (or DataFrame) is of the same type as if you used the .str. / .dt. on a Series of that type (and not of type category!).". So I think the expected value to be:

>>> datetimes.dt.tz_localize(None)
0   2019-01-01
1   2019-01-01
2   2019-01-02
dtype: datetime64[ns]

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit : None python : 3.7.2.final.0 python-bits : 64 OS : Linux OS-release : 5.2.8-200.fc30.x86_64 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 0.25.0
numpy : 1.17.0
pytz : 2019.2
dateutil : 2.8.0
pip : 19.0.2
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.3.0
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : 4.7.1
bottleneck : None
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.3.0
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None

The text was updated successfully, but these errors were encountered:

Found a bug, pandas-dev/pandas#27952, which made our module output wrong results when there are lots of duplicates. We work around the wrong Pandas behavior by writing a special code path that avoids the buggy code. [finishes #167858839]

mroeschke · 2019-08-18T00:15:20Z

The line that needs fixing is here:

pandas/pandas/core/indexes/accessors.py

Line 327 in 9f93d57

data = Series(orig.values.categories, name=orig.name, copy=False)

PR's welcome!

jbrockmendel · 2019-09-16T19:56:32Z

@mroeschke are you sure thats the problem? I think that might be a problem, but it looks like _delegate_method might be doing something weird too

mroeschke · 2019-09-16T21:16:32Z

When I tried debugging quickly that's the first spot where the behavior was incorrect. You're probably right in that the behavior is also incorrect up/down stream somewhere

mroeschke added Categorical Categorical Data Type Datetime Datetime data dtype Timezones Timezone data dtype labels Aug 16, 2019

TomAugspurger added this to the Contributions Welcome milestone Aug 20, 2019

MarcoGorelli mentioned this issue Sep 5, 2019

BUG: make tz_localize operate on values rather than categories #28300

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.0 Nov 2, 2019

jreback closed this as completed in #28300 Nov 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`series.dt.tz_localize()` on Categorical operates on categories, not values #27952

`series.dt.tz_localize()` on Categorical operates on categories, not values #27952

adamhooper commented Aug 16, 2019

mroeschke commented Aug 18, 2019

jbrockmendel commented Sep 16, 2019

mroeschke commented Sep 16, 2019

series.dt.tz_localize() on Categorical operates on categories, not values #27952

series.dt.tz_localize() on Categorical operates on categories, not values #27952

Comments

adamhooper commented Aug 16, 2019

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

mroeschke commented Aug 18, 2019

jbrockmendel commented Sep 16, 2019

mroeschke commented Sep 16, 2019

`series.dt.tz_localize()` on Categorical operates on categories, not values #27952

`series.dt.tz_localize()` on Categorical operates on categories, not values #27952

Output of `pd.show_versions()`