-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: make tz_localize operate on values rather than categories #28300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: make tz_localize operate on values rather than categories #28300
Conversation
pandas/core/indexes/accessors.py
Outdated
@@ -324,7 +324,9 @@ def __new__(cls, data): | |||
|
|||
orig = data if is_categorical_dtype(data) else None | |||
if orig is not None: | |||
data = Series(orig.values.categories, name=orig.name, copy=False) | |||
data = Series( | |||
orig.values.astype("datetime64[ns]"), name=orig.name, copy=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the astype? what if we get here with e.g. timedelta64 or Period?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it was just a quick attempt based on what I'd read in the issue (but without having fully understood the issue), sorry for having put this up so soon, will work on it further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No sweat. ping me when this is ready for review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel thanks - I'm happy to have this reviewed, whenever you have time
pandas/core/indexes/accessors.py
Outdated
@@ -324,8 +321,7 @@ def __new__(cls, data): | |||
|
|||
orig = data if is_categorical_dtype(data) else None | |||
if orig is not None: | |||
data = Series(orig.values.categories, name=orig.name, copy=False) | |||
|
|||
data = Series(orig.values.to_list(), name=orig.name, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we don't want to convert to a list, I think rather we should still operate on the categories
and just return a new Categorical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand - is data
supposed to be a pandas.Categorical
? If so, then it won't pass the
if not isinstance(data, ABCSeries):
raise TypeError(
"cannot convert an object of type {0} to a "
"datetimelike index".format(type(data))
)
test when we get to
if is_datetime64_dtype(data.dtype):
return DatetimeProperties(data, orig)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is data supposed to be a pandas.Categorical?
At this point data
is a Series with CategoricalDtype.
I think @jreback's point is that we don't want to call to_list()
because that is costly
@@ -373,6 +373,17 @@ def test_tz_localize_convert_copy_inplace_mutate(self, copy, method, tz): | |||
) | |||
tm.assert_series_equal(result, expected) | |||
|
|||
def test_tz_localize_categorical(self, tz_aware_fixture): | |||
# GH 27952 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be in the test_categorical tests
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -88,7 +88,7 @@ Categorical | |||
^^^^^^^^^^^ | |||
|
|||
- Added test to assert the :func:`fillna` raises the correct ValueError message when the value isn't a value from categories (:issue:`13628`) | |||
- | |||
- Bug in :meth:`CombinedDatetimelikeProperties.__new__` and :meth:`Properties._delegate_property_get` was resulting in properties being applied to categories rather than values (:issue:`27952`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not clear to a user what you are actually doing, try to use a little more obvious language, like
using date accessors on a categorical dtyped series of datetimes was incorrectly returning a ....
this also needs a bit more general testing, try to add a few more accessor functions that you are testing. |
03fd748
to
22002e6
Compare
Thanks @jbrockmendel and @jreback for your comments. I've moved the location of the tests to indexes/test_category and have removed the I can't see how to keep |
pandas/core/indexes/accessors.py
Outdated
@@ -324,7 +321,12 @@ def __new__(cls, data): | |||
|
|||
orig = data if is_categorical_dtype(data) else None | |||
if orig is not None: | |||
data = Series(orig.values.categories, name=orig.name, copy=False) | |||
data = Series( | |||
orig.values.__array__(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont use __array__()
directly. this is meant to be used as np.asarray(orig.values)
expected = datetimes.dt.tz_localize(tz) | ||
tm.assert_series_equal(result, expected) | ||
|
||
@pytest.mark.parametrize("accessor", [("year"), ("month"), ("day")]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parens around year/month/day look unnecessary
categorical = datetimes.astype("category") | ||
result = categorical.dt.tz_localize(tz) | ||
expected = datetimes.dt.tz_localize(tz) | ||
tm.assert_series_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do the same with tz_convert (starting from a tz-aware dtype)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the result here be datetime64tz or categorical[datetime64tz]? My initial intuition is that it would stay categorical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel thanks for your feedback! I've made the changes you suggested.
Regarding the result dtype - the example from #27952 returns
dtype: datetime64[ns]
and so this isn't affected by my changes. Perhaps it should be a separate issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the docs say
Note The returned Series (or DataFrame) is of the same type as if you used the .str. / .dt. on a Series of that type (and not of type category!).
so this is consistent with that - but I'm happy to work on changing this if that's what you prefer
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -110,7 +110,7 @@ Categorical | |||
^^^^^^^^^^^ | |||
|
|||
- Added test to assert the :func:`fillna` raises the correct ValueError message when the value isn't a value from categories (:issue:`13628`) | |||
- | |||
- Using date accessors on a categorical dtyped series of datetimes was incorrectly operating only on the unique categories (:issue: `27952`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mention Series.dt.tz_localize
here as an example
tm.assert_series_equal(result, expected) | ||
|
||
def test_dt_tz_convert(self, tz_aware_fixture): | ||
# GH 27952 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think these should go in pandas/tests/series/tests_datetime_values.py
e5b8208
to
e5545a7
Compare
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -135,7 +135,9 @@ Categorical | |||
|
|||
- Added test to assert the :func:`fillna` raises the correct ValueError message when the value isn't a value from categories (:issue:`13628`) | |||
- Bug in :meth:`Categorical.astype` where ``NaN`` values were handled incorrectly when casting to int (:issue:`28406`) | |||
- | |||
- Using date accessors on a categorical dtyped series of datetimes was not returning a Series (or DataFrame) of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a hard time following this. Is the "not returning a Series / DataFrame" the same issue as the "skipping duplicates?"
If they're distinct issue, break it into multiple lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger the only issue is that it's acting on the unique categories rather than on the values. I've tried rephrasing this more clearly
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -165,7 +165,9 @@ Categorical | |||
|
|||
- Added test to assert the :func:`fillna` raises the correct ValueError message when the value isn't a value from categories (:issue:`13628`) | |||
- Bug in :meth:`Categorical.astype` where ``NaN`` values were handled incorrectly when casting to int (:issue:`28406`) | |||
- | |||
- Using date accessors on a categorical dtyped :class:`Series` of datetimes was not returning the same object that would |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These :meth:
aren't correct, since they don't resolve to a real method.
How about
- Using date accessors on a categorical dtyped :class:`Series` of datetimes was not returning the same object that would | |
- Bug in :class:`Series.str` and :class:`Series.dt` accessor for categorical-dtype returning results with the incorrect length (:issue:`27952`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - I've rephrased the first part as you've suggested, but have kept the Series.dt.tz_localize
example as suggested by @jreback
5d903a0
to
4b0c2a8
Compare
@TomAugspurger @jbrockmendel @jreback thanks for your reviews. Now that #28762 has been closed, I've been able to push a solution to this which doesn't involve Is this something you're still interested in? Should the return type (and thus the documentation) be changed as well? If not, no worries, thanks for the time you've already dedicated to helping me, feel free to close it :) |
sure this still makes sense |
@MarcoGorelli can you rebase and we'll take a fresh look and try to get this through |
4b0c2a8
to
4d0320c
Compare
Sure, done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. some stylistic commments, ping on green.
pandas/core/indexes/accessors.py
Outdated
@@ -324,7 +321,12 @@ def __new__(cls, data): | |||
|
|||
orig = data if is_categorical_dtype(data) else None | |||
if orig is not None: | |||
data = Series(orig.values.categories, name=orig.name, copy=False) | |||
data = Series( | |||
orig.values, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this orig.array (doesn't make a practical difference), but is more idiomatic to our usecase here.
@@ -344,6 +344,39 @@ def test_dt_namespace_accessor_categorical(self): | |||
expected = Series([2017, 2017, 2018, 2018], name="foo") | |||
tm.assert_series_equal(result, expected) | |||
|
|||
def test_dt_tz_localize(self, tz_aware_fixture): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add categorical to these test names to indicate what is going onhere.
@jreback Sure, here you go |
thanks @MarcoGorelli nice patch! |
series.dt.tz_localize()
on Categorical operates on categories, not values #27952black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff