-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Support EAs in Series.unstack #23284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 13 commits
ced299f
3b63fcb
756dde9
90f84ef
942db1b
36a4450
ee330d6
2fcaf4d
4f46364
e9498a1
72b5a0d
f6b2050
4d679cb
ff7aba7
91587cb
49bdb50
cf8ed73
5902b5b
17d3002
a75806a
2397e89
8ed7c73
b23234c
29a6bb1
19b7cfa
254fe52
2d78d42
a9e6263
ca286f7
2f28638
967c674
f6aa4b9
32bc3de
56e5f2f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -102,7 +102,7 @@ def copy(self, deep=False): | |
def astype(self, dtype, copy=True): | ||
if isinstance(dtype, type(self.dtype)): | ||
return type(self)(self._data, context=dtype.context) | ||
return super(DecimalArray, self).astype(dtype, copy) | ||
return np.asarray(self, dtype=dtype) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OT: how much testing do we have on this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Which bit specifically? We have a base test for I'm not actually sure why I changed this... |
||
|
||
def __setitem__(self, key, value): | ||
if pd.api.types.is_list_like(value): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
import operator | ||
import decimal | ||
import math | ||
import operator | ||
|
||
import numpy as np | ||
import pandas as pd | ||
|
@@ -63,9 +64,23 @@ def data_for_grouping(): | |
class BaseDecimal(object): | ||
|
||
def assert_series_equal(self, left, right, *args, **kwargs): | ||
|
||
left_na = left.isna() | ||
right_na = right.isna() | ||
def convert(x): | ||
# need to convert array([Decimal(NaN)], dtype='object') to np.NaN | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm, we could add this support I think though, create an issue? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right now we have a pretty clear answer of "decimal isn't supported". I'm hesitant to even partially support it :) |
||
# because Series[object].isnan doesn't recognize decimal(NaN) as | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. where does this come up now? e.g. whats an example There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In [21]: import pandas as pd
In [22]: import pandas.util.testing as tm
In [23]: from pandas.tests.extension.decimal import to_decimal
In [24]: ser = pd.Series(to_decimal(['1.0', 'NaN']))
In [25]: tm.assert_series_equal(ser.astype(object), ser.astype(object))
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-25-5356adf50e72> in <module>
----> 1 tm.assert_series_equal(ser.astype(object), ser.astype(object))
~/sandbox/pandas/pandas/util/testing.py in assert_series_equal(left, right, check_dtype, check_index_type, check_series_type, check_less_precise, check_names, check_exact, check_datetimelike_compat, check_categorical, obj)
1293 check_less_precise=check_less_precise,
1294 check_dtype=check_dtype,
-> 1295 obj='{obj}'.format(obj=obj))
1296
1297 # metadata comparison
~/sandbox/pandas/pandas/_libs/testing.pyx in pandas._libs.testing.assert_almost_equal()
64
65
---> 66 cpdef assert_almost_equal(a, b,
67 check_less_precise=False,
68 bint check_dtype=True,
~/sandbox/pandas/pandas/_libs/testing.pyx in pandas._libs.testing.assert_almost_equal()
178 msg = '{0} values are different ({1} %)'.format(
179 obj, np.round(diff * 100.0 / na, 5))
--> 180 raise_assert_detail(obj, msg, lobj, robj)
181
182 return True
~/sandbox/pandas/pandas/util/testing.py in raise_assert_detail(obj, message, left, right, diff)
1080 msg += "\n[diff]: {diff}".format(diff=diff)
1081
-> 1082 raise AssertionError(msg)
1083
1084
AssertionError: Series are different
Series values are different (50.0 %)
[left]: [1.0, NaN]
[right]: [1.0, NaN] We do the astype(object) to build the expected. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is a bug, and would solve the aboe i think.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger actually, can you explain why the current code in master is not working fine? Why do you need to convert to object? Because before, there was already the calls to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is my point. I think there is a bug somewhere here, e.g. isna is maybe not dispatching to the EA? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So it's all about creating the expected result. When we go to do the final assert that the values match, we do result = result.astype(object)
self.assert_frame_equal(result, expected) but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, OK, I somehow thought you were doing the For me it is fine to keep this hack in here for now. In the end that is somehow the purpose of using the class instances for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the
on the these should be converted to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. object-dtype can store anything, including decimal objects. It'd be strange to only convert There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, this seems correct to me. Going to merge, if we need to further discuss this, we can do that in another issue (it's not really related anymore with actually fixing unstack) |
||
# NA. | ||
try: | ||
return math.isnan(x) | ||
except TypeError: | ||
return False | ||
|
||
if left.dtype == 'object': | ||
left_na = left.apply(convert) | ||
else: | ||
left_na = left.isna() | ||
if right.dtype == 'object': | ||
right_na = right.apply(convert) | ||
else: | ||
right_na = right.isna() | ||
|
||
tm.assert_series_equal(left_na, right_na) | ||
return tm.assert_series_equal(left[~left_na], | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really? what does this change for Categorical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously
Series[Categorical].unstack()
returnedDataFrame[object
].Now it'll be a
DataFrame[Categorical]
, i.e.unstack()
preserves the CategoricalDtype.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I forget. Previously, we went internally went
Categorical -> object -> Categorical
. Now we avoid the conversion to categorical.So the changes from 0.23.4 will be
Onces DatetimeTZ is an ExtensionArray, then we'll presumably preserve that as well. On 0.23.4, we convert to datetime64ns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this might be need a larger note then