-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Inconsistent return type for downsampling on resample of empty DataFrame #15093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
index=pd.date_range('1/1/2000', periods=100, freq="M")) | ||
df2 = df1[df1.a < 0] | ||
result = df2.resample("Q").size() | ||
assertIsInstance(result, pd.Series) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
construct the anticipated series and use assert_series_equal
instead
update this test as well (IOW add size)
|
pandas/tseries/resample.py
Outdated
@@ -686,6 +686,10 @@ def _downsample(self, how, **kwargs): | |||
if not len(ax): | |||
# reset to the new freq | |||
obj = obj.copy() | |||
if how == "size" and isinstance(obj, pd.DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a good way of fixing. you will have to trace thru the .size
implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback the size
method of resampler of the empty dataframe must always return the empty series for each frequency, doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the easiest I think is to pass the result thru .wrap_result(..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a whatsnew note, but more importantly needs work on the impl.
Codecov Report
@@ Coverage Diff @@
## master #15093 +/- ##
==========================================
- Coverage 90.95% 90.92% -0.03%
==========================================
Files 161 161
Lines 49276 49282 +6
==========================================
- Hits 44817 44811 -6
- Misses 4459 4471 +12
Continue to review full report at Codecov.
|
46e9282
to
c41d339
Compare
doc/source/whatsnew/v0.19.0.txt
Outdated
@@ -1563,3 +1563,4 @@ Bug Fixes | |||
- ``PeriodIndex`` can now accept ``list`` and ``array`` which contains ``pd.NaT`` (:issue:`13430`) | |||
- Bug in ``df.groupby`` where ``.median()`` returns arbitrary values if grouped dataframe contains empty bins (:issue:`13629`) | |||
- Bug in ``Index.copy()`` where ``name`` parameter was ignored (:issue:`14302`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moves to 0.20.0
give a user friendly function here, _downsample
is internal.
pandas/tseries/resample.py
Outdated
@@ -756,6 +756,15 @@ def _wrap_result(self, result): | |||
# convert if needed | |||
if self.kind == 'period' and not isinstance(result.index, PeriodIndex): | |||
result.index = result.index.to_period(self.freq) | |||
|
|||
# Make consistent type of result. GH14962 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is all this necessary? should be simpler
@@ -730,7 +730,7 @@ def test_resample_empty_series(self): | |||
assert_series_equal(result, expected, check_dtype=False, | |||
check_names=False) | |||
# this assert will break when fixed | |||
self.assertTrue(result.name is None) | |||
# self.assertTrue(result.name is None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are you doing here?
for method in methods: | ||
result = getattr(f.resample(freq), method)() | ||
|
||
expected = f.copy() | ||
expected = pd.Series([]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not correct, it doesn't preserve things like name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback. Could you please explain the name you are talking about? The name of the Series
? How could exist it if the Series
created after DataFrame resampling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, you are resampling an empty dataframe here, so the result must be an empty dataframe with the new freq.
its not a Series at all.
@@ -806,6 +806,16 @@ def test_resample_loffset_arg_type(self): | |||
assert_frame_equal(result_agg, expected) | |||
assert_frame_equal(result_how, expected) | |||
|
|||
def test_resample_empty_dataframe_with_size(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is repeating the above tests
c41d339
to
427234f
Compare
@jreback could you please take a look, that's solution is okay or not? |
doc/source/whatsnew/v0.19.0.txt
Outdated
@@ -1562,4 +1562,4 @@ Bug Fixes | |||
- Bugs in ``stack``, ``get_dummies``, ``make_axis_dummies`` which don't preserve categorical dtypes in (multi)indexes (:issue:`13854`) | |||
- ``PeriodIndex`` can now accept ``list`` and ``array`` which contains ``pd.NaT`` (:issue:`13430`) | |||
- Bug in ``df.groupby`` where ``.median()`` returns arbitrary values if grouped dataframe contains empty bins (:issue:`13629`) | |||
- Bug in ``Index.copy()`` where ``name`` parameter was ignored (:issue:`14302`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this file
doc/source/whatsnew/v0.20.0.txt
Outdated
|
||
|
||
- Bug in ``DataFrame.boxplot`` where ``fontsize`` was not applied to the tick labels on both axes (:issue:`15108`) | ||
- Bug in ``resample().size()``. Inconsistent return type on resample of empty DataFrame (:issue:`14962`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just move this in the bug fix section up higher, that way you won't have conflicts
pandas/tseries/resample.py
Outdated
@@ -567,8 +579,13 @@ def f(self, _method=method, *args, **kwargs): | |||
# groupby & aggregate methods | |||
for method in ['count', 'size']: | |||
|
|||
def f(self, _method=method): | |||
return self._downsample(_method) | |||
if method == "size": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think any of this is necessary, _wrap_results
can handle all of this
pandas/tseries/resample.py
Outdated
|
||
# Make consistent type of result. GH14962 | ||
if not len(self.ax) and isinstance(result, ABCDataFrame) and \ | ||
isinstance(result.index, (DatetimeIndex, TimedeltaIndex)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do this logic in _wrap_results
, but I again don't think this is ncessary, simply define the size
method as it 'seems' special and needs extra handling.
for freq in ['M', 'D', 'H']: | ||
result = f.resample(freq).size() | ||
|
||
if isinstance(result.index, PeriodIndex) and freq == 'H': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this any different?
427234f
to
986f7c2
Compare
986f7c2
to
766a77f
Compare
can you rebase / update |
766a77f
to
e5ab684
Compare
can you rebase / update |
@discort sorry been MIA on this one. can you rebase. |
expected.index = f.index._shallow_copy(freq=freq) | ||
assert_index_equal(result.index, expected.index) | ||
self.assertEqual(result.index.freq, expected.index.freq) | ||
assert_frame_equal(result, expected, check_dtype=False) | ||
assert_equal(result, expected, check_dtype=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't use assert_equal
can you rebase |
can you rebase and update? |
a98536a
to
f73a3da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm minor comments. ping on green.
pandas/tests/test_resample.py
Outdated
expected.index = f.index._shallow_copy(freq=freq) | ||
assert_index_equal(result.index, expected.index) | ||
assert result.index.freq == expected.index.freq | ||
assert_frame_equal(result, expected, check_dtype=False) | ||
assert_type_equal(result, expected, check_dtype=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use assert_almost_equal
here (it dispatches) rather than rolling your own
pandas/core/resample.py
Outdated
@@ -549,6 +550,14 @@ def var(self, ddof=1, *args, **kwargs): | |||
nv.validate_resampler_func('var', args, kwargs) | |||
return self._downsample('var', ddof=ddof) | |||
|
|||
@Appender(GroupBy.size.__doc__) | |||
def size(self): | |||
# It 'seems' special and needs extra handling. GH14962 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better comment here. say its a special case as higher level does returns a copy of 0-len objects.
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -112,6 +112,7 @@ Plotting | |||
|
|||
Groupby/Resample/Rolling | |||
^^^^^^^^^^^^^^^^^^^^^^^^ | |||
- Bug in ``resample().size()``. Inconsistent return type on resample of empty DataFrame (:issue:`14962`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug in DataFrame.resample().size()
where an empty DataFrame did not return a Series.
thanks! |
git diff upstream/master | flake8 --diff