Skip to content

BUG: Inconsistent return type for downsampling on resample of empty DataFrame #15093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 13, 2017

Conversation

discort
Copy link
Contributor

@discort discort commented Jan 9, 2017

@jreback jreback changed the title BUG #14962 BUG: Inconsistent return type for downsampling on resample of empty DataFrame Jan 9, 2017
index=pd.date_range('1/1/2000', periods=100, freq="M"))
df2 = df1[df1.a < 0]
result = df2.resample("Q").size()
assertIsInstance(result, pd.Series)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct the anticipated series and use assert_series_equal instead

@jreback
Copy link
Contributor

jreback commented Jan 9, 2017

update this test as well (IOW add size)

def test_resample_empty_dataframe(self):
        # GH13212
        index = self.create_series().index[:0]
        f = DataFrame(index=index)

        for freq in ['M', 'D', 'H']:
            # count retains dimensions too
            methods = downsample_methods + ['count']
            for method in methods:
                result = getattr(f.resample(freq), method)()

                expected = f.copy()
                expected.index = f.index._shallow_copy(freq=freq)
                assert_index_equal(result.index, expected.index)
                self.assertEqual(result.index.freq, expected.index.freq)
                assert_frame_equal(result, expected, check_dtype=False)

@@ -686,6 +686,10 @@ def _downsample(self, how, **kwargs):
if not len(ax):
# reset to the new freq
obj = obj.copy()
if how == "size" and isinstance(obj, pd.DataFrame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a good way of fixing. you will have to trace thru the .size implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback the size method of resampler of the empty dataframe must always return the empty series for each frequency, doesn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the easiest I think is to pass the result thru .wrap_result(..)

@jreback jreback added Bug Resample resample method Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 9, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a whatsnew note, but more importantly needs work on the impl.

@codecov-io
Copy link

codecov-io commented Jan 10, 2017

Codecov Report

Merging #15093 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15093      +/-   ##
==========================================
- Coverage   90.95%   90.92%   -0.03%     
==========================================
  Files         161      161              
  Lines       49276    49282       +6     
==========================================
- Hits        44817    44811       -6     
- Misses       4459     4471      +12
Flag Coverage Δ
#multiple 88.68% <100%> (-0.03%) ⬇️
#single 40.18% <42.85%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/resample.py 96.13% <100%> (+0.03%) ⬆️
pandas/plotting/_converter.py 63.23% <0%> (-1.82%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 344cec7...44f52be. Read the comment docs.

@@ -1563,3 +1563,4 @@ Bug Fixes
- ``PeriodIndex`` can now accept ``list`` and ``array`` which contains ``pd.NaT`` (:issue:`13430`)
- Bug in ``df.groupby`` where ``.median()`` returns arbitrary values if grouped dataframe contains empty bins (:issue:`13629`)
- Bug in ``Index.copy()`` where ``name`` parameter was ignored (:issue:`14302`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moves to 0.20.0

give a user friendly function here, _downsample is internal.

@@ -756,6 +756,15 @@ def _wrap_result(self, result):
# convert if needed
if self.kind == 'period' and not isinstance(result.index, PeriodIndex):
result.index = result.index.to_period(self.freq)

# Make consistent type of result. GH14962
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is all this necessary? should be simpler

@@ -730,7 +730,7 @@ def test_resample_empty_series(self):
assert_series_equal(result, expected, check_dtype=False,
check_names=False)
# this assert will break when fixed
self.assertTrue(result.name is None)
# self.assertTrue(result.name is None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are you doing here?

for method in methods:
result = getattr(f.resample(freq), method)()

expected = f.copy()
expected = pd.Series([])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct, it doesn't preserve things like name.

Copy link
Contributor Author

@discort discort Jan 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback. Could you please explain the name you are talking about? The name of the Series? How could exist it if the Series created after DataFrame resampling?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, you are resampling an empty dataframe here, so the result must be an empty dataframe with the new freq.

its not a Series at all.

@@ -806,6 +806,16 @@ def test_resample_loffset_arg_type(self):
assert_frame_equal(result_agg, expected)
assert_frame_equal(result_how, expected)

def test_resample_empty_dataframe_with_size(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is repeating the above tests

@discort
Copy link
Contributor Author

discort commented Jan 23, 2017

@jreback could you please take a look, that's solution is okay or not?

@@ -1562,4 +1562,4 @@ Bug Fixes
- Bugs in ``stack``, ``get_dummies``, ``make_axis_dummies`` which don't preserve categorical dtypes in (multi)indexes (:issue:`13854`)
- ``PeriodIndex`` can now accept ``list`` and ``array`` which contains ``pd.NaT`` (:issue:`13430`)
- Bug in ``df.groupby`` where ``.median()`` returns arbitrary values if grouped dataframe contains empty bins (:issue:`13629`)
- Bug in ``Index.copy()`` where ``name`` parameter was ignored (:issue:`14302`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this file



- Bug in ``DataFrame.boxplot`` where ``fontsize`` was not applied to the tick labels on both axes (:issue:`15108`)
- Bug in ``resample().size()``. Inconsistent return type on resample of empty DataFrame (:issue:`14962`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just move this in the bug fix section up higher, that way you won't have conflicts

@@ -567,8 +579,13 @@ def f(self, _method=method, *args, **kwargs):
# groupby & aggregate methods
for method in ['count', 'size']:

def f(self, _method=method):
return self._downsample(_method)
if method == "size":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of this is necessary, _wrap_results can handle all of this


# Make consistent type of result. GH14962
if not len(self.ax) and isinstance(result, ABCDataFrame) and \
isinstance(result.index, (DatetimeIndex, TimedeltaIndex)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not necessary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do this logic in _wrap_results, but I again don't think this is ncessary, simply define the size method as it 'seems' special and needs extra handling.

for freq in ['M', 'D', 'H']:
result = f.resample(freq).size()

if isinstance(result.index, PeriodIndex) and freq == 'H':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this any different?

@jreback
Copy link
Contributor

jreback commented Feb 16, 2017

can you rebase / update

@jreback
Copy link
Contributor

jreback commented Apr 3, 2017

can you rebase / update

@discort
Copy link
Contributor Author

discort commented Apr 5, 2017

@jreback

@jreback
Copy link
Contributor

jreback commented Apr 20, 2017

@discort sorry been MIA on this one. can you rebase.

expected.index = f.index._shallow_copy(freq=freq)
assert_index_equal(result.index, expected.index)
self.assertEqual(result.index.freq, expected.index.freq)
assert_frame_equal(result, expected, check_dtype=False)
assert_equal(result, expected, check_dtype=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use assert_equal

@jreback
Copy link
Contributor

jreback commented May 13, 2017

can you rebase

@jreback
Copy link
Contributor

jreback commented Jun 10, 2017

can you rebase and update?

@discort discort force-pushed the fix_14962_bug branch 2 times, most recently from a98536a to f73a3da Compare June 12, 2017 20:08
@discort
Copy link
Contributor Author

discort commented Jun 13, 2017

@jreback

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm minor comments. ping on green.

expected.index = f.index._shallow_copy(freq=freq)
assert_index_equal(result.index, expected.index)
assert result.index.freq == expected.index.freq
assert_frame_equal(result, expected, check_dtype=False)
assert_type_equal(result, expected, check_dtype=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use assert_almost_equal here (it dispatches) rather than rolling your own

@@ -549,6 +550,14 @@ def var(self, ddof=1, *args, **kwargs):
nv.validate_resampler_func('var', args, kwargs)
return self._downsample('var', ddof=ddof)

@Appender(GroupBy.size.__doc__)
def size(self):
# It 'seems' special and needs extra handling. GH14962
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better comment here. say its a special case as higher level does returns a copy of 0-len objects.

@@ -112,6 +112,7 @@ Plotting

Groupby/Resample/Rolling
^^^^^^^^^^^^^^^^^^^^^^^^
- Bug in ``resample().size()``. Inconsistent return type on resample of empty DataFrame (:issue:`14962`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug in DataFrame.resample().size() where an empty DataFrame did not return a Series.

@jreback jreback added this to the 0.21.0 milestone Jun 13, 2017
@discort
Copy link
Contributor Author

discort commented Jun 13, 2017

@jreback

@jreback jreback merged commit d02ef6f into pandas-dev:master Jun 13, 2017
@jreback
Copy link
Contributor

jreback commented Jun 13, 2017

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Resample resample method Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent return type for downsampling on resample of empty DataFrame
3 participants