Inconsistent return type for downsampling on resample of empty DataFrame #14962

rwarren · 2016-12-22T17:55:07Z

Code Sample, a copy-pastable example if possible

>>> df1 = pd.DataFrame(dict(a=range(100)), index=pd.date_range('1/1/2000', periods=100, freq="M"))
>>> df2 = df1[df1.a < 0]
>>> df1.shape
(100, 1)
>>> df2.shape
(0, 1)
>>> df2.empty
True
>>> type(df1.resample("Q").size())
<class 'pandas.core.series.Series'>
>>> type(df2.resample("Q").size())
<class 'pandas.core.frame.DataFrame'>

Problem description

Code that is resampling a DataFrame that also has some filtering should be able to expect consistent types. My specific case of this causing a problem was that the size() output was being forwarded on to other code using Series.tolist(), which obviously fails when the output is a DataFrame.

Expected Output

Expectation is that .size() should always return a Series.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_CA.utf8
LOCALE: en_CA.UTF-8

pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-12-22T23:36:53Z

we do this correctly for groupby, so will mark this as a bug.
pull requests welcome!

In [9]: df = pd.DataFrame({'A' : [1,1,2],'B':[1,2,3]})

In [10]: df
Out[10]: 
   A  B
0  1  1
1  1  2
2  2  3

In [11]: df.groupby('A').size()
Out[11]: 
A
1    2
2    1
dtype: int64

In [12]: df[df.A>3].groupby('A').size()
Out[12]: Series([], dtype: int64)

souravsingh · 2016-12-25T17:04:20Z

@jreback I am interested in working on the issue. How do I start?

kamal94 · 2016-12-28T02:15:07Z

@souravsingh I followed this guide provided by the project website.

ghost · 2016-12-31T15:21:26Z

I'm a novice and have traced the DateTimeIndexResampler as follows -
DateTimeIndexResampler->Resampler-> _Groupby->(pandasobject,selectionMixin)

I couldn't identify the size method , besides the one mentioned in groupby class . Help regarding the same would be welcome.

jorisvandenbossche · 2017-01-02T09:01:07Z

Search for size in the tseries/resample.py. You will see that it is added to the Resampler object using _downsample(method='size')

discort · 2017-01-06T12:12:10Z

@jreback @rwarren @jorisvandenbossche is it exactly a bug? There is a test test_resample_empty_dataframe ( pandas/tseries/tests/test_resample.py) where checks that the input empty dataframe should equal the result empty dataframe after resampling.

def test_resample_empty_dataframe(self):
        # GH13212
        index = self.create_series().index[:0]
        f = DataFrame(index=index)

        for freq in ['M', 'D', 'H']:
            # count retains dimensions too
            methods = downsample_methods + ['count']
            for method in methods:
                result = getattr(f.resample(freq), method)()

                expected = f.copy()
                expected.index = f.index._shallow_copy(freq=freq)
                assert_index_equal(result.index, expected.index)
                self.assertEqual(result.index.freq, expected.index.freq)
                assert_frame_equal(result, expected, check_dtype=False)

jreback · 2017-01-06T13:42:19Z

size is not in the downsample_methods :<, so this should be

methods = downsample_methods + upsample_methods (nunique needs to be excluded / separately tested as its series only)

jreback added Bug Difficulty Novice Resample resample method Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 22, 2016

jreback added this to the Next Major Release milestone Dec 22, 2016

discort added a commit to discort/pandas that referenced this issue Jan 9, 2017

BUG pandas-dev#14962

46e9282

discort mentioned this issue Jan 9, 2017

BUG: Inconsistent return type for downsampling on resample of empty DataFrame #15093

Merged

4 tasks

discort added a commit to discort/pandas that referenced this issue Jan 11, 2017

BUG pandas-dev#14962

b717b4e

jreback modified the milestones: 0.20.0, Next Major Release Jan 14, 2017

discort added a commit to discort/pandas that referenced this issue Jan 23, 2017

BUG pandas-dev#14962

c1d7dd6

discort added a commit to discort/pandas that referenced this issue Jan 25, 2017

BUG pandas-dev#14962

2fc473c

discort added a commit to discort/pandas that referenced this issue Feb 13, 2017

BUG pandas-dev#14962

d14d821

discort added a commit to discort/pandas that referenced this issue Feb 16, 2017

BUG pandas-dev#14962

95306d5

jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017

discort added a commit to discort/pandas that referenced this issue Apr 5, 2017

BUG pandas-dev#14962

1cb1d55

discort added a commit to discort/pandas that referenced this issue Jun 12, 2017

BUG pandas-dev#14962

6002d5a

jreback modified the milestones: 0.21.0, Next Major Release Jun 13, 2017

discort added a commit to discort/pandas that referenced this issue Jun 13, 2017

BUG pandas-dev#14962

95d2008

jreback closed this as completed in #15093 Jun 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent return type for downsampling on resample of empty DataFrame #14962

Inconsistent return type for downsampling on resample of empty DataFrame #14962

rwarren commented Dec 22, 2016

INSTALLED VERSIONS

jreback commented Dec 22, 2016 •

edited

Loading

souravsingh commented Dec 25, 2016

kamal94 commented Dec 28, 2016

ghost commented Dec 31, 2016

jorisvandenbossche commented Jan 2, 2017

discort commented Jan 6, 2017 •

edited

Loading

jreback commented Jan 6, 2017

Inconsistent return type for downsampling on resample of empty DataFrame #14962

Inconsistent return type for downsampling on resample of empty DataFrame #14962

Comments

rwarren commented Dec 22, 2016

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Dec 22, 2016 • edited Loading

souravsingh commented Dec 25, 2016

kamal94 commented Dec 28, 2016

ghost commented Dec 31, 2016

jorisvandenbossche commented Jan 2, 2017

discort commented Jan 6, 2017 • edited Loading

jreback commented Jan 6, 2017

Output of `pd.show_versions()`

jreback commented Dec 22, 2016 •

edited

Loading

discort commented Jan 6, 2017 •

edited

Loading