Skip to content

Inconsistent return type for downsampling on resample of empty DataFrame #14962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rwarren opened this issue Dec 22, 2016 · 7 comments · Fixed by #15093
Closed

Inconsistent return type for downsampling on resample of empty DataFrame #14962

rwarren opened this issue Dec 22, 2016 · 7 comments · Fixed by #15093
Labels
Bug Resample resample method Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@rwarren
Copy link

rwarren commented Dec 22, 2016

Code Sample, a copy-pastable example if possible

>>> df1 = pd.DataFrame(dict(a=range(100)), index=pd.date_range('1/1/2000', periods=100, freq="M"))
>>> df2 = df1[df1.a < 0]
>>> df1.shape
(100, 1)
>>> df2.shape
(0, 1)
>>> df2.empty
True
>>> type(df1.resample("Q").size())
<class 'pandas.core.series.Series'>
>>> type(df2.resample("Q").size())
<class 'pandas.core.frame.DataFrame'>

Problem description

Code that is resampling a DataFrame that also has some filtering should be able to expect consistent types. My specific case of this causing a problem was that the size() output was being forwarded on to other code using Series.tolist(), which obviously fails when the output is a DataFrame.

Expected Output

Expectation is that .size() should always return a Series.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_CA.utf8
LOCALE: en_CA.UTF-8

pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Dec 22, 2016

we do this correctly for groupby, so will mark this as a bug.
pull requests welcome!

In [9]: df = pd.DataFrame({'A' : [1,1,2],'B':[1,2,3]})

In [10]: df
Out[10]: 
   A  B
0  1  1
1  1  2
2  2  3

In [11]: df.groupby('A').size()
Out[11]: 
A
1    2
2    1
dtype: int64

In [12]: df[df.A>3].groupby('A').size()
Out[12]: Series([], dtype: int64)

@jreback jreback added Bug Difficulty Novice Resample resample method Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 22, 2016
@jreback jreback added this to the Next Major Release milestone Dec 22, 2016
@souravsingh
Copy link

@jreback I am interested in working on the issue. How do I start?

@kamal94
Copy link
Contributor

kamal94 commented Dec 28, 2016

@souravsingh I followed this guide provided by the project website.

@ghost
Copy link

ghost commented Dec 31, 2016

I'm a novice and have traced the DateTimeIndexResampler as follows -
DateTimeIndexResampler->Resampler-> _Groupby->(pandasobject,selectionMixin)

I couldn't identify the size method , besides the one mentioned in groupby class . Help regarding the same would be welcome.

@jorisvandenbossche
Copy link
Member

Search for size in the tseries/resample.py. You will see that it is added to the Resampler object using _downsample(method='size')

@discort
Copy link
Contributor

discort commented Jan 6, 2017

@jreback @rwarren @jorisvandenbossche is it exactly a bug? There is a test test_resample_empty_dataframe ( pandas/tseries/tests/test_resample.py) where checks that the input empty dataframe should equal the result empty dataframe after resampling.

def test_resample_empty_dataframe(self):
        # GH13212
        index = self.create_series().index[:0]
        f = DataFrame(index=index)

        for freq in ['M', 'D', 'H']:
            # count retains dimensions too
            methods = downsample_methods + ['count']
            for method in methods:
                result = getattr(f.resample(freq), method)()

                expected = f.copy()
                expected.index = f.index._shallow_copy(freq=freq)
                assert_index_equal(result.index, expected.index)
                self.assertEqual(result.index.freq, expected.index.freq)
                assert_frame_equal(result, expected, check_dtype=False)

@jreback
Copy link
Contributor

jreback commented Jan 6, 2017

size is not in the downsample_methods :<, so this should be

methods = downsample_methods + upsample_methods (nunique needs to be excluded / separately tested as its series only)

discort added a commit to discort/pandas that referenced this issue Jan 9, 2017
discort added a commit to discort/pandas that referenced this issue Jan 11, 2017
@jreback jreback modified the milestones: 0.20.0, Next Major Release Jan 14, 2017
discort added a commit to discort/pandas that referenced this issue Jan 23, 2017
discort added a commit to discort/pandas that referenced this issue Jan 25, 2017
discort added a commit to discort/pandas that referenced this issue Feb 13, 2017
discort added a commit to discort/pandas that referenced this issue Feb 16, 2017
@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017
discort added a commit to discort/pandas that referenced this issue Apr 5, 2017
discort added a commit to discort/pandas that referenced this issue Jun 12, 2017
@jreback jreback modified the milestones: 0.21.0, Next Major Release Jun 13, 2017
discort added a commit to discort/pandas that referenced this issue Jun 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Resample resample method Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants