Skip to content

Pandas 0.18.1 df.groupby().count() throws "ValueError: Buffer has wrong number of dimensions" when one of the counted columns has dtype datetime64 #13393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jfries opened this issue Jun 7, 2016 · 10 comments · Fixed by #18167
Labels
Bug Groupby Timezones Timezone data dtype
Milestone

Comments

@jfries
Copy link

jfries commented Jun 7, 2016

I've confirmed the error does not occur in Pandas 0.16 on a similar machine.

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame({'x': ['a', 'a', 'b'],
                   'y': [pd.Timestamp('2016-05-07 20:09:25+00:00'), pd.Timestamp('2016-05-07 20:09:29+00:00'), pd.Timestamp('2016-05-07 20:09:29+00:00')]})
df.groupby('x').count()

Observed output


ValueError                                Traceback (most recent call last)
<ipython-input-5-3119045de5b1> in <module>()
      2 df = pd.DataFrame({'x': ['a', 'a', 'b'],
      3                    'y': [pd.Timestamp('2016-05-07 20:09:25+00:00'), pd.Timestamp('2016-05-07 20:09:29+00:00'), pd.Timestamp('2016-05-07 20:09:29+00:00')]})
----> 4 print df.groupby('x').count()

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in count(self)
   3754         blk = map(make_block, map(counter, val), loc)
   3755 
-> 3756         return self._wrap_agged_blocks(data.items, list(blk))
   3757 
   3758 

pandas/lib.pyx in pandas.lib.count_level_2d (pandas/lib.c:23068)()

ValueError: Buffer has wrong number of dimensions (expected 2, got 1)

Expected Output

x  y
a  2
b  1

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-32-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.1
pip: 8.1.2
setuptools: 21.2.1
Cython: None
numpy: 1.11.0
scipy: 0.16.0
statsmodels: 0.5.0
xarray: None
IPython: 4.2.0
sphinx: None
patsy: 0.2.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.5.1
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: 0.5.8
lxml: None
bs4: 4.2.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: 2.5.3 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.36.0
pandas_datareader: None
@jreback
Copy link
Contributor

jreback commented Jun 7, 2016

Yeah this is an issue, I thought it was a duplicate, but can't seem to find an open one.

its a bit to fix, but pull-requests welcome.

You can do this.

In [4]: df.groupby('x').y.count()
Out[4]: 
x
a    2
b    1
Name: y, dtype: int64

@jreback jreback added this to the Next Major Release milestone Jun 7, 2016
@jreback
Copy link
Contributor

jreback commented Jun 7, 2016

as an FYI, pls paste code as markdown. I edited the above.

@jfries
Copy link
Author

jfries commented Jun 7, 2016

sorry about the lack of markdown, will use going forward.
Agreed that your workaround is correct, but I'm pretty sure that a lot of existing pandas code uses the idiom in my example, as it's common in a lot of tutorials & example code.

@jreback
Copy link
Contributor

jreback commented Jun 7, 2016

@jfries sure, and that's why its marked as a bug.

@alan-wong
Copy link

alan-wong commented Jun 29, 2016

This seems related to why the following no longer works in 0.18.1:

df = pd.DataFrame({'a':list('abssbab')}) df.groupby('a').count()

this also used to work

@jorisvandenbossche
Copy link
Member

@alan-wong I don't think that is related. In your example, there are no columns left to count the values, so I think it is correct that it does not work. But you are right the behaviour did change. Previously it returned an empty dataframe, now it does give an error.
And the empty dataframe seems more correct, @alan-wong do you want to open an new issue about that?

@alan-wong
Copy link

@jorisvandenbossche I get an empty dataframe on version 0.18.1 not sure what version I was running when I posted this answer: http://stackoverflow.com/questions/22391433/count-the-frequency-that-a-value-occurs-in-a-dataframe-column/22391554#22391554 but it used to work it does raise an error if you pass as_index=False arg to the groupby call

@jorisvandenbossche
Copy link
Member

@alan-wong Ah, indeed, on 0.18.1 it does work correctly, but on master not anymore. So that indeed looks like a bug. Do you want to open a new issue?

@alan-wong
Copy link

@jorisvandenbossche but is it correct behaviour that it's empty? Are you saying that in older versions it shouldn't have worked? I'll post an issue

@alan-wong
Copy link

@jorisvandenbossche posted issue: #13530

@jreback jreback modified the milestones: Next Minor Release, Next Major Release Apr 3, 2017
@jreback jreback modified the milestones: Interesting Issues, 0.21.1 Nov 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants