Pandas 0.18.1 df.groupby().count() throws "ValueError: Buffer has wrong number of dimensions" when one of the counted columns has dtype datetime64 #13393

jfries · 2016-06-07T22:11:53Z

I've confirmed the error does not occur in Pandas 0.16 on a similar machine.

Code Sample, a copy-pastable example if possible

import pandas as pd
df = pd.DataFrame({'x': ['a', 'a', 'b'],
                   'y': [pd.Timestamp('2016-05-07 20:09:25+00:00'), pd.Timestamp('2016-05-07 20:09:29+00:00'), pd.Timestamp('2016-05-07 20:09:29+00:00')]})
df.groupby('x').count()

Observed output

ValueError                                Traceback (most recent call last)
<ipython-input-5-3119045de5b1> in <module>()
      2 df = pd.DataFrame({'x': ['a', 'a', 'b'],
      3                    'y': [pd.Timestamp('2016-05-07 20:09:25+00:00'), pd.Timestamp('2016-05-07 20:09:29+00:00'), pd.Timestamp('2016-05-07 20:09:29+00:00')]})
----> 4 print df.groupby('x').count()

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in count(self)
   3754         blk = map(make_block, map(counter, val), loc)
   3755 
-> 3756         return self._wrap_agged_blocks(data.items, list(blk))
   3757 
   3758 

pandas/lib.pyx in pandas.lib.count_level_2d (pandas/lib.c:23068)()

ValueError: Buffer has wrong number of dimensions (expected 2, got 1)

Expected Output

x  y
a  2
b  1

output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-32-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.1
pip: 8.1.2
setuptools: 21.2.1
Cython: None
numpy: 1.11.0
scipy: 0.16.0
statsmodels: 0.5.0
xarray: None
IPython: 4.2.0
sphinx: None
patsy: 0.2.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.5.1
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: 0.5.8
lxml: None
bs4: 4.2.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: 2.5.3 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.36.0
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-06-07T22:15:18Z

Yeah this is an issue, I thought it was a duplicate, but can't seem to find an open one.

its a bit to fix, but pull-requests welcome.

You can do this.

In [4]: df.groupby('x').y.count()
Out[4]: 
x
a    2
b    1
Name: y, dtype: int64

jreback · 2016-06-07T22:16:35Z

as an FYI, pls paste code as markdown. I edited the above.

jfries · 2016-06-07T22:28:48Z

sorry about the lack of markdown, will use going forward.
Agreed that your workaround is correct, but I'm pretty sure that a lot of existing pandas code uses the idiom in my example, as it's common in a lot of tutorials & example code.

jreback · 2016-06-07T22:38:35Z

@jfries sure, and that's why its marked as a bug.

alan-wong · 2016-06-29T09:03:27Z

This seems related to why the following no longer works in 0.18.1:

df = pd.DataFrame({'a':list('abssbab')}) df.groupby('a').count()

this also used to work

jorisvandenbossche · 2016-06-29T12:38:33Z

@alan-wong I don't think that is related. In your example, there are no columns left to count the values, so I think it is correct that it does not work. But you are right the behaviour did change. Previously it returned an empty dataframe, now it does give an error.
And the empty dataframe seems more correct, @alan-wong do you want to open an new issue about that?

alan-wong · 2016-06-29T12:50:45Z

@jorisvandenbossche I get an empty dataframe on version 0.18.1 not sure what version I was running when I posted this answer: http://stackoverflow.com/questions/22391433/count-the-frequency-that-a-value-occurs-in-a-dataframe-column/22391554#22391554 but it used to work it does raise an error if you pass as_index=False arg to the groupby call

jorisvandenbossche · 2016-06-29T13:16:57Z

@alan-wong Ah, indeed, on 0.18.1 it does work correctly, but on master not anymore. So that indeed looks like a bug. Do you want to open a new issue?

alan-wong · 2016-06-29T13:51:07Z

@jorisvandenbossche but is it correct behaviour that it's empty? Are you saying that in older versions it shouldn't have worked? I'll post an issue

alan-wong · 2016-06-29T13:58:15Z

@jorisvandenbossche posted issue: #13530

jreback added Bug Groupby Timezones Timezone data dtype Difficulty Intermediate labels Jun 7, 2016

jreback added this to the Next Major Release milestone Jun 7, 2016

alan-wong mentioned this issue Jun 29, 2016

groupby on single col df results in ValueError on Master, works on 0.18.1 #13530

Closed

jreback mentioned this issue Mar 8, 2017

Pandas crash with : ValueError: Buffer has wrong number of dimensions (expected 2, got 1) #15612

Closed

jreback modified the milestones: Next Minor Release, Next Major Release Apr 3, 2017

watercrossing mentioned this issue Nov 8, 2017

Fix groupby().count() for datetime columns #18167

Merged

4 tasks

jreback modified the milestones: Interesting Issues, 0.21.1 Nov 8, 2017

jreback closed this as completed in #18167 Nov 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas 0.18.1 df.groupby().count() throws "ValueError: Buffer has wrong number of dimensions" when one of the counted columns has dtype datetime64 #13393

Pandas 0.18.1 df.groupby().count() throws "ValueError: Buffer has wrong number of dimensions" when one of the counted columns has dtype datetime64 #13393

jfries commented Jun 7, 2016 •

edited by jreback

Loading

jreback commented Jun 7, 2016 •

edited

Loading

jreback commented Jun 7, 2016

jfries commented Jun 7, 2016

jreback commented Jun 7, 2016

alan-wong commented Jun 29, 2016 •

edited

Loading

jorisvandenbossche commented Jun 29, 2016

alan-wong commented Jun 29, 2016

jorisvandenbossche commented Jun 29, 2016

alan-wong commented Jun 29, 2016

alan-wong commented Jun 29, 2016

Pandas 0.18.1 df.groupby().count() throws "ValueError: Buffer has wrong number of dimensions" when one of the counted columns has dtype datetime64 #13393

Pandas 0.18.1 df.groupby().count() throws "ValueError: Buffer has wrong number of dimensions" when one of the counted columns has dtype datetime64 #13393

Comments

jfries commented Jun 7, 2016 • edited by jreback Loading

Code Sample, a copy-pastable example if possible

Observed output

Expected Output

output of pd.show_versions()

jreback commented Jun 7, 2016 • edited Loading

jreback commented Jun 7, 2016

jfries commented Jun 7, 2016

jreback commented Jun 7, 2016

alan-wong commented Jun 29, 2016 • edited Loading

jorisvandenbossche commented Jun 29, 2016

alan-wong commented Jun 29, 2016

jorisvandenbossche commented Jun 29, 2016

alan-wong commented Jun 29, 2016

alan-wong commented Jun 29, 2016

jfries commented Jun 7, 2016 •

edited by jreback

Loading

output of `pd.show_versions()`

jreback commented Jun 7, 2016 •

edited

Loading

alan-wong commented Jun 29, 2016 •

edited

Loading