Skip to content

Pandas crash with : ValueError: Buffer has wrong number of dimensions (expected 2, got 1) #15612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
24hours opened this issue Mar 8, 2017 · 2 comments
Labels
Bug Duplicate Report Duplicate issue or pull request Groupby Timezones Timezone data dtype

Comments

@24hours
Copy link

24hours commented Mar 8, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd
import datetime
import numpy as np

base = datetime.datetime.today()
date_list = [base - datetime.timedelta(days=x) for x in range(0, 365)]
score_list = list(np.random.randint(low=1, high=1000, size=365))
df = pd.DataFrame()

df['datetime'] = date_list
df['datetime'] = pd.to_datetime(df['datetime'])
df['datetime'] = df['datetime'].astype('datetime64[ns, Asia/Kuala_Lumpur]')
df['score'] = score_list

print(df.groupby([ lambda x: df.loc[x]['datetime'].year ]).count())

Problem description

Pandas crash without obvious reason why.
pandas work as expected with following line removed

df['datetime'] = df['datetime'].astype('datetime64[ns, Asia/Kuala_Lumpur]')

Expected Output

aggregated total count group by year.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-64-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 28.8.0
Cython: 0.25.1
numpy: 1.12.0
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.7.3
lxml: None
bs4: 4.4.1
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: 1.1.4
pymysql: 0.7.9.None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: None
pandas_datareader: None

@jorisvandenbossche
Copy link
Member

Simplified example:

In [68]: df = pd.DataFrame({'datetime': pd.date_range('2012-03-01', periods=365, tz='Asia/Kuala_Lumpur'),
                            'score': np.arange(365)})

In [69]: df.groupby(df['datetime'].dt.year)[['score']].count()
Out[69]: 
          score
datetime       
2012        306
2013         59

In [70]: df.groupby(df['datetime'].dt.year).count()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-70-90b6e370b9fa> in <module>()
----> 1 df.groupby(df['datetime'].dt.year).count()

/home/joris/scipy/pandas/pandas/core/groupby.py in count(self)
   4009         blk = map(make_block, map(counter, val), loc)
   4010 
-> 4011         return self._wrap_agged_blocks(data.items, list(blk))
   4012 
   4013     def nunique(self, dropna=True):

/home/joris/scipy/pandas/pandas/lib.pyx in pandas.lib.count_level_2d (pandas/lib.c:23708)()

ValueError: Buffer has wrong number of dimensions (expected 2, got 1)

@jreback
Copy link
Contributor

jreback commented Mar 8, 2017

this is a dupe of #13393

@jreback jreback closed this as completed Mar 8, 2017
@jreback jreback added Bug Groupby Timezones Timezone data dtype Duplicate Report Duplicate issue or pull request labels Mar 8, 2017
@jreback jreback added this to the No action milestone Mar 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Groupby Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

3 participants