Skip to content

BUG: Kernel crashing when using shift with groupby when there are NaNs in group column #13976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nickderobertis opened this issue Aug 12, 2016 · 4 comments
Labels
Bug Duplicate Report Duplicate issue or pull request Groupby

Comments

@nickderobertis
Copy link

Sometimes when shifting a variable by groups, if there are NaNs in the group column, it crashes my kernel. Sometimes the operation completes successfully, though it crashes over half the time.

Code Sample, a copy-pastable example if possible

from numpy import nan
import pandas as pd
from pandas import Timestamp

df = pd.DataFrame(data = [
    (Timestamp('2003-01-15 00:00:00'), 1, 1),
    (Timestamp('2003-01-15 00:00:00'), nan, nan),
    (Timestamp('2003-02-14 00:00:00'), 1, 2),
    ], columns=['Date','ID','var'])

test.groupby('ID')['var'].shift(1) #crashes kernel sometimes
test.dropna(subset=['ID']).groupby('ID')['var'].shift(1) #does not crash kernel
test.groupby('ID')['var'].apply(lambda x: x) #does not crash kernel

Expected Output

0    NaN
1    NaN
2    1.0
Name: var, dtype: float64

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 24.0.2
Cython: 0.23.4
numpy: 1.11.1
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 5.0.0
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.9999999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None
@jreback
Copy link
Contributor

jreback commented Aug 12, 2016

this is a dupe of #13813 fixed in #13819 and will be in 0.19.0

@jreback jreback closed this as completed Aug 12, 2016
@jreback jreback added Bug Groupby Duplicate Report Duplicate issue or pull request labels Aug 12, 2016
@jreback jreback added this to the No action milestone Aug 12, 2016
@nickderobertis
Copy link
Author

Thanks, I didn't find it because I only searched open issues... my mistake.

@rojour
Copy link

rojour commented Mar 23, 2017

I have all the latest versions as of today and kernel crashes because one of the records in the groupby has a nan. Had to go around it filtering first with .nonull()

@jreback
Copy link
Contributor

jreback commented Mar 23, 2017

@rojour if you are having an issue, you can create a new issue with a reproducible example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Groupby
Projects
None yet
Development

No branches or pull requests

3 participants