Skip to content

BUG: Runtime warning with groupby/tail when None appears in group column #46814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
ian-r-rose opened this issue Apr 20, 2022 · 6 comments
Closed
3 tasks done
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Groupby Upstream issue Issue related to pandas dependency Warnings Warnings that appear or should be added to pandas

Comments

@ian-r-rose
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas
df = pandas.DataFrame(
    [
        ["a", 1],
        ["a", 2],
        [None, 0],
        ["b", 2],
        ["b", 3],

    ],
    columns=["x", "y"],
)
df.groupby("x", dropna=False).tail(1)   # Succeeds
df.groupby("x", dropna=True).tail(1)   # Produces warning

Issue Description

👋 In pandas main, a groupby/tail on a DataFrame which contains nulls in the grouped column produces a RuntimeWarning suggesting a mishandled case in indexing logic:

/lib/python3.8/site-packages/pandas/core/groupby/indexing.py:217: RuntimeWarning: invalid value encountered in remainder
  mask &= offset_array % step == 0

If I set dropna=False, the snippet succeeds without producing a warning.

Based on the warning location and git blame, it may be related to #42947

Expected Behavior

The above groupby should succeed without producing a warning.

Installed Versions

Pandas main, as well as 1.4.

@ian-r-rose ian-r-rose added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 20, 2022
@rhshadrach
Copy link
Member

rhshadrach commented Apr 21, 2022

Thanks for the report! I'm not seeing this on main or 1.4.x; can you post the output of pd.show_versions().

versions
pandas           : 1.4.2+11.gd1717e98a2
numpy            : 1.22.3
pytz             : 2022.1
dateutil         : 2.8.2
pip              : 22.0.4
setuptools       : 62.1.0
Cython           : 0.29.28
pytest           : 7.1.1
hypothesis       : 6.43.3
sphinx           : 4.5.0
blosc            : None
feather          : None
xlsxwriter       : 3.0.3
lxml.etree       : 4.8.0
html5lib         : 1.1
pymysql          : None
psycopg2         : None
jinja2           : 3.0.3
IPython          : 8.2.0
pandas_datareader: None
bs4              : 4.11.1
bottleneck       : 1.3.4
brotli           : 
fastparquet      : 0.8.0
fsspec           : 2021.11.0
gcsfs            : 2021.11.0
markupsafe       : 2.1.1
matplotlib       : 3.5.1
numba            : 0.53.1
numexpr          : 2.8.0
odfpy            : None
openpyxl         : 3.0.9
pandas_gbq       : None
pyarrow          : 3.0.0
pyreadstat       : 1.1.4
pyxlsb           : None
s3fs             : 2021.11.0
scipy            : 1.8.0
snappy           : 
sqlalchemy       : 1.4.35
tables           : 3.7.0
tabulate         : 0.8.9
xarray           : 0.18.2
xlrd             : 2.0.1
xlwt             : 1.3.0
zstandard        : None

@rhshadrach rhshadrach added Groupby Warnings Warnings that appear or should be added to pandas Needs Info Clarification about behavior needed to assess issue labels Apr 21, 2022
@ian-r-rose
Copy link
Author

Ah, my mistake, I can't reproduce on 1.4.x, so I must have messed up my test environment yesterday (sorry to omit show_versions). I can still reproduce this on main:

pd.show_versions():

INSTALLED VERSIONS ------------------ commit : a8968bf python : 3.8.8.final.0 python-bits : 64 OS : Linux OS-release : 5.13.0-39-generic Version : #44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.5.0.dev0+694.ga8968bfa69
numpy : 1.21.5
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 59.8.0
Cython : 0.29.22
pytest : 6.2.2
hypothesis : None
sphinx : 4.4.0
blosc : 1.10.2
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.30.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
brotli :
fastparquet : 0.7.2
fsspec : 2021.06.1
gcsfs : 0.7.2
markupsafe : 1.1.1
matplotlib : 3.3.4
numba : 0.53.1
numexpr : 2.8.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 5.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2021.06.1
scipy : 1.7.3
snappy :
sqlalchemy : 1.4.20
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.21.1
xlrd : None
xlwt : None
zstandard : None

@ian-r-rose
Copy link
Author

Update: I ran a git bisect, and for me the warning starts to show up at 4d7a03a

@rhshadrach
Copy link
Member

Thanks, I'm able to reproduce with numpy 1.21.6; no warning appears with numpy 1.22.0.This was a bug that fixed in 1.22.0:

numpy/numpy#18170

We could add a workaround to avoid the warning for numpy < 1.22 by computing a Boolean mask where the array is null.

@rhshadrach rhshadrach added Bug and removed Bug Needs Info Clarification about behavior needed to assess issue Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 22, 2022
@rhshadrach rhshadrach added this to the Contributions Welcome milestone Apr 22, 2022
@ian-r-rose
Copy link
Author

Thanks @rhshadrach! Upgrading numpy is a fine workaround from my perspective.

@simonjayhawkins simonjayhawkins added Upstream issue Issue related to pandas dependency Closing Candidate May be closeable, needs more eyeballs labels May 29, 2022
@mroeschke
Copy link
Member

Closing as it seems this was a numpy isssue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Groupby Upstream issue Issue related to pandas dependency Warnings Warnings that appear or should be added to pandas
Projects
None yet
Development

No branches or pull requests

4 participants