-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Mysterious Series.get() with Int64Index bug #33439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
expected behavior:
|
this is the full step-through of the code that gives the unexpected None:
|
@sam-cohan I can't directly help based on the above, but if you could test with the latest release or pandas master, that would also be helpful (there have been some fixes related to groupby recently) For getting a reproducible example:
I would indeed try to create a reproducibe example with the groupby operation, not the isolated .get, as it is indeed quite likely to be related to some groupby internals. I suppose you can't share the original data you have are experiencing this with. So some tips, as you can typically create similar data that you can share. For example, start with making your data smaller (eg taking a part of it) and see if you still get the error. Remove all columns you don't need to reproduce the example. Use dummy names for the columns. Try to replace the values in the columns with dummy data but that have the same characteristics. Etc. |
Thanks for the feedback. I will try to see if I can get a subset of the data to provide a repro. |
@jorisvandenbossche I have updated the description with min repo. Please let me know if it is not clear. |
Hmm. Strange, seems like I cannot update that. Here is minimum repro code:
|
@sam-cohan thanks for the reproducer! So the good new is that this is fixed again on master (it prints "232131096 232131096" instead of "None 232131096"). But it would still be good to add a test to ensure this keeps working. For a test we still need a more reduced example though (small dataframe that can be created in the tests). |
@jorisvandenbossche here is proper minimum repro:
|
Interestingly, if you rename 'S' to "foo" and 'W' to "bar", it won't have the bug, so the values of the string column "A" are significant to the bug! |
Better still:
|
@jorisvandenbossche @simonjayhawkins I added a PR with the test. Who can I follow up with to get it merged? |
fixed in #32611 (i.e. 1.1) fa48f5f is the first new commit
|
@sam-cohan thanks for the better reproducible example and the PR! |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
I am not able to get a minimum repro of this as it is seems to be data dependent. Instead, I am capturing the bug by showing you my pdb debugging statements in hopes that someone that knows the code can figure out where the problem is.
Basically, I am doing a custom agg function which needs to grab an element from a Series object, and even though the value clearly exists in the index, it returns None. If I first convert to dict, then it does get the value.
I was not able to repro this by simply creating a new series and calling .get on it... that works just fine. And in fact, if I filter the dataframe for just that device, then it works just fine. It is definitely some sort of internal state issue which happens as a result of groupby having more records...
Expected Output
obviously I expect x.get(24) to return the correct value instead of None.
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here leaving a blank line after the details tag]INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 17.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 41.2.0
Cython : None
pytest : 5.4.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : 0.48.0
The text was updated successfully, but these errors were encountered: