Skip to content

Regression in DataFrame indexing in 1.5.3.230227. #585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jaccarmac opened this issue Mar 21, 2023 · 6 comments
Closed

Regression in DataFrame indexing in 1.5.3.230227. #585

jaccarmac opened this issue Mar 21, 2023 · 6 comments

Comments

@jaccarmac
Copy link

pandas-stubs versions 1.5.3.230227 and later (also tested the most recent and most recent yanked versions) treat [] indexing as producing a Series, even when the result is a DataFrame.

To Reproduce

pandas-stubs-bug.py

from pandas import DataFrame

df = DataFrame({"a": (1,)})
reveal_type(df["a"])
reveal_type(df[["a"]])

python -m mypy pandas-stubs-bug.py

The correct output from pandas-stubs==1.5.3.230214:

pandas-stubs-bug.py:4: note: Revealed type is "pandas.core.series.Series[Any]"
pandas-stubs-bug.py:5: note: Revealed type is "pandas.core.frame.DataFrame"

The output from pandas-stubs==1.5.3.230227 or later:

pandas-stubs-bug.py:4: note: Revealed type is "pandas.core.series.Series[Any]"
pandas-stubs-bug.py:5: note: Revealed type is "pandas.core.series.Series[Any]"

Please complete the following information:

  • OS: GuixSD on WSL2
  • N/A / Windows 11
  • Python 3.8.2
  • mypy 1.1.1
  • Pandas 1.1.0
  • pandas-stubs as described above.

Additional context

#537 seems a likely culprit, but I'm not sure how to bisect to make sure (or how to write a test suite to catch the regression). I'm happy to do either if pointed in the right direction.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Mar 21, 2023

I've tested your code with mypy 1.1.1, python 3.9.13, pandas 1.5.3 and it doesn't report that error.

We only test the stubs with the indicated release of pandas (in this case 1.5.3.230227 is tested with pandas 1.5.3).

So the issue might be that you are on pandas 1.1.0. Can you try your test with pandas 1.5.3?

@jaccarmac
Copy link
Author

I can confirm that the issue does not appear in that version of Pandas. I was under the impression that the stubs had some tested level of backwards compatibility, but see that the version checks are minimal and restricted to the test suite. Is the intention to only support indicated releases? Would a change to restore the 1.5.3.230214 behavior for this particular case be accepted?

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Mar 23, 2023

I can confirm that the issue does not appear in that version of Pandas. I was under the impression that the stubs had some tested level of backwards compatibility, but see that the version checks are minimal and restricted to the test suite. Is the intention to only support indicated releases? Would a change to restore the 1.5.3.230214 behavior for this particular case be accepted?

We only test the stubs against the latest released version of pandas, especially since the pandas API evolves with each new minor version.

You can try to restore the behavior, but you'd have to do so without changing any of the existing tests, and make sure it works with 1.5.3. In particular, the PR you pointed to as the possible culprit has nothing to do with DataFrame.__getitem__(), which is what your examples are looking for when type checking occurs.

I think the reason that this occurred with an older version of pandas is because of how mypy works. It looks at the stubs, but it will also look inside the python source if it thinks something is missing. We try to keep the organization of the file hierarchy to match the pandas source, so you also may be running into an issue where files got reorganized in pandas between 1.1.0 and 1.5.3, so mypy can easily get confused. One thing you might try is to try using pyright and your old version of pandas to see if the error is reported there. AFAIK, pyright doesn't look inside the code to do type checking (unless a module is completely missing).

@jaccarmac
Copy link
Author

In particular, the PR you pointed to as the possible culprit has nothing to do with DataFrame.__getitem__(), which is what your examples are looking for when type checking occurs.

You're absolutely correct; It was a bad guess and bisecting to figure that out was nice and easy.

I think the reason that this occurred with an older version of pandas is because of how mypy works. It looks at the stubs, but it will also look inside the python source if it thinks something is missing.

There's some deep magic working because #550 is the patch which broke this for Pandas 1.1.0.

mypy can easily get confused

It sure looks like that's happening, inside or outside the Pandas source. Is there a way to set the equivalent of a breakpoint at those two lines to see how they're affecting the checks in my example?

One thing you might try is to try using pyright and your old version of pandas to see if the error is reported there.

Thanks, I'll look into this. As you might guess based on the Python and library versions, the environment this issue affects is rather weird, and that prevents the obvious installation of pyright. Suspect I'll need to compile Node for myself or somesuch.

Anyway, I can probably hack together a monkey-patch for this since it's just two lines. No idea if #571 is at all related, but I noticed that issue title the other day and seeing time types pop up again here seems awfully coincidental.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Mar 23, 2023

It sure looks like that's happening, inside or outside the Pandas source. Is there a way to set the equivalent of a breakpoint at those two lines to see how they're affecting the checks in my example?

These things can be really hard to track down. I've done it by incrementally changing the stubs (sometimes line by line) to figure out what particular change causes mypy to change its results. The challenge here is that it takes about a minute for mypy to check the stubs. mypy can sometimes have behaviors with a small change that are hard to understand what is the actual root cause.

If you want to fix it in your own code without messing with the stubs, you could write cast(pd.DataFrame, df[["a"]]) to get the type checker to know the right type.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Apr 1, 2023

I'm going to close this issue, because we won't be able to support using pandas-stubs with older versions of pandas.

@Dr-Irv Dr-Irv closed this as completed Apr 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants