-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Allow type declaration of dataframes with index other than Index
#54378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You might want to try installing edit: I believe your examples will still fail with pandas-stubs, but if I get pandas-dev/pandas-stubs#723 (comment) working, there might be a future where DataFrame is generic in terms of the index. |
Thanks for your response. So it seems that for now at least, I need to turn off type checking if I'm working with dataframes that don't have a plain index (or manually type everything, but I'm constantly switching from multi-index to single, so this is not tenable). Is that a reasonable conclusion? I must admit I'm a bit confused by the whole 'stubs' situation with VS Code. Like, how do I tell what stubs packages are being used? The Also, I'm a little bit confused about types and Pandas. It seems like a lot of the objects have types defined in the core library, yet |
I'm not sure what is shipped with VSCode (@Dr-Irv might know more). I would recommend explicitly installing pandas-stubs in your python environment if you want to use type checkers with pandas code (even if it is already part of VSCode).
I think VSCode has a "goto definition" option, that might help. I know pyright (the core type engine behind VScode/PyLance) by default(?) analyzes library code, e.g., pandas, if no explicit stubs are installed, e.g., pandas-stubs.
If/when pandas reaches the py.typed state (this will take years, unless it is more actively pushed), pandas-stubs will be obsolete (because type checkers will not look for external stubs if a library declares itself as py.typed). If you want to improve type annotations, you are welcome to open PRs for pandas and/or pandas-stubs :) I think the main divide between the two is whether the annotations are focused on end users (pandas-stubs) or on pandas developers. I dream of a future where we can just have one: retroactively adding type annotations that are both internally (pandas code) and externally (end user code) consistent is challenging - it is also challenging to write stubs for an API that was created before Python's type system existed (but it is easier). If you require generic objects (Series[int], ...) , you will have to use pandas-stubs for now (has a generic Series and hopefully also soon a generic Index). If you mainly use popular methods, such as read_csv, and don't care too much about having a lot of Anys or Unions, you will be fine with pyright using the pandas annotations. If you use other type checkers (mypy), you have to use pandas-stubs - I believe pyright is the only type checker that is eager enough to analyze non-py.typed code. |
That's great, thanks for the clarifications. It sounds like work on this sort of thing is well under way so I'll close this. |
Maybe. The issue here is that in a static typing context, we can't track what kind of index is backing the dataframe. For example, methods like
from pandas._version import _stub_version
reveal_type(_stub_version) Then in VS Code, the
The types in the pandas source are there for internal type checking of the pandas code for pandas development. The stubs are meant for users. There are a few advantages of using the stubs, IMHO:
On a personal note, my team and I have been using the stubs on our code and it has picked up numerous bugs in our code, and saves us a lot of time. |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
I'm a long time Pandas user, just switched to VS Code which has stronger type checking. I'm struggling to get error-free code when using Dataframes with
MultiIndex
orDatetimeIndex
. E.g. this pretty basic code results in an error in VS Code.Feature Description
A few options I can think of (I'm not sure of the viability of any)
DatetimeDataFrame
,MultiIndexDataFrame
types, that can be used to annotate the return value of anything that returns aDataFrame
Index
and raise errors/noop when called on the wrong type. E.g.Index.day
is a valid property, but returnsNone
. Not great as it clutters the auto-complete list.Alternative Solutions
I don't know. How to people who use Pandas with VS Code do this? Does everyone just turn off type checking? Is there some obvious step I'm missing?
My workaround is to create a type/class with the right index and assign that as the type, but that in itself is an error (
Expression of type "DataFrame" cannot be assigned to declared type "DataFrameDatetimeIndex"
), so at the point where I define the type I have to turn off type checking. But at least then I get auto-complete for a DatetimeIndex.The other workaround is to use
typing.cast
but that means an extra import and seems hacky.Additional Context
No response
The text was updated successfully, but these errors were encountered: