-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYP: Series.name is Hashable - should it be str? #42534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
but can be any hashable object, most often a string, sometimes an int, but can also be a tuple.
|
So users that always name their |
yep. if they know better than mypy they can cast, or to be sure use an assert. Or better still make their code generic enough to handle any hashable as a name. From the pandas side, we could make Series generic, like numpy have done with np.array[ndim, dtype] The generic type parameters of a Series have not even been touched on yet. We would probably want dtype. and something for Index? and then you could maybe have something for the name, but imo that is much less important than the first two. |
Should we consider that when we name a I've got code where I have created a bunch of series and named them all with strings, and then have other parts of the code that refer to |
hmm, that would be a breaking change. probably easier to widen the accepted types in the api docs to match the actual behavior.
not sure what to suggest. Since Series.name can be any hashable, that's the type it is given. |
But do you think it would break user code? In other words, do you think that users depend on It just seems more intuitive from a design standpoint that if the constructor assumes I went back in the old docs, and we didn't document the return type of |
you mean if we change, say, the return type of DataFrame.xs to be a string instead of tuple?
would need a deprecation cycle. Any idea how many functions would affected?
Our types are not public. we use types for internal consistency and robustness. We are adding/changing/improving/refining types gradually. The discussion on making types public is #28142 and once we do that, we maybe would need a policy on stability. if you do
with a released version of pandas you get
so I don't think there are any user facing changes to be concerned with here. |
If the annotation or docstring says string it is wrong. if a frame's index is Int64Index then frame.iloc[0] will be a Series with an integer name. If it has a DatetimeIndex then frame.iloc[0] will have a Timestamp for its name. |
Here's another possibility. What if we do this in @property
def name(self) -> str:
if name is not None:
return str(self._name)
else:
return None In other words, we always return a Regarding your example, using pyright from within Visual Studio Code: import pandas as pd
s = pd.Series([1,2,3], name="foo")
reveal_type(s.name) you get:
That is because pylance is providing pandas type stubs that correspond to what is in the pandas source. So because I am writing "user" code in VSC and using the supplied type checking, I now have to cast |
That's what TypeVars are for. see my earlier comment on what could be done on the pandas side. i.e. use Generic https://docs.python.org/3/library/typing.html#user-defined-generic-types so that a Series created with a str name in the example above would be say Series[np.int64, RangeIndex, str] and Series.name would be typed as str, using a TypeVar to tie the attributes, return types and constructor parameters together.
so you are using 3rd party stubs. pandas source does not implement PEP 561. so this is not a pandas issue. |
But it will be a pandas issue in the future unless we do the I still don't see why we don't change the implementation of |
and that's probably a good argument for having official stubs until we progress to having the source code fully annotated. writing stubs (for just the public api) is a lot easier than typing a complete codebase as complex and dynamic as pandas. I use dynamic in the programming sense since some of our code (methods and attributes) are dynamically generated. stubs would allow us to experiment with allowing classes to accept type parameters a lot sooner than if we did inline. but this is for the discussion in #28142. till then our types are experimental. |
Let's change the perspective here. I'd like to argue that from an API design standpoint, Can you provide a counterargument? |
probably best to open a new issue to discuss that as getting off track. From the OP
so let's keep the discussion to the type annotation for Series.name (which has been addressed in #42534 (comment) and #42534 (comment)) |
Fair enough. #42550 |
the constructor takes any hashable, the docstring is incorrect. |
I think this one can be closed. wider discussion in #42550 |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
mypy then reports:
It's natural to assume that the names of a Series are
str
So why do we have it typed as
Hashable
?Seems like @simonjayhawkins changed this in #39107
The text was updated successfully, but these errors were encountered: