Skip to content

TYP: Series.name is Hashable - should it be str? #42534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
Dr-Irv opened this issue Jul 14, 2021 · 17 comments
Closed
3 tasks done

TYP: Series.name is Hashable - should it be str? #42534

Dr-Irv opened this issue Jul 14, 2021 · 17 comments
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Typing type annotations, mypy/pyright type checking

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jul 14, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


import pandas as pd

s = pd.Series([1,2,3], name="foo")

reveal_type(s.name)

nam: str = s.name

mypy then reports:

seriesnam.py:5: note: Revealed type is "typing.Hashable*"
seriesnam.py:7: error: Incompatible types in assignment (expression has type "Hashable", variable has type "str")  [assignment]

It's natural to assume that the names of a Series are str

So why do we have it typed as Hashable ?

Seems like @simonjayhawkins changed this in #39107

@Dr-Irv Dr-Irv added Bug Typing type annotations, mypy/pyright type checking labels Jul 14, 2021
@simonjayhawkins
Copy link
Member

It's natural to assume that the names of a Series are str

but can be any hashable object, most often a string, sometimes an int, but can also be a tuple.

>>> df = pd.DataFrame(
...     np.ones((3, 4)), columns=pd.MultiIndex.from_product([[1, 2], ["A", "B"]])
... )
>>> df
     1         2     
     A    B    A    B
0  1.0  1.0  1.0  1.0
1  1.0  1.0  1.0  1.0
2  1.0  1.0  1.0  1.0
>>> 
>>> df.xs((1, "A"), axis=1)
0    1.0
1    1.0
2    1.0
Name: (1, A), dtype: float64
>>> 

@simonjayhawkins simonjayhawkins added the Closing Candidate May be closeable, needs more eyeballs label Jul 14, 2021
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 14, 2021

but can be any hashable object, most often a string, sometimes an int, but can also be a tuple.

So users that always name their Series with strings will have to always cast the result of Series.name when they know the names are strings?

@simonjayhawkins
Copy link
Member

So users that always name their Series with strings will have to always cast the result of Series.name when they know the names are strings?

yep. if they know better than mypy they can cast, or to be sure use an assert. Or better still make their code generic enough to handle any hashable as a name.

From the pandas side, we could make Series generic, like numpy have done with np.array[ndim, dtype]

The generic type parameters of a Series have not even been touched on yet. We would probably want dtype. and something for Index? and then you could maybe have something for the name, but imo that is much less important than the first two.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 14, 2021

yep. if they know better than mypy they can cast, or to be sure use an assert. Or better still make their code generic enough to handle any hashable as a name.

Should we consider that when we name a Series internally, we name it with the string representation? Note that we document the constructor for Series as allowing name to be str or None, so should we do that in internal code?

I've got code where I have created a bunch of series and named them all with strings, and then have other parts of the code that refer to Series.name and I know it is a string, so doing casts to make mypy (and pyright) happy seems non-intuitive

@simonjayhawkins
Copy link
Member

Note that we document the constructor for Series as allowing name to be str or None, so should we do that in internal code?

hmm, that would be a breaking change. probably easier to widen the accepted types in the api docs to match the actual behavior.

I've got code where I have created a bunch of series and named them all with strings, and then have other parts of the code that refer to Series.name and I know it is a string, so doing casts to make mypy (and pyright) happy seems non-intuitive

not sure what to suggest. Since Series.name can be any hashable, that's the type it is given.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 14, 2021

Note that we document the constructor for Series as allowing name to be str or None, so should we do that in internal code?

hmm, that would be a breaking change. probably easier to widen the accepted types in the api docs to match the actual behavior.

But do you think it would break user code? In other words, do you think that users depend on Series.name being a Hashable instead of just a str ?

It just seems more intuitive from a design standpoint that if the constructor assumes name is a str or None, you'd expect that Series.name would return a str or None as a result.

I went back in the old docs, and we didn't document the return type of Series.name in 1.0.0. The "hashable" appeared in 1.1.0.

@simonjayhawkins
Copy link
Member

But do you think it would break user code?

you mean if we change, say, the return type of DataFrame.xs to be a string instead of tuple?

It just seems more intuitive from a design standpoint that if the constructor assumes name is a str or None, you'd expect that Series.name would return a str or None as a result.

would need a deprecation cycle. Any idea how many functions would affected?

I went back in the old docs, and we didn't document the return type of Series.name in 1.0.0. The "hashable" appeared in 1.1.0.

Our types are not public. we use types for internal consistency and robustness. We are adding/changing/improving/refining types gradually. The discussion on making types public is #28142 and once we do that, we maybe would need a policy on stability.

if you do

import pandas as pd

reveal_type(pd.Series.name)

with a released version of pandas you get

t.py:1: error: Skipping analyzing "pandas": found module but no type hints or library stubs
t.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
t.py:3: note: Revealed type is "Any"

so I don't think there are any user facing changes to be concerned with here.

@jbrockmendel
Copy link
Member

If the annotation or docstring says string it is wrong. if a frame's index is Int64Index then frame.iloc[0] will be a Series with an integer name. If it has a DatetimeIndex then frame.iloc[0] will have a Timestamp for its name.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 15, 2021

Here's another possibility. What if we do this in Series:

@property
def name(self) -> str:
    if name is not None:
        return str(self._name)
    else:
        return None

In other words, we always return a str.

Regarding your example, using pyright from within Visual Studio Code:

import pandas as pd

s = pd.Series([1,2,3], name="foo")
reveal_type(s.name)

you get:

Type of "s.name" is "Hashable | None"

That is because pylance is providing pandas type stubs that correspond to what is in the pandas source.

So because I am writing "user" code in VSC and using the supplied type checking, I now have to cast s.name when I know it is a str

@simonjayhawkins
Copy link
Member

So because I am writing "user" code in VSC and using the supplied type checking, I now have to cast s.name when I know it is a str

That's what TypeVars are for. see my earlier comment on what could be done on the pandas side. i.e. use Generic https://docs.python.org/3/library/typing.html#user-defined-generic-types

so that a Series created with a str name in the example above would be say Series[np.int64, RangeIndex, str] and Series.name would be typed as str, using a TypeVar to tie the attributes, return types and constructor parameters together.

That is because pylance is providing pandas type stubs that correspond to what is in the pandas source.

so you are using 3rd party stubs. pandas source does not implement PEP 561. so this is not a pandas issue.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 15, 2021

so you are using 3rd party stubs. pandas source does not implement PEP 561. so this is not a pandas issue.

But it will be a pandas issue in the future unless we do the Generic thing.

I still don't see why we don't change the implementation of Series.name to return a str (or None).

@simonjayhawkins
Copy link
Member

But it will be a pandas issue in the future unless we do the Generic thing.

and that's probably a good argument for having official stubs until we progress to having the source code fully annotated. writing stubs (for just the public api) is a lot easier than typing a complete codebase as complex and dynamic as pandas. I use dynamic in the programming sense since some of our code (methods and attributes) are dynamically generated.

stubs would allow us to experiment with allowing classes to accept type parameters a lot sooner than if we did inline.

but this is for the discussion in #28142. till then our types are experimental.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 15, 2021

I still don't see why we don't change the implementation of Series.name to return a str (or None).

Let's change the perspective here. I'd like to argue that from an API design standpoint, Series.name should return a str, since the constructor of a Series takes only str or None for the name argument.

Can you provide a counterargument?

@simonjayhawkins
Copy link
Member

Can you provide a counterargument?

probably best to open a new issue to discuss that as getting off track.

From the OP

So why do we have it typed as Hashable ?

so let's keep the discussion to the type annotation for Series.name

(which has been addressed in #42534 (comment) and #42534 (comment))

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 15, 2021

probably best to open a new issue to discuss that as getting off track.

Fair enough. #42550

@jbrockmendel
Copy link
Member

since the constructor of a Series takes only str or None for the name argument.

the constructor takes any hashable, the docstring is incorrect.

@simonjayhawkins
Copy link
Member

so let's keep the discussion to the type annotation for Series.name

(which has been addressed in #42534 (comment) and #42534 (comment))

I think this one can be closed. wider discussion in #42550

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

No branches or pull requests

3 participants