TYP: Series.name is Hashable - should it be str? #42534

Dr-Irv · 2021-07-14T16:44:28Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

import pandas as pd

s = pd.Series([1,2,3], name="foo")

reveal_type(s.name)

nam: str = s.name

mypy then reports:

seriesnam.py:5: note: Revealed type is "typing.Hashable*"
seriesnam.py:7: error: Incompatible types in assignment (expression has type "Hashable", variable has type "str")  [assignment]

It's natural to assume that the names of a Series are str

So why do we have it typed as Hashable ?

Seems like @simonjayhawkins changed this in #39107

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2021-07-14T17:19:18Z

It's natural to assume that the names of a Series are str

but can be any hashable object, most often a string, sometimes an int, but can also be a tuple.

>>> df = pd.DataFrame(
...     np.ones((3, 4)), columns=pd.MultiIndex.from_product([[1, 2], ["A", "B"]])
... )
>>> df
     1         2     
     A    B    A    B
0  1.0  1.0  1.0  1.0
1  1.0  1.0  1.0  1.0
2  1.0  1.0  1.0  1.0
>>> 
>>> df.xs((1, "A"), axis=1)
0    1.0
1    1.0
2    1.0
Name: (1, A), dtype: float64
>>>

Dr-Irv · 2021-07-14T17:28:42Z

but can be any hashable object, most often a string, sometimes an int, but can also be a tuple.

So users that always name their Series with strings will have to always cast the result of Series.name when they know the names are strings?

simonjayhawkins · 2021-07-14T17:40:12Z

So users that always name their Series with strings will have to always cast the result of Series.name when they know the names are strings?

yep. if they know better than mypy they can cast, or to be sure use an assert. Or better still make their code generic enough to handle any hashable as a name.

From the pandas side, we could make Series generic, like numpy have done with np.array[ndim, dtype]

The generic type parameters of a Series have not even been touched on yet. We would probably want dtype. and something for Index? and then you could maybe have something for the name, but imo that is much less important than the first two.

Dr-Irv · 2021-07-14T19:12:20Z

yep. if they know better than mypy they can cast, or to be sure use an assert. Or better still make their code generic enough to handle any hashable as a name.

Should we consider that when we name a Series internally, we name it with the string representation? Note that we document the constructor for Series as allowing name to be str or None, so should we do that in internal code?

I've got code where I have created a bunch of series and named them all with strings, and then have other parts of the code that refer to Series.name and I know it is a string, so doing casts to make mypy (and pyright) happy seems non-intuitive

simonjayhawkins · 2021-07-14T19:45:37Z

Note that we document the constructor for Series as allowing name to be str or None, so should we do that in internal code?

hmm, that would be a breaking change. probably easier to widen the accepted types in the api docs to match the actual behavior.

I've got code where I have created a bunch of series and named them all with strings, and then have other parts of the code that refer to Series.name and I know it is a string, so doing casts to make mypy (and pyright) happy seems non-intuitive

not sure what to suggest. Since Series.name can be any hashable, that's the type it is given.

Dr-Irv · 2021-07-14T21:54:59Z

Note that we document the constructor for Series as allowing name to be str or None, so should we do that in internal code?

hmm, that would be a breaking change. probably easier to widen the accepted types in the api docs to match the actual behavior.

But do you think it would break user code? In other words, do you think that users depend on Series.name being a Hashable instead of just a str ?

It just seems more intuitive from a design standpoint that if the constructor assumes name is a str or None, you'd expect that Series.name would return a str or None as a result.

I went back in the old docs, and we didn't document the return type of Series.name in 1.0.0. The "hashable" appeared in 1.1.0.

simonjayhawkins · 2021-07-14T22:25:34Z

But do you think it would break user code?

you mean if we change, say, the return type of DataFrame.xs to be a string instead of tuple?

It just seems more intuitive from a design standpoint that if the constructor assumes name is a str or None, you'd expect that Series.name would return a str or None as a result.

would need a deprecation cycle. Any idea how many functions would affected?

I went back in the old docs, and we didn't document the return type of Series.name in 1.0.0. The "hashable" appeared in 1.1.0.

Our types are not public. we use types for internal consistency and robustness. We are adding/changing/improving/refining types gradually. The discussion on making types public is #28142 and once we do that, we maybe would need a policy on stability.

if you do

import pandas as pd

reveal_type(pd.Series.name)

with a released version of pandas you get

t.py:1: error: Skipping analyzing "pandas": found module but no type hints or library stubs
t.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
t.py:3: note: Revealed type is "Any"

so I don't think there are any user facing changes to be concerned with here.

jbrockmendel · 2021-07-15T03:31:39Z

If the annotation or docstring says string it is wrong. if a frame's index is Int64Index then frame.iloc[0] will be a Series with an integer name. If it has a DatetimeIndex then frame.iloc[0] will have a Timestamp for its name.

Dr-Irv · 2021-07-15T14:12:33Z

Here's another possibility. What if we do this in Series:

@property
def name(self) -> str:
    if name is not None:
        return str(self._name)
    else:
        return None

In other words, we always return a str.

Regarding your example, using pyright from within Visual Studio Code:

import pandas as pd

s = pd.Series([1,2,3], name="foo")
reveal_type(s.name)

you get:

Type of "s.name" is "Hashable | None"

That is because pylance is providing pandas type stubs that correspond to what is in the pandas source.

So because I am writing "user" code in VSC and using the supplied type checking, I now have to cast s.name when I know it is a str

simonjayhawkins · 2021-07-15T14:49:59Z

So because I am writing "user" code in VSC and using the supplied type checking, I now have to cast s.name when I know it is a str

That's what TypeVars are for. see my earlier comment on what could be done on the pandas side. i.e. use Generic https://docs.python.org/3/library/typing.html#user-defined-generic-types

so that a Series created with a str name in the example above would be say Series[np.int64, RangeIndex, str] and Series.name would be typed as str, using a TypeVar to tie the attributes, return types and constructor parameters together.

That is because pylance is providing pandas type stubs that correspond to what is in the pandas source.

so you are using 3rd party stubs. pandas source does not implement PEP 561. so this is not a pandas issue.

Dr-Irv · 2021-07-15T14:52:47Z

so you are using 3rd party stubs. pandas source does not implement PEP 561. so this is not a pandas issue.

But it will be a pandas issue in the future unless we do the Generic thing.

I still don't see why we don't change the implementation of Series.name to return a str (or None).

simonjayhawkins · 2021-07-15T15:03:44Z

But it will be a pandas issue in the future unless we do the Generic thing.

and that's probably a good argument for having official stubs until we progress to having the source code fully annotated. writing stubs (for just the public api) is a lot easier than typing a complete codebase as complex and dynamic as pandas. I use dynamic in the programming sense since some of our code (methods and attributes) are dynamically generated.

stubs would allow us to experiment with allowing classes to accept type parameters a lot sooner than if we did inline.

but this is for the discussion in #28142. till then our types are experimental.

Dr-Irv · 2021-07-15T15:24:49Z

I still don't see why we don't change the implementation of Series.name to return a str (or None).

Let's change the perspective here. I'd like to argue that from an API design standpoint, Series.name should return a str, since the constructor of a Series takes only str or None for the name argument.

Can you provide a counterargument?

simonjayhawkins · 2021-07-15T15:55:21Z

Can you provide a counterargument?

probably best to open a new issue to discuss that as getting off track.

From the OP

So why do we have it typed as Hashable ?

so let's keep the discussion to the type annotation for Series.name

(which has been addressed in #42534 (comment) and #42534 (comment))

Dr-Irv · 2021-07-15T16:02:07Z

probably best to open a new issue to discuss that as getting off track.

Fair enough. #42550

jbrockmendel · 2021-07-15T16:11:15Z

since the constructor of a Series takes only str or None for the name argument.

the constructor takes any hashable, the docstring is incorrect.

simonjayhawkins · 2021-07-19T11:54:28Z

so let's keep the discussion to the type annotation for Series.name

(which has been addressed in #42534 (comment) and #42534 (comment))

I think this one can be closed. wider discussion in #42550

Dr-Irv added Bug Typing type annotations, mypy/pyright type checking labels Jul 14, 2021

simonjayhawkins added the Closing Candidate May be closeable, needs more eyeballs label Jul 14, 2021

Dr-Irv mentioned this issue Jul 15, 2021

ENH: Series.name should return a string or None #42550

Closed

simonjayhawkins closed this as completed Jul 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TYP: Series.name is Hashable - should it be str? #42534

TYP: Series.name is Hashable - should it be str? #42534

Dr-Irv commented Jul 14, 2021

simonjayhawkins commented Jul 14, 2021

Dr-Irv commented Jul 14, 2021

simonjayhawkins commented Jul 14, 2021

Dr-Irv commented Jul 14, 2021

simonjayhawkins commented Jul 14, 2021

Dr-Irv commented Jul 14, 2021

simonjayhawkins commented Jul 14, 2021

jbrockmendel commented Jul 15, 2021

Dr-Irv commented Jul 15, 2021

simonjayhawkins commented Jul 15, 2021

Dr-Irv commented Jul 15, 2021

simonjayhawkins commented Jul 15, 2021

Dr-Irv commented Jul 15, 2021

simonjayhawkins commented Jul 15, 2021

Dr-Irv commented Jul 15, 2021

jbrockmendel commented Jul 15, 2021

simonjayhawkins commented Jul 19, 2021

TYP: Series.name is Hashable - should it be str? #42534

TYP: Series.name is Hashable - should it be str? #42534

Comments

Dr-Irv commented Jul 14, 2021

simonjayhawkins commented Jul 14, 2021

Dr-Irv commented Jul 14, 2021

simonjayhawkins commented Jul 14, 2021

Dr-Irv commented Jul 14, 2021

simonjayhawkins commented Jul 14, 2021

Dr-Irv commented Jul 14, 2021

simonjayhawkins commented Jul 14, 2021

jbrockmendel commented Jul 15, 2021

Dr-Irv commented Jul 15, 2021

simonjayhawkins commented Jul 15, 2021

Dr-Irv commented Jul 15, 2021

simonjayhawkins commented Jul 15, 2021

Dr-Irv commented Jul 15, 2021

simonjayhawkins commented Jul 15, 2021

Dr-Irv commented Jul 15, 2021

jbrockmendel commented Jul 15, 2021

simonjayhawkins commented Jul 19, 2021