Skip to content

Column lookups using str subclasses fail on DataFrames with DateTime indexes #37366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bkurtz opened this issue Oct 23, 2020 · 1 comment
Closed
Labels
Bug Datetime Datetime data dtype
Milestone

Comments

@bkurtz
Copy link

bkurtz commented Oct 23, 2020

Consider the following code:

x = pd.DataFrame({"a": [1]}, index=pd.DatetimeIndex(["2020-10-22 01:21:00+00:00"]))
x["b"] = 2
class mystring(str):
    pass

x[mystring("b")] # works
x[mystring("c")] = 3 # error!

We are specifically interested in this because we've been trying to move towards using enum.Enum classes to remove column-name string constants from our code, which works well in many cases, but fails badly when there's a datetime index. Just as an example, we might do something like

import enum
@enum.unique
class ColNames(str, enum.Enum):
    NAME = "name"
    POWER_W = "power (watts)"

As far as I can tell, what's going on is:

  1. When indexing, pandas (for reasons I'm not sure I understand) first tries to interpret the key as a row slice
  2. That function will happily accept str subclasses (and works as expected if the column already exists)
  3. But with a DateTime index, it eventually gets here where apparently str subclasses are no longer accepted and a TypeError is raised.

I see two easy solutions (i.e. that I could easily submit a PR for):

  1. Convert the key to a normal string either here or in the subclass implementations thereof
  2. Watch for TypeErrors here in addition to the other known types there, allowing it to gracefully continue on and try the key as a column name (which will then succeed)

I can also envision fixing this by
3. extending the low-level parsing function to work better with this case, or
4. finishing the deprecation of row lookups using regular frame[indx] notation
However solution 3 is outside my comfort zone, and 4 seems like it might be more involved anyway.

It feels to me like solution 3 is the "correct" resolution to this problem, and that 1 is almost as good (if applied in all the right places), but since it's not my project, I wanted to get feedback before trying to jump in to any of these.

mortada added a commit to mortada/pandas that referenced this issue May 10, 2021
mortada added a commit to mortada/pandas that referenced this issue May 10, 2021
@jreback jreback added Bug Datetime Datetime data dtype labels May 10, 2021
@jreback jreback added this to the 1.3 milestone May 10, 2021
mortada added a commit to mortada/pandas that referenced this issue May 10, 2021
@lithomas1
Copy link
Member

Fixed by #41406

JulianWgs pushed a commit to JulianWgs/pandas that referenced this issue Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

No branches or pull requests

3 participants