Skip to content

CLN: Move capitalize_first_letter to where it's used #57096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 2, 2024
9 changes: 6 additions & 3 deletions pandas/core/dtypes/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,6 @@
is_list_like,
)

from pandas.util import capitalize_first_letter

if not pa_version_under10p1:
import pyarrow as pa

Expand Down Expand Up @@ -104,6 +102,11 @@
str_type = str


def capitalize_first_letter(s: str) -> str:
# For PeriodDtype.__eq__
return s[:1].upper() + s[1:]


class PandasExtensionDtype(ExtensionDtype):
"""
A np.dtype duck-typed class, suitable for holding a custom dtype.
Expand Down Expand Up @@ -1087,7 +1090,7 @@ def na_value(self) -> NaTType:

def __eq__(self, other: object) -> bool:
if isinstance(other, str):
return other in [self.name, capitalize_first_letter(self.name)]
return other in {self.name, capitalize_first_letter(self.name)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm still a bit confused.

First thing is that self.name is always going to be period[... in lower case, this is the implementation (just two methods above this part):

    @property
    def name(self) -> str_type:
        return f"period[{self._freqstr}]"

Then, if we want to compare strings, why not simply:

Suggested change
return other in {self.name, capitalize_first_letter(self.name)}
return other in {self.name, capitalize_first_letter(self.name)}
return other[0].lower() + other[1:] == self.name

This will make:

assert PeriodDtype('3D') == 'Period[3D]'
assert PeriodDtype('3D') == 'period[3D]'
assert PeriodDtype('3D') != 'Period[3d]'

Which for what understand is what we want, no?

Then, there is another case to consider:

>>> PeriodDtype(freq='1s1ms') == PeriodDtype(freq='1001ms')
True
>>> PeriodDtype(freq='1s1ms') == '1001ms'
False

Is this what we want? I personally think this is a bug. I'd say that a better implementation here is simply:

Suggested change
return other in {self.name, capitalize_first_letter(self.name)}
other = self.__class__(other)

So the comparison with string is consistent with the comparison with the types directly. If we want to do this, we probably don't want to raise when the string is not a valid frequency string, so I guess this makes more sense:

Suggested change
return other in {self.name, capitalize_first_letter(self.name)}
try:
other = self.__class__(other)
except ValueError:
return False

If we go with this option, we probably should add a test, as this would be a bug fix, not a clean up anymore.

What do you think? I think it's a quick change, but if you prefer that I open a new PR for this I can do it.

Copy link
Contributor

@natmokval natmokval Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use pd.PeriodDtype(freq='1s1ms').freq.freqstrt to make the comparison

>>> pd.PeriodDtype(freq='1s1ms') == pd.PeriodDtype(freq='1001ms')
True
>>> pd.PeriodDtype(freq='1s1ms').freq.freqstr == '1001ms'
True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this PR to use return other[:1].lower() + other[1:] == self.name

For the PeriodDtype(freq='1s1ms') == '1001ms', I would prefer not to make a call about this case in this PR. My gut would prefer to be stricter in eq and require period[...], while just the frequency string can be compared exactly to freqstr as @natmokval suggested. For example, 1001ms could be "timedelta-like" for a user which shouldn't be made equal to a "period-like`"


return super().__eq__(other)

Expand Down
4 changes: 0 additions & 4 deletions pandas/util/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,3 @@ def __getattr__(key: str):

def __dir__():
return list(globals().keys()) + ["hash_array", "hash_pandas_object"]


def capitalize_first_letter(s: str) -> str:
return s[:1].upper() + s[1:]