CLN: Move capitalize_first_letter to where it's used #57096

mroeschke · 2024-01-26T23:08:47Z

No description provided.

datapythonista · 2024-01-29T04:41:13Z

pandas/core/dtypes/dtypes.py

@@ -1087,7 +1085,7 @@ def na_value(self) -> NaTType:

    def __eq__(self, other: object) -> bool:
        if isinstance(other, str):
-            return other in [self.name, capitalize_first_letter(self.name)]
+            return other in {self.name, self.name.title()}


The equivalent of the deleted function is str.capitalize(). str.title() makes the first letter of every word uppercase, not only the first letter of the whole string. Not sure if in this case it makes a difference, but maybe worth keeping the same exact behavior.

Suggested change

return other in {self.name, self.name.title()}

return other in {self.name, self.name.capitalize()}

In any case, nice clean up.

Looks like neither is correct

In [2]: "period[D]".capitalize() Out[2]: 'Period[d]' # we don't want to lowercase the [D] In [4]: "Period[us]".title() Out[4]: 'Period[Us]' # we don't want to uppercase the [Us]

I found string.capwords works for what we want to compare here so I'll use that

I like the current implementation better. But I think it simply does a case insensitive comparison, which is fine. But if that's what we really want, I'd just apply .lower() to both sides. The code with string.capwords does exactly the same I think, but when reading it feels like there must be a reason to use string.capwords in particular, so I find it a bit misleading.

@MarcoGorelli @natmokval any thoughts on this?

string.capwords allows a case insensitive match for the first letter only which is what we want. lower would do a case insensitive match on the whole string which is incorrect because we don't want to do that for the frequency in the brackets.

seems like, if we replace string.capwords with .lower all tests pass, but I prefer not to change the frequency in the brackets.
Thouhg for dtype we use both "period[D]" and "Period[D]", it's better to keep the correct case for the frequency.

The reason: currently we are working on creating a consistent naming for aliases (xref #56346).
Deprecated lowercase strings in favour of uppercase strings denoting:

period aliases daily, weekly, monthly and bigger frequency

offsets aliases representing DateOffset subclasses that are a week or bigger (Week, MonthBegin, MonthEnd, etc.)

timedelta units for week, month or bigger

Deprecated uppercase strings in favour of lowercase strings denoting:

period aliases hourly, minutely or smaller frequency

offsets aliases representing DateOffset subclasses that are an hour or smaller (Hour, Minute, etc.)

timedelta units for hour, minute or smaller

For some frequencies the deprecation is done, for example we show FutureWarning if freq="q"

>>> pd.Period("2015-01-01", freq="q") <stdin>:1: FutureWarning: 'q' is deprecated and will be removed in a future version, please use 'Q' instead.

For others, for example Day/Daily we can use both 'D' and 'd'

>>> pd.Period("2015-01-01", freq="d") Period('2015-01-01', 'D')

Sorry to insist @mroeschke, but I don't think the code in the PR does what you think it does (or maybe I'm missing something, but I can't see what).

string.capwords is making the comparison fully case insensitive:

>>> string.capwords('Period[US]') == string.capwords('Period[us]') True

Any case variation of period[us] would be converted to Period[us] and the comparison will be true regardless on which part of the string has different capitalization.

If the resulting string is going to be used for anything else than comparing, I agree the exact function used is relevant. But in this case, any of .upper(), .lower(), .title(), string.capwords()... would be equivalent, and I think .lower() is the one that makes the code clearer.

Or maybe we know that one of the sides of the comparison is in the Period[ms] or period[ms] capitalization? Then we should remove the call to string.capwords in that side of the comparison, and string.capwords will do what we want.

Ah OK thanks for talking that through. I understand the situation clearer now. I think with the requirement to keep the frequency in the brackets unmodified, I think the only solution is to use the original helper function to the latest commit changed it back to that

datapythonista · 2024-02-01T11:30:33Z

pandas/core/dtypes/dtypes.py

@@ -1087,7 +1090,7 @@ def na_value(self) -> NaTType:

    def __eq__(self, other: object) -> bool:
        if isinstance(other, str):
-            return other in [self.name, capitalize_first_letter(self.name)]
+            return other in {self.name, capitalize_first_letter(self.name)}


Sorry, I'm still a bit confused.

First thing is that self.name is always going to be period[... in lower case, this is the implementation (just two methods above this part):

@property def name(self) -> str_type: return f"period[{self._freqstr}]"

Then, if we want to compare strings, why not simply:

Suggested change

return other in {self.name, capitalize_first_letter(self.name)}

return other in {self.name, capitalize_first_letter(self.name)}

return other[0].lower() + other[1:] == self.name

This will make:

assert PeriodDtype('3D') == 'Period[3D]' assert PeriodDtype('3D') == 'period[3D]' assert PeriodDtype('3D') != 'Period[3d]'

Which for what understand is what we want, no?

Then, there is another case to consider:

>>> PeriodDtype(freq='1s1ms') == PeriodDtype(freq='1001ms') True >>> PeriodDtype(freq='1s1ms') == '1001ms' False

Is this what we want? I personally think this is a bug. I'd say that a better implementation here is simply:

Suggested change

return other in {self.name, capitalize_first_letter(self.name)}

other = self.__class__(other)

So the comparison with string is consistent with the comparison with the types directly. If we want to do this, we probably don't want to raise when the string is not a valid frequency string, so I guess this makes more sense:

Suggested change

return other in {self.name, capitalize_first_letter(self.name)}

try:

other = self.__class__(other)

except ValueError:

return False

If we go with this option, we probably should add a test, as this would be a bug fix, not a clean up anymore.

What do you think? I think it's a quick change, but if you prefer that I open a new PR for this I can do it.

Maybe we can use pd.PeriodDtype(freq='1s1ms').freq.freqstrt to make the comparison

>>> pd.PeriodDtype(freq='1s1ms') == pd.PeriodDtype(freq='1001ms') True >>> pd.PeriodDtype(freq='1s1ms').freq.freqstr == '1001ms' True

I updated this PR to use return other[:1].lower() + other[1:] == self.name

For the PeriodDtype(freq='1s1ms') == '1001ms', I would prefer not to make a call about this case in this PR. My gut would prefer to be stricter in eq and require period[...], while just the frequency string can be compared exactly to freqstr as @natmokval suggested. For example, 1001ms could be "timedelta-like" for a user which shouldn't be made equal to a "period-like`"

datapythonista

Looks good to me. Nice clean up, thanks for taking care of this.

I don't think the other option I gave would equal to true when the dtype is a timedelta or anything that it's not a PeriodDtype, but happy to get this merge, and we can consider the other case separately if it makes sense.

@natmokval if you want to have a look here and add any comment before we merge...

natmokval · 2024-02-02T13:38:38Z

lgtm, thanks @mroeschke for the clean up.

Use str.captialize instead of helper

9a5ffd2

mroeschke added the Clean label Jan 26, 2024

Use title instead

00f0992

datapythonista reviewed Jan 29, 2024

View reviewed changes

Use string capwords

b775e21

mroeschke changed the title ~~CLN: Use str.captialize instead of helper~~ CLN: Use string.capwords instead of helper Jan 29, 2024

mroeschke added this to the 3.0 milestone Jan 29, 2024

mroeschke added 5 commits January 30, 2024 12:02

Merge remote-tracking branch 'upstream/main' into cln/helper

088b988

Merge remote-tracking branch 'upstream/main' into cln/helper

9208bee

Pylint

be1df81

Merge remote-tracking branch 'upstream/main' into cln/helper

61e6b14

Just use helper function

19fa575

mroeschke changed the title ~~CLN: Use string.capwords instead of helper~~ CLN: Move capitalize_first_letter to where it's used Jan 31, 2024

Merge remote-tracking branch 'upstream/main' into cln/helper

2d65f02

datapythonista reviewed Feb 1, 2024

View reviewed changes

mroeschke added 2 commits February 1, 2024 10:59

Merge remote-tracking branch 'upstream/main' into cln/helper

fd76f47

Remove helper function

8ce54ce

datapythonista approved these changes Feb 2, 2024

View reviewed changes

datapythonista merged commit 680b215 into pandas-dev:main Feb 2, 2024

mroeschke deleted the cln/helper branch February 2, 2024 17:42

pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024

CLN: Move capitalize_first_letter to where it's used (pandas-dev#57096)

484b328

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CLN: Move capitalize_first_letter to where it's used #57096

CLN: Move capitalize_first_letter to where it's used #57096

Uh oh!

mroeschke commented Jan 26, 2024

Uh oh!

datapythonista Jan 29, 2024

Uh oh!

mroeschke Jan 29, 2024

Uh oh!

datapythonista Jan 30, 2024

Uh oh!

mroeschke Jan 30, 2024

Uh oh!

natmokval Jan 30, 2024

Uh oh!

datapythonista Jan 31, 2024

Uh oh!

mroeschke Jan 31, 2024

Uh oh!

datapythonista Feb 1, 2024

Uh oh!

natmokval Feb 1, 2024 •

edited

Loading

Uh oh!

mroeschke Feb 1, 2024

Uh oh!

datapythonista left a comment

Uh oh!

natmokval commented Feb 2, 2024

Uh oh!

Uh oh!

	return other in {self.name, self.name.title()}
	return other in {self.name, self.name.capitalize()}

	return other in {self.name, capitalize_first_letter(self.name)}
	return other in {self.name, capitalize_first_letter(self.name)}
	return other[0].lower() + other[1:] == self.name

	return other in {self.name, capitalize_first_letter(self.name)}
	other = self.__class__(other)

-            return other in {self.name, capitalize_first_letter(self.name)}
+            try:
+                other = self.__class__(other)
+            except ValueError:
+                return False

Uh oh!

CLN: Move capitalize_first_letter to where it's used #57096

CLN: Move capitalize_first_letter to where it's used #57096

Uh oh!

Conversation

mroeschke commented Jan 26, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

natmokval Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

natmokval commented Feb 2, 2024

Uh oh!

Uh oh!

natmokval Feb 1, 2024 •

edited

Loading