-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN refactor core indexes #37582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN refactor core indexes #37582
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm one comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment, looks good to me.
pandas/core/indexes/datetimelike.py
Outdated
|
||
# quick check | ||
if len(self) and self.is_monotonic and i8[0] != iNaT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put both of tehse outside the try/except (I think this is ok to do), see L298
can you rebase this again to make sure failures are not systemtic |
The problem is that
gets to if self.hasnans:
if skipna:
min_stamp = self[~self._isnan].asi8.min()
else:
return self._na_value
else:
min_stamp = i8.min()
try:
return self._data._box_func(min_stamp)
except ValueError:
return self._na_value and throws a ValueError on latest commit addresses that |
pandas/core/indexes/datetimelike.py
Outdated
return self._data._box_func(max_stamp) | ||
except ValueError: | ||
|
||
# quick check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could just return self._data.max(...)
. Downside is that doesn't take advantage of caching
can you merge master ping on green |
pandas/core/indexes/base.py
Outdated
@@ -5847,7 +5834,7 @@ def trim_front(strings: List[str]) -> List[str]: | |||
Trims zeros and decimal points. | |||
""" | |||
trimmed = strings | |||
while len(strings) > 0 and all(x[0] == " " for x in trimmed): | |||
while trimmed and all(x.startswith(" ") for x in trimmed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like it changes the logic by checking trimmed instead of strings?
perf difference between startswith vs x[0] == " "?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but AFAIKT this condition is only needed if trimmed
is an empty list (because all([])
will always be True
), so if we do an early return then we can remove it.
Regarding perf differences:
In [15]: %timeit 'foobarfdsfsdfsdafsdafhdlsafhgsdlafhsdlafhsda'.startswith('f')
110 ns ± 3.31 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [16]: %timeit 'foobarfdsfsdfsdafsdafhdlsafhgsdlafhsdlafhsda'[0] == 'f'
24.1 ns ± 1.05 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
so I'll go back to [0] == ' '
(cc @ivanovmg )
pandas/core/indexes/datetimes.py
Outdated
and (other.tz is None) | ||
or (self.tz is None) | ||
and (other.tz is not None) | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self.tz is None) ^ (other.tz is None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(self.tz is None ^ other.tz is not None)
? AFAIKT the check is to see if one of them is None
but the other one isn't - I've tried regrouping the parens to make it clearer anyway, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this not viable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in,
if (self.tz is None) or (other.tz is None):
raise TypeError("Cannot join tz-naive with tz-aware DatetimeIndex")
?
I don't think that would work because we don't want to raise if both self.tz
and other.tz
are None
, just if one is but the other isn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not (self.tz is None) or (other.tz is None)
, but (self.tz is None) ^ (other.tz is None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry, I didn't actually know that was a Python command - that should work then, thanks so much!
@MarcoGorelli if you rebase and fix this up we can get into 1.2 |
sure, have fixed conflicts and responded to review comments |
pandas/core/indexes/base.py
Outdated
@@ -3403,9 +3399,7 @@ def _convert_listlike_indexer(self, keyarr): | |||
keyarr : numpy.ndarray | |||
Return tuple-safe keys. | |||
""" | |||
if isinstance(keyarr, Index): | |||
pass | |||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we did it this way to make coverage obvious
pandas/core/indexes/base.py
Outdated
return trimmed | ||
if not strings: | ||
return strings | ||
while all(x[0] == " " for x in strings): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this going to break when one of the strings becomes empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're absolutely right, thanks! It fails on master
too, but still, worth fixing while we're modifying these lines
In [1]: from pandas.core.indexes.base import trim_front
In [2]: trim_front([' ', ' a'])
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-2-1d23255f182c> in <module>
----> 1 trim_front([' ', ' a'])
~/pandas-dev/pandas/core/indexes/base.py in trim_front(strings)
5848 """
5849 trimmed = strings
-> 5850 while len(strings) > 0 and all(x[0] == " " for x in trimmed):
5851 trimmed = [x[1:] for x in trimmed]
5852 return trimmed
~/pandas-dev/pandas/core/indexes/base.py in <genexpr>(.0)
5848 """
5849 trimmed = strings
-> 5850 while len(strings) > 0 and all(x[0] == " " for x in trimmed):
5851 trimmed = [x[1:] for x in trimmed]
5852 return trimmed
IndexError: string index out of range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, I don't think it's an issue, because trim_front
is only ever called with a list of strings which are all of the same length.
It's only ever called from pd.Index._format_with_header
:
result = trim_front(format_array(values, None, justify="left"))
and format_array
from pandas/io/formats/format.py
returns
fmt_obj.get_result()
which in turn returns
_make_fixed_width(fmt_values, self.justify)
Nonetheless I can make the condition
while all(strings) and all(x[0] == " " for x in strings):
and add a tiny test for which that'd be necessary
can you rebase |
thanks @MarcoGorelli |
Some refactorings found by Sourcery https://sourcery.ai/
I've removed the ones of the kind