-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: preserve freq in DTI/TDI factorize #33836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6d741b1
e466d71
a553174
23911ef
516d232
0e51930
678251d
abb5913
c96c1ac
7c66389
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -602,6 +602,15 @@ def _shallow_copy(self, values=None, name: Label = lib.no_default): | |
result._cache = cache | ||
return result | ||
|
||
def factorize(self, sort=False, na_sentinel=-1): | ||
if self.freq is not None and sort is False: | ||
# we are unique, so can short-circuit, also can preserve freq | ||
codes = np.arange(len(self), dtype=np.intp) | ||
return codes, self.copy() | ||
# TODO: In the sort=True case we could check for montonic_decreasing | ||
# and operate on self[::-1] | ||
return super().factorize(sort=sort, na_sentinel=na_sentinel) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems to be duplicating the array version, can't that be reused? (directly, or by having There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you check this comment? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Its close; the main difference is that the Index version returns an Index for uniques There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But that is handled in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. im not clear on what youre suggesting? that we dont need to override this here at all? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, you might need to update (or we could decide that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If we're getting into Index-subclass-specific logic, I think that belongs on the Index subclass. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'd be OK with this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not necessarily, I would say, as it already has index-specific handling as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There used to be, but it was being used in places where freq shouldnt be retained, so was changed as it was a footgun. |
||
|
||
# -------------------------------------------------------------------- | ||
# Set Operation Methods | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't want to expose factorize on Index at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already there in the status quo. If you want to remove it, that needs a deprecation cycle, is a separate issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls show where
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm i guess its in the base class.
In any event. I don't think we actually want to change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment in existing tests seems to think this is already the behavior: https://github.com/pandas-dev/pandas/pull/33836/files#diff-8cf55ac38b6988b09a7f7f5d7280eb0fL360
Also should improve perf since this short-circuits the expensive part of factorize