-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: ExtensionArray.fillna #19909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
ENH: ExtensionArray.fillna #19909
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
74614cb
ENH: ExtensionArray.fillna
TomAugspurger 67a19ba
REF: cast to object
TomAugspurger 280ac94
Revert "REF: cast to object"
TomAugspurger 6705712
Merge remote-tracking branch 'upstream/master' into fu1+fillna
TomAugspurger 69a0f9b
Simpler implementation
TomAugspurger 4d6846b
Support limit
TomAugspurger f3b81dc
Test backfill with limit
TomAugspurger 70efbb8
BUG: ensure array-like for indexer
TomAugspurger 8fc3851
Refactor tolist
TomAugspurger 39096e5
Test Series[ea].tolist
TomAugspurger 9e44f88
Merge remote-tracking branch 'upstream/master' into fu1+fillna
TomAugspurger e902f18
Fixed Categorical unwrapping
TomAugspurger 17126e6
updated
TomAugspurger 1106ef2
Back to getvalues list
TomAugspurger 744c381
Simplified
TomAugspurger 9a3fa55
As a method
TomAugspurger 2bc43bc
Merge remote-tracking branch 'upstream/master' into fu1+fillna
TomAugspurger f22fd7b
Removed tolist fully
TomAugspurger c26d261
Merge remote-tracking branch 'upstream/master' into fu1+fillna
TomAugspurger b342efe
Linting
TomAugspurger a609f48
Merge remote-tracking branch 'upstream/master' into fu1+fillna
TomAugspurger a35f93c
Just apply to objects
TomAugspurger 3f78dec
Merge remote-tracking branch 'upstream/master' into fu1+fillna
TomAugspurger 1160e15
Linting
TomAugspurger c9c7a48
Merge remote-tracking branch 'upstream/master' into fu1+fillna
TomAugspurger 05fced6
Merge remote-tracking branch 'upstream/master' into fu1+fillna
TomAugspurger File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this does
self.get_values().tolist()
instead of directlyself.tolist()
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to avoid an infinite loop.
Categorical.tolist
callstolist
which callsCategorical.__iter__
.Need to do
get_values
break that cycle / extract the python scalar types.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just feels that we are doing to much work. To iterate, we first create an array, iterate through it to create a list, and then iterate through the list ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So get_values either returns a array or a Datetime(like)Index. For the array we can just iterate over it, but for the Datetime(like)Index I think as well . The tolist implementation there is
list(self.astype(object))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that previous comment was of course wrong, as that is exactly checking the
get_values().tolist()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, therefore we need the tolist.
But then moving the logic is maybe not best, as then
tolist
would consume an iterator coming from a list ..It's all just complex with the different unboxing depending on the type :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified slightly if you want to take one last look. Basically, we didn't have to worry about the different unboxing for different types, since
Categorical.get_values()
with datetimes returns a DatetimeIndex, and when we.tolist()
on that we get the right unboxing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the whole
.get_values()
fiasco (which I created a while back :). I think should be addressed sooner rather than later.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any issues with the current implementation? Currently there's a tradeoff between how expensive
it = iter(Categorical)
is vs.next(it)
.Somewhere, we have to go from numpy scalars to Python scalars. The fastest way to do that is with
tolist()
, but that has upfront memory overhead. We could avoid that by doing it in thenext
, but that has time overhead forso that
next(it)
becomes much slower. I don't think there's much more to be done right now.