-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: fix NDFrame.as_blocks() for sparse containers #6748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+21
−6
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had meant to move the guts of this to internals anyhow; e.g. maybe
get_as_blocks
can return the dict of str(dtype) to block manager (already combined), so in generic needs to just iterate and _constructor....finalize...if it works better to refactor this later (in your internal changes checklist), ok 2
lmk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will most likely get overwritten during refactoring.
And it did occur to me that this is very much like "group blocks by ftype" operation that happens during merging/concatenation. And the implementation does look rather trivial:
Such one-liner functions are indeed a natural extension of blockmanager external API, but I'm not sure if they deserve to be part of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok....maybe just reference this issue then in your refactoring...
will merge then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decided to separate this into a separate comment.
Which brings me to another topic, minimalism in API design. I mean, I tend to share the opinion that an API is perfect not when you can't add anything else but rather when you can't take anything from it. It happens quite often that an interface gets overburdened with those small details, e.g. it just hurts my eyes to wade through
numpy.ndarray
attrs/methods when I forget the exact spelling of the thing I want to use and I know is there:And, unfortunately, pandas containers add more on top of that:
The desire to have a bit of everything at your fingertip is tempting indeed, but 225 (two hundred!!) methods and properties is a bit too many for my liking. And it is a lot to wrap your head around when you're only starting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a valid point, except that you have search in the docs, lots of docs, and you can always start the possibilities with a letter (to narrow down the search).
pandas provides a lot of functionaility (as does numpy). not sure that this is a problem per se.
the alternative is cryptic functions that do too much with too much overloading. which IMHO is a bigger/worse problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how overloading is handled in BLAS interfaces: you have a convention about prefixes/suffixes that correspond to certain variation of the core algorithm, so you as a programmer should make an educated guess about what the data is going to look like or just use the generic implementation.
As for the naming, I've gone through the reference a bit and found out a "light of hope": string functions that obviously form a cluster of functionality available for series containers are conveniently put under
.str
attribute, that's a very nice application of namespacing and that's a very viable approach to the API cluttering problem. Following the Zen, I'd suggest that Namespaces are one honking great idea -- let's do more of those!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever I monkey patch new functionality I try to create namespaces. Since we control the ipython autocomplete, we could always move methods from the root to namespaces. The original methods would still work, they just wouldn't autocomplete in the root. I do this for subclasses of pandas objects. If I created a subclass like
OHLC
, I'd rather only see the custom methods I made.