-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[ArrayManager] Add SingleArrayManager to back a Series #40152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
def astype_array(values, dtype, copy): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This added code here (this function and the one below) is only temporary until #40141 is merged (that PR is moving the Block.astype implementation so it can be reused here)
pandas/core/internals/base.py
Outdated
@@ -98,3 +98,9 @@ def equals(self, other: object) -> bool: | |||
return False | |||
|
|||
return self._equal_values(other) | |||
|
|||
|
|||
class SingleManager(DataManager): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially added this class to have a common subclass for SingleBlockManager and SingleArrayManager.
But in the end, the only way that it is actually being used are a few places that do isinstance(mgr, SingleManager)
, while I could also replace those with isinstance(mgr, (SingleBlockManager, SingleArrayManager))
instead of having this additional class injected in the class hierarchy.
manager = get_option("mode.data_manager") | ||
if manager == "block": | ||
data = SingleBlockManager.from_array(data, index) | ||
elif manager == "array": | ||
data = SingleArrayManager.from_array(data, index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I have below another time, so could move it to a helper function to deduplicate
pandas/core/series.py
Outdated
arr = self._mgr.array | ||
if isinstance(arr, np.ndarray): | ||
arr = PandasArray(arr) | ||
return arr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also move this to SingleArrayManager and add a array_values()
method there (for SingleBlockManager this lives on the Block, not the Manager)
182f75d
to
69cf7bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to just have a SingleManager class that works for block / array. a SingleBlockManager after all is a single array.
if not take_split_path and self.obj._mgr.blocks and self.ndim > 1: | ||
if ( | ||
not take_split_path | ||
and getattr(self.obj._mgr, "blocks", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally shouldn't reach into the mgr here (followon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I am planning to do some PRs to move some of the block-custom logic from indexing.py to the internals (but it's quite complex code ..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, AFAIK. A Block and an array have a quite different API (also, the SingleArrayManager is inheriting quite a bit from ArrayManager, so it's more different from SingleBlockManager than just the code added in the SingleArrayManager class here). There certainly might be some trivial properties that could be shared (like |
pandas/_libs/reduction.pyx
Outdated
slice(len(vslider.buf))) | ||
if self.has_block: | ||
object.__setattr__(cached_typ._mgr._block, 'values', vslider.buf) | ||
object.__setattr__(cached_typ._mgr._block, 'mgr_locs', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is being done separately in #40199
@jreback are you OK with this approach (see my response at #40152 (comment)) |
@@ -98,3 +98,7 @@ def equals(self, other: object) -> bool: | |||
return False | |||
|
|||
return self._equal_values(other) | |||
|
|||
|
|||
class SingleDataManager(DataManager): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny!
thanks @jorisvandenbossche ok generally ok with adding code, but more code == more complexity, so ultimately we want to simplify as much as possible. of course we are in the middle of new development on array manager so ok for now. but we want to consolidate. |
xref #39146 (comment)
This implements a
SingleArrayManager
class for backing a Series, consistent with how we have a SingleBlockManager.cc @jreback @jbrockmendel