Skip to content

[ArrayManager] Add SingleArrayManager to back a Series #40152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Mar 5, 2021
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
69cf7bb
[ArrayManager] Add SingleArrayManager to back a Series
jorisvandenbossche Mar 1, 2021
758df4a
fix fast_xs
jorisvandenbossche Mar 2, 2021
2c62473
fix astype tests
jorisvandenbossche Mar 2, 2021
50fe6e0
add SingleArrayManager.set_values - fix apply
jorisvandenbossche Mar 2, 2021
8747d25
Merge remote-tracking branch 'upstream/master' into am-series
jorisvandenbossche Mar 2, 2021
dd2511a
use slice to avoid copy
jorisvandenbossche Mar 2, 2021
0412e36
Merge remote-tracking branch 'upstream/master' into am-series
jorisvandenbossche Mar 2, 2021
805c866
Merge remote-tracking branch 'upstream/master' into am-series
jorisvandenbossche Mar 3, 2021
e7e0cad
avoid PandasDtype in construction
jorisvandenbossche Mar 3, 2021
c3c00f4
fix groupby
jorisvandenbossche Mar 3, 2021
022da5a
fix cumulative ops
jorisvandenbossche Mar 3, 2021
ffcbf37
fix putmask
jorisvandenbossche Mar 3, 2021
4d8a029
rename SingleManager -> SingleDataManager, add typing union alias
jorisvandenbossche Mar 3, 2021
606f467
move logic from Series.array into manager
jorisvandenbossche Mar 3, 2021
26186a3
fixup return
jorisvandenbossche Mar 3, 2021
0f4bf41
correct check for PandasArray
jorisvandenbossche Mar 3, 2021
2b2fa84
fix typing
jorisvandenbossche Mar 3, 2021
22295e0
Merge remote-tracking branch 'upstream/master' into am-series
jorisvandenbossche Mar 3, 2021
3e13fed
Merge remote-tracking branch 'upstream/master' into am-series
jorisvandenbossche Mar 4, 2021
44bd1f4
Merge remote-tracking branch 'upstream/master' into am-series
jorisvandenbossche Mar 4, 2021
26648c7
simplify py fallback function
jorisvandenbossche Mar 4, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,9 @@ jobs:
pytest pandas/tests/resample/ --array-manager
pytest pandas/tests/reshape/merge --array-manager

pytest pandas/tests/series/methods --array-manager
pytest pandas/tests/series/test_* --array-manager

# indexing subset (temporary since other tests don't pass yet)
pytest pandas/tests/frame/indexing/test_indexing.py::TestDataFrameIndexing::test_setitem_boolean --array-manager
pytest pandas/tests/frame/indexing/test_where.py --array-manager
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,7 @@
from pandas.core.internals import (
ArrayManager,
BlockManager,
SingleArrayManager,
)
from pandas.core.missing import find_valid_index
from pandas.core.ops import align_method_FRAME
Expand Down Expand Up @@ -5562,7 +5563,7 @@ def _protect_consolidate(self, f):
Consolidate _mgr -- if the blocks have changed, then clear the
cache
"""
if isinstance(self._mgr, ArrayManager):
if isinstance(self._mgr, (ArrayManager, SingleArrayManager)):
return f()
blocks_before = len(self._mgr.blocks)
result = f()
Expand Down
6 changes: 5 additions & 1 deletion pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1575,7 +1575,11 @@ def _setitem_with_indexer(self, indexer, value, name="iloc"):

# if there is only one block/type, still have to take split path
# unless the block is one-dimensional or it can hold the value
if not take_split_path and self.obj._mgr.blocks and self.ndim > 1:
if (
not take_split_path
and getattr(self.obj._mgr, "blocks", False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally shouldn't reach into the mgr here (followon)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I am planning to do some PRs to move some of the block-custom logic from indexing.py to the internals (but it's quite complex code ..)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I am planning to do some PRs to move some of the block-custom logic from indexing.py to the internals (but it's quite complex code ..)

ive been looking at this for a while; it (at least my preferred approach) is blocked by #39510 and #39163

and self.ndim > 1
):
# in case of dict, keys are indices
val = list(value.values()) if isinstance(value, dict) else value
blk = self.obj._mgr.blocks[0]
Expand Down
12 changes: 10 additions & 2 deletions pandas/core/internals/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
from pandas.core.internals.array_manager import ArrayManager
from pandas.core.internals.base import DataManager
from pandas.core.internals.array_manager import (
ArrayManager,
SingleArrayManager,
)
from pandas.core.internals.base import (
DataManager,
SingleManager,
)
from pandas.core.internals.blocks import ( # io.pytables, io.packers
Block,
CategoricalBlock,
Expand Down Expand Up @@ -35,6 +41,8 @@
"ArrayManager",
"BlockManager",
"SingleBlockManager",
"SingleManager",
"SingleArrayManager",
"concatenate_managers",
# those two are preserved here for downstream compatibility (GH-33892)
"create_block_manager_from_arrays",
Expand Down
216 changes: 180 additions & 36 deletions pandas/core/internals/array_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
from __future__ import annotations

from typing import (
TYPE_CHECKING,
Any,
Callable,
List,
Expand Down Expand Up @@ -34,6 +33,7 @@
)
from pandas.core.dtypes.common import (
is_bool_dtype,
is_datetime64_ns_dtype,
is_dtype_equal,
is_extension_array_dtype,
is_numeric_dtype,
Expand All @@ -54,7 +54,13 @@
)

import pandas.core.algorithms as algos
from pandas.core.arrays import ExtensionArray
from pandas.core.arrays import (
DatetimeArray,
ExtensionArray,
IntervalArray,
PeriodArray,
TimedeltaArray,
)
from pandas.core.arrays.sparse import SparseDtype
from pandas.core.construction import (
ensure_wrapped_if_datetimelike,
Expand All @@ -66,13 +72,12 @@
Index,
ensure_index,
)
from pandas.core.internals.base import DataManager
from pandas.core.internals.base import (
DataManager,
SingleManager,
)
from pandas.core.internals.blocks import make_block

if TYPE_CHECKING:
from pandas.core.internals.managers import SingleBlockManager


T = TypeVar("T", bound="ArrayManager")


Expand Down Expand Up @@ -114,6 +119,7 @@ def __init__(

if verify_integrity:
self._axes = [ensure_index(ax) for ax in axes]
self.arrays = [ensure_wrapped_if_datetimelike(arr) for arr in arrays]
self._verify_integrity()

def make_empty(self: T, axes=None) -> T:
Expand All @@ -126,7 +132,7 @@ def make_empty(self: T, axes=None) -> T:

@property
def items(self) -> Index:
return self._axes[1]
return self._axes[-1]

@property
def axes(self) -> List[Index]: # type: ignore[override]
Expand Down Expand Up @@ -401,7 +407,10 @@ def apply_with_block(self: T, f, align_keys=None, **kwargs) -> T:
# The caller is responsible for ensuring that
# obj.axes[-1].equals(self.items)
if obj.ndim == 1:
kwargs[k] = obj.iloc[[i]]
if self.ndim == 2:
kwargs[k] = obj.iloc[slice(i, i + 1)]._values
else:
kwargs[k] = obj.iloc[:]._values
else:
kwargs[k] = obj.iloc[:, [i]]._values
else:
Expand All @@ -414,15 +423,21 @@ def apply_with_block(self: T, f, align_keys=None, **kwargs) -> T:
elif arr.dtype.kind == "m" and not isinstance(arr, np.ndarray):
# TimedeltaArray needs to be converted to ndarray for TimedeltaBlock
arr = arr._data # type: ignore[union-attr]
if isinstance(arr, np.ndarray):
arr = np.atleast_2d(arr)
block = make_block(arr, placement=slice(0, 1, 1), ndim=2)

if self.ndim == 2:
if isinstance(arr, np.ndarray):
arr = np.atleast_2d(arr)
block = make_block(arr, placement=slice(0, 1, 1), ndim=2)
else:
block = make_block(arr, placement=slice(0, len(self), 1), ndim=1)

applied = getattr(block, f)(**kwargs)
if isinstance(applied, list):
applied = applied[0]
arr = applied.values
if isinstance(arr, np.ndarray):
arr = arr[0, :]
if self.ndim == 2:
if isinstance(arr, np.ndarray):
arr = arr[0, :]
result_arrays.append(arr)

return type(self)(result_arrays, self._axes)
Expand Down Expand Up @@ -716,32 +731,24 @@ def fast_xs(self, loc: int) -> ArrayLike:
"""
dtype = _interleaved_dtype(self.arrays)

if isinstance(dtype, SparseDtype):
temp_dtype = dtype.subtype
elif isinstance(dtype, PandasDtype):
temp_dtype = dtype.numpy_dtype
elif is_extension_array_dtype(dtype):
temp_dtype = "object"
elif is_dtype_equal(dtype, str):
temp_dtype = "object"
else:
temp_dtype = dtype

result = np.array([arr[loc] for arr in self.arrays], dtype=temp_dtype)
values = [arr[loc] for arr in self.arrays]
if isinstance(dtype, ExtensionDtype):
result = dtype.construct_array_type()._from_sequence(result, dtype=dtype)
result = dtype.construct_array_type()._from_sequence(values, dtype=dtype)
# for datetime64/timedelta64, the np.ndarray constructor cannot handle pd.NaT
elif is_datetime64_ns_dtype(dtype):
result = DatetimeArray._from_sequence(values, dtype=dtype)._data
elif is_timedelta64_ns_dtype(dtype):
result = TimedeltaArray._from_sequence(values, dtype=dtype)._data
else:
result = np.array(values, dtype=dtype)
return result

def iget(self, i: int) -> SingleBlockManager:
def iget(self, i: int) -> SingleArrayManager:
"""
Return the data as a SingleBlockManager.
Return the data as a SingleArrayManager.
"""
from pandas.core.internals.managers import SingleBlockManager

values = self.arrays[i]
block = make_block(values, placement=slice(0, len(values)), ndim=1)

return SingleBlockManager(block, self._axes[0])
return SingleArrayManager([values], [self._axes[0]])

def iget_values(self, i: int) -> ArrayLike:
"""
Expand Down Expand Up @@ -901,8 +908,8 @@ def _reindex_indexer(
if not allow_dups:
self._axes[axis]._validate_can_reindex(indexer)

# if axis >= self.ndim:
# raise IndexError("Requested axis not found in manager")
if axis >= self.ndim:
raise IndexError("Requested axis not found in manager")

if axis == 1:
new_arrays = []
Expand Down Expand Up @@ -1031,3 +1038,140 @@ def _interleaved_dtype(blocks) -> Optional[DtypeObj]:
return None

return find_common_type([b.dtype for b in blocks])


class SingleArrayManager(ArrayManager, SingleManager):

__slots__ = [
"_axes", # private attribute, because 'axes' has different order, see below
"arrays",
]

arrays: List[Union[np.ndarray, ExtensionArray]]
_axes: List[Index]

ndim = 1

def __init__(
self,
arrays: Union[np.ndarray, ExtensionArray],
axes: List[Index],
verify_integrity: bool = True,
):
self._axes = axes
self.arrays = arrays

if verify_integrity:
assert len(axes) == 1
assert len(arrays) == 1
self._axes = [ensure_index(ax) for ax in self._axes]
self.arrays = [ensure_wrapped_if_datetimelike(arr) for arr in arrays]
self._verify_integrity()

def _verify_integrity(self) -> None:
(n_rows,) = self.shape
assert len(self.arrays) == 1
assert len(self.arrays[0]) == n_rows

@staticmethod
def _normalize_axis(axis):
return axis

def make_empty(self: T, axes=None) -> T:
"""Return an empty ArrayManager with index/array of length 0"""
if axes is None:
axes = [Index([], dtype=object)]
array = np.array([], dtype=self.dtype)
return type(self)([array], axes)

@classmethod
def from_array(cls, array, index):
return cls([array], [index])

@property
def axes(self):
return self._axes

@property
def index(self) -> Index:
return self._axes[0]

@property
def array(self):
return self.arrays[0]

@property
def dtype(self):
return self.array.dtype

def external_values(self):
"""The array that Series.values returns"""
if isinstance(self.array, (PeriodArray, IntervalArray)):
return self.array.astype(object)
elif isinstance(self.array, (DatetimeArray, TimedeltaArray)):
return self.array._data
else:
return self.array

def internal_values(self):
"""The array that Series._values returns"""
return self.array

@property
def _can_hold_na(self) -> bool:
if isinstance(self.array, np.ndarray):
return self.array.dtype.kind not in ["b", "i", "u"]
else:
# ExtensionArray
return self.array._can_hold_na

@property
def is_single_block(self) -> bool:
return True

def _consolidate_check(self):
pass

def get_slice(self, slobj: slice, axis: int = 0) -> SingleArrayManager:
if axis >= self.ndim:
raise IndexError("Requested axis not found in manager")

new_array = self.array[slobj]
new_index = self.index[slobj]
return type(self)([new_array], [new_index])

def apply(self, func, **kwargs):
if callable(func):
new_array = func(self.array, **kwargs)
else:
new_array = getattr(self.array, func)(**kwargs)
return type(self)([new_array], self._axes)

def setitem(self, indexer, value):
return self.apply_with_block("setitem", indexer=indexer, value=value)

def idelete(self, indexer):
"""
Delete selected locations in-place (new array, same ArrayManager)
"""
to_keep = np.ones(self.shape[0], dtype=np.bool_)
to_keep[indexer] = False

self.arrays = [self.arrays[0][to_keep]]
self._axes = [self._axes[0][to_keep]]

def _get_data_subset(self, predicate: Callable) -> ArrayManager:
# used in get_numeric_data / get_bool_data
if predicate(self.array):
return type(self)(self.arrays, self._axes, verify_integrity=False)
else:
return self.make_empty()

def set_values(self, values: ArrayLike):
"""
Set (replace) the values of the SingleArrayManager in place.

Use at your own risk! This does not check if the passed values are
valid for the current SingleArrayManager (length, dtype, etc).
"""
self.arrays[0] = values
4 changes: 4 additions & 0 deletions pandas/core/internals/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,3 +98,7 @@ def equals(self, other: object) -> bool:
return False

return self._equal_values(other)


class SingleManager(DataManager):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially added this class to have a common subclass for SingleBlockManager and SingleArrayManager.

But in the end, the only way that it is actually being used are a few places that do isinstance(mgr, SingleManager), while I could also replace those with isinstance(mgr, (SingleBlockManager, SingleArrayManager)) instead of having this additional class injected in the class hierarchy.

ndim = 1
7 changes: 5 additions & 2 deletions pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,10 @@
Index,
ensure_index,
)
from pandas.core.internals.base import DataManager
from pandas.core.internals.base import (
DataManager,
SingleManager,
)
from pandas.core.internals.blocks import (
Block,
CategoricalBlock,
Expand Down Expand Up @@ -1525,7 +1528,7 @@ def unstack(self, unstacker, fill_value) -> BlockManager:
return bm


class SingleBlockManager(BlockManager):
class SingleBlockManager(BlockManager, SingleManager):
""" manage a single block with """

ndim = 1
Expand Down
Loading