Skip to content

BUG: Fixed block placment from reindex(?) #24149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 7, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1297,6 +1297,7 @@ Datetimelike
- Bug in the :class:`Series` repr with period-dtype data missing a space before the data (:issue:`23601`)
- Bug in :func:`date_range` when decrementing a start date to a past end date by a negative frequency (:issue:`23270`)
- Bug in :meth:`Series.min` which would return ``NaN`` instead of ``NaT`` when called on a series of ``NaT`` (:issue:`23282`)
- Bug in :meth:`Series.combine_first` not properly aligning categoricals, so that missing values in ``self`` where not filled by valid values from ``other`` (:issue:`24147`)
- Bug in :func:`DataFrame.combine` with datetimelike values raising a TypeError (:issue:`23079`)
- Bug in :func:`date_range` with frequency of ``Day`` or higher where dates sufficiently far in the future could wrap around to the past instead of raising ``OutOfBoundsDatetime`` (:issue:`14187`)
- Bug in :class:`PeriodIndex` with attribute ``freq.n`` greater than 1 where adding a :class:`DateOffset` object would return incorrect results (:issue:`23215`)
Expand Down
3 changes: 2 additions & 1 deletion pandas/_libs/internals.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,8 @@ cdef class BlockPlacement:

return '%s(%r)' % (self.__class__.__name__, v)

__repr__ = __str__
def __repr__(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is a cython bug, but I wasn't getting a nice repr. without this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually you would just do
__repr__ = __str__

Copy link
Contributor Author

@TomAugspurger TomAugspurger Dec 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that's what I deleted. But on master I get this for the repr.

In [1]: import pandas as pd; import numpy as np; import pickle

In [2]: pd.Series([1,2])._data._block.mgr_locs
Out[2]: <pandas._libs.internals.BlockPlacement object at 0x111b65048>

In [10]: x.__repr__
Out[10]: <method-wrapper '__str__' of pandas._libs.internals.BlockPlacement object at 0x10ed26278>

In [11]: repr(x)
Out[11]: '<pandas._libs.internals.BlockPlacement object at 0x10ed26278>'

In [12]: x.__repr__()
Out[12]: 'BlockPlacement(slice(0, 2, 1))'

I assume it's some behavior difference between Cython and Python.

return str(self)

def __len__(self):
cdef:
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -1887,7 +1887,7 @@ def take_nd(self, indexer, axis=0, new_mgr_locs=None, fill_tuple=None):
allow_fill=True)

# if we are a 1-dim object, then always place at 0
if self.ndim == 1:
if self.ndim == 1 and new_mgr_locs is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea if this is going to break things.

Oh and I notice that the comment is out of date now.

new_mgr_locs = [0]
else:
if new_mgr_locs is None:
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/extension/base/methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,15 @@ def test_combine_add(self, data_repeated):
orig_data1._from_sequence([a + val for a in list(orig_data1)]))
self.assert_series_equal(result, expected)

@pytest.mark.xfail(reason="GH-24147", strict=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test for the original bug from the issue. Unfortunately, it doesn't run yet since Series.where loses the extension dtype. #24147 is fixing that.

def test_combine_first(self, data):
# https://github.com/pandas-dev/pandas/issues/24147
a = pd.Series(data[:3])
b = pd.Series(data[2:5], index=[2, 3, 4])
result = a.combine_first(b)
expected = pd.Series(data[:5])
self.assert_series_equal(result, expected)

@pytest.mark.parametrize('frame', [True, False])
@pytest.mark.parametrize('periods, indices', [
(-2, [2, 3, 4, -1, -1]),
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/internals/test_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -1283,3 +1283,12 @@ def test_validate_ndim():

with pytest.raises(ValueError, match=msg):
make_block(values, placement, ndim=2)


def test_block_shape():
idx = pd.Index([0, 1, 2, 3, 4])
a = pd.Series([1, 2, 3]).reindex(idx)
b = pd.Series(pd.Categorical([1, 2, 3])).reindex(idx)

assert (a._data.blocks[0].mgr_locs.indexer ==
b._data.blocks[0].mgr_locs.indexer)