Skip to content

BUG: NonConsolidatebleBlock with ndim 1 'take_nd' with len 1 gives wrong ndim #27785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Aug 6, 2019 · 2 comments · Fixed by #27786
Closed
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Internals Related to non-user accessible pandas implementation
Milestone

Comments

@jorisvandenbossche
Copy link
Member

When doing an indexing operation on an EA column that results in a single row, the resulting SingleBlockManager / Block has an inconsistent internal state:

In [62]: df = pd.DataFrame({'a': pd.Series([1, 2, 3], dtype='Int64'), 'b': [1, 2, 3]})                                                                        

In [63]: df.loc[[0], 'a']._data                                                                                                                               
Out[63]: 
SingleBlockManager
Items: Int64Index([0], dtype='int64')
ExtensionBlock: slice(0, 1, 1), 1 x 1, dtype: Int64

In [64]: df.loc[[0], 'a']._data.ndim                                                                                                                          
Out[64]: 1

In [65]: df.loc[[0], 'a']._data._block.ndim                                                                                                                   
Out[65]: 2

In [66]: df.loc[[0], 'b']._data                                                                                                                               
Out[66]: 
SingleBlockManager
Items: Int64Index([0], dtype='int64')
IntBlock: 1 dtype: int64

In [67]: df.loc[[0], 'b']._data._block.ndim                                                                                                                   
Out[67]: 1

So df.loc[[0], 'a'] gives a Series as result that holds a 2d block, which leads to other bugs (although not directly visible in the repr. Example: geopandas/geopandas#1078)

The reason for this is that Block.take_nd does not specify the ndim of the resulting new Block, and thus this is inferred in the NonConsolidatableBlock.__init__:

# Maybe infer ndim from placement
if ndim is None:
if len(placement) != 1:
ndim = 1
else:
ndim = 2

But, this len(placement) is not only 1 in case of a 2D block, but also if you have a 1D block with values of len 1 ...

This bug shows with any EA, so also eg Categorical. And it seems to go back to pandas 0.23, and still manifests on master.

@jbrockmendel
Copy link
Member

@jorisvandenbossche closed by #27786?

@jorisvandenbossche
Copy link
Member Author

Yes, strange that it was not closed automatically. Apparently if you do "closes #xxxx" the issue reference needs to be an actual # + number, and not a full url ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug ExtensionArray Extending pandas with custom dtypes or arrays. Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants