Skip to content

BUG: inconsistent behavior on multi-dimensional slicing based on type #45303

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
tacaswell opened this issue Jan 10, 2022 · 4 comments
Open
2 of 3 tasks
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@tacaswell
Copy link
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

x, y = np.random.randn(2, 100)
x = pd.Series(x, dtype="Float32")
z = pd.Series(range(5))

x[:, None] # fails with ValueError
z[:, None] # warns but promised to be IndexError

Issue Description

Per discussion around #35527 / https://github.com/matplotlib/matplotlib/issues/18158and related links there have been a bunch of issues about multi-dimensional indexing in to Series and objects that have inconsistent .ndims and .shape.

In a departure from numpy it was my understanding that the implicit broadcasting to higher dimensions was going to be dropped by pandas at some point in the future (although it seems to still warn for builtin types). However, for the new missing-data types trying to do this 2D slicing raises a ValueError which was reported as a bug to Matplotlib matplotlib/matplotlib#22125

In [15]: x[:, None]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/.virtualenvs/sys310/lib/python3.10/site-packages/pandas/core/series.py, in Series._get_values(self, indexer)
   1025 try:
-> 1026     new_mgr = self._mgr.getitem_mgr(indexer)
   1027     return self._constructor(new_mgr).__finalize__(self)

File ~/.virtualenvs/sys310/lib/python3.10/site-packages/pandas/core/internals/managers.py, in SingleBlockManager.getitem_mgr(self, indexer)
   1635 blk = self._block
-> 1636 array = blk._slice(indexer)
   1637 if array.ndim > 1:
   1638     # This will be caught by Series._get_values

File ~/.virtualenvs/sys310/lib/python3.10/site-packages/pandas/core/internals/blocks.py, in ExtensionBlock._slice(self, slicer)
   1553         raise AssertionError(
   1554             "invalid slicing for a 1-ndim ExtensionArray", slicer
   1555         )
-> 1557 return self.values[slicer]

File ~/.virtualenvs/sys310/lib/python3.10/site-packages/pandas/core/arrays/masked.py, in BaseMaskedArray.__getitem__(self, item)
    144 item = check_array_indexer(self, item)
--> 146 return type(self)(self._data[item], self._mask[item])

File ~/.virtualenvs/sys310/lib/python3.10/site-packages/pandas/core/arrays/floating.py, in FloatingArray.__init__(self, values, mask, copy)
    251     raise TypeError(
    252         "values should be floating numpy array. Use "
    253         "the 'pd.array' function instead"
    254     )
--> 255 super().__init__(values, mask, copy=copy)

File ~/.virtualenvs/sys310/lib/python3.10/site-packages/pandas/core/arrays/masked.py, in BaseMaskedArray.__init__(self, values, mask, copy)
    122 if values.ndim != 1:
--> 123     raise ValueError("values must be a 1D array")
    124 if mask.ndim != 1:

ValueError: values must be a 1D array

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Input In [15], in <module>
----> 1 x[:, None]

File ~/.virtualenvs/sys310/lib/python3.10/site-packages/pandas/core/series.py, in Series.__getitem__(self, key)
    963     key = np.asarray(key, dtype=bool)
    964     return self._get_values(key)
--> 966 return self._get_with(key)

File ~/.virtualenvs/sys310/lib/python3.10/site-packages/pandas/core/series.py, in Series._get_with(self, key)
    976     raise TypeError(
    977         "Indexing a Series with DataFrame is not "
    978         "supported, use the appropriate DataFrame column"
    979     )
    980 elif isinstance(key, tuple):
--> 981     return self._get_values_tuple(key)
    983 elif not is_list_like(key):
    984     # e.g. scalars that aren't recognized by lib.is_scalar, GH#32684
    985     return self.loc[key]

File ~/.virtualenvs/sys310/lib/python3.10/site-packages/pandas/core/series.py, in Series._get_values_tuple(self, key)
   1008 def _get_values_tuple(self, key):
   1009     # mpl hackaround
   1010     if com.any_none(*key):
-> 1011         result = self._get_values(key)
   1012         deprecate_ndim_indexing(result, stacklevel=5)
   1013         return result

File ~/.virtualenvs/sys310/lib/python3.10/site-packages/pandas/core/series.py, in Series._get_values(self, indexer)
   1027     return self._constructor(new_mgr).__finalize__(self)
   1028 except ValueError:
   1029     # mpl compat if we look up e.g. ser[:, np.newaxis];
   1030     #  see tests.series.timeseries.test_mpl_compat_hack
   1031     # the asarray is needed to avoid returning a 2D DatetimeArray
-> 1032     return np.asarray(self._values[indexer])

File ~/.virtualenvs/sys310/lib/python3.10/site-packages/pandas/core/arrays/masked.py, in BaseMaskedArray.__getitem__(self, item)
    142     return self._data[item]
    144 item = check_array_indexer(self, item)
--> 146 return type(self)(self._data[item], self._mask[item])

File ~/.virtualenvs/sys310/lib/python3.10/site-packages/pandas/core/arrays/floating.py, in FloatingArray.__init__(self, values, mask, copy)
    250 if not (isinstance(values, np.ndarray) and values.dtype.kind == "f"):
    251     raise TypeError(
    252         "values should be floating numpy array. Use "
    253         "the 'pd.array' function instead"
    254     )
--> 255 super().__init__(values, mask, copy=copy)

File ~/.virtualenvs/sys310/lib/python3.10/site-packages/pandas/core/arrays/masked.py, in BaseMaskedArray.__init__(self, values, mask, copy)
    118     raise TypeError(
    119         "mask should be boolean numpy array. Use "
    120         "the 'pd.array' function instead"
    121     )
    122 if values.ndim != 1:
--> 123     raise ValueError("values must be a 1D array")
    124 if mask.ndim != 1:
    125     raise ValueError("mask must be a 1D array")

ValueError: values must be a 1D array

Given

   1008 def _get_values_tuple(self, key):
   1009     # mpl hackaround

I think we need to revisit this and make sure we are not both grumbling about the other and putting in dueling workarounds ;)

Expected Behavior

Behavior of the container to be independent of the contained type.

Installed Versions

I have broken distutils and pd.show_versions() fails in a weird way?

@tacaswell tacaswell added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 10, 2022
@jbrockmendel
Copy link
Member

x[:, None] # fails with ValueError

This goes through FloatingArray.__getitem__, which raises bc 2D FloatingArrays are not supported (much to my consternation). (Actually on master this returns a 2D object-dtype ndarray, which i could imagine being worse)

z[:, None]

This goes through ndarray.__getitem__ which works just how you would expect.

IIRC the plan is to enforce the deprecation in 2.0 so these would both raise.

I think we need to revisit this and make sure we are not both grumbling about the other and putting in dueling workarounds ;)

Do you need series[:, None] to work, or can will this deprecation being enforced be OK on your end?

@tacaswell
Copy link
Contributor Author

We currently have a bunch of complicated warning contexts / try ... excepts to handle this, if both raised we would be perfectly happy! The issue is that we were not catching ValueError so a spurious error made it out to our user. I would want either in the FloatingArray case the series __getitem__ to catch the ValueError and re-raise an InedxingError or to have assurances that when the deprecation go through both raise ValueError.

The context we are using this in is https://github.com/matplotlib/matplotlib/blob/8d7a2b9d2a38f01ee0d6802dd4f9e98aec812322/lib/matplotlib/cbook/__init__.py#L1301-L1343 . We are using the failure of series[:, None] to identify we have been passed a pandas object and make no further use of the result.

We are also open to better suggestions about how to handle this (without importing pandas). I guess we would do some if 'pandas' in sys.modules logic, but that seems a bit messy.

#30588 is probably also relevant.

@jbrockmendel
Copy link
Member

if both raised we would be perfectly happy [...] or to have assurances that when the deprecation go through both raise ValueError

Both will raise in 2.0. I'm not sure if we have pinned down exactly what type of exception will be raised, but ValueError seems like a reasonable option.

We are also open to better suggestions about how to handle this (without importing pandas). I guess we would do some if 'pandas' in sys.modules logic, but that seems a bit messy.

With the appropriate caveats about relying on implementation details: most (non-scalar) pandas classes have a "_typ: str" attribute that are used to bootstrap isinstance checks without circular imports.

@mroeschke mroeschke added Indexing Related to indexing on series/frames, not to indexes themselves Closing Candidate May be closeable, needs more eyeballs and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 14, 2022
@Pavanmahaveer7
Copy link

To add a new dimension to the existing data frame, by using values method we could do it. series will be failed to add dimension, Here everything depends on the series type we use. Float series type has that issue not the Integer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Closing Candidate May be closeable, needs more eyeballs Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

4 participants