Skip to content

Error Indexing DafaFrame with a 0-d array #21946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
floatn opened this issue Jul 17, 2018 · 29 comments
Closed

Error Indexing DafaFrame with a 0-d array #21946

floatn opened this issue Jul 17, 2018 · 29 comments
Labels
Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@floatn
Copy link

floatn commented Jul 17, 2018

>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
>>> ar = np.array(0)
>>> df.iloc[ar]
Traceback (most recent call last):
  ...
  ...
  File "/home/float2/Documents/Python/Pandas/pandas/pandas/core/indexes/base.py", line 529, in _shallow_copy_with_infer
    if not len(values) and 'dtype' not in kwargs:
TypeError: object of type 'numpy.int64' has no len()

A 0-d array passes this check cause it is an Iterable. But later, when len() is called on, TypeError is raised.
It would be nice to implement PEP 357 to prevent this error.
The matter has been raised at PyTorch project. #9237

@gfyoung gfyoung added Indexing Related to indexing on series/frames, not to indexes themselves Numeric Operations Arithmetic, Comparison, and Logical operations labels Jul 17, 2018
@gfyoung
Copy link
Member

gfyoung commented Jul 17, 2018

cc @jreback

@chris-b1
Copy link
Contributor

FYI #21861 brought this closer - is_list_like no longer returns true for 0d arrays, but still errors. I think unpacking the scalar would be the right thing to do here, as strange as 0d arrays are.

In [3]: >>> df.iloc[ar]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-9eb33c655c7c> in <module>()
----> 1 df.iloc[ar]

~\Documents\python-dev\pandas\pandas\core\indexing.py in __getitem__(self, key)
   1508
   1509             maybe_callable = com._apply_if_callable(key, self.obj)
-> 1510             return self._getitem_axis(maybe_callable, axis=axis)
   1511
   1512     def _is_scalar_access(self, key):

~\Documents\python-dev\pandas\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   2216         # a single integer
   2217         else:
-> 2218             key = self._convert_scalar_indexer(key, axis)
   2219
   2220             if not is_integer(key):

~\Documents\python-dev\pandas\pandas\core\indexing.py in _convert_scalar_indexer(self, key, axis)
    264         ax = self.obj._get_axis(min(axis, self.ndim - 1))
    265         # a scalar
--> 266         return ax._convert_scalar_indexer(key, kind=self.name)
    267
    268     def _convert_slice_indexer(self, key, axis):

~\Documents\python-dev\pandas\pandas\core\indexes\numeric.py in _convert_scalar_indexer(self, key, kind)
    185             key = self._maybe_cast_indexer(key)
    186         return (super(Int64Index, self)
--> 187                 ._convert_scalar_indexer(key, kind=kind))
    188
    189     def _wrap_joined_index(self, joined, other):

~\Documents\python-dev\pandas\pandas\core\indexes\base.py in _convert_scalar_indexer(self, key, kind)
   1540
   1541         if kind == 'iloc':
-> 1542             return self._validate_indexer('positional', key, kind)
   1543
   1544         if len(self) and not isinstance(self, ABCMultiIndex,):

~\Documents\python-dev\pandas\pandas\core\indexes\base.py in _validate_indexer(self, form, key, kind)
   4158             pass
   4159         elif kind in ['iloc', 'getitem']:
-> 4160             self._invalid_indexer(form, key)
   4161         return key
   4162

~\Documents\python-dev\pandas\pandas\core\indexes\base.py in _invalid_indexer(self, form, key)
   1751                         "indexers [{key}] of {kind}".format(
   1752                             form=form, klass=type(self), key=key,
-> 1753                             kind=type(key)))
   1754
   1755     def get_duplicates(self):

TypeError: cannot do positional indexing on <class 'pandas.core.indexes.range.RangeIndex'> with these indexers [0] of <class 'numpy.ndarray'>

@floatn
Copy link
Author

floatn commented Jul 19, 2018

I don't think #21861 is a good fix here.

 return (isinstance(obj, Iterable) and
            # we do not count strings/unicode/bytes as list-like
            not isinstance(obj, string_and_binary_types) and
            # exclude zero-dimensional numpy arrays, effectively scalars
            not (isinstance(obj, np.ndarray) and obj.ndim == 0))

The last line looks like a bad decision.
A 0-d np.ndarray doesn't pass is_list_like anymore, but take, for example, a pytorch 0-d tensor:

>>> import torch
>>> tn = torch.tensor(0)
>>> df.iloc[tn]
Traceback (most recent call last):
...
...
  File "/home/float2/Documents/Python/Pandas/pandas/pandas/core/indexes/base.py", line 528, in _shallow_copy_with_infer
    if not len(values) and 'dtype' not in kwargs:
TypeError: object of type 'numpy.int64' has no len()

The Error hasn't changed and now, after #21861, two issues of the same nature has different output errors.

@chris-b1
Copy link
Contributor

chris-b1 commented Jul 19, 2018

cc @jbrockmendel

@floatn, welcome a PR if you have a suggestion on how we should handle this, 0d arrays are an odd case by themselves, along with the complication of duck array types.

@jbrockmendel
Copy link
Member

@floatn what behavior do you think makes sense with a zero-dim array?

Does torch.tensor subclass ndarray or otherwise present "standard" attributes that we can use to tell we are looking at a pseudo-scalar?

@floatn
Copy link
Author

floatn commented Jul 19, 2018

@jbrockmendel
I suggest using __index__() method. Try to call it on key within _validate_indexer. This may override is_integer(key) check there too, I guess. And add len() check to is_list_like_indexer to prevent 0-d iterables passing through. Something like that.

@illegalnumbers
Copy link
Contributor

Is anyone working on this? I'd love to make a contribution if it's as simple as what you discussed.

@gfyoung
Copy link
Member

gfyoung commented Jul 23, 2018

@illegalnumbers : I don't think so. Go for it!

@illegalnumbers
Copy link
Contributor

Cool I'm gonna start working on this now and hopefully have a PR today.

@illegalnumbers
Copy link
Contributor

It seems that __index__() being called might not be the right answer since it raises a TypeError in the case that it is a correct shaped array object. It might be more beneficial to do something else otherwise I have to handle that exception when it is correct which seems wrong. Any other ideas?

@gfyoung
Copy link
Member

gfyoung commented Jul 23, 2018

There are two ways to look at this:

  • Perhaps a special-case conditional to avoid the Exception?
  • Follow the Zen of Python: It is better to ask for forgiveness than permission, in which case, Exception handling is fine 🙂

Your call!

@illegalnumbers
Copy link
Contributor

illegalnumbers commented Jul 23, 2018

Calling .ndim seems more appropriate since it doesn't error but I'm not sure if there is more to ___index___ that I might be missing context around?

@illegalnumbers
Copy link
Contributor

Well in that case calling is_list_like also seems like a more apt solution, good to reuse code for checks.

@illegalnumbers
Copy link
Contributor

Hmm is_list_like breaks df.ix[[], :] indexing so I guess ndim will have to do.

illegalnumbers added a commit to illegalnumbers/pandas that referenced this issue Jul 23, 2018
closes pandas-dev#21946

Change-Id: I79cb68527820adddd3a8e3f4a71c12ee677eec60
illegalnumbers added a commit to illegalnumbers/pandas that referenced this issue Jul 23, 2018
illegalnumbers added a commit to illegalnumbers/pandas that referenced this issue Jul 23, 2018
@illegalnumbers
Copy link
Contributor

I think #22032 should do it :) (pending full specs pass)

illegalnumbers added a commit to illegalnumbers/pandas that referenced this issue Jul 23, 2018
@gfyoung gfyoung added the Error Reporting Incorrect or improved errors from pandas label Jul 23, 2018
@illegalnumbers
Copy link
Contributor

So referenced in my pull request I mention that it appears this functionality is fixed as appropriate on master. TypeError: Cannot index by location index with a non-integer key is the error we get. I think we can close this issue.

@corneels
Copy link

Is there any reason this issue should not be closed?

@tamuhey
Copy link
Contributor

tamuhey commented Jan 29, 2019

@jreback @jorisvandenbossche I have no permission to close this issue, so please close.

@jreback
Copy link
Contributor

jreback commented Jan 29, 2019

@tamuhey issues are closed when the corresponding PR is actually merged

@illegalnumbers
Copy link
Contributor

When I tried to repro in #22032 I was unable to on master?

@gfyoung
Copy link
Member

gfyoung commented Mar 5, 2019

@illegalnumbers : You can still add a test for this error message to close this out.

@illegalnumbers
Copy link
Contributor

illegalnumbers commented Mar 5, 2019 via email

@illegalnumbers
Copy link
Contributor

illegalnumbers commented Mar 22, 2019

So in creating my test for this being just about the same code in the test_indexing.py file I did last year it seems I wasn't able to get the TypeError to come up.

def test_error_for_zero_index(self):
      # GH-21946
      df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
      ar = np.array(0)
      msg = 'Cannot index by location index with a non-integer key'
      with tm.assert_raises_regex(TypeError, msg):
        df.iloc[ar]

@illegalnumbers
Copy link
Contributor

So this might need further review and debugging.

@illegalnumbers
Copy link
Contributor

illegalnumbers commented Mar 22, 2019

In fact I get this:

>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
>>> ar = np.array(0)
>>> df.iloc[ar]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/bytenel/.pyenv/versions/3.6.6/lib/python3.6/site-packages/pandas/core/indexing.py", line 1500, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/Users/bytenel/.pyenv/versions/3.6.6/lib/python3.6/site-packages/pandas/core/indexing.py", line 2226, in _getitem_axis
    raise TypeError("Cannot index by location index with a "
TypeError: Cannot index by location index with a non-integer key

But my local version is

>>> pd.__version__

'0.24.2'

@illegalnumbers
Copy link
Contributor

So the issue should not be a TypeError then? Not sure what I should make this test out to be as it seems that this functionality should be expected by what is in the repo now. If you print the dataframe you get:

------------------------------------------------------------------------------ Captured stdout call ------------------------------------------------------------------------------
a    1
b    2
Name: 0, dtype: int64

In the test harness.

@illegalnumbers
Copy link
Contributor

Not entirely sure if that's right? If we get a 0 dim array should it just be the first element?

illegalnumbers added a commit to illegalnumbers/pandas that referenced this issue Mar 24, 2019
illegalnumbers added a commit to illegalnumbers/pandas that referenced this issue Mar 24, 2019
@illegalnumbers
Copy link
Contributor

So I put a test for that functionality here: #25856 and I hope it gets through review. Always happy to contribute, wish I had more time!

illegalnumbers added a commit to illegalnumbers/pandas that referenced this issue Mar 24, 2019
@jreback
Copy link
Contributor

jreback commented Mar 24, 2019

this is a duplicate of #24919 and closed by #24924

@jreback jreback closed this as completed Mar 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants