BUG: Groupby.apply raises KeyError for Float64Index depending on order of index and column #34455

fjetter · 2020-05-29T11:54:42Z

When performing a groupby.apply on a dataframe with a float index, I receive a KeyError, depending on whether or not the index has the same ordering as the column I am grouping on.

Code Sample

broken

import pandas as pd

df = pd.DataFrame({"col": [1, 2, 3,]}, index=[0.1, 0.3, 0.2,])
df.groupby("col").apply(lambda x: x)

working (mind the sorting of col vs index)

import pandas as pd

df = pd.DataFrame({"col": [1, 2, 3,]}, index=[0.1, 0.2, 0.3,])
df.groupby("col").apply(lambda x: x)

import pandas as pd

df = pd.DataFrame({"col": [1, 4, 3,]}, index=[0.1, 0.4, 0.2,])
df.groupby("col").apply(lambda x: x)

Example traceback

KeyError                                  Traceback (most recent call last)
<ipython-input-14-fac5566be46e> in <module>
----> 1 df.groupby("col").apply(lambda x: x.copy())

~/workspace/pandas/pandas/core/groupby/groupby.py in apply(self, func, *args, **kwargs)
    854         with option_context("mode.chained_assignment", None):
    855             try:
--> 856                 result = self._python_apply_general(f, self._selected_obj)
    857             except TypeError:
    858                 # gh-20949

~/workspace/pandas/pandas/core/groupby/groupby.py in _python_apply_general(self, f, data)
    887             data after applying f
    888         """
--> 889         keys, values, mutated = self.grouper.apply(f, data, self.axis)
    890
    891         return self._wrap_applied_output(

~/workspace/pandas/pandas/core/groupby/ops.py in apply(self, f, data, axis)
    176         ):
    177             try:
--> 178                 result_values, mutated = splitter.fast_apply(f, sdata, group_keys)
    179
    180             except libreduction.InvalidApply as err:

~/workspace/pandas/pandas/core/groupby/ops.py in fast_apply(self, f, sdata, names)
    962         # must return keys::list, values::list, mutated::bool
    963         starts, ends = lib.generate_slices(self.slabels, self.ngroups)
--> 964         return libreduction.apply_frame_axis0(sdata, f, names, starts, ends)
    965
    966     def _chop(self, sdata: DataFrame, slice_obj: slice) -> DataFrame:

~/workspace/pandas/pandas/_libs/reduction.pyx in pandas._libs.reduction.apply_frame_axis0()
    482     results = []
    483
--> 484     slider = BlockSlider(frame)
    485
    486     mutated = False

~/workspace/pandas/pandas/_libs/reduction.pyx in pandas._libs.reduction.BlockSlider.__init__()
    549
    550         self.frame = frame
--> 551         self.dummy = frame[:0]
    552         self.index = self.dummy.index
    553

~/workspace/pandas/pandas/core/frame.py in __getitem__(self, key)
   2810
   2811         # Do we have a slicer (on rows)?
-> 2812         indexer = convert_to_index_sliceable(self, key)
   2813         if indexer is not None:
   2814             # either we have a slice or we have a string that can be converted

~/workspace/pandas/pandas/core/indexing.py in convert_to_index_sliceable(obj, key)
   2113     idx = obj.index
   2114     if isinstance(key, slice):
-> 2115         return idx._convert_slice_indexer(key, kind="getitem")
   2116
   2117     elif isinstance(key, str):

~/workspace/pandas/pandas/core/indexes/numeric.py in _convert_slice_indexer(self, key, kind)
    382         # We always treat __getitem__ slicing as label-based
    383         # translate to locations
--> 384         return self.slice_indexer(key.start, key.stop, key.step, kind=kind)
    385
    386     # ----------------------------------------------------------------

~/workspace/pandas/pandas/core/indexes/base.py in slice_indexer(self, start, end, step, kind)
   4940         slice(1, 3, None)
   4941         """
-> 4942         start_slice, end_slice = self.slice_locs(start, end, step=step, kind=kind)
   4943
   4944         # return a slice

~/workspace/pandas/pandas/core/indexes/base.py in slice_locs(self, start, end, step, kind)
   5147         end_slice = None
   5148         if end is not None:
-> 5149             end_slice = self.get_slice_bound(end, "right", kind)
   5150         if end_slice is None:
   5151             end_slice = len(self)

~/workspace/pandas/pandas/core/indexes/base.py in get_slice_bound(self, label, side, kind)
   5063             except ValueError:
   5064                 # raise the original KeyError
-> 5065                 raise err
   5066
   5067         if isinstance(slc, np.ndarray):

~/workspace/pandas/pandas/core/indexes/base.py in get_slice_bound(self, label, side, kind)
   5057         # we need to look up the label
   5058         try:
-> 5059             slc = self.get_loc(label)
   5060         except KeyError as err:
   5061             try:

~/workspace/pandas/pandas/core/indexes/numeric.py in get_loc(self, key, method, tolerance)
    444             return nan_idxs
    445
--> 446         return super().get_loc(key, method=method, tolerance=tolerance)
    447
    448     @cache_readonly

~/workspace/pandas/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2876                 return self._engine.get_loc(casted_key)
   2877             except KeyError as err:
-> 2878                 raise KeyError(key) from err
   2879
   2880         if tolerance is not None:

KeyError: 0

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 043b609
python : 3.8.2.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
Version : Darwin Kernel Version 18.7.0: Mon Feb 10 21:08:45 PST 2020; root:xnu-4903.278.28~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.1.0.dev0+1708.g043b60920
numpy : 1.18.4
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.1.0.post20200528
Cython : 0.29.19
pytest : 5.4.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.14.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : 1.3.0
numba : None

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas (043b609)

The text was updated successfully, but these errors were encountered:

CloseChoice · 2020-05-30T09:37:38Z

take

pushkarnimkar · 2020-06-10T06:51:21Z

I'm facing the same issue. This seems to be introduced in version 1.0.0 I've checked that this doesn't break with pandas 0.25.3.

piotrjurkiewicz · 2020-10-24T17:06:08Z

I confirm the bug and that this is a regression from 0.25. My code worked in 0.25 and after update to 1.1.3 the bug appeared. I did not change my code nor data.

fjetter added Bug Needs Triage Issue that has not been reviewed by a pandas team member Apply Apply, Aggregate, Transform, Map Groupby labels May 29, 2020

github-actions bot assigned CloseChoice May 30, 2020

rhshadrach mentioned this issue Jun 14, 2020

BUG: pandas.groupby.apply does not work with floating point indexed dataframe since 1.0.0 #34675

Closed

3 tasks

dchigarev mentioned this issue Jul 17, 2020

[FEATURE] Pivot implementation modin-project/modin#1645

Merged

5 tasks

TomAugspurger removed the Needs Triage Issue that has not been reviewed by a pandas team member label Sep 4, 2020

mroeschke mentioned this issue Dec 28, 2021

TST: Add regression tests for old issues #45095

Merged

7 tasks

jreback added this to the 1.4 milestone Dec 28, 2021

jreback closed this as completed in #45095 Dec 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Groupby.apply raises KeyError for Float64Index depending on order of index and column #34455

BUG: Groupby.apply raises KeyError for Float64Index depending on order of index and column #34455

fjetter commented May 29, 2020

INSTALLED VERSIONS

CloseChoice commented May 30, 2020

pushkarnimkar commented Jun 10, 2020

piotrjurkiewicz commented Oct 24, 2020

BUG: Groupby.apply raises KeyError for Float64Index depending on order of index and column #34455

BUG: Groupby.apply raises KeyError for Float64Index depending on order of index and column #34455

Comments

fjetter commented May 29, 2020

Code Sample

Example traceback

Output of pd.show_versions()

INSTALLED VERSIONS

CloseChoice commented May 30, 2020

pushkarnimkar commented Jun 10, 2020

piotrjurkiewicz commented Oct 24, 2020

Output of `pd.show_versions()`