You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After grouping and subsequently resampling as shown above, the returned data frame seems to lose all its meta information, such as column names and what type of index it uses. This leads to a key error when trying to access columns of the data frame that were existent, but empty, before doing the grouping and resampling.
Perhaps somewhere in the code a "generic" empty data frame is returned without the attached information about column names and indices?
or similar, and empty_df["b"] == resampled_df["b"] would make sense to me.
Actual Output
Empty DataFrame
Columns: []
Index: []
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2927, in __getitem__
indexer = self.columns.get_loc(key)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/indexes/base.py", line 2659, in get_lo
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'b'
I have traced this down to this bit of code in core/groupby/generic.py (line 273):
if len(keys) == 0:
return DataFrame(index=keys)
I have managed to preserve columns and dtypes by replacing it with this:
if len(keys) == 0:
result = DataFrame(index=keys, columns=self.obj.columns)
result = result.astype(self.obj.dtypes.to_dict())
return result
Regarding the index, it could be kept with something like:
result = DataFrame(index=self.obj.index[:0], columns=self.obj.columns)
but this (even after modifications) breaks test_agg_apply_corner() in tests/groupby/aggregate/test_aggregate.py which expects groupby performed on float64 values to result in a Float64Index. In other words, it expects the index to be consistent with the values, not with the index of the starting DataFrame.
Feels like a trade-off and I am not quite sure how to proceed. Any thoughts?
Code Sample
Problem description
After grouping and subsequently resampling as shown above, the returned data frame seems to lose all its meta information, such as column names and what type of index it uses. This leads to a key error when trying to access columns of the data frame that were existent, but empty, before doing the grouping and resampling.
Perhaps somewhere in the code a "generic" empty data frame is returned without the attached information about column names and indices?
Expected Output
or similar, and
empty_df["b"] == resampled_df["b"]
would make sense to me.Actual Output
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: