Skip to content

BUG: Missing keys using aggregation dictionary that are unsortable raise TypeError instead of SpecificationError #39025

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks
simonjayhawkins opened this issue Jan 7, 2021 · 2 comments · Fixed by #39028
Labels
Bug Error Reporting Incorrect or improved errors from pandas Regression Functionality that used to work in a prior pandas version Resample resample method
Milestone

Comments

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Jan 7, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

>>> import numpy as np
>>> import pandas as pd
>>>
>>> pd.__version__
'1.3.0.dev0+358.g68db2d26dd'
>>>
>>> df = pd.DataFrame(
...     np.random.randn(1000, 3),
...     index=pd.date_range("1/1/2012", freq="S", periods=1000),
...     columns=[1, "foo", None],
... )
>>> r = df.resample("3T")
>>> r.agg({1: "mean", "foo": "sum"})
                            1        foo
2012-01-01 00:00:00 -0.070340   9.142790
2012-01-01 00:03:00 -0.048538  31.041777
2012-01-01 00:06:00 -0.058046  -0.394660
2012-01-01 00:09:00  0.014268  -8.947485
2012-01-01 00:12:00  0.040080  -6.869409
2012-01-01 00:15:00  0.020587   0.225159
>>>
>>> r.agg({2: "mean", "bar": "sum"})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\pandas\pandas\core\resample.py", line 298, in aggregate
    result, how = aggregate(self, func, *args, **kwargs)
  File "C:\Users\simon\pandas\pandas\core\aggregation.py", line 566, in aggregate
    return agg_dict_like(obj, arg, _axis), True
  File "C:\Users\simon\pandas\pandas\core\aggregation.py", line 741, in agg_dict_like
    cols = sorted(set(keys) - set(selected_obj.columns.intersection(keys)))
TypeError: '<' not supported between instances of 'str' and 'int'
>>>
>>> df = pd.DataFrame(
...     np.random.randn(1000, 3),
...     index=pd.date_range("1/1/2012", freq="S", periods=1000),
...     columns=["A", "B", "C"],
... )
>>>
>>> r = df.resample("3T")
>>> r.agg({"r1": "mean", "r2": "sum"})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\pandas\pandas\core\resample.py", line 298, in aggregate
    result, how = aggregate(self, func, *args, **kwargs)
  File "C:\Users\simon\pandas\pandas\core\aggregation.py", line 566, in aggregate
    return agg_dict_like(obj, arg, _axis), True
  File "C:\Users\simon\pandas\pandas\core\aggregation.py", line 742, in agg_dict_like
    raise SpecificationError(f"Column(s) {cols} do not exist")
pandas.core.base.SpecificationError: Column(s) ['r1', 'r2'] do not exist
>>>

code sample based on test_agg_consistency in pandas/tests/resample/test_resample_api.py

Problem description

with mypy 0.790

pandas/core/aggregation.py:741: error: Value of type variable "_LT" of "sorted" cannot be "Optional[Hashable]" [type-var]

Expected Output

pandas.core.base.SpecificationError: Column(s) [2, 'bar'] do not exist

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

@simonjayhawkins simonjayhawkins added Bug Error Reporting Incorrect or improved errors from pandas Resample resample method labels Jan 7, 2021
@phofl
Copy link
Member

phofl commented Jan 7, 2021

This seems to be a regression from 1.0.5

Traceback (most recent call last):
  File "/home/developer/.config/JetBrains/PyCharm2020.3/scratches/scratch_4.py", line 390, in <module>
    r.agg({2: "mean", "bar": "sum"})
  File "/home/developer/anaconda3/envs/omi/lib/python3.8/site-packages/pandas/core/resample.py", line 281, in aggregate
    result, how = self._aggregate(func, *args, **kwargs)
  File "/home/developer/anaconda3/envs/omi/lib/python3.8/site-packages/pandas/core/base.py", line 357, in _aggregate
    raise SpecificationError("nested renamer is not supported")
pandas.core.base.SpecificationError: nested renamer is not supported

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Jan 8, 2021
@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Jan 8, 2021
@simonjayhawkins
Copy link
Member Author

This seems to be a regression from 1.0.5

first bad commit: [8f90046] ERR: Better error message for missing columns in aggregate (#32836)

@jreback jreback added this to the 1.3 milestone Jan 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas Regression Functionality that used to work in a prior pandas version Resample resample method
Projects
None yet
3 participants