Skip to content

REF: Consolidate validation of dictionary argument in agg/transform #40004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 25, 2021

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Feb 23, 2021

  • Ensure all linting tests pass, see here for how to run them

Currently there are no tests for dict-like argument with agg when a DataFrame is missing a column (edit: in the apply tests, there are tests in groupby, resample, and window). This PR changes the error slightly from

f"Column '{k}' does not exist!"

to

f"Column(s) {cols} do not exist"

in order to make it consistent (and I think more helpful, in the case of multiple columns) with the error message from transform.

@rhshadrach rhshadrach added Refactor Internal refactoring of code Error Reporting Incorrect or improved errors from pandas Apply Apply, Aggregate, Transform, Map labels Feb 23, 2021
@rhshadrach
Copy link
Member Author

@jreback - added test_invalid_arg to start reorganizing the apply tests to something with a better layout. If this seems like a good idea, will move other relevant tests here in a followup.

@jreback jreback added this to the 1.3 milestone Feb 24, 2021
@jreback
Copy link
Contributor

jreback commented Feb 24, 2021

great, this is a very small api change (e.g. now get SpecificationError rather than KeyError in some cases), can you add a whatsnew note. ping on green.

@rhshadrach
Copy link
Member Author

@jreback Ah - thanks for catching this. Before adding the note - is it clear that we want a SpecificationError here and not KeyError? Looking throughout pandas.core, I only see SpecificationError used in three places:

  • groupby.generic._aggregate_multiple_funcs
  • aggregation.reconstruct_func - only used by DataFrame.agg and DataFrameGroupBy.agg
  • apply for transform (and with this PR, agg too)

In each case, I believe either KeyError or ValueError is also appropriate. It seems to me maybe changing to KeyError for transform here would be better, and removing SpecificationError altogether in the future. This would also having the benefit of a smaller API change (transform is only used by Series/DataFrame here, whereas agg is used by GroupBy/Resampler/Rolling).

# GH 35964
obj = box({"A": [1]})
match = "nested renamer is not supported"
with pytest.raises(SpecificationError, match=match):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this error is good

# GH 40004
obj = DataFrame({"A": [1]})
match = re.escape("Column(s) ['B'] do not exist")
with pytest.raises(SpecificationError, match=match):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be ok here with a KeyError (which i think is the same as now)

@jreback jreback merged commit 8bdd081 into pandas-dev:master Feb 25, 2021
@jreback
Copy link
Contributor

jreback commented Feb 25, 2021

very nice!

@rhshadrach rhshadrach deleted the apply_validate_dict branch February 25, 2021 02:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Error Reporting Incorrect or improved errors from pandas Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants