-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ERR: raise on missing values in pd.pivot_table #14965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# GH14938 Make sure values are in data | ||
for i in values: | ||
if i not in data: | ||
raise KeyError(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if set(values) - set(data):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaximilianR That won't let the user know the specific values that were missing. Not sure if we want to raise a KeyError with a text of all the values, as it seems with other functions, pandas raises a KeyError for the first missing key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Something like this would list them all:
missing_values = set(values) - set(data)
if missing_values:
raise KeyError('{} missing in [better message]'.format(', '.join(missing_values)
But if you have strong view on looping then OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, it's better to be consistent with behavior on the other arguments. For example, with groupby
, if you do this:
df=pd.DataFrame({'a' : [i % 3 for i in range(10)], 'b': [i % 2 for i in range(10)], 'c': np.random.randn(10)})
df.groupby(['y','z']).sum()
The KeyError
is raised on the value of 'y'
, not on 'z'
. So I think the error raised on the values
argument should be consistent. Otherwise it begs the question why we don't show all of the KeyError
issues on other arguments for many other functions.
Current coverage is 84.66% (diff: 66.66%)@@ master #14965 diff @@
==========================================
Files 144 144
Lines 51043 51046 +3
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43214 43216 +2
- Misses 7829 7830 +1
Partials 0 0
|
@@ -300,3 +300,4 @@ Bug Fixes | |||
- Bug in ``Series.unique()`` in which unsigned 64-bit integers were causing overflow (:issue:`14721`) | |||
- Require at least 0.23 version of cython to avoid problems with character encodings (:issue:`14699`) | |||
- Bug in converting object elements of array-like objects to unsigned 64-bit integers (:issue:`4471`) | |||
- Bug in ``pivot_table()`` where no error was raised when values argument was not in df.columns (:issue:`14938`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pd.pivot_table()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say not in the columns
@@ -106,6 +106,10 @@ def pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', | |||
else: | |||
values_multi = False | |||
values = [values] | |||
# GH14938 Make sure values are in data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line before the comment.
say valuess indexer (or labels) (to clear these are not actual values, but the indexers of the labels)
very minor comments. ping on green. |
@jreback Changes made as you requested and all green. |
thanks! |
git diff upstream/master | flake8 --diff