Skip to content

ERR: raise on missing values in pd.pivot_table #14965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 23, 2016

Conversation

Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented Dec 22, 2016

# GH14938 Make sure values are in data
for i in values:
if i not in data:
raise KeyError(i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if set(values) - set(data):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaximilianR That won't let the user know the specific values that were missing. Not sure if we want to raise a KeyError with a text of all the values, as it seems with other functions, pandas raises a KeyError for the first missing key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Something like this would list them all:

missing_values = set(values) - set(data)
if missing_values:
    raise KeyError('{} missing in [better message]'.format(', '.join(missing_values)

But if you have strong view on looping then OK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, it's better to be consistent with behavior on the other arguments. For example, with groupby, if you do this:

df=pd.DataFrame({'a' : [i % 3 for i in range(10)], 'b': [i % 2 for i in range(10)], 'c': np.random.randn(10)})
df.groupby(['y','z']).sum()

The KeyError is raised on the value of 'y', not on 'z'. So I think the error raised on the values argument should be consistent. Otherwise it begs the question why we don't show all of the KeyError issues on other arguments for many other functions.

@codecov-io
Copy link

codecov-io commented Dec 22, 2016

Current coverage is 84.66% (diff: 66.66%)

Merging #14965 into master will decrease coverage by <.01%

@@             master     #14965   diff @@
==========================================
  Files           144        144          
  Lines         51043      51046     +3   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43214      43216     +2   
- Misses         7829       7830     +1   
  Partials          0          0          

Powered by Codecov. Last update 45910ae...ff4df5a

@@ -300,3 +300,4 @@ Bug Fixes
- Bug in ``Series.unique()`` in which unsigned 64-bit integers were causing overflow (:issue:`14721`)
- Require at least 0.23 version of cython to avoid problems with character encodings (:issue:`14699`)
- Bug in converting object elements of array-like objects to unsigned 64-bit integers (:issue:`4471`)
- Bug in ``pivot_table()`` where no error was raised when values argument was not in df.columns (:issue:`14938`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pd.pivot_table()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say not in the columns

@@ -106,6 +106,10 @@ def pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',
else:
values_multi = False
values = [values]
# GH14938 Make sure values are in data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line before the comment.

say valuess indexer (or labels) (to clear these are not actual values, but the indexers of the labels)

@jreback jreback changed the title BUG: Fix #14938 ERR: raise on missing values in pd.pivot_table Dec 23, 2016
@jreback jreback added the Error Reporting Incorrect or improved errors from pandas label Dec 23, 2016
@jreback jreback added this to the 0.20.0 milestone Dec 23, 2016
@jreback
Copy link
Contributor

jreback commented Dec 23, 2016

very minor comments. ping on green.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Dec 23, 2016

@jreback Changes made as you requested and all green.

@jreback jreback merged commit 8f7ba1b into pandas-dev:master Dec 23, 2016
@jreback
Copy link
Contributor

jreback commented Dec 23, 2016

thanks!

@Dr-Irv Dr-Irv deleted the error14938 branch December 23, 2016 20:52
ShaharBental pushed a commit to ShaharBental/pandas that referenced this pull request Dec 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ERR: No Error when values argument in pivot_table is not in df.columns
4 participants