Skip to content

ERR: raise on missing values in pd.pivot_table #14965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 23, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -300,3 +300,4 @@ Bug Fixes
- Bug in ``Series.unique()`` in which unsigned 64-bit integers were causing overflow (:issue:`14721`)
- Require at least 0.23 version of cython to avoid problems with character encodings (:issue:`14699`)
- Bug in converting object elements of array-like objects to unsigned 64-bit integers (:issue:`4471`)
- Bug in ``pd.pivot_table()`` where no error was raised when values argument was not in the columns (:issue:`14938`)
14 changes: 14 additions & 0 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -5725,6 +5725,20 @@ def test_group_shift_with_null_key(self):

assert_frame_equal(result, expected)

def test_pivot_table_values_key_error(self):
# This test is designed to replicate the error in issue #14938
df = pd.DataFrame({'eventDate':
pd.date_range(pd.datetime.today(),
periods=20, freq='M').tolist(),
'thename': range(0, 20)})

df['year'] = df.set_index('eventDate').index.year
df['month'] = df.set_index('eventDate').index.month

with self.assertRaises(KeyError):
df.reset_index().pivot_table(index='year', columns='month',
values='badname', aggfunc='count')

def test_agg_over_numpy_arrays(self):
# GH 3788
df = pd.DataFrame([[1, np.array([10, 20, 30])],
Expand Down
5 changes: 5 additions & 0 deletions pandas/tools/pivot.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,11 @@ def pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',
values_multi = False
values = [values]

# GH14938 Make sure value labels are in data
for i in values:
if i not in data:
raise KeyError(i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if set(values) - set(data):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaximilianR That won't let the user know the specific values that were missing. Not sure if we want to raise a KeyError with a text of all the values, as it seems with other functions, pandas raises a KeyError for the first missing key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Something like this would list them all:

missing_values = set(values) - set(data)
if missing_values:
    raise KeyError('{} missing in [better message]'.format(', '.join(missing_values)

But if you have strong view on looping then OK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, it's better to be consistent with behavior on the other arguments. For example, with groupby, if you do this:

df=pd.DataFrame({'a' : [i % 3 for i in range(10)], 'b': [i % 2 for i in range(10)], 'c': np.random.randn(10)})
df.groupby(['y','z']).sum()

The KeyError is raised on the value of 'y', not on 'z'. So I think the error raised on the values argument should be consistent. Otherwise it begs the question why we don't show all of the KeyError issues on other arguments for many other functions.


to_filter = []
for x in keys + values:
if isinstance(x, Grouper):
Expand Down