Skip to content

Make pivot values work with most iterables #12017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

cscanlin
Copy link
Contributor

Previously the values check for pivot tables was only looking for lists or tuples. This patch allows for any generators or other iterables (such as dict.keys() in python 3) to be passed as values. They will be converted to lists when they are identified as iterables, excluding strings. The performance hit from the extra list type assignment to iterables that are already lists or tuples should be minor.

@jreback
Copy link
Contributor

jreback commented Jan 11, 2016

what issue is this supposed to address?

need tests

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jan 11, 2016
@@ -89,8 +89,9 @@ def pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',

values_passed = values is not None
if values_passed:
if isinstance(values, (list, tuple)):
if hasattr(values, '__iter__') and not isinstance(values, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_list_like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's exactly what I was going for, thanks for the tip!

@cscanlin
Copy link
Contributor Author

The problem is that right now the only objects that can be passed in as values to a pivot table are lists and tuples. I would like to expand this so that other iterable objects like generators and (in python 3) dictionary methods like .keys() can be passed in as well. My use case is something like this:

import pandas as pd
import numpy as np

df =pd.DataFrame({'A': ['foo', 'foo', 'foo', 'foo',
                             'bar', 'bar', 'bar', 'bar',
                             'foo', 'foo', 'foo'],
                       'B': np.random.randn(11),
                       'C': np.random.randn(11)})

print(df)

aggs = {'B':np.sum, 'C':np.mean}

pivot_output = pd.pivot_table(
    df,
    index=['A'],
    values=aggs.keys(),
    aggfunc=aggs,
)

print(pivot_output)

This works in Python 2, but returns an error in Python 3 because .keys() returns a dict_keys object instead of a list of keys. This forces you to explicity change it to a list before passing it through which feels clunky: values=list(aggs.keys()),

@MaximilianR pointed out that using is_list_like is probably preferable in the source, instead of the conditional I wrote. I am also wondering if its neccessary to expicitly convert other iterable objects to a list, or if they would work appropriately as is. I need to test more.

As far as unit tests go, I'm still learning and I'm not sure how the best way to write those for this would be, but I'll continue looking into it.

I may have jumped the gun on this PR a bit, so please let me know if theres a better way I should proceed.

@max-sixty
Copy link
Contributor

In lieu of anyone more experienced answering your questions:

  • If you can use one of the standard checks, that's great. is_list_like looks good.
  • Unit tests are awesome. You can start with that use case if you want - just assert that the result is what you want. If you haven't run any nosetests before, have a look at the contributor docs. If you need help writing the tests, have a look at the existing tests, or post questions here.
  • Coercing to a list might be necessary, but probably isn't. Write the tests; if you have problems see if coercing helps.

Congrats on being halfway to your first PR.

@jreback
Copy link
Contributor

jreback commented Jan 22, 2016

this needs tests

@@ -736,6 +736,24 @@ def test_pivot_dtaccessor(self):
index=['X', 'Y'], columns=exp_col)
tm.assert_frame_equal(result, expected)

def test_pivot_table_with_iterator_values(self):
aggs = {'D': 'sum', 'E': 'mean'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment here

@jreback
Copy link
Contributor

jreback commented Jan 30, 2016

lgtm. pls add a whatsnew note in enhancements section, use this PR number as the issue number

@jreback jreback added this to the 0.18.0 milestone Jan 30, 2016
@jreback
Copy link
Contributor

jreback commented Feb 1, 2016

thanks!

@cscanlin
Copy link
Contributor Author

cscanlin commented Feb 3, 2016

Thanks @jreback and @MaximilianR for all your help and suggestions on my first PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants