Skip to content

TypeError: unhashable type: 'dict' when using apply/transform? #17309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
randomgambit opened this issue Aug 22, 2017 · 29 comments
Open

TypeError: unhashable type: 'dict' when using apply/transform? #17309

randomgambit opened this issue Aug 22, 2017 · 29 comments
Labels
Apply Apply, Aggregate, Transform, Map Enhancement Error Reporting Incorrect or improved errors from pandas Groupby

Comments

@randomgambit
Copy link

randomgambit commented Aug 22, 2017

Hello!

I am quite puzzled by some inconsistencies when using apply. Consider this simple example

idx=[pd.to_datetime('2012-02-01 14:00:00') , 
     pd.to_datetime('2012-02-01 14:01:00'),
     pd.to_datetime('2012-03-05 14:04:00'),
     pd.to_datetime('2012-03-05 14:01:00'),
     pd.to_datetime('2012-03-10 14:02:00'),
     pd.to_datetime('2012-03-11 14:07:50')
     ]

test=pd.DataFrame({'value1':[1,2,3,4,5,6],
                   'value2':[10,20,30,40,50,60],
                   'groups' : ['A','A','A','B','B','B']},
    index=idx)

test
Out[22]: 
                    groups  value1  value2
2012-02-01 14:00:00      A       1      10
2012-02-01 14:01:00      A       2      20
2012-03-05 14:04:00      A       3      30
2012-03-05 14:01:00      B       4      40
2012-03-10 14:02:00      B       5      50
2012-03-11 14:07:50      B       6      60

Now, this WORKS

test.groupby('groups').apply(lambda x: x.resample('1 T', label='left', closed='left').apply(
        {'value1' : 'mean',
         'value2' : 'mean'}))

but this FAILS

test.groupby('groups').apply(
        {'value1' : 'mean',
         'value2' : 'mean'})

Traceback (most recent call last):

  File "<ipython-input-24-741304ecf105>", line 3, in <module>
    'value2' : 'mean'})

  File "C:\Users\\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\groupby.py", line 696, in apply
    func = self._is_builtin_func(func)

  File "C:\Users\\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\base.py", line 730, in _is_builtin_func
    return self._builtin_table.get(arg, arg)

TypeError: unhashable type: 'dict'

This worked in prior versions of Pandas. What is the new syntax then? Some very useful variant of the code above I used to use was:

test.groupby('groups').apply(
        {'newname1' : {'value1' : 'mean'},
         'newname2' : {'value2' : 'mean'}})

to rename the new variables on the fly. Is this still possible now? Is this a bug?

Many thanks!

@randomgambit
Copy link
Author

@jorisvandenbossche @jreback same bug with transform

test.groupby('groups').transform(
        {'value1' : 'mean',
         'value2' : 'mean'})

only agg works

test.groupby('groups').agg(
        {'value1' : 'mean',
         'value2' : 'mean'})

is this a nasty bug?
thanks again!

@randomgambit randomgambit changed the title TypeError: unhashable type: 'dict' when using apply? TypeError: unhashable type: 'dict' when using apply/transform? Aug 22, 2017
@jreback
Copy link
Contributor

jreback commented Aug 22, 2017

agg is more general that apply

In [7]: test.groupby('groups').agg(
   ...:         {'value1' : 'mean',
   ...:          'value2' : 'mean'})
   ...: 
Out[7]: 
        value1  value2
groups                
A            2      20
B            5      50

i guess it should work

@randomgambit
Copy link
Author

@jreback yes, thanks, that's correct this is what I am saying as well: it works with agg.

However, I do not want to aggregate, I want to use a transform. The documentation https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.transform.html says we should be able to feed a dict of column-functions..

What do you think? Thanks again!

@jreback
Copy link
Contributor

jreback commented Aug 23, 2017

if you want to submit a PR to fix it, by all means. (your example didn not indicate transform)

@jreback jreback added this to the Next Major Release milestone Aug 23, 2017
@znwang25
Copy link

Has this been fixed yet? I think transform after groupby is a very useful feature to have.

@TomAugspurger
Copy link
Contributor

Still open. Please let us know if you want to start a PR to fix this.

@zeromh
Copy link

zeromh commented Jun 7, 2018

Is there any reason the documentation says that transform takes a dictionary, when it doesn't?

@zeromh
Copy link

zeromh commented Jun 7, 2018

Transform also doesn't take a list, as the documentation says it does. To use the above example:

test.groupby('groups').value1.transform(['cumsum', 'cummax'])

...returns "TypeError: unhashable type: 'list'"

@xx396
Copy link

xx396 commented Aug 4, 2018

Would like to see this fixed too as an aggregate variant of transform would be very handy

@gsmafra
Copy link

gsmafra commented Aug 8, 2018

I'm also confused by the documentation. Isn't there an easy way to transform just one column of a grouped DataFrame?

@Alxe1
Copy link

Alxe1 commented Sep 18, 2018

In pandas version 0.23.4, after group by a dataframe, it can not pass transform method a list of functions and can not rename the field name of a transformed dataframe using a nested dictionary, but it is very useful !!

@colinalexander
Copy link

@zeromh The referenced documentation where transform accepts lists and dictionaries is for the dataframe method of transform, not its groupby cousin version. The doc string for the groupby version correctly states that it accepts a function:

Signature: gb.transform(func, *args, **kwargs)
Docstring:
Call function producing a like-indexed DataFrame on each group and
return a DataFrame having the same indexes as the original object
filled with the transformed values

Parameters
----------
f : function
    Function to apply to each subframe

@sainathadapa
Copy link

Can this then be taken as a feature request, so that the same kind of apply/transform usage be used on both DataFrame and GroupBy objects?

@Alxe1
Copy link

Alxe1 commented Sep 18, 2018

Can this then be taken as a feature request, so that the same kind of apply/transform usage be used on both DataFrame and GroupBy objects?

Vote it! It is very useful

@zeromh
Copy link

zeromh commented Sep 18, 2018

@colin1alexander
Ah, my bad. Thanks for the clarification.

@brianhuey
Copy link
Contributor

@jreback @TomAugspurger
I'm interested in tackling this, my understanding is that NDFrameGroupBy.transform() and SeriesGroupBy.transform() would need to be rewritten to accept a dict with column names as keys and functions as values, similar to NDFrameGroupBy.aggregate(). It seems like using SeriesGroupBy._aggregate_multiple_funcs()` as a guideline for writing a multiple func transform method might be a good idea?

@TomAugspurger
Copy link
Contributor

Yeah, that sounds about right. @WillAyd may have better thoughts on how to start.

Keep in mind, doing this for .apply may be difficult / impossible because it doesn't place any restrictions on the output shape.

With .agg and .transform we at least know what the return shape should be, so we can know ahead of time what the output shape of a dict of functions will be.

@WillAyd
Copy link
Member

WillAyd commented Oct 13, 2018

Reading through the comments here I think there have been quite a few things talked about, but just so we are on the same page I assume we are explicitly talking about changing transform to allow a dict where the key is the column name and the value(s) are the functions to be applied.

Not objected to it though I think it makes more sense if we updated transform to accept a sequence first, as I don't think users will expect the values of a dict to be limited to just one function. @brianhuey if you wanted to try your hand at that would make sense to open as a separate PR first, get that one through and then come back to this

@randomgambit
Copy link
Author

guys, as the original OP and lifelong pandas supporter, let me reiterate that it would be very useful to have apply, transform, and agg be able to work like this:

test.groupby('groups').transform(
        {'value1' : {'value1_mean' : 'mean', 'value1_max' : 'max'},
         'value2' : {'value2_mean' : 'mean'}})

This used to work back in the days with the good old agg. It does not anymore.

This is very unfortunate because in one go I was able to use multiple functions on a single column (here mean and max on value1) as well as rename them on the fly (so that these variables have the names I have chosen and the dataframe does not have some weird multicolumn index)

Do you think that syntax could be used in apply, transform and agg? This syntax was just a great idea.

Thanks!!

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 14, 2018 via email

@randomgambit
Copy link
Author

@TomAugspurger thanks but we re talking about extending that to apply, transform and agg right?

@brianhuey
Copy link
Contributor

@WillAyd
Just so I'm clear, you're suggesting something like:
test.groupby('groups').transform({'value1': [np.mean, max], 'value2': max}) which should return something like:

                    value1     value2    
                      mean max   max
2012-02-01 14:00:00      2   3    30
2012-02-01 14:01:00      2   3    30
2012-03-05 14:04:00      2   3    30
2012-03-05 14:01:00      5   6    60
2012-03-10 14:02:00      5   6    60
2012-03-11 14:07:50      5   6    60

@WillAyd
Copy link
Member

WillAyd commented Oct 15, 2018

My point is that it would make more sense to make sure this works:

test.groupby('groups').transform([np.mean, max])

Before attempting:

test.groupby('groups').transform({'value1': [np.mean, max])

Because the mechanisms to ensure that the list of functions are acceptable will probably be "reused" when it comes time to accepting a value from a dictionary which is a list

Somewhat of a side note but the hierarchical column structure of the result is going to be entangled somewhat in the #18366 (comment). I don't believe that should be a blocker but just a consideration point for devs

@FelixAntonSchneider
Copy link

Hi everyone,
I just stumbled upon the same issue. It would be very important imo to cover this in the documentation. At least I have been very confused by it, since the only entry in the docs regarding transform clearly says that lists and dicts of functions can be passed as an argument. It was not clear to me that the same syntax does not apply to grouped objects.

@elpablete
Copy link

I just stumbled upon this and after checking the docs at padas 0.24.2 DataFrame.transform I see that it still says that dict is supported as func value. I'm guessing from this discussion that it's because the DataFrame.transform does accept it but the GroupBy.transform does not. I't very confusing, is there any quick fix for this (documentation issue).

@elpablete
Copy link

Also, is there any advance on getting the desired feature into a next release? I'm been using pandas for a while now but never actually attempted to contribute. I can try to implement this with a little guidance if someone is willing to help me out.

@TomAugspurger
Copy link
Contributor

@elpablete you linked to DataFrame.transform. That would be a different issue. This is about DataFrameGroupBy.transform.

@elpablete
Copy link

@TomAugspurger I cannot find the docs for "DataFrameGroupBy.transform". I found pandas.core.groupby.GroupBy.transform which I would think are the same, but still, those are empty and thus, one would be inclined to think they have the same interface as pandas.DataFrame.transform.

That's my point when I say it's very confusing.

@simonjayhawkins
Copy link
Member

maybe could provide a more helpful error message (with link to groupby.transform/apply docs) and maybe raise NotImplementedError in the short term

@simonjayhawkins simonjayhawkins added the Error Reporting Incorrect or improved errors from pandas label Apr 6, 2020
@mroeschke mroeschke added the Apply Apply, Aggregate, Transform, Map label Jun 12, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Enhancement Error Reporting Incorrect or improved errors from pandas Groupby
Projects
None yet
Development

No branches or pull requests