-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: DataFrameGroupBy column subset selection with single list? #23566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You do have ambiguity with tuples though (not that anyone should do that) In [14]: df = pd.DataFrame(np.random.randint(10, size=(10, 4)), columns=['a', 'b', 'c', ('a', 'b')])
In [15]: df.groupby('c')['a', 'b'].sum()
Out[15]:
a b
c
0 1 7
1 6 9
2 7 7
5 9 11
6 8 6
7 10 6
8 11 8
In [16]: df.groupby('c')[('a', 'b')].sum()
Out[16]:
a b
c
0 1 7
1 6 9
2 7 7
5 9 11
6 8 6
7 10 6
8 11 8 I think both of those are incorrect. It should rather be In [19]: df.groupby('c').sum()[('a', 'b')]
Out[19]:
c
0 7
1 3
2 5
5 7
6 8
7 16
8 11
Name: (a, b), dtype: int64 |
I don't disagree here. There is a difference when selecting only one column (specifically returning a Series vs a DataFrame) but when selecting multiple columns it would be more consistent if we ALWAYS required double brackets brackets. I assume this would also yield a simpler implementation. Maybe a conversation piece for 1.0? Would be a breaking change for sure so probably best served in a major release like that |
I think the hope is for 1.0 to be backwards compatible with 0.25.x. Do we have a chance to detect this case and throw a FutureWarning (assuming we want to change)? |
Yeah, if we want, I would think it should be possible with a deprecation cycle. |
this i suspect is actually very common in the wild (not using the double brackets) but i agree we should deprecate as it is inconsistent |
Can I take a crack at this or has it already been fixed? Also, will this be a part of the 1.0 or other milestones? |
Go for it!
…Sent from my iPhone
On Dec 24, 2019, at 8:46 PM, Josh Dimarsky ***@***.***> wrote:
Can I take a crack at this or has it already been fixed?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Thanks will do. |
take |
So this is my first time working on pandas code, and I'm a little confused here, so please bear with me. I'm also new to linking to code on GitHub. As I understand, when an object calls I'm having trouble in tracing the code path to figure out where exactly the
Any help here would be greatly appreciated. Thanks. |
@WillAyd @jorisvandenbossche are you able to help point me in the right direction? ☝️ |
Just to close the loop on my earlier question, I never fully figured out how the slicing happens, but it seems it happens at some Cython layer via the |
Although I will confess to being a bit surprised (disappointed?) in the total lack of response from the pandas developers to my question above. Pandas has a reputation as being a welcoming OSS community, and the question was well-researched and clearly stated, so I thought I'd get a bit more feedback than that. Guess I'll attribute it to the holiday season. |
@yehoshuadimarsky we have 3000+ issues and constant comments - to be honest we barely have time to triage on the PRs even really important things are not necessarily discussed at length just like everyone else has limited time - the best way to prompt a discussion is to push a change |
Totally understand. Thanks for acknowledging, and more importantly, thanks for the incredibly important work you do in maintaining pandas. |
* MNT keyword only in examples * MNT pandas 1.0.0 deprectation See pandas-dev/pandas#23566 * MNT new keyword in 0.23
* MNT keyword only in examples * MNT pandas 1.0.0 deprectation See pandas-dev/pandas#23566 * MNT new keyword in 0.23
* MNT keyword only in examples * MNT pandas 1.0.0 deprectation See pandas-dev/pandas#23566 * MNT new keyword in 0.23
@yehoshuadimarsky , this was closed, right? |
yes |
* MNT keyword only in examples * MNT pandas 1.0.0 deprectation See pandas-dev/pandas#23566 * MNT new keyword in 0.23
Thanks for , I at version Pandas I using group[[colone_name,]] so it is useful and clear code better |
…uble [[]] https://stackoverflow.com/questions/60999753/pandas-future-warning-indexing-with-multiple-keys pandas-dev/pandas#23566 Verification: ./test_l3.py --lfmgr 192.168.0.104 --test_duration 20s --polling_interval 5s --upstream_port 1.1.eth2 --radio 'radio==wiphy2,stations==1,ssid==axe11000_5g,ssid_pw==lf_axe11000_5g,security==wpa2,wifi_mode==0,wifi_settings==wifi_settings,enable_flags==(ht160_enable&&wpa2_enable)' --endp_type mc_udp --rates_are_totals --side_a_min_bps=20000 --side_b_min_bps=3000000 --tos BK --log_level debug --csv_data_to_report Signed-off-by: Chuck SmileyRekiere <[email protected]>
… to double [[]] https://stackoverflow.com/questions/60999753/pandas-future-warning-indexing-with-multiple-keys pandas-dev/pandas#23566 Signed-off-by: Chuck SmileyRekiere <[email protected]>
…[] to double [[]] https://stackoverflow.com/questions/60999753/pandas-future-warning-indexing-with-multiple-keys pandas-dev/pandas#23566 Signed-off-by: Chuck SmileyRekiere <[email protected]>
I wouldn't be surprised if there is already an issue about this, but couldn't directly find one.
When doing a subselection of columns on a DataFrameGroupBy object, both a plain list (so a tuple within the
__getitem__
[] brackets) as the double square brackets (a list inside the__getitem__
[] brackets) seems to work:Personally I find this
df.groupby('a')['b', 'c'].sum()
a bit strange, and inconsistent with how DataFrame indexing works.Of course, on a DataFrameGroupBy you don't have the possible confusion with indexing multiple dimensions (rows, columns), but still.
cc @jreback @WillAyd
The text was updated successfully, but these errors were encountered: