Skip to content

CLN: Orderdict -> defaultdict(list) #30468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 25, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 3 additions & 8 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
These are user facing as the result of the ``df.groupby(...)`` operations,
which here returns a DataFrameGroupBy object.
"""
from collections import OrderedDict, abc, namedtuple
from collections import OrderedDict, abc, defaultdict, namedtuple
import copy
from functools import partial
from textwrap import dedent
Expand Down Expand Up @@ -1894,20 +1894,15 @@ def _normalize_keyword_aggregation(kwargs):
"""
# Normalize the aggregation functions as Mapping[column, List[func]],
# process normally, then fixup the names.
# TODO(Py35): When we drop python 3.5, change this to
# defaultdict(list)
# TODO: aggspec type: typing.OrderedDict[str, List[AggScalar]]
# May be hitting https://github.com/python/mypy/issues/5958
# saying it doesn't have an attribute __name__
aggspec = OrderedDict()
aggspec = defaultdict(list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is more complex than this. The code below can be simplified if this is a defaultdict.

Also, is OrderedDict still being used after this, or can it be removed from the imports?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a few other uses of OrderedDict -> i'm aiming to eliminate most in this issue #30469

Copy link
Member Author

@ShaharNaveh ShaharNaveh Dec 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is OrderedDict still being used after this, or can it be removed from the imports?

Yes OrderedDIct is still being used.

I am really new to developing (I just greped for the words 'droped' '35' 'support' etc..), can you please explain what is the problem with OrderedDict ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a regular dict:

groupping_values = {}
for value in [1, 3, 1, 1, 2, 3]:
    if value not in groupping_values:
        groupping_values[value] = [value]
    else:
        groupping_values[value].append(value)

The if is needed, because when we create the key in the dictionary, we need to create the list with its value. The other times we can append to it.

With an example, if we have an empty dictionary {} and we want to add the Python programming language, we can do stuff['languages'] = ['python']. If now that we already have languages with a list, if we want to add javascript we use a different instruction stuff['languages'].append('javascript'). This is what the if in the example does.

Python defaultdict exists to make things simpler:

import collections
groupping_values = collections.defaultdict(list)
for value in [1, 3, 1, 1, 2, 3]:
    groupping_values[value].append(value)

A defaultdict is "smart", since in the creation we tell its elements will be list. So, when we call the append, it'll create an empty list automatically before. We don't need the if anymore, and the code is simpler.

This is what it's expected here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the detailed explanation.

order = []
columns, pairs = list(zip(*kwargs.items()))

for name, (column, aggfunc) in zip(columns, pairs):
if column in aggspec:
aggspec[column].append(aggfunc)
else:
aggspec[column] = [aggfunc]
aggspec[column].append(aggfunc)
order.append((column, com.get_callable_name(aggfunc) or aggfunc))

# uniquify aggfunc name if duplicated in order list
Expand Down