Skip to content

Fix concat not respecting order of OrderedDict #25224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Mar 13, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
c54e3fe
Fix concat not respecting order of OrderedDict
alexander-ponomaroff Feb 8, 2019
073ade8
Merge conflict fix
alexander-ponomaroff Feb 8, 2019
d6cec5c
Fix pep8 issues
alexander-ponomaroff Feb 8, 2019
d4e70ef
Fix import order
alexander-ponomaroff Feb 8, 2019
db82b07
Checks reversed to PY36
alexander-ponomaroff Feb 12, 2019
f1e90ea
Merge conflict
alexander-ponomaroff Feb 12, 2019
0772212
Forgot to remove PY37 from imports
alexander-ponomaroff Feb 12, 2019
06d3472
Change order of expected result in test_groupby_agg_ohlc_non_first() …
alexander-ponomaroff Feb 12, 2019
2aee611
Fixed accidental empty line removal
alexander-ponomaroff Feb 12, 2019
feb7a36
Fix test_groupby_agg_ohlc_non_first()
alexander-ponomaroff Feb 12, 2019
a11cfc8
Testing if this will pass tests in <3.6
alexander-ponomaroff Feb 14, 2019
bbe7019
Merge
alexander-ponomaroff Feb 14, 2019
4f6fc06
Revert back to compat check for now
alexander-ponomaroff Feb 14, 2019
48cce24
Merge master
alexander-ponomaroff Feb 14, 2019
451c43b
Requested changes
alexander-ponomaroff Feb 17, 2019
5e47944
Merge master
alexander-ponomaroff Feb 17, 2019
262208b
Merge
alexander-ponomaroff Feb 19, 2019
2fb7214
Merge
alexander-ponomaroff Feb 19, 2019
154c639
Remaining small requested fixes
alexander-ponomaroff Mar 4, 2019
8faccf8
Merge
alexander-ponomaroff Mar 4, 2019
b74dc54
Fix unwanted import error
alexander-ponomaroff Mar 4, 2019
3a20a1d
Use OrderedDict in groupby/generic.py
alexander-ponomaroff Mar 12, 2019
1557c21
Merge
alexander-ponomaroff Mar 12, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ Reshaping
- Bug in :func:`pandas.merge` adds a string of ``None`` if ``None`` is assigned in suffixes instead of remain the column name as-is (:issue:`24782`).
- Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (:issue:`24212`)
- :func:`to_records` now accepts dtypes to its `column_dtypes` parameter (:issue:`24895`)
- Bug in :func:`concat` where order of ``OrderedDict`` is not respected (:issue:`21510`)
- Bug in :func:`concat` where order of ``OrderedDict`` (and ``dict`` in Python 3.6+) is not respected, when passed in as ``objs`` argument (:issue:`21510`)


Sparse
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -822,7 +822,7 @@ def _aggregate_multiple_funcs(self, arg, _level):
columns.append(com.get_callable_name(f))
arg = lzip(columns, arg)

results = {}
results = collections.OrderedDict()
for name, func in arg:
obj = self
if name in results:
Expand Down
9 changes: 3 additions & 6 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1690,14 +1690,11 @@ def test_groupby_agg_ohlc_non_first():
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]
], columns=pd.MultiIndex.from_tuples((
('foo', 'ohlc', 'open'), ('foo', 'ohlc', 'high'),
('foo', 'ohlc', 'low'), ('foo', 'ohlc', 'close'),
('foo', 'sum', 'foo'))), index=pd.date_range(
('foo', 'sum', 'foo'), ('foo', 'ohlc', 'open'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the only change in behavior is that the output order is now (consistently) the arguments passed to .agg?

Since we did ['sum', 'ohlc'] the output order is 'sum', 'open', 'high', 'low', 'close'?

If so, let's add an entry to the release notes, under the Groupby bug fix section, saying that .agg now respects the order of arguments when 'ohlc' is one of the aggfuncs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be rather difficult to express in a whatsnew since it's really just applicable to _aggregate_multiple_funcs here.

Perhaps a follow up to consistently use OrderedDict's in the groupby module is in order?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is _aggregate_multiple_funcs called by just by functions like ohlc that have multiple outputs per input? Or is it also called by .agg([funcs])?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter. So this has implications to ordering which previously would have been non-deterministic in Py35.

So I think it might make sense as a follow up to use a OrderedDict and make better guarantees about the returned column order, but don't think that should hold this one up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #25692 as follow up

('foo', 'ohlc', 'high'), ('foo', 'ohlc', 'low'),
('foo', 'ohlc', 'close'))), index=pd.date_range(
'2018-01-01', periods=2, freq='D'))

if compat.PY36:
expected = expected[sorted(expected, key=lambda x: x[1], reverse=True)]

result = df.groupby(pd.Grouper(freq='D')).agg(['sum', 'ohlc'])

tm.assert_frame_equal(result, expected)
Expand Down