-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Maintain Dict Ordering with Concat #21512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
87bb1ac
f9d105e
b044e04
eaa7cc0
6869e2c
e9561fe
53aca81
726755f
a7aa052
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -83,4 +83,5 @@ Bug Fixes | |
**Other** | ||
|
||
- Bug in :meth:`Series.nlargest` for signed and unsigned integer dtypes when the minimum value is present (:issue:`21426`) | ||
- | ||
- Bug in :class:`_Concatenator` should not sort Ordered Dictionaries (:issue:`21510`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you come up with some less technical? This is too focused on the technical aspect of the change, but you really should be writing something that a casual user could read and understand in the whatsnew There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. I would just say that the bug was causing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please, check my last commit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Getting closer but you should remove any reference to _Concatenator as it is a private class. I would be fine with almost a copy/paste of what @gfyoung suggested above as well |
||
- | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this needs to go into 0.24.0 as even though its a bug fix, its a jaring one, should be in reshaping. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
""" | ||
concat routines | ||
""" | ||
|
||
from collections import OrderedDict | ||
import numpy as np | ||
from pandas import compat, DataFrame, Series, Index, MultiIndex | ||
from pandas.core.index import (_get_objs_combined_axis, | ||
|
@@ -250,7 +250,10 @@ def __init__(self, objs, axis=0, join='outer', join_axes=None, | |
|
||
if isinstance(objs, dict): | ||
if keys is None: | ||
keys = sorted(objs) | ||
if not isinstance(objs, OrderedDict): | ||
keys = sorted(objs) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Normal dicts are ordered too in python3.6+, that must be checked too. There is a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if we already took a stance on this as a project, but the orderedness of dicts in 3.6 is considered an implementation detail of cpython, while it will be a language feature starting from 3.7 . So in principle in 3.6 users should not rely on it, and we should not assume they do. @topper-123 's comment still holds I think, but for 3.7 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, given that Python 3.6 is at 3.6.5 now, and its's being formalised in 3.7, I think this is quite safe. They're not going to change implementation now... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is very simliar to in structure to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback you mean use: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes that's the point i am raising, I want a refactor here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, from |
||
else: | ||
keys = objs | ||
objs = [objs[k] for k in keys] | ||
else: | ||
objs = list(objs) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
from collections import OrderedDict | ||
from warnings import catch_warnings | ||
from itertools import combinations, product | ||
|
||
|
@@ -1294,6 +1295,16 @@ def test_concat_rename_index(self): | |
tm.assert_frame_equal(result, exp) | ||
assert result.index.names == exp.index.names | ||
|
||
def test_concat_with_ordered_dict(self): | ||
# GH 21510 | ||
ps = pd.concat(OrderedDict([('First', pd.Series(range(3))), | ||
('Another', pd.Series(range(4)))])) | ||
ps_list = list(ps.keys()) | ||
exp_list = [('First', 0), ('First', 1), ('First', 2), | ||
('Another', 0), ('Another', 1), | ||
('Another', 2), ('Another', 3)] | ||
assert ps_list == exp_list | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Typically with test cases we have a pattern of using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is ok if I just change the variables names (expected and result), or I should also create the Series object? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Create the Series There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm trying to figure it out how I can create the Series with different sizes without using the concat by itself. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried to create like this:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
|
||
def test_crossed_dtypes_weird_corner(self): | ||
columns = ['A', 'B', 'C', 'D'] | ||
df1 = DataFrame({'A': np.array([1, 2, 3, 4], dtype='f8'), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you picked up some extra text here. also move to 0.24