-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: from_dict ignored order of OrderedDict (#8425) #26875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #26875 +/- ##
==========================================
- Coverage 91.88% 90.46% -1.42%
==========================================
Files 179 179
Lines 50696 50699 +3
==========================================
- Hits 46581 45867 -714
- Misses 4115 4832 +717
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26875 +/- ##
==========================================
- Coverage 92.79% 91.99% -0.81%
==========================================
Files 180 180
Lines 50417 50776 +359
==========================================
- Hits 46784 46709 -75
- Misses 3633 4067 +434
Continue to review full report at Codecov.
|
pandas/core/indexes/api.py
Outdated
@@ -125,7 +125,7 @@ def _get_combined_index(indexes, intersect=False, sort=False): | |||
return index | |||
|
|||
|
|||
def _union_indexes(indexes, sort=True): | |||
def _union_indexes(indexes, sort=True, ordered=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong: why not set sort=False
at the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with @topper-123 , we already pass thru sort=
, so pls edit to use that.
d765439
to
18ccb11
Compare
not sure what caused this error. |
18ccb11
to
3929945
Compare
Azure pipelines raised some test errors in Python 3.5 and I checked some of them. The change in this PR made the difference in order of DataFrame index if it is constructed from a dict. I suppose this is expected behavior, because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. can you merge master and ping on green.
doc/source/whatsnew/v0.25.0.rst
Outdated
@@ -633,6 +633,7 @@ Indexing | |||
- Bug in which :meth:`DataFrame.to_csv` caused a segfault for a reindexed data frame, when the indices were single-level :class:`MultiIndex` (:issue:`26303`). | |||
- Fixed bug where assigning a :class:`arrays.PandasArray` to a :class:`pandas.core.frame.DataFrame` would raise error (:issue:`26390`) | |||
- Allow keyword arguments for callable local reference used in the :meth:`DataFrame.query` string (:issue:`26426`) | |||
- Bug in which :meth:`DataFrame.from_dict` ignored order of OrderedDict when orient='index' (:issue:`8425`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use double backticks around OrderedDict, also around orient='index'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this to Reshaping section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated whatsnew and merged upstream/master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to set up a python 3.5-based environment locally to debug. You can use, e.g.
conda env create -n pandas-azure-macos-35 --file=ci/deps/azure-macos-35.yaml
I'm not sure whether the failure is because the test was relying on that being / not being sorted, or whether concat was relying on that. It'd be good to match the behavior of master, unless it's clearly a bug.
@TomAugspurger, thank you. >>> import pandas as pd
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/masayo/Documents/work/contributions/pandas/pandas/__init__.py", line 25, in <module>
from pandas._libs import (hashtable as _hashtable,
File "/Users/masayo/Documents/work/contributions/pandas/pandas/_libs/__init__.py", line 3, in <module>
from .tslibs import (
File "/Users/masayo/Documents/work/contributions/pandas/pandas/_libs/tslibs/__init__.py", line 3, in <module>
from .conversion import normalize_date, localize_pydatetime
File "__init__.pxd", line 918, in init pandas._libs.tslibs.conversion
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject FYR, I created another env and installed Python 3.5.6 and pandas ( >>> expected = pd.DataFrame(
... {'A': {('bee', 'bah', 0): 1.0,
... ('bee', 'bah', 1): 1.0,
... ('bee', 'boo', 0): 2.0,
... ('bee', 'boo', 1): 2.0,
... ('bee', 'boo', 2): 2.0},
... 'B': {('bee', 'bah', 0): 1.0,
... ('bee', 'bah', 1): 1.0,
... ('bee', 'boo', 0): 2.0,
... ('bee', 'boo', 1): 2.0,
... ('bee', 'boo', 2): 2.0}})
>>> expected
A B
bee boo 1 2.0 2.0
2 2.0 2.0
bah 1 1.0 1.0
0 1.0 1.0
boo 0 2.0 2.0 |
Not sure, but if you're in the same git repo, you would need to rebuild the C extensions for that new environment. Locally I have my main
Can you clarify: is that on this branch? When I run that in a 3.5 env on master (from a few weeks ago) I get Python 3.5.6 |Anaconda, Inc.| (default, Aug 26 2018, 16:30:03)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.DataFrame(
... {'A': {('bee', 'bah', 0): 1.0,
... ('bee', 'bah', 1): 1.0,
... ('bee', 'boo', 0): 2.0,
... ('bee', 'boo', 1): 2.0,
... ('bee', 'boo', 2): 2.0},
... 'B': {('bee', 'bah', 0): 1.0,
... ('bee', 'bah', 1): 1.0,
... ('bee', 'boo', 0): 2.0,
... ('bee', 'boo', 1): 2.0,
... ('bee', 'boo', 2): 2.0}})
A B
bee bah 0 1.0 1.0
1 1.0 1.0
boo 0 2.0 2.0
1 2.0 2.0
2 2.0 2.0 |
Yes it is. No change have been made after commit ae19e6c. You’re right. I didn’t rebuild pandas after merging master. >>> import pandas as pd
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/masayo/Documents/work/contributions/pandas/pandas/__init__.py", line 25, in <module>
from pandas._libs import (hashtable as _hashtable,
File "/Users/masayo/Documents/work/contributions/pandas/pandas/_libs/__init__.py", line 3, in <module>
from .tslibs import (
File "/Users/masayo/Documents/work/contributions/pandas/pandas/_libs/tslibs/__init__.py", line 3, in <module>
from .conversion import normalize_date, localize_pydatetime
File "pandas/_libs/tslibs/c_timestamp.pxd", line 7, in init pandas._libs.tslibs.conversion
cdef class _Timestamp(datetime):
File "__init__.pxd", line 918, in init pandas._libs.tslibs.c_timestamp
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject |
I think `python setup.py clean` will do the trick.
…On Tue, Jun 25, 2019 at 4:33 PM mazayo ***@***.***> wrote:
returns a DataFrame whose index is not sorted.
Can you clarify: is that on this branch?
Yes it is. No change have been made after commit ae19e6c
<ae19e6c>
.
You’re right. I didn’t rebuild pandas after merging master.
But even after python setup.py build_ext --inplace -j 4, I still get
below error when importing pandas. Maybe it it because the files (pandas
/_libs/tslibs/c_timestamp.*) related to this error is not updated after
the build. Could you tell me how to full build?
>>> import pandas as pd
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/masayo/Documents/work/contributions/pandas/pandas/__init__.py", line 25, in <module>
from pandas._libs import (hashtable as _hashtable,
File "/Users/masayo/Documents/work/contributions/pandas/pandas/_libs/__init__.py", line 3, in <module>
from .tslibs import (
File "/Users/masayo/Documents/work/contributions/pandas/pandas/_libs/tslibs/__init__.py", line 3, in <module>
from .conversion import normalize_date, localize_pydatetime
File "pandas/_libs/tslibs/c_timestamp.pxd", line 7, in init pandas._libs.tslibs.conversion
cdef class _Timestamp(datetime):
File "__init__.pxd", line 918, in init pandas._libs.tslibs.c_timestamp
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26875?email_source=notifications&email_token=AAKAOIUG7S3E6AHWVYUOYTTP4KFKBA5CNFSM4HYQIEH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYRVDJI#issuecomment-505631141>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIUYAMYSLZSB5O74Q6DP4KFKBANCNFSM4HYQIEHQ>
.
|
Thank you. I could successfully full build pandas by Here is what I got in this branch and in the the environment created by >>> import pandas as pd
>>> pd.__version__
'0.25.0.dev0+784.gae19e6c85'
>>> expected = pd.DataFrame(
... {'A': {('bee', 'bah', 0): 1.0,
... ('bee', 'bah', 1): 1.0,
... ('bee', 'boo', 0): 2.0,
... ('bee', 'boo', 1): 2.0,
... ('bee', 'boo', 2): 2.0},
... 'B': {('bee', 'bah', 0): 1.0,
... ('bee', 'bah', 1): 1.0,
... ('bee', 'boo', 0): 2.0,
... ('bee', 'boo', 1): 2.0,
... ('bee', 'boo', 2): 2.0}})
>>> expected
A B
bee boo 1 2.0 2.0
2 2.0 2.0
bah 1 1.0 1.0
0 1.0 1.0
boo 0 2.0 2.0
>>>
>>>
>>> import numpy as np
>>> df1 = pd.DataFrame(np.ones((2, 2)), columns=list('AB'))
>>> df2 = pd.DataFrame(np.ones((3, 2)) * 2, columns=list('AB'))
>>> results = pd.concat((df1, df2), keys=[('bee', 'bah'), ('bee', 'boo')])
>>> results
A B
bee bah 0 1.0 1.0
1 1.0 1.0
boo 0 2.0 2.0
1 2.0 2.0
2 2.0 2.0 Sorry it took me a while to get back to you. I was about to go work when I received you message. |
OK, does that output make sense to out? I haven't looked to closely, but it seems like the old dict-of-dicts constructor sorted unordered dictionaries? |
Right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mazayo really sorry about the delay. Hopefully my suggested change will help. This looks quite close.
index = _union_indexes(indexes) | ||
elif have_dicts: | ||
index = _union_indexes(indexes, sort=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I suppose this is the key one. IIUC, we want to sort for Python=3.5. So this should be
index = _union_indexes(indexes, sort=not compat.PY36)
That will be True for python 3.5 and False for 3.6 and newer.
You'll need to import pandas.compat
at the top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked yet if you need to inspect the contents of the dicts to see if they're ordered or not. Hopefully you won't have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'll take care of this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of
index = _union_indexes(indexes, sort=not compat.PY36)
changed to
index = _union_indexes(indexes, sort=not (compat.PY36 or have_ordered))
to prevent sorting a OrderedDict in Python3.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small tests comments, otherwise lgtm. ping on green.
@@ -517,7 +517,7 @@ def test_constructor_subclass_dict(self, float_frame): | |||
dct.update(v.to_dict()) | |||
data[k] = dct | |||
frame = DataFrame(data) | |||
tm.assert_frame_equal(float_frame.sort_index(), frame) | |||
tm.assert_frame_equal(float_frame, frame.reindex(float_frame.index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expected = frame.reindex(index=float_frame.index)
@@ -1342,15 +1342,28 @@ def test_constructor_list_of_namedtuples(self): | |||
def test_constructor_orient(self, float_string_frame): | |||
data_dict = float_string_frame.T._series | |||
recons = DataFrame.from_dict(data_dict, orient="index") | |||
expected = float_string_frame.sort_index() | |||
tm.assert_frame_equal(recons, expected) | |||
expected = float_string_frame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the reindex here
thanks @mazayo |
Thanks for offering good first issues here and your support |
git diff upstream/master -u -- "*.py" | flake8 --diff
Updated some existing test codes because index order in some DataFrames constructed by from_dict have changed after this fix.