BUG: from_dict ignored order of OrderedDict (#8425) #26875

mazayo · 2019-06-16T03:15:53Z

closes BUG: DataFrame from_dict constructor ignores Ordered dict when orient='index' #8425
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Updated some existing test codes because index order in some DataFrames constructed by from_dict have changed after this fix.

codecov · 2019-06-16T03:55:33Z

Codecov Report

Merging #26875 into master will decrease coverage by 1.41%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #26875      +/-   ##
==========================================
- Coverage   91.88%   90.46%   -1.42%     
==========================================
  Files         179      179              
  Lines       50696    50699       +3     
==========================================
- Hits        46581    45867     -714     
- Misses       4115     4832     +717

Flag	Coverage Δ
#multiple	`90.46% <100%> (ø)`	⬆️
#single	`?`

Impacted Files	Coverage Δ
pandas/core/indexes/api.py	`99% <100%> (ø)`	⬆️
pandas/core/internals/construction.py	`95.96% <100%> (+0.03%)`	⬆️
pandas/core/computation/pytables.py	`62.5% <0%> (-27.75%)`	⬇️
pandas/io/pytables.py	`63.82% <0%> (-26.48%)`	⬇️
pandas/io/gbq.py	`88.88% <0%> (-11.12%)`	⬇️
pandas/core/computation/common.py	`84.21% <0%> (-5.27%)`	⬇️
pandas/core/computation/expr.py	`94.78% <0%> (-3.03%)`	⬇️
pandas/io/clipboard/clipboards.py	`31.88% <0%> (-2.9%)`	⬇️
pandas/io/formats/printing.py	`84.49% <0%> (-1.07%)`	⬇️
pandas/core/indexes/datetimes.py	`96.21% <0%> (-0.17%)`	⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 430f0fd...d765439. Read the comment docs.

codecov · 2019-06-16T03:55:33Z

Codecov Report

Merging #26875 into master will decrease coverage by 0.8%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #26875      +/-   ##
==========================================
- Coverage   92.79%   91.99%   -0.81%     
==========================================
  Files         180      180              
  Lines       50417    50776     +359     
==========================================
- Hits        46784    46709      -75     
- Misses       3633     4067     +434

Flag	Coverage Δ
#multiple	`90.63% <100%> (-0.84%)`	⬇️
#single	`41.83% <100%> (-0.6%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/internals/construction.py	`95.97% <100%> (-0.78%)`	⬇️
pandas/io/gbq.py	`88.88% <0%> (-11.12%)`	⬇️
pandas/io/excel/_openpyxl.py	`84.71% <0%> (-3.23%)`	⬇️
pandas/core/internals/blocks.py	`94.38% <0%> (-1.01%)`	⬇️
pandas/core/tools/datetimes.py	`85.05% <0%> (-0.74%)`	⬇️
pandas/core/arrays/integer.py	`96.3% <0%> (-0.59%)`	⬇️
pandas/io/formats/printing.py	`86.72% <0%> (-0.48%)`	⬇️
pandas/core/dtypes/concat.py	`96.58% <0%> (-0.46%)`	⬇️
pandas/core/config_init.py	`95.8% <0%> (-0.4%)`	⬇️
... and 154 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2efb607...b379d8a. Read the comment docs.

topper-123 · 2019-06-16T08:27:32Z

pandas/core/indexes/api.py

@@ -125,7 +125,7 @@ def _get_combined_index(indexes, intersect=False, sort=False):
    return index


-def _union_indexes(indexes, sort=True):
+def _union_indexes(indexes, sort=True, ordered=False):


This seems wrong: why not set sort=False at the caller?

pandas/core/internals/construction.py

jreback

agree with @topper-123 , we already pass thru sort=, so pls edit to use that.

mazayo · 2019-06-16T22:16:32Z

not sure what caused this error.
I’ll work on this later tonight.

mazayo · 2019-06-20T21:16:16Z

Azure pipelines raised some test errors in Python 3.5 and I checked some of them. The change in this PR made the difference in order of DataFrame index if it is constructed from a dict. I suppose this is expected behavior, because from_dict no longer sorts index and a dict is not ordered in Python 3.5.

jreback

lgtm. can you merge master and ping on green.

jreback · 2019-06-21T01:45:37Z

doc/source/whatsnew/v0.25.0.rst

@@ -633,6 +633,7 @@ Indexing
 - Bug in which :meth:`DataFrame.to_csv` caused a segfault for a reindexed data frame, when the indices were single-level :class:`MultiIndex` (:issue:`26303`).
 - Fixed bug where assigning a :class:`arrays.PandasArray` to a :class:`pandas.core.frame.DataFrame` would raise error (:issue:`26390`)
 - Allow keyword arguments for callable local reference used in the :meth:`DataFrame.query` string (:issue:`26426`)
+- Bug in which :meth:`DataFrame.from_dict` ignored order of OrderedDict when orient='index' (:issue:`8425`).


can you use double backticks around OrderedDict, also around orient='index'

can you move this to Reshaping section.

Updated whatsnew and merged upstream/master.

TomAugspurger

Test failure: https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=13216&view=logs&jobId=521b7dfd-2989-5ff8-bc8c-7481906480fa&taskId=07b8d9d4-6363-5e2d-bc2b-146a30521256&lineStart=70&lineEnd=70&colStart=1&colEnd=38

You may want to set up a python 3.5-based environment locally to debug. You can use, e.g.

conda env create -n pandas-azure-macos-35  --file=ci/deps/azure-macos-35.yaml

I'm not sure whether the failure is because the test was relying on that being / not being sorted, or whether concat was relying on that. It'd be good to match the behavior of master, unless it's clearly a bug.

mazayo · 2019-06-24T21:45:44Z

@TomAugspurger, thank you.
I followed your instruction to create a new env, but it raised below error when importing pandas.

>>> import pandas as pd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/masayo/Documents/work/contributions/pandas/pandas/__init__.py", line 25, in <module>
    from pandas._libs import (hashtable as _hashtable,
  File "/Users/masayo/Documents/work/contributions/pandas/pandas/_libs/__init__.py", line 3, in <module>
    from .tslibs import (
  File "/Users/masayo/Documents/work/contributions/pandas/pandas/_libs/tslibs/__init__.py", line 3, in <module>
    from .conversion import normalize_date, localize_pydatetime
  File "__init__.pxd", line 918, in init pandas._libs.tslibs.conversion
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

FYR, I created another env and installed Python 3.5.6 and pandas (python -m pip install -e). expected in test function test_concat_tuple_keys returns a DataFrame whose index is not sorted.

>>> expected = pd.DataFrame(
...             {'A': {('bee', 'bah', 0): 1.0,
...                    ('bee', 'bah', 1): 1.0,
...                    ('bee', 'boo', 0): 2.0,
...                    ('bee', 'boo', 1): 2.0,
...                    ('bee', 'boo', 2): 2.0},
...              'B': {('bee', 'bah', 0): 1.0,
...                    ('bee', 'bah', 1): 1.0,
...                    ('bee', 'boo', 0): 2.0,
...                    ('bee', 'boo', 1): 2.0,
...                    ('bee', 'boo', 2): 2.0}})
>>> expected
             A    B
bee boo 1  2.0  2.0
        2  2.0  2.0
    bah 1  1.0  1.0
        0  1.0  1.0
    boo 0  2.0  2.0

TomAugspurger · 2019-06-25T16:41:28Z

I followed your instruction to create a new env, but it raised below error when importing pandas.

Not sure, but if you're in the same git repo, you would need to rebuild the C extensions for that new environment. Locally I have my main pandas git repo, and a second pandas-35 repo.

returns a DataFrame whose index is not sorted.

Can you clarify: is that on this branch? When I run that in a 3.5 env on master (from a few weeks ago) I get

Python 3.5.6 |Anaconda, Inc.| (default, Aug 26 2018, 16:30:03)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.DataFrame(
...  {'A': {('bee', 'bah', 0): 1.0,
...         ('bee', 'bah', 1): 1.0,
...         ('bee', 'boo', 0): 2.0,
...         ('bee', 'boo', 1): 2.0,
...         ('bee', 'boo', 2): 2.0},
...   'B': {('bee', 'bah', 0): 1.0,
...         ('bee', 'bah', 1): 1.0,
...         ('bee', 'boo', 0): 2.0,
...         ('bee', 'boo', 1): 2.0,
...         ('bee', 'boo', 2): 2.0}})
             A    B
bee bah 0  1.0  1.0
        1  1.0  1.0
    boo 0  2.0  2.0
        1  2.0  2.0
        2  2.0  2.0

mazayo · 2019-06-25T21:33:11Z

returns a DataFrame whose index is not sorted.

Can you clarify: is that on this branch?

Yes it is. No change have been made after commit ae19e6c.

You’re right. I didn’t rebuild pandas after merging master.
But even after python setup.py build_ext --inplace -j 4, I still get below error when importing pandas. Maybe it it because the files (pandas
/_libs/tslibs/c_timestamp.*) related to this error is not updated after the build. Could you tell me how to full build?

>>> import pandas as pd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/masayo/Documents/work/contributions/pandas/pandas/__init__.py", line 25, in <module>
    from pandas._libs import (hashtable as _hashtable,
  File "/Users/masayo/Documents/work/contributions/pandas/pandas/_libs/__init__.py", line 3, in <module>
    from .tslibs import (
  File "/Users/masayo/Documents/work/contributions/pandas/pandas/_libs/tslibs/__init__.py", line 3, in <module>
    from .conversion import normalize_date, localize_pydatetime
  File "pandas/_libs/tslibs/c_timestamp.pxd", line 7, in init pandas._libs.tslibs.conversion
    cdef class _Timestamp(datetime):
  File "__init__.pxd", line 918, in init pandas._libs.tslibs.c_timestamp
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

TomAugspurger · 2019-06-25T22:05:53Z

I think `python setup.py clean` will do the trick.

…

On Tue, Jun 25, 2019 at 4:33 PM mazayo ***@***.***> wrote: returns a DataFrame whose index is not sorted. Can you clarify: is that on this branch? Yes it is. No change have been made after commit ae19e6c <ae19e6c> . You’re right. I didn’t rebuild pandas after merging master. But even after python setup.py build_ext --inplace -j 4, I still get below error when importing pandas. Maybe it it because the files (pandas /_libs/tslibs/c_timestamp.*) related to this error is not updated after the build. Could you tell me how to full build? >>> import pandas as pd Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/masayo/Documents/work/contributions/pandas/pandas/__init__.py", line 25, in <module> from pandas._libs import (hashtable as _hashtable, File "/Users/masayo/Documents/work/contributions/pandas/pandas/_libs/__init__.py", line 3, in <module> from .tslibs import ( File "/Users/masayo/Documents/work/contributions/pandas/pandas/_libs/tslibs/__init__.py", line 3, in <module> from .conversion import normalize_date, localize_pydatetime File "pandas/_libs/tslibs/c_timestamp.pxd", line 7, in init pandas._libs.tslibs.conversion cdef class _Timestamp(datetime): File "__init__.pxd", line 918, in init pandas._libs.tslibs.c_timestamp ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26875?email_source=notifications&email_token=AAKAOIUG7S3E6AHWVYUOYTTP4KFKBA5CNFSM4HYQIEH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYRVDJI#issuecomment-505631141>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIUYAMYSLZSB5O74Q6DP4KFKBANCNFSM4HYQIEHQ> .

mazayo · 2019-06-26T11:39:31Z

Thank you. I could successfully full build pandas by python setup.py clean followed by python setup.py build_ext --inplace -j 4.

Here is what I got in this branch and in the the environment created by conda env create -n pandas-azure-macos-35 --file=ci/deps/azure-macos-35.yaml.

>>> import pandas as pd
>>> pd.__version__
'0.25.0.dev0+784.gae19e6c85'
>>> expected = pd.DataFrame(
...             {'A': {('bee', 'bah', 0): 1.0,
...                    ('bee', 'bah', 1): 1.0,
...                    ('bee', 'boo', 0): 2.0,
...                    ('bee', 'boo', 1): 2.0,
...                    ('bee', 'boo', 2): 2.0},
...              'B': {('bee', 'bah', 0): 1.0,
...                    ('bee', 'bah', 1): 1.0,
...                    ('bee', 'boo', 0): 2.0,
...                    ('bee', 'boo', 1): 2.0,
...                    ('bee', 'boo', 2): 2.0}})
>>> expected
             A    B
bee boo 1  2.0  2.0
        2  2.0  2.0
    bah 1  1.0  1.0
        0  1.0  1.0
    boo 0  2.0  2.0
>>>
>>>
>>> import numpy as np
>>> df1 = pd.DataFrame(np.ones((2, 2)), columns=list('AB'))
>>> df2 = pd.DataFrame(np.ones((3, 2)) * 2, columns=list('AB'))
>>> results = pd.concat((df1, df2), keys=[('bee', 'bah'), ('bee', 'boo')])
>>> results
             A    B
bee bah 0  1.0  1.0
        1  1.0  1.0
    boo 0  2.0  2.0
        1  2.0  2.0
        2  2.0  2.0

Sorry it took me a while to get back to you. I was about to go work when I received you message.

TomAugspurger · 2019-06-26T11:45:13Z

OK, does that output make sense to out? I haven't looked to closely, but it seems like the old dict-of-dicts constructor sorted unordered dictionaries?

mazayo · 2019-06-26T12:50:06Z

Right. extract_index is called when constructing a DataFrame from a dict. It used to sort keys before the update in this branch.

TomAugspurger

@mazayo really sorry about the delay. Hopefully my suggested change will help. This looks quite close.

TomAugspurger · 2019-07-02T21:08:12Z

pandas/core/internals/construction.py

            index = _union_indexes(indexes)
+        elif have_dicts:
+            index = _union_indexes(indexes, sort=False)


So I suppose this is the key one. IIUC, we want to sort for Python=3.5. So this should be

index = _union_indexes(indexes, sort=not compat.PY36)

That will be True for python 3.5 and False for 3.6 and newer.

You'll need to import pandas.compat at the top.

I haven't checked yet if you need to inspect the contents of the dicts to see if they're ordered or not. Hopefully you won't have to.

OK. I'll take care of this later.

Instead of

index = _union_indexes(indexes, sort=not compat.PY36)

changed to

index = _union_indexes(indexes, sort=not (compat.PY36 or have_ordered))

to prevent sorting a OrderedDict in Python3.5

jreback

small tests comments, otherwise lgtm. ping on green.

jreback · 2019-07-05T18:42:04Z

pandas/tests/frame/test_constructors.py

@@ -517,7 +517,7 @@ def test_constructor_subclass_dict(self, float_frame):
            dct.update(v.to_dict())
            data[k] = dct
        frame = DataFrame(data)
-        tm.assert_frame_equal(float_frame.sort_index(), frame)
+        tm.assert_frame_equal(float_frame, frame.reindex(float_frame.index))


expected = frame.reindex(index=float_frame.index)

jreback · 2019-07-05T18:42:19Z

pandas/tests/frame/test_constructors.py

@@ -1342,15 +1342,28 @@ def test_constructor_list_of_namedtuples(self):
    def test_constructor_orient(self, float_string_frame):
        data_dict = float_string_frame.T._series
        recons = DataFrame.from_dict(data_dict, orient="index")
-        expected = float_string_frame.sort_index()
-        tm.assert_frame_equal(recons, expected)
+        expected = float_string_frame


put the reindex here

jreback · 2019-07-08T01:26:55Z

thanks @mazayo

mazayo · 2019-07-08T22:24:41Z

Thanks for offering good first issues here and your support

topper-123 reviewed Jun 16, 2019

View reviewed changes

topper-123 added the DataFrame DataFrame data structure label Jun 16, 2019

jreback requested changes Jun 16, 2019

View reviewed changes

mazayo force-pushed the from_ordered_dict branch from d765439 to 18ccb11 Compare June 16, 2019 15:25

BUG: from_dict ignored order of OrderedDict (pandas-dev#8425)

3929945

mazayo force-pushed the from_ordered_dict branch from 18ccb11 to 3929945 Compare June 17, 2019 12:57

jreback requested changes Jun 21, 2019

View reviewed changes

jreback added this to the 0.25.0 milestone Jun 21, 2019

jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jun 21, 2019

mazayo added 2 commits June 22, 2019 06:44

Merge remote-tracking branch 'upstream/master' into from_ordered_dict

28f3e72

Merged upstream/master

ae19e6c

TomAugspurger reviewed Jun 24, 2019

View reviewed changes

TomAugspurger reviewed Jul 2, 2019

View reviewed changes

jreback removed this from the 0.25.0 milestone Jul 3, 2019

Changed for compatibility. Python=3.5 needs sort.

de4406b

mazayo force-pushed the from_ordered_dict branch from a9ec3fa to de4406b Compare July 4, 2019 19:55

Merge remote-tracking branch 'upstream/master' into from_ordered_dict

8720697

mazayo force-pushed the from_ordered_dict branch from b379d8a to 8720697 Compare July 4, 2019 21:14

Fixed linting error

50c5467

Resolved the error found in black

7bc716b

jreback requested changes Jul 5, 2019

View reviewed changes

jreback added this to the 0.25.0 milestone Jul 5, 2019

Fixed according to the review comments

74966cb

jreback approved these changes Jul 8, 2019

View reviewed changes

jreback merged commit 5422807 into pandas-dev:master Jul 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: from_dict ignored order of OrderedDict (#8425) #26875

BUG: from_dict ignored order of OrderedDict (#8425) #26875

mazayo commented Jun 16, 2019

codecov bot commented Jun 16, 2019

codecov bot commented Jun 16, 2019 •

edited

Loading

topper-123 Jun 16, 2019

jreback left a comment

mazayo commented Jun 16, 2019

mazayo commented Jun 20, 2019

jreback left a comment

jreback Jun 21, 2019

jreback Jun 21, 2019

mazayo Jun 21, 2019

TomAugspurger left a comment •

edited

Loading

mazayo commented Jun 24, 2019

TomAugspurger commented Jun 25, 2019

mazayo commented Jun 25, 2019

TomAugspurger commented Jun 25, 2019 via email

mazayo commented Jun 26, 2019

TomAugspurger commented Jun 26, 2019

mazayo commented Jun 26, 2019

TomAugspurger left a comment

TomAugspurger Jul 2, 2019

TomAugspurger Jul 2, 2019

mazayo Jul 2, 2019

mazayo Jul 4, 2019

jreback left a comment

jreback Jul 5, 2019

jreback Jul 5, 2019

jreback commented Jul 8, 2019

mazayo commented Jul 8, 2019

BUG: from_dict ignored order of OrderedDict (#8425) #26875

BUG: from_dict ignored order of OrderedDict (#8425) #26875

Conversation

mazayo commented Jun 16, 2019

codecov bot commented Jun 16, 2019

Codecov Report

codecov bot commented Jun 16, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

mazayo commented Jun 16, 2019

mazayo commented Jun 20, 2019

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger left a comment • edited Loading

Choose a reason for hiding this comment

mazayo commented Jun 24, 2019

TomAugspurger commented Jun 25, 2019

mazayo commented Jun 25, 2019

TomAugspurger commented Jun 25, 2019 via email

mazayo commented Jun 26, 2019

TomAugspurger commented Jun 26, 2019

mazayo commented Jun 26, 2019

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jul 8, 2019

mazayo commented Jul 8, 2019

codecov bot commented Jun 16, 2019 •

edited

Loading

TomAugspurger left a comment •

edited

Loading