BUG: DataFrame from_dict constructor ignores Ordered dict when orient='index' #8425

aimboden · 2014-09-30T08:43:29Z

Hello,
I have been experimenting with OrderedDicts lately, and found a bug with the DataFrame from_dict constructor. Here is a sample code.

import collections
import pandas as pd

firstrow={}
firstrow['foo'] = 'bar'
firstrow['baz'] = 'buzz'

row1 = pd.Series(firstrow)

secondrow={}
secondrow['foo'] = 'bar2'
secondrow['baz'] = 'buzz2'

row2 = pd.Series(secondrow)

roworder = collections.OrderedDict()

roworder['zShould be first'] = row1
roworder['Should be second'] = row2

# Ordering is respected when sorting on columns
df = pd.DataFrame.from_dict(data=roworder, orient='columns')

# But not when sorting on rows
incorrectdf = pd.DataFrame.from_dict(data=roworder, orient='index')
correctdf = df.transpose()

INSTALLED VERSIONS

commit: None
python: 3.3.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: fr_CH

pandas: 0.14.1
nose: 1.3.4
Cython: 0.20.1
numpy: 1.9.0
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 2.2.0
sphinx: 1.2.3
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2013.9
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.0
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.5.7
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None

jreback · 2014-09-30T18:20:28Z

can you make your code runnable (so can simply copy/paste). you have some undefined variables.

aimboden · 2014-09-30T20:32:16Z

Sorry about that! Should be fine now. If not, will check when back in the office tomorrow.

EDIT: the code now reproduces the above mentioned bug

jreback · 2014-10-01T17:37:15Z

@Gimli510 that does look buggy.

welcome a pull-request to fix.

You can use your test example above, just step thru the code and see where its breaking and try a fix.

aimboden · 2014-10-02T07:51:34Z

@jreback I think I found where the bug comes from.
The function _union_index calls
lib.fast_unique_multiple_list(indexes), which sorts the keys before returning them. Should we carry a flag telling this cython function not to sort the keys when the indexes list was created from an ordered dict? I guess there is a cleaner way to do this, but don't really have any idea about how to go about it.

# Up to this point, the future index is ordered as it should.
indexes = [['zShould be first', 'Should be second'], ['zShould be first', 'Should be second']]
# When indexes is a list with more than 1 items, we hit this path:        
# return Index(lib.fast_unique_multiple_list(indexes))

# However, 
lib.fast_unique_multiple_list(indexes)

returns

['Should be second', 'zShould be first']

jreback · 2014-10-02T11:54:09Z

I think this should be handled in core/pandas/frame/extract_index. Need to differentiate between a dict and an OrderedDict.

maybe add in a have_ordered in addition to setting have_dict. Then you can pass this to _union_indexes(indexes,ordered=have_ordered)

Then you can validate that if ordered=True is passed (default is False)
then can do a unique preserving order (so pass the flag into fast_unique_multiple, iow don't sort)

hamedhsn · 2015-09-29T21:55:07Z

@jreback
I have done based on what you said and in the last part how can I pass the flag to fast_unique_multiple because it calls fast_unique_multiple_list(_args, *_kwargs) and when I look at the lib.pyx it always sort the list at the end(uniques.sort())
any idea?

alichaudry · 2017-02-17T20:05:05Z

@jreback is this still an issue in the current version of pandas? I'm seeing the problem on an older version (v0.16.2) and I'm not sure if it's been addressed in the current one.

df = pd.DataFrame.from_dict(ordered_dict_data, orient='index')

sorts the index alphabetically. I've been using the following hack to address it:

df = pd.DataFrame.from_dict(ordered_dict_data, orient='columns').T

My hack, however, sorts the columns alphabetically.

For the data that I have, it's easier for me to re-order these columns so the latter solution works better. To be precise, my data is an OrderedDict of OrderedDicts so I expect the sort order of both the index and columns to be respected. It looks something like this:

data = OrderedDict(
    'a': OrderedDict('aa': 5, 'bb': 10),
    'b': OrderedDict('aa': 7, 'bb': 14),
    ...)

If it's not fixed, I can take a stab at it.

TomAugspurger · 2018-07-06T21:50:50Z

Still an open issue.

This removes the deprecation warnings introduced in pandas-dev#18262, by reimplementing DataFrame.from_items() in the recommended way using DataFrame.from_dict() and collections.OrderedDict. This eliminates the maintenance burden of separate code for from_items(), while allowing existing uses to keep working. A small cleanup can be done once pandas-dev#8425 is fixed.

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 1, 2014

jreback added this to the 0.16 milestone Oct 1, 2014

jreback modified the milestones: 0.16, 0.15.1 Oct 7, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

jreback mentioned this issue Jul 6, 2015

DataFrame constructor ignores key order when data is an OrderedDict and orient is 'columns' #10514

Closed

jreback added Difficulty Novice labels Jul 6, 2015

TomAugspurger added the good first issue label Oct 11, 2017

jreback added good first issue and removed good first issue Difficulty Novice labels Dec 15, 2017

datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018

jzwinck mentioned this issue Jul 28, 2018

MAINT: refactor from_items() using from_dict() #22094

Closed

4 tasks

jzwinck mentioned this issue Jul 30, 2018

Un-deprecate DataFrame.from_items() #21850

Closed

mazayo added a commit to mazayo/pandas that referenced this issue Jun 15, 2019

BUG: from_dict ignored order of OrderedDict (pandas-dev#8425)

fc48439

mazayo added a commit to mazayo/pandas that referenced this issue Jun 16, 2019

BUG: from_dict ignored order of OrderedDict (pandas-dev#8425)

1c3b0dc

mazayo added a commit to mazayo/pandas that referenced this issue Jun 16, 2019

BUG: from_dict ignored order of OrderedDict (pandas-dev#8425)

ef8bbf8

mazayo added a commit to mazayo/pandas that referenced this issue Jun 16, 2019

BUG: from_dict ignored order of OrderedDict (pandas-dev#8425)

d765439

mazayo mentioned this issue Jun 16, 2019

BUG: from_dict ignored order of OrderedDict (#8425) #26875

Merged

4 tasks

mazayo added a commit to mazayo/pandas that referenced this issue Jun 16, 2019

BUG: from_dict ignored order of OrderedDict (pandas-dev#8425)

18ccb11

mazayo added a commit to mazayo/pandas that referenced this issue Jun 17, 2019

BUG: from_dict ignored order of OrderedDict (pandas-dev#8425)

3929945

jreback modified the milestones: Someday, 0.25.0 Jun 21, 2019

jreback modified the milestones: 0.25.0, Contributions Welcome Jul 3, 2019

jreback closed this as completed in #26875 Jul 8, 2019

jreback pushed a commit that referenced this issue Jul 8, 2019

BUG: from_dict ignored order of OrderedDict (#8425) (#26875)

5422807

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame from_dict constructor ignores Ordered dict when orient='index' #8425

BUG: DataFrame from_dict constructor ignores Ordered dict when orient='index' #8425

aimboden commented Sep 30, 2014

jreback commented Sep 30, 2014

aimboden commented Sep 30, 2014

jreback commented Oct 1, 2014

aimboden commented Oct 2, 2014

jreback commented Oct 2, 2014

hamedhsn commented Sep 29, 2015

alichaudry commented Feb 17, 2017

TomAugspurger commented Jul 6, 2018

BUG: DataFrame from_dict constructor ignores Ordered dict when orient='index' #8425

BUG: DataFrame from_dict constructor ignores Ordered dict when orient='index' #8425

Comments

aimboden commented Sep 30, 2014

INSTALLED VERSIONS

jreback commented Sep 30, 2014

aimboden commented Sep 30, 2014

jreback commented Oct 1, 2014

aimboden commented Oct 2, 2014

jreback commented Oct 2, 2014

hamedhsn commented Sep 29, 2015

alichaudry commented Feb 17, 2017

TomAugspurger commented Jul 6, 2018