Skip to content

BUG: DataFrame from_dict constructor ignores Ordered dict when orient='index' #8425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aimboden opened this issue Sep 30, 2014 · 8 comments · Fixed by #26875
Closed

BUG: DataFrame from_dict constructor ignores Ordered dict when orient='index' #8425

aimboden opened this issue Sep 30, 2014 · 8 comments · Fixed by #26875
Labels
Bug good first issue Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@aimboden
Copy link

Hello,
I have been experimenting with OrderedDicts lately, and found a bug with the DataFrame from_dict constructor. Here is a sample code.

import collections
import pandas as pd

firstrow={}
firstrow['foo'] = 'bar'
firstrow['baz'] = 'buzz'

row1 = pd.Series(firstrow)

secondrow={}
secondrow['foo'] = 'bar2'
secondrow['baz'] = 'buzz2'

row2 = pd.Series(secondrow)

roworder = collections.OrderedDict()

roworder['zShould be first'] = row1
roworder['Should be second'] = row2

# Ordering is respected when sorting on columns
df = pd.DataFrame.from_dict(data=roworder, orient='columns')

# But not when sorting on rows
incorrectdf = pd.DataFrame.from_dict(data=roworder, orient='index')
correctdf = df.transpose()

INSTALLED VERSIONS

commit: None
python: 3.3.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel
byteorder: little
LC_ALL: None
LANG: fr_CH

pandas: 0.14.1
nose: 1.3.4
Cython: 0.20.1
numpy: 1.9.0
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 2.2.0
sphinx: 1.2.3
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2013.9
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.0
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.5.7
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None

@jreback
Copy link
Contributor

jreback commented Sep 30, 2014

can you make your code runnable (so can simply copy/paste). you have some undefined variables.

@aimboden
Copy link
Author

Sorry about that! Should be fine now. If not, will check when back in the office tomorrow.

EDIT: the code now reproduces the above mentioned bug

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Oct 1, 2014
@jreback jreback added this to the 0.16 milestone Oct 1, 2014
@jreback
Copy link
Contributor

jreback commented Oct 1, 2014

@Gimli510 that does look buggy.

welcome a pull-request to fix.

You can use your test example above, just step thru the code and see where its breaking and try a fix.

@aimboden
Copy link
Author

aimboden commented Oct 2, 2014

@jreback I think I found where the bug comes from.
The function _union_index calls
lib.fast_unique_multiple_list(indexes), which sorts the keys before returning them. Should we carry a flag telling this cython function not to sort the keys when the indexes list was created from an ordered dict? I guess there is a cleaner way to do this, but don't really have any idea about how to go about it.

# Up to this point, the future index is ordered as it should.
indexes = [['zShould be first', 'Should be second'], ['zShould be first', 'Should be second']]
# When indexes is a list with more than 1 items, we hit this path:        
# return Index(lib.fast_unique_multiple_list(indexes))

# However, 
lib.fast_unique_multiple_list(indexes)

returns

['Should be second', 'zShould be first']

@jreback
Copy link
Contributor

jreback commented Oct 2, 2014

I think this should be handled in core/pandas/frame/extract_index. Need to differentiate between a dict and an OrderedDict.

maybe add in a have_ordered in addition to setting have_dict. Then you can pass this to _union_indexes(indexes,ordered=have_ordered)

Then you can validate that if ordered=True is passed (default is False)
then can do a unique preserving order (so pass the flag into fast_unique_multiple, iow don't sort)

@hamedhsn
Copy link

@jreback
I have done based on what you said and in the last part how can I pass the flag to fast_unique_multiple because it calls fast_unique_multiple_list(_args, *_kwargs) and when I look at the lib.pyx it always sort the list at the end(uniques.sort())
any idea?

@alichaudry
Copy link

@jreback is this still an issue in the current version of pandas? I'm seeing the problem on an older version (v0.16.2) and I'm not sure if it's been addressed in the current one.

df = pd.DataFrame.from_dict(ordered_dict_data, orient='index') 

sorts the index alphabetically. I've been using the following hack to address it:

df = pd.DataFrame.from_dict(ordered_dict_data, orient='columns').T

My hack, however, sorts the columns alphabetically.

For the data that I have, it's easier for me to re-order these columns so the latter solution works better. To be precise, my data is an OrderedDict of OrderedDicts so I expect the sort order of both the index and columns to be respected. It looks something like this:

data = OrderedDict(
    'a': OrderedDict('aa': 5, 'bb': 10),
    'b': OrderedDict('aa': 7, 'bb': 14),
    ...)

If it's not fixed, I can take a stab at it.

@TomAugspurger
Copy link
Contributor

Still an open issue.

@datapythonista datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018
jzwinck added a commit to jzwinck/pandas that referenced this issue Jul 28, 2018
This removes the deprecation warnings introduced in pandas-dev#18262,
by reimplementing DataFrame.from_items() in the recommended
way using DataFrame.from_dict() and collections.OrderedDict.

This eliminates the maintenance burden of separate code for
from_items(), while allowing existing uses to keep working.

A small cleanup can be done once pandas-dev#8425 is fixed.
jzwinck added a commit to jzwinck/pandas that referenced this issue Jul 28, 2018
This removes the deprecation warnings introduced in pandas-dev#18262,
by reimplementing DataFrame.from_items() in the recommended
way using DataFrame.from_dict() and collections.OrderedDict.

This eliminates the maintenance burden of separate code for
from_items(), while allowing existing uses to keep working.

A small cleanup can be done once pandas-dev#8425 is fixed.
jzwinck added a commit to jzwinck/pandas that referenced this issue Jul 28, 2018
This removes the deprecation warnings introduced in pandas-dev#18262,
by reimplementing DataFrame.from_items() in the recommended
way using DataFrame.from_dict() and collections.OrderedDict.

This eliminates the maintenance burden of separate code for
from_items(), while allowing existing uses to keep working.

A small cleanup can be done once pandas-dev#8425 is fixed.
mazayo added a commit to mazayo/pandas that referenced this issue Jun 15, 2019
mazayo added a commit to mazayo/pandas that referenced this issue Jun 16, 2019
mazayo added a commit to mazayo/pandas that referenced this issue Jun 16, 2019
mazayo added a commit to mazayo/pandas that referenced this issue Jun 16, 2019
mazayo added a commit to mazayo/pandas that referenced this issue Jun 16, 2019
mazayo added a commit to mazayo/pandas that referenced this issue Jun 17, 2019
@jreback jreback modified the milestones: Someday, 0.25.0 Jun 21, 2019
@jreback jreback modified the milestones: 0.25.0, Contributions Welcome Jul 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug good first issue Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants