`DataFrame.from_dict()` does not behave as documented. #12387

cswarth · 2016-02-19T00:44:06Z

DataFrame.from_dict does not seem to behave according to the documentation.

import pandas as pd

print("pandas.from_dict(<dict>, orient='index')")
print(pd.DataFrame.from_dict(dict([ ['key1', 1],['key2',2] ]), orient='index'))

print("pandas.from_dict(<list [<list>]>, orient='columns')")
print(pd.DataFrame.from_dict([ ['key1', 1],['key2',2] ], orient='columns'))

print("pandas.from_dict(<dict>, orient='columns')")
print(pd.DataFrame.from_dict(dict([ ['key1', 1],['key2',2] ]), orient='columns'))

Produces

pandas.from_dict(<dict>, orient='index')
      0
key2  2
key1  1
pandas.from_dict(<list [<list>]>, orient='columns')
      0  1
0  key1  1
1  key2  2
pandas.from_dict(<dict>, orient='columns')
Traceback (most recent call last):
  File "../../bin/pandaserr.py", line 10, in <module>
    print(pd.DataFrame.from_dict(dict([ ['key1', 1],['key2',2] ]), orient='columns'))
  File "/home/cwarth/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 804, in from_dict
    return cls(data, index=index, columns=columns, dtype=dtype)
  File "/home/cwarth/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 226, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "/home/cwarth/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 363, in _init_dict
    dtype=dtype)
  File "/home/cwarth/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 5158, in _arrays_to_mgr
    index = extract_index(arrays)
  File "/home/cwarth/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 5197, in extract_index
    raise ValueError('If using all scalar values, you must pass'
ValueError: If using all scalar values, you must pass an index

According to the reference documentation, "If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’" That doesn't appear to be what is happening.

Expected Output

from_dict(<dict>, orient='index') works as expected.
I would expect from_dict(..., orient='columns') to return a dataframe with the dictionary keys forming the column index, like this:

      key1  key2
0   1        2

I would expect from_dict() to take a dict as a parameter in either case. Instead it appears to take a a dict for orient='index' and a list of lists (or tuples) for orient='columns'. Passing a dict with integer values when orient='columns' causes a crash.

output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.6
pip: 8.0.2
setuptools: 18.8
Cython: 0.23.2
numpy: 1.10.4
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 4.0.3
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.5.0
openpyxl: 2.2.0-b1
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.3.3
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
Jinja2: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-02-19T01:13:41Z

No, its acting correctly, you just passing in scalars, the values need to be lists, otherwise you don't have an index, but have values, just like the error message says.

In [15]: pd.DataFrame.from_dict(dict([ ['key1', [1]],['key2',[2]] ]), orient='columns')
Out[15]: 
   key1  key2
0     1     2

Note that this is equivalent to the more used basic constructor.

In [19]: DataFrame(dict([ ['key1', [1]],['key2',[2]] ]))
Out[19]: 
   key1  key2
0     1     2

cswarth · 2016-02-19T05:10:24Z

Thanks for the explanation. I still have to ask why the different behavior for the two values of orient? For orient=index passing scalars is fine. For orient=columns suddenly scalars are forbidden. I suspect you are so familiar with this code that that this behavior doesn't surprise you, but it certainly surprises me. I would have a difficult time rationalizing this seemingly arbitrary distinction.

BTW, perhaps it is time to revisit #4916 and deprecate .from_dict as its functionality is duplicated in the basic constructor?

jreback · 2016-02-19T13:22:54Z

certainly could revist #4916. want to put forth a proposal on that issue (e.g. show the new constructor) so people can comment.

We don't allow construction of a DataFrame with values when you don't have an index.

it is exactly this:

In [14]: DataFrame({'A' : 1})
ValueError: If using all scalar values, you must pass an index

It could simply create an index with len 1, but this is invariably a user mistake, then didn't say how long to make it.

zhangysh1995 · 2017-08-01T08:37:14Z

@jreback why we should have index to construct a DataFrame? Does this mean I have to reconstruct my data with index? Or could I use Pandas to add index for my data?

jeswcollins · 2019-05-16T22:57:49Z

Error message could say "If using all scalar values, use orient='index'."

I read the error message "you must pass an index" to imply I should set an index parameter, then I got another error:

In [14] pd.DataFrame.from_dict(df,index=df.index)
TypeError: from_dict() got an unexpected keyword argument 'index'

TomAugspurger · 2019-05-17T14:46:19Z

This is an closed issue so people probably aren't monitoring it. If you have a suggestion for improving the error message I'd recommend opening a new issue.

…

On Thu, May 16, 2019 at 5:57 PM Jesse W. Collins ***@***.***> wrote: Error message could say "If using all scalar values, use orient='index'." If read the error message "you must pass an index" to imply I should set an index parameter, then I got another error. TypeError: from_dict() got an unexpected keyword argument 'index'``` — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#12387?email_source=notifications&email_token=AAKAOIVR2M3GFDE7OJ2ST7DPVXRHHA5CNFSM4B3WM57KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVTIS4I#issuecomment-493259121>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIRGU5COGLQIZT7D4YDPVXRHHANCNFSM4B3WM57A> .

gennaro-tedesco · 2020-02-26T13:52:31Z

This behaviour is extremely counter-intuitive and should be strongly documented at least, if this is how things are intended to be kept.

There is no reason why the DataFrame constructor should not create the index automatically when passing scalars, and asking to manually pass the index requires the user to already know the structure of the dictionary they are intending to parse, which is not the case in most practical applications (and should not be).

focusmediaproperties · 2020-04-30T18:34:36Z

If scalar aren't the keys implied as the index?

exeptionerror · 2021-06-22T13:16:09Z

All Possible working solution [Solved] ValueError: If using all scalar values, you must pass an index

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Usage Question labels Feb 19, 2016

jreback closed this as completed Feb 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`DataFrame.from_dict()` does not behave as documented. #12387

`DataFrame.from_dict()` does not behave as documented. #12387

cswarth commented Feb 19, 2016

jreback commented Feb 19, 2016

cswarth commented Feb 19, 2016

jreback commented Feb 19, 2016

zhangysh1995 commented Aug 1, 2017 •

edited

Loading

jeswcollins commented May 16, 2019 •

edited

Loading

TomAugspurger commented May 17, 2019 via email

gennaro-tedesco commented Feb 26, 2020 •

edited

Loading

focusmediaproperties commented Apr 30, 2020

exeptionerror commented Jun 22, 2021

DataFrame.from_dict() does not behave as documented. #12387

DataFrame.from_dict() does not behave as documented. #12387

Comments

cswarth commented Feb 19, 2016

Produces

Expected Output

output of pd.show_versions()

jreback commented Feb 19, 2016

cswarth commented Feb 19, 2016

jreback commented Feb 19, 2016

zhangysh1995 commented Aug 1, 2017 • edited Loading

jeswcollins commented May 16, 2019 • edited Loading

TomAugspurger commented May 17, 2019 via email

gennaro-tedesco commented Feb 26, 2020 • edited Loading

focusmediaproperties commented Apr 30, 2020

exeptionerror commented Jun 22, 2021

`DataFrame.from_dict()` does not behave as documented. #12387

`DataFrame.from_dict()` does not behave as documented. #12387

output of `pd.show_versions()`

zhangysh1995 commented Aug 1, 2017 •

edited

Loading

jeswcollins commented May 16, 2019 •

edited

Loading

gennaro-tedesco commented Feb 26, 2020 •

edited

Loading