Skip to content

Inconsistent type casting between DataFrame and Series #14216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dwyatte opened this issue Sep 13, 2016 · 7 comments
Closed

Inconsistent type casting between DataFrame and Series #14216

dwyatte opened this issue Sep 13, 2016 · 7 comments
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request

Comments

@dwyatte
Copy link

dwyatte commented Sep 13, 2016

There is an inconsistency between how Series and DataFrames export types to dicts. The issue only manifests if the dataframe has multiple columns with different types. I think the issue is probably pretty far under the surface with differences between Series and DataFrame dtypes and just happened to be uncovered by export to dict.

Code Sample, a copy-pastable example if possible

import pandas as pd

orig_dict = [{0: 'a', 1: 1}]

# does not happen with
# orig_dict =[{0: 0, 1: 1}]

print 'Original dict'
for k in orig_dict[0]:
    print type(orig_dict[0][k])
print

df = pd.DataFrame(orig_dict)
dict_exported_from_dataframe = [df.loc[[x]].to_dict(orient='records')[0] for x in df.index]
dict_exported_from_series = [df.loc[x].to_dict() for x in df.index]

print 'Dict exported from DataFrame'
for k in dict_exported_from_dataframe[0]:
    print type(dict_exported_from_dataframe[0][k])
print

print 'Dict exported from Series'
for k in dict_exported_from_series[0]:
    print type(dict_exported_from_series[0][k])
print

Expected Output

Original dict
<type 'str'>
<type 'int'>

Dict exported from DataFrame
<type 'str'>
<type 'int'>

Dict exported from Series
<type 'str'>
<type 'int'>

OR

Original dict
<type 'str'>
<type 'int'>

Dict exported from DataFrame
<type 'str'>
<type 'numpy.int64'>

Dict exported from Series
<type 'str'>
<type 'numpy.int64'>

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 25.2.0
Cython: None
numpy: 1.11.1
scipy: 0.13.0b1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

@chris-b1
Copy link
Contributor

Slightly simpler repro.

d = {'a':[1], 'b':['b']}

pd.DataFrame(d).to_dict()
Out[92]: {'a': {0: 1}, 'b': {0: 'b'}}

pd.DataFrame(d).to_dict(orient='records')
Out[93]: [{'a': 1L, 'b': 'b'}]

@jreback
Copy link
Contributor

jreback commented Sep 13, 2016

this is a duplicate of #13236

so this could be a test for that issue

@jreback
Copy link
Contributor

jreback commented Sep 13, 2016

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions API Design Duplicate Report Duplicate issue or pull request labels Sep 14, 2016
@jreback jreback added this to the No action milestone Sep 14, 2016
@jreback jreback closed this as completed Sep 14, 2016
@jorisvandenbossche
Copy link
Member

@jreback I don't think this is a duplicate of #13236. You repurposed that issue to document the behaviour of map (mapping on python/pandas types and not numpy scalars). While here is shown that to_dict still does return numpy scalars in certain cases.

@jreback
Copy link
Contributor

jreback commented Sep 14, 2016

right I didn't repurpose that issue. The underlying, which is slated for 0.20 is #13258

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Sep 14, 2016

Ah yes, so it is a duplicate of that one (#13258). Will add it there as case to test

@dwyatte
Copy link
Author

dwyatte commented Sep 12, 2017

I checked out #17491 and the bug in my original code sample where to_dict() returns different types for DataFrame and Series is still reproducible.

@chris-b1's code snippet is used as a test case and correctly works now, so I think there is still something in Series.to_dict() that needs to be changed to return python types, assuming that is the expected behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

4 participants