Skip to content

Problem with to_dict('records') in 0.24 #25050

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Javert899 opened this issue Jan 31, 2019 · 3 comments
Closed

Problem with to_dict('records') in 0.24 #25050

Javert899 opened this issue Jan 31, 2019 · 3 comments
Labels
Duplicate Report Duplicate issue or pull request

Comments

@Javert899
Copy link

Hi,

Running the following code in Pandas 0.24, the column names of the later object are completely wrong. May be that when the column name contains a : the to_dict('records') method is bugged?

CODE:

import pandas as pd

df = pd.read_csv("C:\running-example.csv")
print(df.columns)
print(df.to_dict().keys())
print(df.to_dict('records'))

OUTPUT:

Index(['Unnamed: 0', 'Activity', 'Costs', 'Resource', 'case:concept:name',
'case:creator', 'concept:name', 'org:resource', 'time:timestamp'],
dtype='object')
dict_keys(['Unnamed: 0', 'Activity', 'Costs', 'Resource', 'case:concept:name', 'case:creator', 'concept:name', 'org:resource', 'time:timestamp'])
[{'_0': 0, 'Activity': 'register request', 'Costs': 50, 'Resource': 'Pete', '_4': 3, '_5': 'Fluxicon Nitro', '_6': 'register request', '_7': 'Pete', '_8': '2010-12-30 14:32:00+01:00'}, {'_0': 1, 'Activity': 'examine casually', 'Costs': 400, 'Resource': 'Mike', '_4': 3, '_5': 'Fluxicon Nitro', '_6': 'examine casually', '_7': 'Mike', '_8': '2010-12-30 15:06:00+01:00'}, {'_0': 2, 'Activity': 'check ticket', 'Costs': 100, 'Resource': 'Ellen', '_4': 3, '_5': 'Fluxicon Nitro', '_6': 'check ticket', '_7': 'Ellen', '_8': '2010-12-30 16:34:00+01:00'}, {'_0': 3, 'Activity': 'decide', 'Costs': 200, 'Resource': 'Sara', '_4': 3, '_5': 'Fluxicon Nitro', '_6': 'decide', '_7': 'Sara', '_8': '2011-01-06 09:18:00+01:00'}, {'_0': 4, 'Activity': 'reinitiate request', 'Costs': 200, 'Resource': 'Sara', '_4': 3, '_5': 'Fluxicon Nitro', '_6': 'reinitiate request', '_7': 'Sara', '_8': '2011-01-06 12:18:00+01:00'}, {'_0': 5, 'Activity': 'examine thoroughly', 'Costs': 400, 'Resource': 'Sean', '_4': 3, '_5': 'Fluxicon Nitro', '_6': 'examine thoroughly', '_7': 'Sean', '_8': '2011-01-06 13:06:00+01:00'}, {'_0': 6, 'Activity': 'check ticket', 'Costs': 100, 'Resource': 'Pete', '_4': 3, '_5': 'Fluxicon Nitro', '_6': 'check ticket', '_7': 'Pete', '_8': '2011-01-08 11:43:00+01:00'}, {'_0': 7, 'Activity': 'decide', 'Costs': 200, 'Resource': 'Sara', '_4': 3, '_5': 'Fluxicon Nitro', '_6': 'decide', '_7': 'Sara', '_8': '2011-01-09 09:55:00+01:00'}, {'_0': 8, 'Activity': 'pay compensation', 'Costs': 200, 'Resource': 'Ellen', '_4': 3, '_5': 'Fluxicon Nitro', '_6': 'pay compensation', '_7': 'Ellen', '_8': '2011-01-15 10:45:00+01:00'}, {'_0': 9, 'Activity': 'register request', 'Costs': 50, 'Resource': 'Mike', '_4': 2, '_5': 'Fluxicon Nitro', '_6': 'register request', '_7': 'Mike', '_8': '2010-12-30 11:32:00+01:00'}, {'_0': 10, 'Activity': 'check ticket', 'Costs': 100, 'Resource': 'Mike', '_4': 2, '_5': 'Fluxicon Nitro', '_6': 'check ticket', '_7': 'Mike', '_8': '2010-12-30 12:12:00+01:00'}, {'_0': 11, 'Activity': 'examine casually', 'Costs': 400, 'Resource': 'Sean', '_4': 2, '_5': 'Fluxicon Nitro', '_6': 'examine casually', '_7': 'Sean', '_8': '2010-12-30 14:16:00+
................

@hfwittmann
Copy link

hfwittmann commented Jan 31, 2019

The problem seems to be related to complicated headers:

This here works

pandas 0.23.0

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/solar.csv').iloc[:,:2]

print(df)
print(df.to_dict("rows"))
        State  Number of Solar Plants

0 California 289
1 Arizona 48
2 Nevada 11
3 New Mexico 33
4 Colorado 20
5 Texas 12
6 North Carolina 148
7 New York 13
[{'State': 'California', 'Number of Solar Plants': 289}, {'State': 'Arizona', 'Number of Solar Plants': 48}, {'State': 'Nevada', 'Number of Solar Plants': 11}, {'State': 'New Mexico', 'Number of Solar Plants': 33}, {'State': 'Colorado','Number of Solar Plants': 20}, {'State': 'Texas', 'Number of Solar Plants': 12}, {'State': 'North Carolina', 'Number of Solar Plants': 148}, {'State': 'New York', 'Number of Solar Plants': 13}]
DataTable(columns=[{'name': 'State', 'id': 'State'}, {'name': 'Number of Solar Plants', 'id': 'Number of Solar Plants'}], data=[{'State': 'California', 'Number of Solar Plants': 289}, {'State': 'Arizona', 'Number of Solar Plants': 48}, {'State': 'Nevada', 'Number of Solar Plants': 11}, {'State': 'New Mexico', 'Number of Solar Plants': 33}, {'State': 'Colorado', 'Number of Solar Plants': 20}, {'State': 'Texas', 'Number of Solar Plants': 12}, {'State': 'North Carolina', 'Number of Solar Plants': 148}, {'State': 'New York', 'Number of Solar Plants': 13}], id='table')

This same code does not work in pandas 0.24.0 :

pandas 0.24.0

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/solar.csv').iloc[:,:2]

print(df)
print(df.to_dict("rows"))
        State  Number of Solar Plants

0 California 289
1 Arizona 48
2 Nevada 11
3 New Mexico 33
4 Colorado 20
5 Texas 12
6 North Carolina 148
7 New York 13
[{'State': 'California', '_1': 289}, {'State': 'Arizona', '_1': 48}, {'State': 'Nevada', '_1': 11}, {'State': 'New Mexico', '_1': 33}, {'State': 'Colorado', '_1': 20}, {'State': 'Texas', '_1': 12}, {'State': 'North Carolina', '_1': 148}, {'State': 'New York', '_1': 13}]
DataTable(columns=[{'name': 'State', 'id': 'State'}, {'name': 'Number of Solar Plants', 'id': 'Number of Solar Plants'}], data=[{'State': 'California', '_1': 289}, {'State': 'Arizona', '_1': 48}, {'State': 'Nevada', '_1': 11}, {'State':'New Mexico', '_1': 33}, {'State': 'Colorado', '_1': 20}, {'State': 'Texas', '_1': 12}, {'State': 'North Carolina', '_1': 148}, {'State': 'New York', '_1': 13}], id='table')

However after fixing the headers the pandas 0.24.0 version also works:

pandas 0.24.0

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/solar.csv').iloc[:,:2]

df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')

print(df)
print(df.to_dict("rows"))
        state  number_of_solar_plants

0 California 289
1 Arizona 48
2 Nevada 11
3 New Mexico 33
4 Colorado 20
5 Texas 12
6 North Carolina 148
7 New York 13
[{'state': 'California', 'number_of_solar_plants': 289}, {'state': 'Arizona', 'number_of_solar_plants': 48}, {'state': 'Nevada', 'number_of_solar_plants': 11}, {'state': 'New Mexico', 'number_of_solar_plants': 33}, {'state': 'Colorado','number_of_solar_plants': 20}, {'state': 'Texas', 'number_of_solar_plants': 12}, {'state': 'North Carolina', 'number_of_solar_plants': 148}, {'state': 'New York', 'number_of_solar_plants': 13}]
DataTable(columns=[{'name': 'state', 'id': 'state'}, {'name': 'number_of_solar_plants', 'id': 'number_of_solar_plants'}], data=[{'state': 'California', 'number_of_solar_plants': 289}, {'state': 'Arizona', 'number_of_solar_plants': 48}, {'state': 'Nevada', 'number_of_solar_plants': 11}, {'state': 'New Mexico', 'number_of_solar_plants': 33}, {'state': 'Colorado', 'number_of_solar_plants': 20}, {'state': 'Texas', 'number_of_solar_plants': 12}, {'state': 'North Carolina', 'number_of_solar_plants': 148}, {'state': 'New York', 'number_of_solar_plants': 13}], id='table')

@jorisvandenbossche
Copy link
Member

This is a known problem in 0.24.0 (see #25012, #25023, #24991, #24940, #24965), and will be fixed in 0.24.1 (will be released today or in a couple of days).

@jorisvandenbossche jorisvandenbossche added the Duplicate Report Duplicate issue or pull request label Jan 31, 2019
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Jan 31, 2019
@lmxia
Copy link

lmxia commented Feb 2, 2019

I encountered with the same error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

4 participants