Skip to content

How should I store frames with multiindex columns in CSV? #21976

Closed
@alexitkes

Description

@alexitkes

Hello.

I tried to save a dataframe with MultiIndex used as columns to a CSV file and load it back, but I had no luck.

# Create a fame with multiindex columns
frame = pd.DataFrame({('AAPL', 'OPEN'): [1, 2, 3, 4], ('AAPL', 'CLOSE'): [1, 2, 3, 4], ('MSFT', 'OPEN'): [1, 2, 3, 4], ('MSFT', 'CLOSE'): [1, 2, 3, 4]})
# Make sure it was created as wanted.
frame
#   AAPL       MSFT     
#   CLOSE OPEN CLOSE OPEN
# 0     1    1     1    1
# 1     2    2     2    2
# 2     3    3     3    3
# 3     4    4     4    4

# Try to convert the frame to CSV
s1 = frame.to_csv()
s2 = frame.to_csv(tupleize_cols=True)
# FutureWarning displayed - tupleize_cols is deprecated.

print(s1)
# ,AAPL,AAPL,MSFT,MSFT
# ,CLOSE,OPEN,CLOSE,OPEN
# 0,1,1,1,1
# 1,2,2,2,2
# 2,3,3,3,3
# 3,4,4,4,4

print(s2)
# ,"('AAPL', 'CLOSE')","('AAPL', 'OPEN')","('MSFT', 'CLOSE')","('MSFT', 'OPEN')"
# 0,1,1,1,1
# 1,2,2,2,2
# 2,3,3,3,3
# 3,4,4,4,4

# Read the CSV strings back to DataFrames
f1 = pd.read_csv(StringIO(s1))
f2 = pd.read_csv(StringIO(s2), tupleize_cols=True)
# Warning about tupleize_cols here

# Both frames does not look like the original one.
f1
#    Unnamed: 0   AAPL AAPL.1   MSFT MSFT.1
# 0         NaN  CLOSE   OPEN  CLOSE   OPEN
# 1         0.0      1      1      1      1
# 2         1.0      2      2      2      2
# 3         2.0      3      3      3      3
# 4         3.0      4      4      4      4

f2
#    Unnamed: 0  ('AAPL', 'CLOSE')  ('AAPL', 'OPEN')  ('MSFT', 'CLOSE')  ('MSFT', 'OPEN')
# 0           0                  1                 1                  1                 1
# 1           1                  2                 2                  2                 2
# 2           2                  3                 3                  3                 3
# 3           3                  4                 4                  4                 4

As you see, both frames don't have multiindexed columns as original one. So, how should I save a DataFrame with multiindexed columns to CSV file and load it back to get a frame same to the original one?

I also tried to save as JSON, but also encountered problems. Here is what the frame shown above is converted to.

frame.to_json()
'{"["AAPL","CLOSE"]":{"0":1,"1":2,"2":3,"3":4},"["AAPL","OPEN"]":{"0":1,"1":2,"2":3,"3":4},"["MSFT","CLOSE"]":{"0":1,"1":2,"2":3,"3":4},"["MSFT","OPEN"]":{"0":1,"1":2,"2":3,"3":4}}'

So, tupleized multiindexed column names are obviously incorrectly quoted.

With best regards,

Alex.

INSTALLED VERSIONS

commit: None
python: 3.4.2.final.0
python-bits: 32
OS: Linux
OS-release: 3.16.0-6-686-pae
machine: i686
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0.dev0+318.g272bbdc
pytest: 3.6.3
pip: 1.5.6
setuptools: 5.5.1
Cython: 0.28.4
numpy: 1.14.5
scipy: None
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO DataIO issues that don't fit into a more specific label

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions