Description
Hello.
I tried to save a dataframe with MultiIndex used as columns to a CSV file and load it back, but I had no luck.
# Create a fame with multiindex columns
frame = pd.DataFrame({('AAPL', 'OPEN'): [1, 2, 3, 4], ('AAPL', 'CLOSE'): [1, 2, 3, 4], ('MSFT', 'OPEN'): [1, 2, 3, 4], ('MSFT', 'CLOSE'): [1, 2, 3, 4]})
# Make sure it was created as wanted.
frame
# AAPL MSFT
# CLOSE OPEN CLOSE OPEN
# 0 1 1 1 1
# 1 2 2 2 2
# 2 3 3 3 3
# 3 4 4 4 4
# Try to convert the frame to CSV
s1 = frame.to_csv()
s2 = frame.to_csv(tupleize_cols=True)
# FutureWarning displayed - tupleize_cols is deprecated.
print(s1)
# ,AAPL,AAPL,MSFT,MSFT
# ,CLOSE,OPEN,CLOSE,OPEN
# 0,1,1,1,1
# 1,2,2,2,2
# 2,3,3,3,3
# 3,4,4,4,4
print(s2)
# ,"('AAPL', 'CLOSE')","('AAPL', 'OPEN')","('MSFT', 'CLOSE')","('MSFT', 'OPEN')"
# 0,1,1,1,1
# 1,2,2,2,2
# 2,3,3,3,3
# 3,4,4,4,4
# Read the CSV strings back to DataFrames
f1 = pd.read_csv(StringIO(s1))
f2 = pd.read_csv(StringIO(s2), tupleize_cols=True)
# Warning about tupleize_cols here
# Both frames does not look like the original one.
f1
# Unnamed: 0 AAPL AAPL.1 MSFT MSFT.1
# 0 NaN CLOSE OPEN CLOSE OPEN
# 1 0.0 1 1 1 1
# 2 1.0 2 2 2 2
# 3 2.0 3 3 3 3
# 4 3.0 4 4 4 4
f2
# Unnamed: 0 ('AAPL', 'CLOSE') ('AAPL', 'OPEN') ('MSFT', 'CLOSE') ('MSFT', 'OPEN')
# 0 0 1 1 1 1
# 1 1 2 2 2 2
# 2 2 3 3 3 3
# 3 3 4 4 4 4
As you see, both frames don't have multiindexed columns as original one. So, how should I save a DataFrame with multiindexed columns to CSV file and load it back to get a frame same to the original one?
I also tried to save as JSON, but also encountered problems. Here is what the frame shown above is converted to.
frame.to_json()
'{"["AAPL","CLOSE"]":{"0":1,"1":2,"2":3,"3":4},"["AAPL","OPEN"]":{"0":1,"1":2,"2":3,"3":4},"["MSFT","CLOSE"]":{"0":1,"1":2,"2":3,"3":4},"["MSFT","OPEN"]":{"0":1,"1":2,"2":3,"3":4}}'
So, tupleized multiindexed column names are obviously incorrectly quoted.
With best regards,
Alex.
INSTALLED VERSIONS
commit: None
python: 3.4.2.final.0
python-bits: 32
OS: Linux
OS-release: 3.16.0-6-686-pae
machine: i686
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.0.dev0+318.g272bbdc
pytest: 3.6.3
pip: 1.5.6
setuptools: 5.5.1
Cython: 0.28.4
numpy: 1.14.5
scipy: None
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None