cannot output csv with IntervalIndex #28210

AlJohri · 2019-08-29T03:26:31Z

Code Sample, a copy-pastable example if possible

Using pd.interval_range:

pd.DataFrame({'a': [1, 2, 3]}, index=pd.interval_range(0, 1, periods=3)).to_csv('hello.csv')

Using pd.IntervalIndex.from_arrays:

pd.DataFrame({'a': [1, 2, 3]}, index=pd.IntervalIndex.from_arrays(np.array([0, 1, 2]), np.array([1, 2, 3]))).to_csv('hello.csv')

Problem description

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-6f913f6c0211> in <module>
----> 1 pd.DataFrame({'a': [1, 2, 3]}, index=pd.IntervalIndex.from_arrays(np.array([0, 1, 2]), np.array([1, 2, 3]))).to_csv('hello.csv')

~/Development/propensity-to-subscribe-modeling/.venv/lib/python3.7/site-packages/pandas/core/generic.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, decimal)
   3226             decimal=decimal,
   3227         )
-> 3228         formatter.save()
   3229 
   3230         if path_or_buf is None:

~/Development/propensity-to-subscribe-modeling/.venv/lib/python3.7/site-packages/pandas/io/formats/csvs.py in save(self)
    200                 self.writer = UnicodeWriter(f, **writer_kwargs)
    201 
--> 202             self._save()
    203 
    204         finally:

~/Development/propensity-to-subscribe-modeling/.venv/lib/python3.7/site-packages/pandas/io/formats/csvs.py in _save(self)
    322                 break
    323 
--> 324             self._save_chunk(start_i, end_i)
    325 
    326     def _save_chunk(self, start_i, end_i):

~/Development/propensity-to-subscribe-modeling/.venv/lib/python3.7/site-packages/pandas/io/formats/csvs.py in _save_chunk(self, start_i, end_i)
    354         )
    355 
--> 356         libwriters.write_csv_rows(self.data, ix, self.nlevels, self.cols, self.writer)

TypeError: Argument 'data_index' has incorrect type (expected numpy.ndarray, got list)

Expected Output

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.4.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 18.7.0
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 0.25.1
numpy            : 1.17.0
pytz             : 2019.2
dateutil         : 2.8.0
pip              : 19.2.2
setuptools       : 41.1.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.10.1
IPython          : 7.7.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.1.1
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 0.14.1
pytables         : None
s3fs             : None
scipy            : 1.3.1
sqlalchemy       : None
tables           : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None

The text was updated successfully, but these errors were encountered:

jschendel · 2019-08-29T14:11:30Z

Thanks, for the report. I can confirm this behavior on master:

In [2]: df = pd.DataFrame({'a': [1, 2, 3]}, index=pd.interval_range(0, 3))

In [3]: df.to_csv()
---------------------------------------------------------------------------
TypeError: Argument 'data_index' has incorrect type (expected numpy.ndarray, got list)

A temporary workaround is to cast the index to string prior to writing the output:

In [4]: df.index = df.index.astype(str)

In [5]: df.to_csv()
Out[5]: ',a\n"(0, 1]",1\n"(1, 2]",2\n"(2, 3]",3\n'

MarcoGorelli · 2019-08-29T15:05:08Z

In the example

import pandas

df = pandas.DataFrame({'a': [1, 2, 3]}, index=pandas.interval_range(0, 3))

is the expected output

',a\n"(0, 1]",1\n"(1, 2]",2\n"(2, 3]",3\n'

or is that just a temporary workaround to avoid an error?

AlJohri · 2019-08-29T16:22:13Z

I think the string representation would be fine as the expected output. If a user wants a non-string output they can break the IntervalIndex out into index.left and index.right.

jschendel · 2019-08-29T16:24:45Z

Yes, that should be the expected output, or at the very least is consistent with previous versions where this was working:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.23.4'

In [2]: df = pd.DataFrame({'a': [1, 2, 3]}, index=pd.interval_range(0, 3))

In [3]: df.to_csv()
Out[3]: ',a\n"(0, 1]",1\n"(1, 2]",2\n"(2, 3]",3\n

Note that my example produces string output because I didn't pass anything for path_or_buf in to_csv, so the behavior in that case is to display a string of what would be written to a file. Maybe not common knowledge but useful for debugging and the like.

korakot · 2020-11-18T08:51:39Z

I search everywhere about how to read a csv file with IntervalIndex. Eventually, I found the solution, so I wanna share it here.

def to_interval(istr):
    c_left = istr[0]=='['
    c_right = istr[-1]==']'
    closed = {(True, False): 'left',
              (False, True): 'right',
              (True, True): 'both',
              (False, False): 'neither'
              }[c_left, c_right]
    left, right = map(float, istr[1:-1].split(','))
    return pd.Interval(left, right, closed)

# the IntervalIndex is the first column
df = pd.read_csv('data.csv',  index_col=0, converters={0: to_interval})

jschendel added Interval Interval data type IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version labels Aug 29, 2019

jschendel added this to the Contributions Welcome milestone Aug 29, 2019

jschendel mentioned this issue Aug 30, 2019

REGR: Fix to_csv with IntervalIndex #28229

Merged

5 tasks

jschendel modified the milestones: Contributions Welcome, 0.25.2 Aug 30, 2019

TomAugspurger closed this as completed in #28229 Aug 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cannot output csv with IntervalIndex #28210

cannot output csv with IntervalIndex #28210

AlJohri commented Aug 29, 2019

jschendel commented Aug 29, 2019

MarcoGorelli commented Aug 29, 2019

AlJohri commented Aug 29, 2019

jschendel commented Aug 29, 2019 •

edited

Loading

korakot commented Nov 18, 2020

cannot output csv with IntervalIndex #28210

cannot output csv with IntervalIndex #28210

Comments

AlJohri commented Aug 29, 2019

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

jschendel commented Aug 29, 2019

MarcoGorelli commented Aug 29, 2019

AlJohri commented Aug 29, 2019

jschendel commented Aug 29, 2019 • edited Loading

korakot commented Nov 18, 2020

Output of `pd.show_versions()`

jschendel commented Aug 29, 2019 •

edited

Loading