Skip to content

cannot output csv with IntervalIndex #28210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AlJohri opened this issue Aug 29, 2019 · 5 comments · Fixed by #28229
Closed

cannot output csv with IntervalIndex #28210

AlJohri opened this issue Aug 29, 2019 · 5 comments · Fixed by #28229
Labels
Interval Interval data type IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@AlJohri
Copy link

AlJohri commented Aug 29, 2019

Code Sample, a copy-pastable example if possible

Using pd.interval_range:

pd.DataFrame({'a': [1, 2, 3]}, index=pd.interval_range(0, 1, periods=3)).to_csv('hello.csv')

Using pd.IntervalIndex.from_arrays:

pd.DataFrame({'a': [1, 2, 3]}, index=pd.IntervalIndex.from_arrays(np.array([0, 1, 2]), np.array([1, 2, 3]))).to_csv('hello.csv')

Problem description

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-6f913f6c0211> in <module>
----> 1 pd.DataFrame({'a': [1, 2, 3]}, index=pd.IntervalIndex.from_arrays(np.array([0, 1, 2]), np.array([1, 2, 3]))).to_csv('hello.csv')

~/Development/propensity-to-subscribe-modeling/.venv/lib/python3.7/site-packages/pandas/core/generic.py in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, decimal)
   3226             decimal=decimal,
   3227         )
-> 3228         formatter.save()
   3229 
   3230         if path_or_buf is None:

~/Development/propensity-to-subscribe-modeling/.venv/lib/python3.7/site-packages/pandas/io/formats/csvs.py in save(self)
    200                 self.writer = UnicodeWriter(f, **writer_kwargs)
    201 
--> 202             self._save()
    203 
    204         finally:

~/Development/propensity-to-subscribe-modeling/.venv/lib/python3.7/site-packages/pandas/io/formats/csvs.py in _save(self)
    322                 break
    323 
--> 324             self._save_chunk(start_i, end_i)
    325 
    326     def _save_chunk(self, start_i, end_i):

~/Development/propensity-to-subscribe-modeling/.venv/lib/python3.7/site-packages/pandas/io/formats/csvs.py in _save_chunk(self, start_i, end_i)
    354         )
    355 
--> 356         libwriters.write_csv_rows(self.data, ix, self.nlevels, self.cols, self.writer)

TypeError: Argument 'data_index' has incorrect type (expected numpy.ndarray, got list)

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.4.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 18.7.0
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 0.25.1
numpy            : 1.17.0
pytz             : 2019.2
dateutil         : 2.8.0
pip              : 19.2.2
setuptools       : 41.1.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.10.1
IPython          : 7.7.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.1.1
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 0.14.1
pytables         : None
s3fs             : None
scipy            : 1.3.1
sqlalchemy       : None
tables           : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
@jschendel
Copy link
Member

Thanks, for the report. I can confirm this behavior on master:

In [2]: df = pd.DataFrame({'a': [1, 2, 3]}, index=pd.interval_range(0, 3))

In [3]: df.to_csv()
---------------------------------------------------------------------------
TypeError: Argument 'data_index' has incorrect type (expected numpy.ndarray, got list)

A temporary workaround is to cast the index to string prior to writing the output:

In [4]: df.index = df.index.astype(str)

In [5]: df.to_csv()
Out[5]: ',a\n"(0, 1]",1\n"(1, 2]",2\n"(2, 3]",3\n'

@jschendel jschendel added Interval Interval data type IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version labels Aug 29, 2019
@jschendel jschendel added this to the Contributions Welcome milestone Aug 29, 2019
@MarcoGorelli
Copy link
Member

In the example

import pandas

df = pandas.DataFrame({'a': [1, 2, 3]}, index=pandas.interval_range(0, 3))

is the expected output

',a\n"(0, 1]",1\n"(1, 2]",2\n"(2, 3]",3\n'

or is that just a temporary workaround to avoid an error?

@AlJohri
Copy link
Author

AlJohri commented Aug 29, 2019

I think the string representation would be fine as the expected output. If a user wants a non-string output they can break the IntervalIndex out into index.left and index.right.

@jschendel
Copy link
Member

jschendel commented Aug 29, 2019

Yes, that should be the expected output, or at the very least is consistent with previous versions where this was working:

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.23.4'

In [2]: df = pd.DataFrame({'a': [1, 2, 3]}, index=pd.interval_range(0, 3))

In [3]: df.to_csv()
Out[3]: ',a\n"(0, 1]",1\n"(1, 2]",2\n"(2, 3]",3\n

Note that my example produces string output because I didn't pass anything for path_or_buf in to_csv, so the behavior in that case is to display a string of what would be written to a file. Maybe not common knowledge but useful for debugging and the like.

@jschendel jschendel modified the milestones: Contributions Welcome, 0.25.2 Aug 30, 2019
@korakot
Copy link

korakot commented Nov 18, 2020

I search everywhere about how to read a csv file with IntervalIndex. Eventually, I found the solution, so I wanna share it here.

def to_interval(istr):
    c_left = istr[0]=='['
    c_right = istr[-1]==']'
    closed = {(True, False): 'left',
              (False, True): 'right',
              (True, True): 'both',
              (False, False): 'neither'
              }[c_left, c_right]
    left, right = map(float, istr[1:-1].split(','))
    return pd.Interval(left, right, closed)

# the IntervalIndex is the first column
df = pd.read_csv('data.csv',  index_col=0, converters={0: to_interval})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Interval Interval data type IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants