to_csv() fail on 0.11.dev #3163

gdraps · 2013-03-25T01:57:36Z

Hit this after updating to '0.11.0.dev-da54321' from master. Haven't had a chance to dig any deeper, other than isolate frame length as a factor.

df = pandas.util.testing.makeTimeDataFrame(25000)
df.to_csv("save.csv")  # works
df = pandas.util.testing.makeTimeDataFrame(25001)
df.to_csv("save.csv")  # throws exception below

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-83-12cc25e3eafd> in <module>()
----> 1 df.to_csv("save.csv")

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0.dev_da54321-py2.7-linux-i686.egg/pandas/core/frame.pyc in to_csv(self, path_or_buf, sep, na_rep, float_format, cols, header, index, index_label, mode, nanRep, encoding, quoting, line_terminator, chunksize, **kwds)
   1348                                          index_label=index_label,
   1349                                          chunksize=chunksize,legacy=kwds.get("legacy",False) )
-> 1350             formatter.save()
   1351 
   1352     def to_excel(self, excel_writer, sheet_name='sheet1', na_rep='',

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0.dev_da54321-py2.7-linux-i686.egg/pandas/core/format.pyc in save(self)
    936 
    937             else:
--> 938                 self._save()
    939 
    940 

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0.dev_da54321-py2.7-linux-i686.egg/pandas/core/format.pyc in _save(self)
   1008                 break
   1009 
-> 1010             self._save_chunk(start_i, end_i)
   1011 
   1012     def _save_chunk(self, start_i, end_i):

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0.dev_da54321-py2.7-linux-i686.egg/pandas/core/format.pyc in _save_chunk(self, start_i, end_i)
   1029         ix = data_index.to_native_types(slicer=slicer, na_rep=self.na_rep, float_format=self.float_format)
   1030 
-> 1031         lib.write_csv_rows(self.data, ix, self.nlevels, self.cols, self.writer)
   1032 
   1033 # from collections import namedtuple

/usr/local/lib/python2.7/dist-packages/pandas-0.11.0.dev_da54321-py2.7-linux-i686.egg/pandas/lib.so in pandas.lib.write_csv_rows (pandas/lib.c:13152)()

IndexError: list index out of range

Current workaround:

df.to_csv("save.csv", legacy=True)

The text was updated successfully, but these errors were encountered:

jreback · 2013-03-25T02:02:12Z

platform and numpy version?

gdraps · 2013-03-25T02:08:01Z

i386 GNU/Linux and numpy 1.6.2

ghost · 2013-03-25T02:13:37Z

22f258f fixed a very similar boundry+1 issue for multiindex, due to a slicer arg being ignored
in MultiIndex to_native_types? maybe DateTimeIndex has the same issue?

jreback · 2013-03-25T02:18:38Z

it's 1 chunk plus 1 row

will take a look

ghost · 2013-03-25T02:22:14Z

I got it.

core/index:to_native_types
        if self.is_all_dates:
            return _date_formatter(self)
        else:
            values[mask] = na_rep

should be

        if self.is_all_dates:
            return _date_formatter(self[slicer])
        else:
            values[mask] = na_rep

jreback · 2013-03-25T02:22:46Z

by visual inspection

to_native_types in core/index

is_all_dates is returning the formatted for self, should be values

will add test and fix tom

@gdraps thanks for the report

ghost · 2013-03-25T02:23:25Z

take it away jeff.... :)

jreback · 2013-03-25T02:23:58Z

faster than me!

use values instead of self[slicer]
already computed

ghost · 2013-03-25T02:25:28Z

it's a view isn't it? you do it. I'll beef up the torture test, I'm not sure I tested DateTimeIndex,
maybe just TimeStamp Objects.

jreback · 2013-03-25T02:26:25Z

it's a view

ghost · 2013-03-25T03:02:47Z

test at 8386da9

jreback · 2013-03-25T12:45:52Z

closed by #3166

@y-p I put in a separate test for this (marked slow), but pls merge yours as well

ghost · 2013-03-25T14:26:08Z

Thanks, will do.

EDIT: 886c3c7

ghost · 2013-03-26T13:22:00Z

fyi, legacy=True has been replaced by engine='python', to be consistent with the c_parser convention.

jreback mentioned this issue Mar 25, 2013

BUG: GH3163 fixed to_csv with a boundry condition issue at the chunksize break #3166

Merged

jreback closed this as completed Mar 25, 2013

ghost mentioned this issue Jan 24, 2014

Handling of duplicate columns in pandas.io.sql.read_frame #2738

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_csv() fail on 0.11.dev #3163

to_csv() fail on 0.11.dev #3163

gdraps commented Mar 25, 2013

jreback commented Mar 25, 2013

gdraps commented Mar 25, 2013

ghost commented Mar 25, 2013

jreback commented Mar 25, 2013

ghost commented Mar 25, 2013

jreback commented Mar 25, 2013

ghost commented Mar 25, 2013

jreback commented Mar 25, 2013

ghost commented Mar 25, 2013

jreback commented Mar 25, 2013

ghost commented Mar 25, 2013

jreback commented Mar 25, 2013

ghost commented Mar 25, 2013

ghost commented Mar 26, 2013

to_csv() fail on 0.11.dev #3163

to_csv() fail on 0.11.dev #3163

Comments

gdraps commented Mar 25, 2013

jreback commented Mar 25, 2013

gdraps commented Mar 25, 2013

ghost commented Mar 25, 2013

jreback commented Mar 25, 2013

ghost commented Mar 25, 2013

jreback commented Mar 25, 2013

ghost commented Mar 25, 2013

jreback commented Mar 25, 2013

ghost commented Mar 25, 2013

jreback commented Mar 25, 2013

ghost commented Mar 25, 2013

jreback commented Mar 25, 2013

ghost commented Mar 25, 2013

ghost commented Mar 26, 2013