Skip to content

BUG/API: can't pass parameters to csv module via df.to_csv #4528

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brechea opened this issue Aug 9, 2013 · 6 comments
Closed

BUG/API: can't pass parameters to csv module via df.to_csv #4528

brechea opened this issue Aug 9, 2013 · 6 comments
Labels
API Design Bug IO CSV read_csv, to_csv Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@brechea
Copy link

brechea commented Aug 9, 2013

Trying to print a data frame as plain, strict tsv (i.e., no quoting and no escaping, because I know none the fields will contain tabs), I wanted to use the "quoting" option, which is documented in pandas and is passed through to csv, as well as the "quotechar" option, not documented in pandas but also a csv option. But it doesn't work:

In [1]: import sys, csv

In [2]: from pandas import DataFrame

In [3]: data = {'col1': ['contents of col1 row1', 'contents " of col1 row2'], 'col2': ['contents of col2 row1', 'contents " of col2 row2'] }

In [4]: df = DataFrame(data)

In [5]: df.to_csv(sys.stdout, sep='\t', quoting=csv.QUOTE_NONE, quotechar=None)
        col1    col2
0       contents of col1 row1   contents of col2 row1
---------------------------------------------------------------------------
Error                                     Traceback (most recent call last)
<ipython-input-5-a30d32266fb4> in <module>()
----> 1 df.to_csv(sys.stdout, sep='\t', quoting=csv.QUOTE_NONE, quotechar=None)

/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/frame.pyc in to_csv(self, path_or_buf, sep, na_rep, float_format, cols, header, index, index_label, mode, nanRep, encoding, quoting, line_terminator, chunksize, tupleize_cols, **kwds)
   1409                                      tupleize_cols=tupleize_cols,
   1410                                      )
-> 1411         formatter.save()
   1412
   1413     def to_excel(self, excel_writer, sheet_name='sheet1', na_rep='',

/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/format.pyc in save(self)
    974
    975             else:
--> 976                 self._save()
    977
    978

/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/format.pyc in _save(self)
   1080                 break
   1081
-> 1082             self._save_chunk(start_i, end_i)
   1083
   1084     def _save_chunk(self, start_i, end_i):

/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/core/format.pyc in _save_chunk(self, start_i, end_i)
   1098         ix = data_index.to_native_types(slicer=slicer, na_rep=self.na_rep, float_format=self.float_format)
   1099
-> 1100         lib.write_csv_rows(self.data, ix, self.nlevels, self.cols, self.writer)
   1101
   1102 # from collections import namedtuple

/home/brechea/.local/lib/python2.6/site-packages/pandas-0.12.0-py2.6-linux-x86_64.egg/pandas/lib.so in pandas.lib.write_csv_rows (pandas/lib.c:13871)()

Error: need to escape, but no escapechar set

Adding the parameter

quotechar=kwds.get("quotechar")

to the

formatter = fmt.CSVFormatter(...

call in to_csv(), and doing corresponding changes to format.CSVFormatter()'s init() and save(), produces the expected output:

In [1]: import sys, csv

In [2]: from pandas import DataFrame

In [3]: data = {'col1': ['contents of col1 row1', 'contents " of col1 row2'], 'col2': ['contents of col2 row1', 'contents " of col2 row2'] }

In [4]: df = DataFrame(data)

In [5]: df.to_csv(sys.stdout, sep='\t', quoting=csv.QUOTE_NONE, quotechar=None)
        col1    col2
0       contents of col1 row1   contents of col2 row1
1       contents " of col1 row2 contents " of col2 row2

i.e., unescaped, unquoted tsv.

More generally, there could be many reasons to want more control of the underlying csv writer, so a generic mechanism (as opposed to adding each param one by one) might be called for (e.g., allowign for a csv dialect object or at least a dictionary holding dialect attributes).

@jreback
Copy link
Contributor

jreback commented Aug 10, 2013

yep...would be nice to add this parameter (and you are right, dialect would also be nice to pass, which if not None could control the values of other parms). Can you do a PR to add those (with tests!)

also I believe the doc string needs to be updated in to_csv and the docs in io.rst.

thanks!

@brechea
Copy link
Author

brechea commented Aug 13, 2013

I'm not sure what a PR is, but I assume a pull request, given this is github? I confess I don't have a git repo of pandas, and did the above very quickly. I simply hacked the above mentioned modules directly so at least I could demonstrate the before and after behavior. I'll look into doing things properly, but it may take a while.

@jreback
Copy link
Contributor

jreback commented Aug 13, 2013

have a look at this: http://pandas.pydata.org/developers.html

@jreback
Copy link
Contributor

jreback commented Sep 20, 2013

@brechea giving this a shot?

@jreback jreback closed this as completed Sep 21, 2013
@jreback jreback reopened this Sep 21, 2013
@jreback
Copy link
Contributor

jreback commented Sep 30, 2013

@brechea ping!

@jreback
Copy link
Contributor

jreback commented Feb 18, 2014

this is closed by #5414

@brechea pls try this again on master...if you are still experiencing issues...let us know

@jreback jreback closed this as completed Feb 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug IO CSV read_csv, to_csv Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

2 participants