ENH: Store comments from read_csv in an property (`df.comment`) and allow saving #15283

DSLituiev · 2017-02-01T19:05:32Z

Problem description

This is an enhancement request to allow handling and saving comment strings with DataFrame text file IO. Two related stackoverflow questions about such feature:
http://stackoverflow.com/questions/39724298/pandas-extract-comment-lines
http://stackoverflow.com/questions/29233496/write-comments-in-csv-file-with-pandas

Code Sample, a copy-pastable example if possible

df = pd.read_csv("mydata.csv", comment="#")
df.comment
"this is a comment from mydata.csv file"
df.to_csv("output.csv", comment="#") # saves the `comment` string by pasting "#" before its each line and putting it before the table.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.1.final.0 python-bits: 64 OS: Darwin OS-release: 14.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.0
nose: 1.3.7
pip: 9.0.1
setuptools: 29.0.1
Cython: 0.24
numpy: 1.12.0
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.0.0
sphinx: 1.4.5
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: None
httplib2: 0.9.2
apiclient: 1.5.1
sqlalchemy: 1.0.14
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.42.0
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2017-02-01T19:08:54Z

xref #5686

attaching meta-data (that they is mostly-not propogated) is not generally a good idea, rather this could be something like:

df, comments, bad_lines = pd.read_csv(...., return_comments=True, return_bad_lines=True)

or some such.

of if we simply want to not allow both comment processing and returning (which is prob preferable)

df, comments, bad_lines = pd.read_csv(...., comments='return', errors='return')
(going with the new errors kw that has been proposed for handling / warning on bad_lines)

or probably better

df, return_data = ...

where

return_data = 
{
'comments' = [list_of_tuples (line_number, comment)], 
'errors'= [list_of_tuples(line_number, text, error)]
}

or somesuch (this would be nicer as you only always have 1 return value, and it has optional keys)

TrigonaMinima · 2017-02-10T09:14:08Z

@jreback I'd like to work upon this.

Is there any other target file other than pandas/io/parsers.py?

jreback · 2017-02-10T14:38:06Z

the main file is pandas/parser.pyx, pandas/io/parsers.py houses the top-level interface and the python parser, while parser.pyx houses the c-parser. Both would need to be modified.

joeflack4 · 2023-01-31T23:15:05Z

Not a huge help to me but adding a +1 just because of current need

jreback added API Design Difficulty Intermediate IO CSV read_csv, to_csv labels Feb 1, 2017

jreback added this to the Next Major Release milestone Feb 1, 2017

jbrockmendel removed Effort Medium labels Oct 21, 2019

mroeschke added Enhancement Needs Discussion Requires discussion from core team before further action and removed API Design labels May 8, 2021

jbrockmendel added the metadata _metadata, .attrs label Dec 21, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Store comments from read_csv in an property (`df.comment`) and allow saving #15283

ENH: Store comments from read_csv in an property (`df.comment`) and allow saving #15283

DSLituiev commented Feb 1, 2017 •

edited

Loading

jreback commented Feb 1, 2017 •

edited

Loading

TrigonaMinima commented Feb 10, 2017

jreback commented Feb 10, 2017

joeflack4 commented Jan 31, 2023

ENH: Store comments from read_csv in an property (df.comment) and allow saving #15283

ENH: Store comments from read_csv in an property (df.comment) and allow saving #15283

Comments

DSLituiev commented Feb 1, 2017 • edited Loading

Problem description

Code Sample, a copy-pastable example if possible

Output of pd.show_versions()

jreback commented Feb 1, 2017 • edited Loading

TrigonaMinima commented Feb 10, 2017

jreback commented Feb 10, 2017

joeflack4 commented Jan 31, 2023

ENH: Store comments from read_csv in an property (`df.comment`) and allow saving #15283

ENH: Store comments from read_csv in an property (`df.comment`) and allow saving #15283

DSLituiev commented Feb 1, 2017 •

edited

Loading

Output of `pd.show_versions()`

jreback commented Feb 1, 2017 •

edited

Loading