Skip to content

ENH: Store comments from read_csv in an property (df.comment) and allow saving #15283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
DSLituiev opened this issue Feb 1, 2017 · 4 comments
Labels
Enhancement IO CSV read_csv, to_csv metadata _metadata, .attrs Needs Discussion Requires discussion from core team before further action

Comments

@DSLituiev
Copy link

DSLituiev commented Feb 1, 2017

Problem description

This is an enhancement request to allow handling and saving comment strings with DataFrame text file IO. Two related stackoverflow questions about such feature:
http://stackoverflow.com/questions/39724298/pandas-extract-comment-lines
http://stackoverflow.com/questions/29233496/write-comments-in-csv-file-with-pandas

Code Sample, a copy-pastable example if possible

df = pd.read_csv("mydata.csv", comment="#")
df.comment
"this is a comment from mydata.csv file"
df.to_csv("output.csv", comment="#") # saves the `comment` string by pasting "#" before its each line and putting it before the table. 

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.1.final.0 python-bits: 64 OS: Darwin OS-release: 14.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.0
nose: 1.3.7
pip: 9.0.1
setuptools: 29.0.1
Cython: 0.24
numpy: 1.12.0
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.0.0
sphinx: 1.4.5
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: None
httplib2: 0.9.2
apiclient: 1.5.1
sqlalchemy: 1.0.14
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.42.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Feb 1, 2017

xref #5686

attaching meta-data (that they is mostly-not propogated) is not generally a good idea, rather this could be something like:

df, comments, bad_lines = pd.read_csv(...., return_comments=True, return_bad_lines=True)

or some such.

of if we simply want to not allow both comment processing and returning (which is prob preferable)

df, comments, bad_lines = pd.read_csv(...., comments='return', errors='return')
(going with the new errors kw that has been proposed for handling / warning on bad_lines)

or probably better

df, return_data = ...

where

return_data = 
{
'comments' = [list_of_tuples (line_number, comment)], 
'errors'= [list_of_tuples(line_number, text, error)]
}

or somesuch (this would be nicer as you only always have 1 return value, and it has optional keys)

@jreback jreback added this to the Next Major Release milestone Feb 1, 2017
@TrigonaMinima
Copy link

@jreback I'd like to work upon this.

Is there any other target file other than pandas/io/parsers.py?

@jreback
Copy link
Contributor

jreback commented Feb 10, 2017

the main file is pandas/parser.pyx, pandas/io/parsers.py houses the top-level interface and the python parser, while parser.pyx houses the c-parser. Both would need to be modified.

@mroeschke mroeschke added Enhancement Needs Discussion Requires discussion from core team before further action and removed API Design labels May 8, 2021
@jbrockmendel jbrockmendel added the metadata _metadata, .attrs label Dec 21, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@joeflack4
Copy link

Not a huge help to me but adding a +1 just because of current need

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO CSV read_csv, to_csv metadata _metadata, .attrs Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

6 participants