pd.read_csv prefix parameter seems do not works #27394

Sniperq2 · 2019-07-15T08:06:33Z

Code Sample, a copy-pastable example if possible

field,data,values
text1,34,hh,fail
text2,76,tt,fail2

import pandas as pd

if __name__ == "__main__":
    data_list = pd.read_csv('test.csv', prefix="X")
    print(data_list)

Problem description

A documentation said:

prefix : str, optional
    Prefix to add to column numbers when no header, e.g. ‘X’ for X0, X1, …

If I understood correctly this documentation I should get this

Expected Output

field    data    values  X0
text1     34    hh        fail
text2     76    tt         fail2

But I got this anyway:

Current Output

          field  data    values
text1     34    hh      fail
text2     76    tt        fail2

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 8.1
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.2
pytest: None
pip: 19.1.1
setuptools: 40.1.0
Cython: None
numpy: 1.15.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.5
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 4.1.1
bs4: 4.6.0
html5lib: None
sqlalchemy: 1.2.19
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-07-15T12:03:11Z

I think you misunderstand. That's only used when there's no header and the default range(n) header is used.

In [9]: pd.read_csv(io.StringIO('0,1\n2,3'), header=None, prefix="X_")
Out[9]:
   X_0  X_1
0    0    1
1    2    3

In [10]: pd.read_csv(io.StringIO('0,1\n2,3'), prefix="X_")
Out[10]:
   0  1
0  2  3

We should probably raise when header and prefix are both not None.

Sniperq2 · 2019-07-15T12:22:24Z

Ah. That's it. "Prefix to add to column numbers when no header at all, e.g. ‘X’ for X0, X1, …"
So I consider it as NOTABUG

I look around for function to drop columns without a header. df.dropna() did not help? Could I create a feature request?

TomAugspurger · 2019-07-15T13:25:30Z

I look around for function to drop columns without a header

I don't understand. Every column in a DataFrame always has a name. You can't have a "column without a header".

sameshl · 2019-07-15T14:15:50Z

@TomAugspurger , what do you mean by

We should probably raise when header and prefix are both not None.

Could you explain me the issue. I would like to take it up.

TomAugspurger · 2019-07-15T15:15:27Z

@sameshl great!

If you look at #27394 (comment), In[10] should raise a TypeError saying that prefix can only be used when header=None.

@WillAyd @simonjayhawkins does that sound right to you?

WillAyd · 2019-07-15T15:26:29Z

Makes sense to me

sameshl · 2019-07-15T15:41:53Z

@TomAugspurger I see. Thanks for the info. Will work on it!

Sniperq2 · 2019-07-15T15:54:55Z

I look around for function to drop columns without a header

I don't understand. Every column in a DataFrame always has a name. You can't have a "column without a header".

Yes. I totally agree with you. But some csv files have columns without headers. Peoples make mistakes... Would be nice if pandas.read_csv method have an option to cut off such columns (without header).

ex. https://stackoverflow.com/questions/50301544/how-to-delete-columns-without-headers-in-python-pandas-read-csv

TomAugspurger · 2019-07-15T16:00:55Z

The answers there seem reasonable to me.

…

On Mon, Jul 15, 2019 at 10:55 AM Gutsycat ***@***.***> wrote: I look around for function to drop columns without a header I don't understand. Every column in a DataFrame always has a name. You can't have a "column without a header". Yes. I totally agree with you. But some csv files have columns without headers. Peoples make mistakes... Would be nice if pandas.read_csv method have an option to cut off such columns (without header). ex. https://stackoverflow.com/questions/50301544/how-to-delete-columns-without-headers-in-python-pandas-read-csv — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#27394?email_source=notifications&email_token=AAKAOIVQWY2TZETQARIYWLLP7SMVPA5CNFSM4IDUF6IKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ6EKHA#issuecomment-511460636>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOITP76VTC46XUF4HKVDP7SMVPANCNFSM4IDUF6IA> .

Sniperq2 · 2019-07-15T16:57:59Z

The answers there seem reasonable to me.
…

Yes. But it is not the same that I suggest:

Now pandas.read_csv has this

error_bad_lines : bool, default True
Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will dropped from
the DataFrame that is returned.

May be do the same for columns?

error_bad_columns : bool, default True
Columns with empty header and/or [other "damages"] will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these **“bad columns” will dropped from the DataFrame that is returned.**

What do you think?

TomAugspurger · 2019-07-15T17:20:55Z

That behavior seems confusing to me.

…

On Mon, Jul 15, 2019 at 11:58 AM Gutsycat ***@***.***> wrote: The answers there seem reasonable to me. … <#m_-6482100132353005987_> Yes. But it is not the same that I suggest: Now pandas.read_csv has this error_bad_lines : bool, default True Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will dropped from the DataFrame that is returned. May be do the same for columns? error_bad_columns : bool, default True Columns with empty header and/or [other "damages"] will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these *“bad columns” will dropped from the DataFrame that is returned.* What do you think? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#27394?email_source=notifications&email_token=AAKAOIVLFVX4XKQ3DLZYBPLP7SUB7A5CNFSM4IDUF6IKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ6J6AI#issuecomment-511483649>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIVJP4TILOPRLOQQXODP7SUB7ANCNFSM4IDUF6IA> .

samyak-jn · 2019-10-19T11:16:51Z

Hey, can I work on the issue, if someone is not working on it now?

TomAugspurger · 2019-10-19T13:18:43Z

Sure. It looks like #27998 started, but stalled. You may want to read through that PR first.

… None (#31383) Closes #27394

Pandas 1.1 has broken Record Mover's usage of the read_csv() function by adding error checking in cases where a certain argument would be unused. Details of the Pandas change: * pandas-dev/pandas#27394 * pandas-dev/pandas#31383 See Records Mover test failures here: * https://app.circleci.com/pipelines/github/bluelabsio/records-mover/1089/workflows/e62f1cf0-f8d0-4e22-9652-112df72b02b8/jobs/9439

TomAugspurger added Effort Low labels Jul 15, 2019

TomAugspurger added this to the Contributions Welcome milestone Jul 15, 2019

TomAugspurger added IO CSV read_csv, to_csv Error Reporting Incorrect or improved errors from pandas good first issue and removed Difficulty Intermediate good first issue labels Jul 15, 2019

PraneethKhanna mentioned this issue Aug 18, 2019

Raise exception in read_csv when prefix is set, but not used because a header exists #27998

Closed

5 tasks

jbrockmendel removed the Effort Low label Oct 21, 2019

rushabh-v mentioned this issue Jan 28, 2020

Raise error in read_csv when arguments header and prefix both are not None #31383

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.1 Feb 1, 2020

gfyoung closed this as completed in #31383 Feb 3, 2020

gfyoung pushed a commit that referenced this issue Feb 3, 2020

Raise error in read_csv when arguments header and prefix both are not…

a2721fd

… None (#31383) Closes #27394

vinceatbluelabs mentioned this issue Aug 11, 2020

Adjust to Pandas breaking change, bump Python version bluelabsio/records-mover#101

Merged

malinkallen mentioned this issue Jan 12, 2021

Raise an error in read_csv when names and prefix both are not None #39123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pd.read_csv prefix parameter seems do not works #27394

pd.read_csv prefix parameter seems do not works #27394

Sniperq2 commented Jul 15, 2019 •

edited

Loading

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

TomAugspurger commented Jul 15, 2019

Sniperq2 commented Jul 15, 2019

TomAugspurger commented Jul 15, 2019

sameshl commented Jul 15, 2019 •

edited

Loading

TomAugspurger commented Jul 15, 2019

WillAyd commented Jul 15, 2019

sameshl commented Jul 15, 2019

Sniperq2 commented Jul 15, 2019

TomAugspurger commented Jul 15, 2019 via email

Sniperq2 commented Jul 15, 2019 •

edited

Loading

TomAugspurger commented Jul 15, 2019 via email

samyak-jn commented Oct 19, 2019

TomAugspurger commented Oct 19, 2019

pd.read_csv prefix parameter seems do not works #27394

pd.read_csv prefix parameter seems do not works #27394

Comments

Sniperq2 commented Jul 15, 2019 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Current Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS

TomAugspurger commented Jul 15, 2019

Sniperq2 commented Jul 15, 2019

TomAugspurger commented Jul 15, 2019

sameshl commented Jul 15, 2019 • edited Loading

TomAugspurger commented Jul 15, 2019

WillAyd commented Jul 15, 2019

sameshl commented Jul 15, 2019

Sniperq2 commented Jul 15, 2019

TomAugspurger commented Jul 15, 2019 via email

Sniperq2 commented Jul 15, 2019 • edited Loading

TomAugspurger commented Jul 15, 2019 via email

samyak-jn commented Oct 19, 2019

TomAugspurger commented Oct 19, 2019

Sniperq2 commented Jul 15, 2019 •

edited

Loading

Output of `pd.show_versions()`

[paste the output of `pd.show_versions()` here below this line]
INSTALLED VERSIONS

sameshl commented Jul 15, 2019 •

edited

Loading

Sniperq2 commented Jul 15, 2019 •

edited

Loading