-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
pd.read_csv prefix parameter seems do not works #27394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think you misunderstand. That's only used when there's no header and the default In [9]: pd.read_csv(io.StringIO('0,1\n2,3'), header=None, prefix="X_")
Out[9]:
X_0 X_1
0 0 1
1 2 3
In [10]: pd.read_csv(io.StringIO('0,1\n2,3'), prefix="X_")
Out[10]:
0 1
0 2 3 We should probably raise when |
Ah. That's it. "Prefix to add to column numbers when no header at all, e.g. ‘X’ for X0, X1, …" I look around for function to drop columns without a header. df.dropna() did not help? Could I create a feature request? |
I don't understand. Every column in a DataFrame always has a name. You can't have a "column without a header". |
@TomAugspurger , what do you mean by
Could you explain me the issue. I would like to take it up. |
@sameshl great! If you look at #27394 (comment), @WillAyd @simonjayhawkins does that sound right to you? |
Makes sense to me |
@TomAugspurger I see. Thanks for the info. Will work on it! |
Yes. I totally agree with you. But some csv files have columns without headers. Peoples make mistakes... Would be nice if pandas.read_csv method have an option to cut off such columns (without header). |
The answers there seem reasonable to me.
…On Mon, Jul 15, 2019 at 10:55 AM Gutsycat ***@***.***> wrote:
I look around for function to drop columns without a header
I don't understand. Every column in a DataFrame always has a name. You
can't have a "column without a header".
Yes. I totally agree with you. But some csv files have columns without
headers. Peoples make mistakes... Would be nice if pandas.read_csv method
have an option to cut off such columns (without header).
ex.
https://stackoverflow.com/questions/50301544/how-to-delete-columns-without-headers-in-python-pandas-read-csv
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27394?email_source=notifications&email_token=AAKAOIVQWY2TZETQARIYWLLP7SMVPA5CNFSM4IDUF6IKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ6EKHA#issuecomment-511460636>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOITP76VTC46XUF4HKVDP7SMVPANCNFSM4IDUF6IA>
.
|
Yes. But it is not the same that I suggest: Now pandas.read_csv has this
May be do the same for columns?
What do you think? |
That behavior seems confusing to me.
…On Mon, Jul 15, 2019 at 11:58 AM Gutsycat ***@***.***> wrote:
The answers there seem reasonable to me.
… <#m_-6482100132353005987_>
Yes. But it is not the same that I suggest:
Now pandas.read_csv has this
error_bad_lines : bool, default True
Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will dropped from the DataFrame that is returned.
May be do the same for columns?
error_bad_columns : bool, default True
Columns with empty header and/or [other "damages"] will by default cause
an exception to be raised, and no DataFrame will be returned. If False,
then these *“bad columns” will dropped from the DataFrame that is
returned.*
What do you think?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27394?email_source=notifications&email_token=AAKAOIVLFVX4XKQ3DLZYBPLP7SUB7A5CNFSM4IDUF6IKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ6J6AI#issuecomment-511483649>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIVJP4TILOPRLOQQXODP7SUB7ANCNFSM4IDUF6IA>
.
|
Hey, can I work on the issue, if someone is not working on it now? |
Sure. It looks like #27998 started, but stalled. You may want to read through that PR first. |
Pandas 1.1 has broken Record Mover's usage of the read_csv() function by adding error checking in cases where a certain argument would be unused. Details of the Pandas change: * pandas-dev/pandas#27394 * pandas-dev/pandas#31383 See Records Mover test failures here: * https://app.circleci.com/pipelines/github/bluelabsio/records-mover/1089/workflows/e62f1cf0-f8d0-4e22-9652-112df72b02b8/jobs/9439
Code Sample, a copy-pastable example if possible
Problem description
A documentation said:
If I understood correctly this documentation I should get this
Expected Output
But I got this anyway:
Current Output
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 8.1
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.24.2
pytest: None
pip: 19.1.1
setuptools: 40.1.0
Cython: None
numpy: 1.15.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.5
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 4.1.1
bs4: 4.6.0
html5lib: None
sqlalchemy: 1.2.19
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
None
The text was updated successfully, but these errors were encountered: