Skip to content

read_csv skiprows vs. read_excel #16084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ragesz opened this issue Apr 21, 2017 · 2 comments
Closed

read_csv skiprows vs. read_excel #16084

ragesz opened this issue Apr 21, 2017 · 2 comments
Labels
Duplicate Report Duplicate issue or pull request IO Excel read_excel, to_excel

Comments

@ragesz
Copy link

ragesz commented Apr 21, 2017

As I see pandas.read_csv() and pandas.read_excel() handle differently the skiprows argument. I have the same data in a CSV file and in an Excel file:

abc  def
  1   10
  2   11
  3   12
  4   13
  5   14

I want to use different column names when I read the data, so I specify the desired column names in names argument and I skip the first (header) row of my CSV & Excel files:
pd.read_excel('test.xlsx', skiprows=0, names=['foo', 'bar']) returns with my expected result:

   foo  bar
0    1   10
1    2   11
2    3   12
3    4   13
4    5   14

I get the same expected result with pd.read_csv('test.csv', skiprows=1, names=['foo', 'bar']). But pd.read_csv('test.csv', skiprows=0, names=['foo', 'bar']) keeps the first (header) row of the input file:

   foo  bar
0  abc  def
1    1   10
2    2   11
3    3   12
4    4   13
5    5   14

Is this the expected behavior of skiprows or something is wrong at pandas.read_csv()?

@TomAugspurger
Copy link
Contributor

I think I would do this using the header argument.

In [17]: pd.read_excel("foo.xlsx", names=['foo', 'bar'], header=0)
Out[17]:
   foo  bar
0    1   10
1    2   11
2    3   12
3    4   13
4    5   14

In [18]: pd.read_csv("foo.csv", names=['foo', 'bar'], header=0)
Out[18]:
   foo  bar
0    1   10
1    2   11
2    3   12
3    4   13
4    5   14

This is a symptom of #11889

@TomAugspurger TomAugspurger added IO Excel read_excel, to_excel Duplicate Report Duplicate issue or pull request labels Apr 21, 2017
@TomAugspurger TomAugspurger added this to the No action milestone Apr 21, 2017
@TomAugspurger
Copy link
Contributor

Since you actually hit that API inconsistency, your input would be appreciated in #11889

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

2 participants