Skip to content

Pandas v0.15.2 breaks read_csv with skiprows, delim_whitespace=True and explicit naming of columns #9079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cpaulik opened this issue Dec 15, 2014 · 3 comments · Fixed by #9102
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@cpaulik
Copy link

cpaulik commented Dec 15, 2014

Hi,

The latest version of pandas 0.15.2 can no longer read a file that was no problem before.

The file has the following format.

SMOSMANIA  SMOSMANIA       Narbonne          43.15000     2.95670  112.00    0.05    0.05 ThetaProbe-ML2X 
2007/01/01 01:00   0.2140 U M 
2007/01/01 02:00   0.2140 U M 
2007/01/01 03:00   0.2140 U M 

The file can be found here

Since the first line is not directly related to the number of columns below I use skiprows=1 and specify the names explicitly.

fname='https://raw.githubusercontent.com/TUW-GEO/pytesmo/master/tests/test_ismn/test_data/format_header_values/SMOSMANIA/SMOSMANIA_SMOSMANIA_Narbonne_sm_0.050000_0.050000_ThetaProbe-ML2X_20070101_20070131.stm'

pd.read_csv(fname, skiprows=1, delim_whitespace=True, names=['date', 'time', 'variable','flag','orig_flag'])

Please compare the two code sample below. The first using pandas 0.15.2, the second one 0.15.1

0.15.2

In [1]: import pandas as pd

In [2]: from pandas.util.print_versions import show_versions

In [3]: show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.2
nose: 1.3.4
Cython: 0.21.1
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.6.0
IPython: 2.3.1
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.2
pytz: 2014.9
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None

In [4]: pd.read_csv('https://raw.githubusercontent.com/TUW-GEO/pytesmo/master/tests/test_ismn/test_data/format_header_values/SMOSMANIA/SMOSMANIA_SMOSMANIA_Narbonne_sm_0.050000_0.050000_ThetaProbe-ML2X_20070101_20070131.stm', skiprows=1, delim_whitespace=True, names=['date', 'time', 'variable','flag','orig_flag'])
Out[4]: 
Empty DataFrame
Columns: [date, time, variable, flag, orig_flag]
Index: []

0.15.1

In [1]: import pandas as pd

In [2]: from pandas.util.print_versions import show_versions

In [3]: show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.1
nose: 1.3.4
Cython: 0.21.1
numpy: 1.9.1
scipy: 0.14.0
statsmodels: 0.6.0
IPython: 2.3.1
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.2
pytz: 2014.9
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None

In [4]: pd.read_csv('https://raw.githubusercontent.com/TUW-GEO/pytesmo/master/tests/test_ismn/test_data/format_header_values/SMOSMANIA/SMOSMANIA_SMOSMANIA_Narbonne_sm_0.050000_0.050000_ThetaProbe-ML2X_20070101_20070131.stm', skiprows=1, delim_whitespace=True, names=['date', 'time', 'variable','flag','orig_flag'])
Out[4]: 
           date   time  variable flag orig_flag
0    2007/01/01  01:00    0.2140    U         M
1    2007/01/01  02:00    0.2140    U         M
2    2007/01/01  03:00    0.2140    U         M
3    2007/01/01  04:00    0.2140    U         M
4    2007/01/01  05:00    0.2140    U         M
5    2007/01/01  06:00    0.2140    U         M
6    2007/01/01  07:00    0.2135    U         M
7    2007/01/01  08:00    0.2135    U         M
8    2007/01/01  09:00    0.2135    U         M
9    2007/01/01  10:00    0.2140    U         M
10   2007/01/01  11:00    0.2140    U         M
11   2007/01/01  12:00    0.2145    U         M
12   2007/01/01  13:00    0.2149    U         M
13   2007/01/01  14:00    0.2149    U         M
14   2007/01/01  15:00    0.2149    U         M
15   2007/01/01  16:00    0.2145    U         M
16   2007/01/01  17:00    0.2135    U         M
17   2007/01/01  18:00    0.2130    U         M
18   2007/01/01  19:00    0.2130    U         M
19   2007/01/01  20:00    0.2126    U         M
20   2007/01/01  21:00    0.2121    U         M
21   2007/01/01  22:00    0.2121    U       NaN
22   2007/01/01  23:00    0.2116    U         M
23   2007/01/02  00:00    0.2116    U         M
24   2007/01/02  01:00    0.2112    U         M
25   2007/01/02  02:00    0.2107    U         M
26   2007/01/02  03:00    0.2107    U         M
27   2007/01/02  04:00    0.2102    U         M
28   2007/01/02  05:00    0.2098    U         M
29   2007/01/02  06:00    0.2098    U         M
..          ...    ...       ...  ...       ...
711  2007/01/30  18:00    0.1538    U         M
712  2007/01/30  19:00    0.1534    U         M
713  2007/01/30  20:00    0.1534    U         M
714  2007/01/30  21:00    0.1534    U         M
715  2007/01/30  22:00    0.1534    U         M
716  2007/01/30  23:00    0.1534    U         M
717  2007/01/31  00:00    0.1534    U         M
718  2007/01/31  01:00    0.1531    U         M
719  2007/01/31  02:00    0.1531    U         M
720  2007/01/31  03:00    0.1527    U         M
721  2007/01/31  04:00    0.1527    U         M
722  2007/01/31  05:00    0.1524    U         M
723  2007/01/31  06:00    0.1524    U         M
724  2007/01/31  07:00    0.1524    U         M
725  2007/01/31  08:00    0.1521    U         M
726  2007/01/31  09:00    0.1521    U         M
727  2007/01/31  10:00    0.1521    U         M
728  2007/01/31  11:00    0.1524    U         M
729  2007/01/31  12:00    0.1527    U         M
730  2007/01/31  13:00    0.1534    U         M
731  2007/01/31  14:00    0.1541    U         M
732  2007/01/31  15:00    0.1545    U         M
733  2007/01/31  16:00    0.1541    U         M
734  2007/01/31  17:00    0.1538    U         M
735  2007/01/31  18:00    0.1534    U         M
736  2007/01/31  19:00    0.1531    U         M
737  2007/01/31  20:00    0.1527    U         M
738  2007/01/31  21:00    0.1527    U         M
739  2007/01/31  22:00    0.1524    U         M
740  2007/01/31  23:00    0.1524    U         M

[741 rows x 5 columns]
@jreback
Copy link
Contributor

jreback commented Dec 16, 2014

same issue here: bokeh/bokeh#1556

@jreback jreback added IO CSV read_csv, to_csv Bug labels Dec 16, 2014
@jreback jreback added this to the 0.16.0 milestone Dec 16, 2014
@jreback
Copy link
Contributor

jreback commented Dec 16, 2014

cc @selasley

can you have a look? might be related to changes in #8984

note that this example DOES work in the python parser.

@selasley
Copy link
Contributor

The problem is the CR line endings. read_csv works if I change the line endings to LF or CRLF. I have some time this week so I'll work on fixing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants