Dates are parsed with read_csv thousand seperator #4678

hayd · 2013-08-26T19:37:21Z

When reading a csv with a date column, the date is sometimes parsed as a number:

In [1]: s = '06.02.2013;13:00;1.000,215;0,215;0,185;0,205;0,00'

In [2]: pd.read_csv(StringIO(s), sep=';', header=None, parse_dates={'Dates': [0, 1]}, index_col=0, decimal=',', thousands='.')
Out[2]:
                        2      3      4      5  6
Dates
6022013 13:00   1.000,215  0.215  0.185  0.205  0

Here 06.02.2013 is read as a number 0602013 before the date is parsed (which fails)... I think dates are sometimes written this way on the continent (along with . thousands).

This was found in #4322 (but that issue was more about . being ignored), I guess another test case would be with -:

In [3]: s = '06-02-2013;13:00;1.000,215;0,215;0,185;0,205;0,00'

In [4]: pd.read_csv(StringIO(s), sep=';', header=None, parse_dates={'Dates': [0, 1]}, decimal=',', thousands='-')
Out[4]: 
           Dates          2      3      4      5  6
0  6022013 13:00  1.000,215  0.215  0.185  0.205  0

@jreback suggests:

but it should ignore dates columns entirely (for thousands parsing...)

cc #4598 @guyrt

The text was updated successfully, but these errors were encountered:

guyrt · 2013-08-26T21:27:00Z

I'm not an expert on this IO code just yet, but it would seem that maybe the numeric parser is running first? In that case, we wouldn't even try the datetime converter, would we?

https://github.com/pydata/pandas/blob/master/pandas/parser.pyx#L1648

jreback · 2013-08-26T21:37:39Z

things are parsed (with thousands/decimal substituions) then passed to the dtype converter (and na converter), so I think this would have to change based on if parse_dates is True for a particular column; might be tricky (or not)

jreback · 2013-09-21T17:30:05Z

@guyrt having a look at this?

guyrt · 2013-09-23T02:32:36Z

@jreback I am. Got sidetracked on a few other things, but I'll carve out some time to look at it over the next few days. What I know so far is that the second example works on the python parser. It's not clear yet what is causing it to fail on the c parser but I'll keep digging.

The first example is a problem with the date parser, which doesn't parse the day part correctly.

guyrt · 2013-09-23T03:44:41Z

Fix for C parser submitted, but I found an error in Python parser as well. That one will come in next commit.

#4945

Fixes issue where thousands separator could conflict with date parsing. This is only fixed in the C parser. Closes issue pandas-dev#4678

guyrt mentioned this issue Sep 23, 2013

BUG: Conflict between thousands sep and date parser. #4945

Merged

guyrt added a commit to guyrt/pandas that referenced this issue Sep 23, 2013

BUG: Conflict between thousands sep and date parser.

c6bf2eb

Fixes issue where thousands separator could conflict with date parsing. This is only fixed in the C parser. Closes issue pandas-dev#4678

guyrt added a commit to guyrt/pandas that referenced this issue Sep 24, 2013

BUG: fix issue pandas-dev#4678 for Python parser

fedb26d

jreback closed this as completed in #4945 Sep 26, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dates are parsed with read_csv thousand seperator #4678

Dates are parsed with read_csv thousand seperator #4678

hayd commented Aug 26, 2013

guyrt commented Aug 26, 2013

jreback commented Aug 26, 2013

jreback commented Sep 21, 2013

guyrt commented Sep 23, 2013

guyrt commented Sep 23, 2013

Dates are parsed with read_csv thousand seperator #4678

Dates are parsed with read_csv thousand seperator #4678

Comments

hayd commented Aug 26, 2013

guyrt commented Aug 26, 2013

jreback commented Aug 26, 2013

jreback commented Sep 21, 2013

guyrt commented Sep 23, 2013

guyrt commented Sep 23, 2013