PERF: parse non-ambiguous date formats in c

Date parsing is slow on dates of the form "%m.%d.%Y" compared to "%Y-%m-%d".  I am using 0.17.1 so maybe it's improved in 0.18 but I didn't see anything about it.  I provide a benchmark below to compare the two datetime formats.

Some observations:
- Letting pandas figure it out is unacceptably slow.  In small files it won't make much of a difference but it became a massive bottleneck in an application.
- In my particular use case, I know each timestamp in my file will have the same date so it is very easy to do a string replace. And this turns out to be by far the fastest option here.  Even with the actual string replacement time.
- Regex is slow.  I'm sure that would have been obvious but I tried it out for comparison.
- `infer_datetime_format` is the next best option but it appears to be a massive slowdown.  Compare the relative times on the large file to the relative times of the pure read on the small file and date parsing with `infer_datetime_format`.
- A custom date parser doesn't help.  Not sure why it doesn't help.  Maybe it's not compiled?
#### Code Sample

`test.csv` (1 million lines)

```
2016-03-12, 1
2016-03-12, 1
2016-03-12, 1
2016-03-12, 1
...
```

`test2.csv` (5000 lines)

```
03.12.2016, 1
03.12.2016, 1
03.12.2016, 1
03.12.2016, 1
...
```

```
import pandas as pd
from cStringIO import StringIO
import datetime
import re
print "Large file: pure read"
%timeit pd.read_csv('test.csv')

print "Large file: parse dates"
%timeit pd.read_csv('test.csv', parse_dates=[0])

print "Small file: pure read"
%timeit pd.read_csv('test2.csv')

print "Small file: parse dates"
%timeit pd.read_csv('test2.csv', parse_dates=[0])

print "Small file: parse dates with infer datetime format"
%timeit pd.read_csv('test2.csv', parse_dates=[0], infer_datetime_format=True)

print "Small file: string replace then parse dates "
def test_replace():
    s = file('test2.csv', 'r').read()
    s = s.replace('03.12.2016', '2016-03-12')
    pd.read_csv(StringIO(s), parse_dates=[0])
%timeit test_replace()

print "Small file: parse dates with custom date parser"
date_parser=lambda t: datetime.datetime.strptime(t, "%m.%d.%Y")
%timeit pd.read_csv('test2.csv', parse_dates=[0], date_parser=date_parser)

print "Small file: regex sub then parse dates"
def test_re_sub():
    s = file('test2.csv', 'r').read()
    p = re.compile('(?P<month> \d{2}).(?P<day> \d{2}).(?P<year> \d{4})')
    s = p.sub('\g<year>-\g<month>-\g<day>', s)
    pd.read_csv(StringIO(s), parse_dates=[0])
%timeit test_re_sub()
```
#### Output (2012 MacBook Air)

```
Large file: pure read
1 loop, best of 3: 253 ms per loop
Large file: parse dates
1 loop, best of 3: 790 ms per loop

Small file: pure read
100 loops, best of 3: 2.9 ms per loop
Small file: parse dates
1 loop, best of 3: 1.22 s per loop
Small file: parse dates with infer datetime format
10 loops, best of 3: 32.7 ms per loop
Small file: string replace then parse dates 
100 loops, best of 3: 4.97 ms per loop
Small file: parse dates with custom date parser
10 loops, best of 3: 90.6 ms per loop
Small file: regex sub then parse dates
1 loop, best of 3: 1.08 s per loop
```
#### output of `pd.show_versions()`

```
>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.1.0
setuptools: 20.2.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: None
IPython: 4.1.2
sphinx: 1.3.6
patsy: None
dateutil: 2.5.1
pytz: 2016.1
blosc: None
bottleneck: None
tables: None
numexpr: 2.4.6
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
Jinja2: None
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: parse non-ambiguous date formats in c #12667

Code Sample

Output (2012 MacBook Air)

output of `pd.show_versions()`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

PERF: parse non-ambiguous date formats in c #12667

Description

Code Sample

Output (2012 MacBook Air)

output of pd.show_versions()

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

output of `pd.show_versions()`