Skip to content

Commit c2ea0d4

Browse files
committed
BUG: each date parsing funcs results differently
1 parent e660c05 commit c2ea0d4

File tree

10 files changed

+713
-310
lines changed

10 files changed

+713
-310
lines changed

doc/source/timeseries.rst

+53-10
Original file line numberDiff line numberDiff line change
@@ -71,37 +71,69 @@ Resample:
7171
ts.resample('D', how='mean')
7272
7373
74+
.. _timeseries.overview:
75+
76+
Overview
77+
--------
78+
79+
Following table shows the type of time-related classes pandas can handle and
80+
how to create them.
81+
82+
================= ============================== ==================================================
83+
Class Remarks How to create
84+
================= ============================== ==================================================
85+
``Timestamp`` Represents a single time stamp ``to_datetime``, ``Timestamp``
86+
``DatetimeIndex`` Index of ``Timestamps`` ``to_datetime``, ``date_range``, ``DatetimeIndex``
87+
``Period`` Represents a single time span ``Period``
88+
``PeriodIndex`` Index of ``Period`` ``period_range``, ``PeriodIndex``
89+
================= ============================== ==================================================
90+
7491
.. _timeseries.representation:
7592

7693
Time Stamps vs. Time Spans
7794
--------------------------
7895

7996
Time-stamped data is the most basic type of timeseries data that associates
8097
values with points in time. For pandas objects it means using the points in
81-
time to create the index
98+
time.
8299

83100
.. ipython:: python
84101
85-
dates = [datetime(2012, 5, 1), datetime(2012, 5, 2), datetime(2012, 5, 3)]
86-
ts = Series(np.random.randn(3), dates)
87-
88-
type(ts.index)
89-
90-
ts
102+
Timestamp(datetime(2012, 5, 1))
103+
Timestamp('2012-05-01')
91104
92105
However, in many cases it is more natural to associate things like change
93-
variables with a time span instead.
106+
variables with a time span instead. The span represented by ``Period`` can be
107+
specified explicitly, or inferred from datetime string format.
94108

95109
For example:
96110

97111
.. ipython:: python
98112
99-
periods = PeriodIndex([Period('2012-01'), Period('2012-02'),
100-
Period('2012-03')])
113+
Period('2011-01')
114+
115+
Period('2012-05', freq='D')
116+
117+
``Timestamp`` and ``Period`` can be the index. Lists of ``Timestamp`` and
118+
``Period`` are automatically coerce to ``DatetimeIndex`` and ``PeriodIndex``
119+
respectively.
120+
121+
.. ipython:: python
122+
123+
dates = [Timestamp('2012-05-01'), Timestamp('2012-05-02'), Timestamp('2012-05-03')]
124+
ts = Series(np.random.randn(3), dates)
125+
126+
type(ts.index)
127+
ts.index
128+
129+
ts
130+
131+
periods = [Period('2012-01'), Period('2012-02'), Period('2012-03')]
101132
102133
ts = Series(np.random.randn(3), periods)
103134
104135
type(ts.index)
136+
ts.index
105137
106138
ts
107139
@@ -150,6 +182,17 @@ you can pass the ``dayfirst`` flag:
150182
considerably and on versions later then 0.13.0 explicitly specifying
151183
a format string of '%Y%m%d' takes a faster path still.
152184

185+
If you pass a single string to ``to_datetime``, it returns single ``Timestamp``.
186+
Also, ``Timestamp`` can accept the string input.
187+
Note that ``Timestamp`` doesn't accept string parsing option like ``dayfirst``
188+
or ``format``, use ``to_datetime`` if these are required.
189+
190+
.. ipython:: python
191+
192+
to_datetime('2010/11/12')
193+
194+
Timestamp('2010/11/12')
195+
153196
154197
Invalid Data
155198
~~~~~~~~~~~~

doc/source/whatsnew/v0.17.0.txt

+39
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,45 @@ New features
3333
Other enhancements
3434
^^^^^^^^^^^^^^^^^^
3535

36+
- ``DatetimeIndex`` can be instantiated using strings contains ``NaT`` (:issue:`7599`)
37+
- The string parsing of ``to_datetime``, ``Timestamp`` and ``DatetimeIndex`` has been made consistent" (:issue:`7599`)
38+
39+
Prior to v0.17.0, ``Timestamp`` and ``to_datetime`` may parse year-only datetime-string incorrectly using today's date, otherwise ``DatetimeIndex`` uses the beginning of the year.
40+
``Timestamp`` and ``to_datetime`` may raise ``ValueError`` in some types of datetime-string which ``DatetimeIndex`` can parse, such as quarterly string.
41+
42+
Previous Behavior
43+
44+
.. code-block:: python
45+
46+
In [1]: Timestamp('2012Q2')
47+
Traceback
48+
...
49+
ValueError: Unable to parse 2012Q2
50+
51+
# Results in today's date.
52+
In [2]: Timestamp('2014')
53+
Out [2]: 2014-08-12 00:00:00
54+
55+
v0.17.0 can parse them as below. It works on ``DatetimeIndex`` also.
56+
57+
New Behaviour
58+
59+
.. ipython:: python
60+
61+
Timestamp('2012Q2')
62+
Timestamp('2014')
63+
DatetimeIndex(['2012Q2', '2014'])
64+
65+
.. note:: If you want to perform calculations based on today's date, use ``Timestamp.now()`` and ``pandas.tseries.offsets``.
66+
67+
.. ipython:: python
68+
69+
import pandas.tseries.offsets as offsets
70+
Timestamp.now()
71+
Timestamp.now() + offsets.DateOffset(years=1)
72+
73+
- ``to_datetime`` can now accept ``yearfirst`` keyword (:issue:`7599`)
74+
3675
- ``.as_blocks`` will now take a ``copy`` optional argument to return a copy of the data, default is to copy (no change in behavior from prior versions), (:issue:`9607`)
3776

3877
- ``regex`` argument to ``DataFrame.filter`` now handles numeric column names instead of raising ``ValueError`` (:issue:`10384`).

pandas/io/parsers.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -2047,8 +2047,9 @@ def _make_date_converter(date_parser=None, dayfirst=False,
20472047
def converter(*date_cols):
20482048
if date_parser is None:
20492049
strs = _concat_date_cols(date_cols)
2050+
20502051
try:
2051-
return tools.to_datetime(
2052+
return tools._to_datetime(
20522053
com._ensure_object(strs),
20532054
utc=None,
20542055
box=False,

pandas/tseries/frequencies.py

+6-18
Original file line numberDiff line numberDiff line change
@@ -314,14 +314,12 @@ def _get_freq_str(base, mult=1):
314314
}
315315

316316
need_suffix = ['QS', 'BQ', 'BQS', 'AS', 'BA', 'BAS']
317-
_months = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP',
318-
'OCT', 'NOV', 'DEC']
319317
for __prefix in need_suffix:
320-
for _m in _months:
318+
for _m in tslib._MONTHS:
321319
_offset_to_period_map['%s-%s' % (__prefix, _m)] = \
322320
_offset_to_period_map[__prefix]
323321
for __prefix in ['A', 'Q']:
324-
for _m in _months:
322+
for _m in tslib._MONTHS:
325323
_alias = '%s-%s' % (__prefix, _m)
326324
_offset_to_period_map[_alias] = _alias
327325

@@ -1188,12 +1186,7 @@ def is_superperiod(source, target):
11881186
return target in ['N']
11891187

11901188

1191-
def _get_rule_month(source, default='DEC'):
1192-
source = source.upper()
1193-
if '-' not in source:
1194-
return default
1195-
else:
1196-
return source.split('-')[1]
1189+
_get_rule_month = tslib._get_rule_month
11971190

11981191

11991192
def _is_annual(rule):
@@ -1224,15 +1217,10 @@ def _is_weekly(rule):
12241217

12251218
DAYS = ['MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN']
12261219

1227-
MONTHS = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL',
1228-
'AUG', 'SEP', 'OCT', 'NOV', 'DEC']
1229-
1230-
_month_numbers = dict((k, i) for i, k in enumerate(MONTHS))
1231-
1232-
1220+
MONTHS = tslib._MONTHS
1221+
_month_numbers = tslib._MONTH_NUMBERS
1222+
_month_aliases = tslib._MONTH_ALIASES
12331223
_weekday_rule_aliases = dict((k, v) for k, v in enumerate(DAYS))
1234-
_month_aliases = dict((k + 1, v) for k, v in enumerate(MONTHS))
1235-
12361224

12371225
def _is_multiple(us, mult):
12381226
return us % mult == 0

pandas/tseries/index.py

+13-21
Original file line numberDiff line numberDiff line change
@@ -239,8 +239,9 @@ def __new__(cls, data=None,
239239

240240
# try a few ways to make it datetime64
241241
if lib.is_string_array(data):
242-
data = _str_to_dt_array(data, freq, dayfirst=dayfirst,
243-
yearfirst=yearfirst)
242+
data = tslib.parse_str_array_to_datetime(data, freq=freq,
243+
dayfirst=dayfirst,
244+
yearfirst=yearfirst)
244245
else:
245246
data = tools.to_datetime(data, errors='raise')
246247
data.offset = freq
@@ -254,8 +255,9 @@ def __new__(cls, data=None,
254255
return data
255256

256257
if issubclass(data.dtype.type, compat.string_types):
257-
data = _str_to_dt_array(data, freq, dayfirst=dayfirst,
258-
yearfirst=yearfirst)
258+
data = tslib.parse_str_array_to_datetime(data, freq=freq,
259+
dayfirst=dayfirst,
260+
yearfirst=yearfirst)
259261

260262
if issubclass(data.dtype.type, np.datetime64):
261263
if isinstance(data, ABCSeries):
@@ -288,8 +290,9 @@ def __new__(cls, data=None,
288290
values = data
289291

290292
if lib.is_string_array(values):
291-
subarr = _str_to_dt_array(values, freq, dayfirst=dayfirst,
292-
yearfirst=yearfirst)
293+
subarr = tslib.parse_str_array_to_datetime(values, freq=freq, dayfirst=dayfirst,
294+
yearfirst=yearfirst)
295+
293296
else:
294297
try:
295298
subarr = tools.to_datetime(data, box=False)
@@ -298,11 +301,11 @@ def __new__(cls, data=None,
298301
if isinstance(subarr, ABCSeries):
299302
subarr = subarr.values
300303
if subarr.dtype == np.object_:
301-
subarr = tools.to_datetime(subarr, box=False)
304+
subarr = tools._to_datetime(subarr, box=False)
302305

303306
except ValueError:
304307
# tz aware
305-
subarr = tools.to_datetime(data, box=False, utc=True)
308+
subarr = tools._to_datetime(data, box=False, utc=True)
306309

307310
if not np.issubdtype(subarr.dtype, np.datetime64):
308311
raise ValueError('Unable to convert %s to datetime dtype'
@@ -332,7 +335,7 @@ def __new__(cls, data=None,
332335
if inferred != freq.freqstr:
333336
on_freq = cls._generate(subarr[0], None, len(subarr), None, freq, tz=tz)
334337
if not np.array_equal(subarr.asi8, on_freq.asi8):
335-
raise ValueError('Inferred frequency {0} from passed dates does not'
338+
raise ValueError('Inferred frequency {0} from passed dates does not '
336339
'conform to passed frequency {1}'.format(inferred, freq.freqstr))
337340

338341
if freq_infer:
@@ -534,7 +537,7 @@ def _cached_range(cls, start=None, end=None, periods=None, offset=None,
534537
xdr = generate_range(offset=offset, start=_CACHE_START,
535538
end=_CACHE_END)
536539

537-
arr = tools.to_datetime(list(xdr), box=False)
540+
arr = tools._to_datetime(list(xdr), box=False)
538541

539542
cachedRange = DatetimeIndex._simple_new(arr)
540543
cachedRange.offset = offset
@@ -1926,17 +1929,6 @@ def _to_m8(key, tz=None):
19261929
return np.int64(tslib.pydt_to_i8(key)).view(_NS_DTYPE)
19271930

19281931

1929-
def _str_to_dt_array(arr, offset=None, dayfirst=None, yearfirst=None):
1930-
def parser(x):
1931-
result = parse_time_string(x, offset, dayfirst=dayfirst,
1932-
yearfirst=yearfirst)
1933-
return result[0]
1934-
1935-
arr = np.asarray(arr, dtype=object)
1936-
data = _algos.arrmap_object(arr, parser)
1937-
return tools.to_datetime(data)
1938-
1939-
19401932
_CACHE_START = Timestamp(datetime(1950, 1, 1))
19411933
_CACHE_END = Timestamp(datetime(2030, 1, 1))
19421934

pandas/tseries/offsets.py

+1-16
Original file line numberDiff line numberDiff line change
@@ -1508,22 +1508,7 @@ def onOffset(self, dt):
15081508
modMonth = (dt.month - self.startingMonth) % 3
15091509
return BMonthEnd().onOffset(dt) and modMonth == 0
15101510

1511-
1512-
_int_to_month = {
1513-
1: 'JAN',
1514-
2: 'FEB',
1515-
3: 'MAR',
1516-
4: 'APR',
1517-
5: 'MAY',
1518-
6: 'JUN',
1519-
7: 'JUL',
1520-
8: 'AUG',
1521-
9: 'SEP',
1522-
10: 'OCT',
1523-
11: 'NOV',
1524-
12: 'DEC'
1525-
}
1526-
1511+
_int_to_month = tslib._MONTH_ALIASES
15271512
_month_to_int = dict((v, k) for k, v in _int_to_month.items())
15281513

15291514

pandas/tseries/tests/test_timeseries.py

+28
Original file line numberDiff line numberDiff line change
@@ -1432,6 +1432,25 @@ def test_dti_constructor_preserve_dti_freq(self):
14321432
rng2 = DatetimeIndex(rng)
14331433
self.assertEqual(rng.freq, rng2.freq)
14341434

1435+
def test_dti_constructor_years_only(self):
1436+
# GH 6961
1437+
for tz in [None, 'UTC', 'Asia/Tokyo', 'dateutil/US/Pacific']:
1438+
rng1 = date_range('2014', '2015', freq='M', tz=tz)
1439+
expected1 = date_range('2014-01-31', '2014-12-31', freq='M', tz=tz)
1440+
1441+
rng2 = date_range('2014', '2015', freq='MS', tz=tz)
1442+
expected2 = date_range('2014-01-01', '2015-01-01', freq='MS', tz=tz)
1443+
1444+
rng3 = date_range('2014', '2020', freq='A', tz=tz)
1445+
expected3 = date_range('2014-12-31', '2019-12-31', freq='A', tz=tz)
1446+
1447+
rng4 = date_range('2014', '2020', freq='AS', tz=tz)
1448+
expected4 = date_range('2014-01-01', '2020-01-01', freq='AS', tz=tz)
1449+
1450+
for rng, expected in [(rng1, expected1), (rng2, expected2),
1451+
(rng3, expected3), (rng4, expected4)]:
1452+
tm.assert_index_equal(rng, expected)
1453+
14351454
def test_normalize(self):
14361455
rng = date_range('1/1/2000 9:30', periods=10, freq='D')
14371456

@@ -2146,6 +2165,15 @@ def test_constructor_coverage(self):
21462165
from_ints = DatetimeIndex(expected.asi8)
21472166
self.assertTrue(from_ints.equals(expected))
21482167

2168+
# string with NaT
2169+
strings = np.array(['2000-01-01', '2000-01-02', 'NaT'])
2170+
result = DatetimeIndex(strings)
2171+
expected = DatetimeIndex(strings.astype('O'))
2172+
self.assertTrue(result.equals(expected))
2173+
2174+
from_ints = DatetimeIndex(expected.asi8)
2175+
self.assertTrue(from_ints.equals(expected))
2176+
21492177
# non-conforming
21502178
self.assertRaises(ValueError, DatetimeIndex,
21512179
['2000-01-01', '2000-01-02', '2000-01-04'],

0 commit comments

Comments
 (0)