Skip to content

Commit 50930a9

Browse files
ischurovjreback
authored andcommitted
API/BUG: Fix inconsistency in Partial String Index with 'second' resolution
Closes pandas-dev#14826 Fix inconsistency in Partial String Index with 'second' resolution. See pandas-dev#14826. Now if the timestamp and the index both have resolution `second`, timestamp is considered as an exact match try and not a slice. Therefore, for `Series`, scalar will be returned, for `DataFrame` `KeyError` raised. Author: Ilya V. Schurov <[email protected]> Closes pandas-dev#14856 from ischurov/datetimeindex-slices and squashes the following commits: 2881a53 [Ilya V. Schurov] Merge branch 'datetimeindex-slices' of https://github.com/ischurov/pandas into datetimeindex-slices ac8758e [Ilya V. Schurov] resolved merge conflict in whatsnew/v0.20.0.txt 0e87874 [Ilya V. Schurov] resolved merge conflict in whatsnew/v0.20.0.txt 0814e5b [Ilya V. Schurov] - Addressing code review: added reference to new docs section in whatsnew. d215905 [Ilya V. Schurov] - Addressing code review: documentation clarification. c287845 [Ilya V. Schurov] conflict PR pandas-dev#14856 resolved 40eddc3 [Ilya V. Schurov] - Documentation fixes e17d210 [Ilya V. Schurov] - Whatsnew section added - Documentation section added 67e6bab [Ilya V. Schurov] Addressing code review: more comments added c901588 [Ilya V. Schurov] Addressing code review: testing different combinations with the loop instead of copy-pasting of the code 9b55117 [Ilya V. Schurov] Addressing code review b30039d [Ilya V. Schurov] Make flake8 happy. cc86bdd [Ilya V. Schurov] Fix inconsistency in Partial String Index with 'second' resolution ea51437 [Ilya V. Schurov] Made this code clearer.
1 parent 02906ce commit 50930a9

File tree

4 files changed

+173
-28
lines changed

4 files changed

+173
-28
lines changed

doc/source/timeseries.rst

+70-17
Original file line numberDiff line numberDiff line change
@@ -457,22 +457,6 @@ We are stopping on the included end-point as it is part of the index
457457
458458
dft['2013-1-15':'2013-1-15 12:30:00']
459459
460-
.. warning::
461-
462-
The following selection will raise a ``KeyError``; otherwise this selection methodology
463-
would be inconsistent with other selection methods in pandas (as this is not a *slice*, nor does it
464-
resolve to one)
465-
466-
.. code-block:: python
467-
468-
dft['2013-1-15 12:30:00']
469-
470-
To select a single row, use ``.loc``
471-
472-
.. ipython:: python
473-
474-
dft.loc['2013-1-15 12:30:00']
475-
476460
.. versionadded:: 0.18.0
477461

478462
DatetimeIndex Partial String Indexing also works on DataFrames with a ``MultiIndex``. For example:
@@ -491,10 +475,79 @@ DatetimeIndex Partial String Indexing also works on DataFrames with a ``MultiInd
491475
dft2 = dft2.swaplevel(0, 1).sort_index()
492476
dft2.loc[idx[:, '2013-01-05'], :]
493477
478+
.. _timeseries.slice_vs_exact_match:
479+
480+
Slice vs. exact match
481+
^^^^^^^^^^^^^^^^^^^^^
482+
483+
The same string used as an indexing parameter can be treated either as a slice or as an exact match depending on the resolution of an index. If the string is less accurate than the index, it will be treated as a slice, otherwise as an exact match.
484+
485+
For example, let us consider ``Series`` object which index has minute resolution.
486+
487+
.. ipython:: python
488+
489+
series_minute = pd.Series([1, 2, 3],
490+
pd.DatetimeIndex(['2011-12-31 23:59:00',
491+
'2012-01-01 00:00:00',
492+
'2012-01-01 00:02:00']))
493+
series_minute.index.resolution
494+
495+
Timestamp string less accurate than minute gives ``Series`` object.
496+
497+
.. ipython:: python
498+
499+
series_minute['2011-12-31 23']
500+
501+
Timestamp string with minute resolution (or more accurate) gives scalar instead, i.e. it is not casted to a slice.
502+
503+
.. ipython:: python
504+
505+
series_minute['2011-12-31 23:59']
506+
series_minute['2011-12-31 23:59:00']
507+
508+
If index resolution is second, the minute-accurate timestamp gives ``Series``.
509+
510+
.. ipython:: python
511+
512+
series_second = pd.Series([1, 2, 3],
513+
pd.DatetimeIndex(['2011-12-31 23:59:59',
514+
'2012-01-01 00:00:00',
515+
'2012-01-01 00:00:01']))
516+
series_second.index.resolution
517+
series_second['2011-12-31 23:59']
518+
519+
If the timestamp string is treated as a slice, it can be used to index ``DataFrame`` with ``[]`` as well.
520+
521+
.. ipython:: python
522+
523+
dft_minute = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]},
524+
index=series_minute.index)
525+
dft_minute['2011-12-31 23']
526+
527+
However if the string is treated as an exact match the selection in ``DataFrame``'s ``[]`` will be column-wise and not row-wise, see :ref:`Indexing Basics <indexing.basics>`. For example ``dft_minute['2011-12-31 23:59']`` will raise ``KeyError`` as ``'2012-12-31 23:59'`` has the same resolution as index and there is no column with such name:
528+
529+
To select a single row, use ``.loc``.
530+
531+
.. ipython:: python
532+
533+
dft_minute.loc['2011-12-31 23:59']
534+
535+
Note also that ``DatetimeIndex`` resolution cannot be less precise than day.
536+
537+
.. ipython:: python
538+
539+
series_monthly = pd.Series([1, 2, 3],
540+
pd.DatetimeIndex(['2011-12',
541+
'2012-01',
542+
'2012-02']))
543+
series_monthly.index.resolution
544+
series_monthly['2011-12'] # returns Series
545+
546+
494547
Datetime Indexing
495548
~~~~~~~~~~~~~~~~~
496549

497-
Indexing a ``DateTimeIndex`` with a partial string depends on the "accuracy" of the period, in other words how specific the interval is in relation to the frequency of the index. In contrast, indexing with datetime objects is exact, because the objects have exact meaning. These also follow the semantics of *including both endpoints*.
550+
As discussed in previous section, indexing a ``DateTimeIndex`` with a partial string depends on the "accuracy" of the period, in other words how specific the interval is in relation to the resolution of the index. In contrast, indexing with datetime objects is exact, because the objects have exact meaning. These also follow the semantics of *including both endpoints*.
498551

499552
These ``datetime`` objects are specific ``hours, minutes,`` and ``seconds`` even though they were not explicitly specified (they are ``0``).
500553

doc/source/whatsnew/v0.20.0.txt

+30-2
Original file line numberDiff line numberDiff line change
@@ -193,14 +193,42 @@ in prior versions of pandas) (:issue:`11915`).
193193

194194
.. _whatsnew_0200.api:
195195

196+
Other API Changes
197+
^^^^^^^^^^^^^^^^^
198+
196199
- ``CParserError`` has been renamed to ``ParserError`` in ``pd.read_csv`` and will be removed in the future (:issue:`12665`)
197200
- ``SparseArray.cumsum()`` and ``SparseSeries.cumsum()`` will now always return ``SparseArray`` and ``SparseSeries`` respectively (:issue:`12855`)
201+
- :ref:`DatetimeIndex Partial String Indexing <timeseries.partialindexing>` now works as exact match provided that string resolution coincides with index resolution, including a case when both are seconds (:issue:`14826`). See :ref:`Slice vs. Exact Match <timeseries.slice_vs_exact_match>` for details.
198202

203+
.. ipython:: python
199204

205+
df = DataFrame({'a': [1, 2, 3]}, DatetimeIndex(['2011-12-31 23:59:59',
206+
'2012-01-01 00:00:00',
207+
'2012-01-01 00:00:01']))
208+
Previous Behavior:
200209

210+
.. code-block:: ipython
201211

202-
Other API Changes
203-
^^^^^^^^^^^^^^^^^
212+
In [4]: df['2011-12-31 23:59:59']
213+
Out[4]:
214+
a
215+
2011-12-31 23:59:59 1
216+
217+
In [5]: df['a']['2011-12-31 23:59:59']
218+
Out[5]:
219+
2011-12-31 23:59:59 1
220+
Name: a, dtype: int64
221+
222+
223+
New Behavior:
224+
225+
.. code-block:: ipython
226+
227+
In [4]: df['2011-12-31 23:59:59']
228+
KeyError: '2011-12-31 23:59:59'
229+
230+
In [5]: df['a']['2011-12-31 23:59:59']
231+
Out[5]: 1
204232

205233
.. _whatsnew_0200.deprecations:
206234

pandas/tseries/index.py

+4-6
Original file line numberDiff line numberDiff line change
@@ -1293,14 +1293,12 @@ def _parsed_string_to_bounds(self, reso, parsed):
12931293

12941294
def _partial_date_slice(self, reso, parsed, use_lhs=True, use_rhs=True):
12951295
is_monotonic = self.is_monotonic
1296-
if ((reso in ['day', 'hour', 'minute'] and
1297-
not (self._resolution < Resolution.get_reso(reso) or
1298-
not is_monotonic)) or
1299-
(reso == 'second' and
1300-
not (self._resolution <= Resolution.RESO_SEC or
1301-
not is_monotonic))):
1296+
if (is_monotonic and reso in ['day', 'hour', 'minute', 'second'] and
1297+
self._resolution >= Resolution.get_reso(reso)):
13021298
# These resolution/monotonicity validations came from GH3931,
13031299
# GH3452 and GH2369.
1300+
1301+
# See also GH14826
13041302
raise KeyError
13051303

13061304
if reso == 'microsecond':

pandas/tseries/tests/test_timeseries.py

+69-3
Original file line numberDiff line numberDiff line change
@@ -266,16 +266,15 @@ def test_indexing(self):
266266
expected = ts['2013']
267267
assert_series_equal(expected, ts)
268268

269-
# GH 3925, indexing with a seconds resolution string / datetime object
269+
# GH14826, indexing with a seconds resolution string / datetime object
270270
df = DataFrame(randn(5, 5),
271271
columns=['open', 'high', 'low', 'close', 'volume'],
272272
index=date_range('2012-01-02 18:01:00',
273273
periods=5, tz='US/Central', freq='s'))
274274
expected = df.loc[[df.index[2]]]
275-
result = df['2012-01-02 18:01:02']
276-
assert_frame_equal(result, expected)
277275

278276
# this is a single date, so will raise
277+
self.assertRaises(KeyError, df.__getitem__, '2012-01-02 18:01:02', )
279278
self.assertRaises(KeyError, df.__getitem__, df.index[2], )
280279

281280
def test_recreate_from_data(self):
@@ -4953,6 +4952,73 @@ def test_partial_slice_second_precision(self):
49534952
self.assertRaisesRegexp(KeyError, '2005-1-1 00:00:00',
49544953
lambda: s['2005-1-1 00:00:00'])
49554954

4955+
def test_partial_slicing_dataframe(self):
4956+
# GH14856
4957+
# Test various combinations of string slicing resolution vs.
4958+
# index resolution
4959+
# - If string resolution is less precise than index resolution,
4960+
# string is considered a slice
4961+
# - If string resolution is equal to or more precise than index
4962+
# resolution, string is considered an exact match
4963+
formats = ['%Y', '%Y-%m', '%Y-%m-%d', '%Y-%m-%d %H',
4964+
'%Y-%m-%d %H:%M', '%Y-%m-%d %H:%M:%S']
4965+
resolutions = ['year', 'month', 'day', 'hour', 'minute', 'second']
4966+
for rnum, resolution in enumerate(resolutions[2:], 2):
4967+
# we check only 'day', 'hour', 'minute' and 'second'
4968+
unit = Timedelta("1 " + resolution)
4969+
middate = datetime(2012, 1, 1, 0, 0, 0)
4970+
index = DatetimeIndex([middate - unit,
4971+
middate, middate + unit])
4972+
values = [1, 2, 3]
4973+
df = DataFrame({'a': values}, index, dtype=np.int64)
4974+
self.assertEqual(df.index.resolution, resolution)
4975+
4976+
# Timestamp with the same resolution as index
4977+
# Should be exact match for Series (return scalar)
4978+
# and raise KeyError for Frame
4979+
for timestamp, expected in zip(index, values):
4980+
ts_string = timestamp.strftime(formats[rnum])
4981+
# make ts_string as precise as index
4982+
result = df['a'][ts_string]
4983+
self.assertIsInstance(result, np.int64)
4984+
self.assertEqual(result, expected)
4985+
self.assertRaises(KeyError, df.__getitem__, ts_string)
4986+
4987+
# Timestamp with resolution less precise than index
4988+
for fmt in formats[:rnum]:
4989+
for element, theslice in [[0, slice(None, 1)],
4990+
[1, slice(1, None)]]:
4991+
ts_string = index[element].strftime(fmt)
4992+
4993+
# Series should return slice
4994+
result = df['a'][ts_string]
4995+
expected = df['a'][theslice]
4996+
assert_series_equal(result, expected)
4997+
4998+
# Frame should return slice as well
4999+
result = df[ts_string]
5000+
expected = df[theslice]
5001+
assert_frame_equal(result, expected)
5002+
5003+
# Timestamp with resolution more precise than index
5004+
# Compatible with existing key
5005+
# Should return scalar for Series
5006+
# and raise KeyError for Frame
5007+
for fmt in formats[rnum + 1:]:
5008+
ts_string = index[1].strftime(fmt)
5009+
result = df['a'][ts_string]
5010+
self.assertIsInstance(result, np.int64)
5011+
self.assertEqual(result, 2)
5012+
self.assertRaises(KeyError, df.__getitem__, ts_string)
5013+
5014+
# Not compatible with existing key
5015+
# Should raise KeyError
5016+
for fmt, res in list(zip(formats, resolutions))[rnum + 1:]:
5017+
ts = index[1] + Timedelta("1 " + res)
5018+
ts_string = ts.strftime(fmt)
5019+
self.assertRaises(KeyError, df['a'].__getitem__, ts_string)
5020+
self.assertRaises(KeyError, df.__getitem__, ts_string)
5021+
49565022
def test_partial_slicing_with_multiindex(self):
49575023

49585024
# GH 4758

0 commit comments

Comments
 (0)