-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix inconsistency in Partial String Index with 'second' resolution #14856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
ea51437
cc86bdd
b30039d
9b55117
c901588
67e6bab
e17d210
40eddc3
c287845
d215905
0814e5b
0e87874
ac8758e
2881a53
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -457,22 +457,6 @@ We are stopping on the included end-point as it is part of the index | |
|
||
dft['2013-1-15':'2013-1-15 12:30:00'] | ||
|
||
.. warning:: | ||
|
||
The following selection will raise a ``KeyError``; otherwise this selection methodology | ||
would be inconsistent with other selection methods in pandas (as this is not a *slice*, nor does it | ||
resolve to one) | ||
|
||
.. code-block:: python | ||
|
||
dft['2013-1-15 12:30:00'] | ||
|
||
To select a single row, use ``.loc`` | ||
|
||
.. ipython:: python | ||
|
||
dft.loc['2013-1-15 12:30:00'] | ||
|
||
.. versionadded:: 0.18.0 | ||
|
||
DatetimeIndex Partial String Indexing also works on DataFrames with a ``MultiIndex``. For example: | ||
|
@@ -491,10 +475,79 @@ DatetimeIndex Partial String Indexing also works on DataFrames with a ``MultiInd | |
dft2 = dft2.swaplevel(0, 1).sort_index() | ||
dft2.loc[idx[:, '2013-01-05'], :] | ||
|
||
.. _timeseries.slice_vs_exact_match: | ||
|
||
Slice vs. exact match | ||
^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
The same string used as an indexing parameter can be treated either as a slice or as an exact match depending on the resolution of an index. If the string is less accurate than the index, it will be treated as a slice, otherwise as an exact match. | ||
|
||
For example, let us consider ``Series`` object which index has minute resolution. | ||
|
||
.. ipython:: python | ||
|
||
series_minute = pd.Series([1, 2, 3], | ||
pd.DatetimeIndex(['2011-12-31 23:59:00', | ||
'2012-01-01 00:00:00', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add some more commentary, typically you use smaller ipython blocks, somethign like this is seconds resolution this key is treated like this this key is treated differently e.g. just reads better There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. d215905 |
||
'2012-01-01 00:02:00'])) | ||
series_minute.index.resolution | ||
|
||
Timestamp string less accurate than minute gives ``Series`` object. | ||
|
||
.. ipython:: python | ||
|
||
series_minute['2011-12-31 23'] | ||
|
||
Timestamp string with minute resolution (or more accurate) gives scalar instead, i.e. it is not casted to a slice. | ||
|
||
.. ipython:: python | ||
|
||
series_minute['2011-12-31 23:59'] | ||
series_minute['2011-12-31 23:59:00'] | ||
|
||
If index resolution is second, the minute-accurate timestamp gives ``Series``. | ||
|
||
.. ipython:: python | ||
|
||
series_second = pd.Series([1, 2, 3], | ||
pd.DatetimeIndex(['2011-12-31 23:59:59', | ||
'2012-01-01 00:00:00', | ||
'2012-01-01 00:00:01'])) | ||
series_second.index.resolution | ||
series_second['2011-12-31 23:59'] | ||
|
||
If the timestamp string is treated as a slice, it can be used to index ``DataFrame`` with ``[]`` as well. | ||
|
||
.. ipython:: python | ||
|
||
dft_minute = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}, | ||
index=series_minute.index) | ||
dft_minute['2011-12-31 23'] | ||
|
||
However if the string is treated as an exact match the selection in ``DataFrame``'s ``[]`` will be column-wise and not row-wise, see :ref:`Indexing Basics <indexing.basics>`. For example ``dft_minute['2011-12-31 23:59']`` will raise ``KeyError`` as ``'2012-12-31 23:59'`` has the same resolution as index and there is no column with such name: | ||
|
||
To select a single row, use ``.loc``. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. clarify this a bit (as compared to the previous section). IOW this warning was in isolation before, but now you have a whole section on what works / doesn't work, so the warning needs some reworking to avoid being redundant. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, thanks. Done. d215905 |
||
.. ipython:: python | ||
|
||
dft_minute.loc['2011-12-31 23:59'] | ||
|
||
Note also that ``DatetimeIndex`` resolution cannot be less precise than day. | ||
|
||
.. ipython:: python | ||
|
||
series_monthly = pd.Series([1, 2, 3], | ||
pd.DatetimeIndex(['2011-12', | ||
'2012-01', | ||
'2012-02'])) | ||
series_monthly.index.resolution | ||
series_monthly['2011-12'] # returns Series | ||
|
||
|
||
Datetime Indexing | ||
~~~~~~~~~~~~~~~~~ | ||
|
||
Indexing a ``DateTimeIndex`` with a partial string depends on the "accuracy" of the period, in other words how specific the interval is in relation to the frequency of the index. In contrast, indexing with datetime objects is exact, because the objects have exact meaning. These also follow the semantics of *including both endpoints*. | ||
As discussed in previous section, indexing a ``DateTimeIndex`` with a partial string depends on the "accuracy" of the period, in other words how specific the interval is in relation to the resolution of the index. In contrast, indexing with datetime objects is exact, because the objects have exact meaning. These also follow the semantics of *including both endpoints*. | ||
|
||
These ``datetime`` objects are specific ``hours, minutes,`` and ``seconds`` even though they were not explicitly specified (they are ``0``). | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -266,16 +266,15 @@ def test_indexing(self): | |
expected = ts['2013'] | ||
assert_series_equal(expected, ts) | ||
|
||
# GH 3925, indexing with a seconds resolution string / datetime object | ||
# GH14826, indexing with a seconds resolution string / datetime object | ||
df = DataFrame(randn(5, 5), | ||
columns=['open', 'high', 'low', 'close', 'volume'], | ||
index=date_range('2012-01-02 18:01:00', | ||
periods=5, tz='US/Central', freq='s')) | ||
expected = df.loc[[df.index[2]]] | ||
result = df['2012-01-02 18:01:02'] | ||
assert_frame_equal(result, expected) | ||
|
||
# this is a single date, so will raise | ||
self.assertRaises(KeyError, df.__getitem__, '2012-01-02 18:01:02', ) | ||
self.assertRaises(KeyError, df.__getitem__, df.index[2], ) | ||
|
||
def test_recreate_from_data(self): | ||
|
@@ -4953,6 +4952,73 @@ def test_partial_slice_second_precision(self): | |
self.assertRaisesRegexp(KeyError, '2005-1-1 00:00:00', | ||
lambda: s['2005-1-1 00:00:00']) | ||
|
||
def test_partial_slicing_dataframe(self): | ||
# GH14856 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you give a 1-2 lines about what are asserting here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. 67e6bab |
||
# Test various combinations of string slicing resolution vs. | ||
# index resolution | ||
# - If string resolution is less precise than index resolution, | ||
# string is considered a slice | ||
# - If string resolution is equal to or more precise than index | ||
# resolution, string is considered an exact match | ||
formats = ['%Y', '%Y-%m', '%Y-%m-%d', '%Y-%m-%d %H', | ||
'%Y-%m-%d %H:%M', '%Y-%m-%d %H:%M:%S'] | ||
resolutions = ['year', 'month', 'day', 'hour', 'minute', 'second'] | ||
for rnum, resolution in enumerate(resolutions[2:], 2): | ||
# we check only 'day', 'hour', 'minute' and 'second' | ||
unit = Timedelta("1 " + resolution) | ||
middate = datetime(2012, 1, 1, 0, 0, 0) | ||
index = DatetimeIndex([middate - unit, | ||
middate, middate + unit]) | ||
values = [1, 2, 3] | ||
df = DataFrame({'a': values}, index, dtype=np.int64) | ||
self.assertEqual(df.index.resolution, resolution) | ||
|
||
# Timestamp with the same resolution as index | ||
# Should be exact match for Series (return scalar) | ||
# and raise KeyError for Frame | ||
for timestamp, expected in zip(index, values): | ||
ts_string = timestamp.strftime(formats[rnum]) | ||
# make ts_string as precise as index | ||
result = df['a'][ts_string] | ||
self.assertIsInstance(result, np.int64) | ||
self.assertEqual(result, expected) | ||
self.assertRaises(KeyError, df.__getitem__, ts_string) | ||
|
||
# Timestamp with resolution less precise than index | ||
for fmt in formats[:rnum]: | ||
for element, theslice in [[0, slice(None, 1)], | ||
[1, slice(1, None)]]: | ||
ts_string = index[element].strftime(fmt) | ||
|
||
# Series should return slice | ||
result = df['a'][ts_string] | ||
expected = df['a'][theslice] | ||
assert_series_equal(result, expected) | ||
|
||
# Frame should return slice as well | ||
result = df[ts_string] | ||
expected = df[theslice] | ||
assert_frame_equal(result, expected) | ||
|
||
# Timestamp with resolution more precise than index | ||
# Compatible with existing key | ||
# Should return scalar for Series | ||
# and raise KeyError for Frame | ||
for fmt in formats[rnum + 1:]: | ||
ts_string = index[1].strftime(fmt) | ||
result = df['a'][ts_string] | ||
self.assertIsInstance(result, np.int64) | ||
self.assertEqual(result, 2) | ||
self.assertRaises(KeyError, df.__getitem__, ts_string) | ||
|
||
# Not compatible with existing key | ||
# Should raise KeyError | ||
for fmt, res in list(zip(formats, resolutions))[rnum + 1:]: | ||
ts = index[1] + Timedelta("1 " + res) | ||
ts_string = ts.strftime(fmt) | ||
self.assertRaises(KeyError, df['a'].__getitem__, ts_string) | ||
self.assertRaises(KeyError, df.__getitem__, ts_string) | ||
|
||
def test_partial_slicing_with_multiindex(self): | ||
|
||
# GH 4758 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a small reorg to do after we merge this (on the doc sections). But I think just easier for me to do it than explain. :>