-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix inconsistency in Partial String Index with 'second' resolution #14856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
See pandas-dev#14826. Now the following logic applies: - If timestamp resolution is strictly less precise than index resolution, timetamp is a slice as it can (in theory) correspond to more than one elements in the index. For `Series`, `[]` should return `Series`, for `DataFrame` — `DataFrame`. - If timestamp resolution is equal to index resolution, then timestamp is considered as an attempt to get a kind of "exact match". For `Series`, `[]` should return scalar, for `DataFrame` — try to find column with this key (if any), and most probably raise `KeyError`. - If timestamp resolution is strictly more precise than index resolution and does not resolve to exact match, `KeyError` have to be raised in both cases. Testsuite is updated as well.
@@ -0,0 +1,163 @@ | |||
import numpy as np | |||
|
|||
import pandas.util.testing as tm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of creating a new files, these should go with the existing tests in test_timeseries.py
further, don't create new ways of testing, just follow along the existing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll move it to test_timeseries.py
. Should I keep it in the separate class or unify with one of the existing? What do you mean under "new ways of testing"? Is it okay to add functions like assert_exact
that I did?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put it with the existing tests
pls following the existing testing methodologies
iow don't create your own helpers it's not standard
I am +1 on the idea of this PR. |
@jorisvandenbossche no was fine with it. It was a fix designed for DataFrame, but broke Series (which was never tested), so for sure a bug. I don't think there is a problem with simply changing this, it would give the wrong shape back previously. |
- new tests moved to test_timeseries.py and refactored to avoid creating non-standard helper functions
Current coverage is 84.64% (diff: 100%)@@ master #14856 diff @@
==========================================
Files 144 144
Lines 51016 51016
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 43185 43184 -1
- Misses 7831 7832 +1
Partials 0 0
|
I addressed code review comments by @jreback. |
'2012-01-01 00', | ||
'2012-01-01 01']), | ||
dtype=np.int64) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are repeating lots of tests with only a slight variation
can u so it instead with a loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Actually, it was the reason why I wrote those helper functions initially.
Okay, I united tests for all resolutions into one loop. IMO it become less readable, but reflects the logic properly.
…instead of copy-pasting of the code
@@ -4941,6 +4940,62 @@ def test_partial_slice_second_precision(self): | |||
self.assertRaisesRegexp(KeyError, '2005-1-1 00:00:00', | |||
lambda: s['2005-1-1 00:00:00']) | |||
|
|||
def test_partial_slicing_dataframe(self): | |||
# GH14856 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give a 1-2 lines about what are asserting here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. 67e6bab
pls add a whatsnew entry (0.20.0) in other api changes. a small sub-section, showing the change. |
- Documentation section added
bc96430
to
40eddc3
Compare
@jreback I added whatsnew subsection and also edited documentation a little bit. |
|
||
series_minute = pd.Series([1, 2, 3], | ||
pd.DatetimeIndex(['2011-12-31 23:59:00', | ||
'2012-01-01 00:00:00', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some more commentary, typically you use smaller ipython blocks, somethign like
this is seconds resolution
example here
this key is treated like this
example here
this key is treated differently
e.g.
just reads better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. d215905
dft_minute['2011-12-31 23'] | ||
|
||
.. warning:: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarify this a bit (as compared to the previous section). IOW this warning was in isolation before, but now you have a whole section on what works / doesn't work, so the warning needs some reworking to avoid being redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks. Done. d215905
@@ -193,6 +193,37 @@ Map on Index types now return other Index types | |||
Other API Changes | |||
^^^^^^^^^^^^^^^^^ | |||
|
|||
- :ref:`DatetimeIndex Partial String Indexing <timeseries.partialindexing>` now works as exact match provided that string resolution coincides with index resolution (:issue:`14826`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put a ref to the new code section as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean new docs section? Added. 0814e5b
…ndas into datetimeindex-slices
@@ -491,10 +475,79 @@ DatetimeIndex Partial String Indexing also works on DataFrames with a ``MultiInd | |||
dft2 = dft2.swaplevel(0, 1).sort_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a small reorg to do after we merge this (on the doc sections). But I think just easier for me to do it than explain. :>
thanks! pls review the docs when built (will prob take a few hours) and issue a follow up if needed. |
…lution Closes pandas-dev#14826 Fix inconsistency in Partial String Index with 'second' resolution. See pandas-dev#14826. Now if the timestamp and the index both have resolution `second`, timestamp is considered as an exact match try and not a slice. Therefore, for `Series`, scalar will be returned, for `DataFrame` `KeyError` raised. Author: Ilya V. Schurov <[email protected]> Closes pandas-dev#14856 from ischurov/datetimeindex-slices and squashes the following commits: 2881a53 [Ilya V. Schurov] Merge branch 'datetimeindex-slices' of https://github.com/ischurov/pandas into datetimeindex-slices ac8758e [Ilya V. Schurov] resolved merge conflict in whatsnew/v0.20.0.txt 0e87874 [Ilya V. Schurov] resolved merge conflict in whatsnew/v0.20.0.txt 0814e5b [Ilya V. Schurov] - Addressing code review: added reference to new docs section in whatsnew. d215905 [Ilya V. Schurov] - Addressing code review: documentation clarification. c287845 [Ilya V. Schurov] conflict PR pandas-dev#14856 resolved 40eddc3 [Ilya V. Schurov] - Documentation fixes e17d210 [Ilya V. Schurov] - Whatsnew section added - Documentation section added 67e6bab [Ilya V. Schurov] Addressing code review: more comments added c901588 [Ilya V. Schurov] Addressing code review: testing different combinations with the loop instead of copy-pasting of the code 9b55117 [Ilya V. Schurov] Addressing code review b30039d [Ilya V. Schurov] Make flake8 happy. cc86bdd [Ilya V. Schurov] Fix inconsistency in Partial String Index with 'second' resolution ea51437 [Ilya V. Schurov] Made this code clearer.
Whatsnew entry
Fix inconsistency in Partial String Index with 'second' resolution. See #14826. Now if the timestamp and the index both have resolution
second
, timestamp is considered as an exact match try and not a slice. Therefore, forSeries
, scalar will be returned, forDataFrame
KeyError
raised.