Skip to content

Fix inconsistency in Partial String Index with 'second' resolution #14856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

ischurov
Copy link
Contributor

Whatsnew entry

Fix inconsistency in Partial String Index with 'second' resolution. See #14826. Now if the timestamp and the index both have resolution second, timestamp is considered as an exact match try and not a slice. Therefore, for Series, scalar will be returned, for DataFrame KeyError raised.

See pandas-dev#14826.

Now the following logic applies:

- If timestamp resolution is strictly less precise than index resolution, timetamp is a slice as it can (in theory) correspond to more than one elements in the index. For `Series`, `[]` should return `Series`, for `DataFrame` — `DataFrame`.
- If timestamp resolution is equal to index resolution, then timestamp is considered as an attempt to get a kind of "exact match". For `Series`, `[]` should return scalar, for `DataFrame` — try to find column with this key (if any), and most probably raise `KeyError`.
- If timestamp resolution is strictly more precise than index resolution and does not resolve to exact match, `KeyError` have to be raised in both cases.

Testsuite is updated as well.
@@ -0,0 +1,163 @@
import numpy as np

import pandas.util.testing as tm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of creating a new files, these should go with the existing tests in test_timeseries.py

further, don't create new ways of testing, just follow along the existing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll move it to test_timeseries.py. Should I keep it in the separate class or unify with one of the existing? What do you mean under "new ways of testing"? Is it okay to add functions like assert_exact that I did?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put it with the existing tests
pls following the existing testing methodologies

iow don't create your own helpers it's not standard

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Datetime Datetime data dtype labels Dec 11, 2016
@jorisvandenbossche
Copy link
Member

I am +1 on the idea of this PR.
@jreback in the discussion on the issue, I interpreted your comments as opposed to this change. Is that still the case?
The only problem is whether we should just break this behaviour, or are able to first raise a FutureWarning for this to deprecate it.

@jreback
Copy link
Contributor

jreback commented Dec 14, 2016

@jorisvandenbossche no was fine with it. It was a fix designed for DataFrame, but broke Series (which was never tested), so for sure a bug. I don't think there is a problem with simply changing this, it would give the wrong shape back previously.

- new tests moved to test_timeseries.py and refactored to avoid creating non-standard helper functions
@codecov-io
Copy link

codecov-io commented Dec 18, 2016

Current coverage is 84.64% (diff: 100%)

Merging #14856 into master will decrease coverage by <.01%

@@             master     #14856   diff @@
==========================================
  Files           144        144          
  Lines         51016      51016          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          43185      43184     -1   
- Misses         7831       7832     +1   
  Partials          0          0          

Powered by Codecov. Last update 8e630b6...2881a53

@ischurov
Copy link
Contributor Author

I addressed code review comments by @jreback.

'2012-01-01 00',
'2012-01-01 01']),
dtype=np.int64)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are repeating lots of tests with only a slight variation

can u so it instead with a loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Actually, it was the reason why I wrote those helper functions initially.

Okay, I united tests for all resolutions into one loop. IMO it become less readable, but reflects the logic properly.

@@ -4941,6 +4940,62 @@ def test_partial_slice_second_precision(self):
self.assertRaisesRegexp(KeyError, '2005-1-1 00:00:00',
lambda: s['2005-1-1 00:00:00'])

def test_partial_slicing_dataframe(self):
# GH14856
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give a 1-2 lines about what are asserting here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. 67e6bab

@jreback
Copy link
Contributor

jreback commented Dec 18, 2016

pls add a whatsnew entry (0.20.0) in other api changes. a small sub-section, showing the change.

@ischurov ischurov force-pushed the datetimeindex-slices branch from bc96430 to 40eddc3 Compare December 19, 2016 01:12
@ischurov
Copy link
Contributor Author

@jreback I added whatsnew subsection and also edited documentation a little bit.


series_minute = pd.Series([1, 2, 3],
pd.DatetimeIndex(['2011-12-31 23:59:00',
'2012-01-01 00:00:00',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some more commentary, typically you use smaller ipython blocks, somethign like

this is seconds resolution
example here

this key is treated like this
example here

this key is treated differently

e.g.

just reads better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. d215905

dft_minute['2011-12-31 23']

.. warning::

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarify this a bit (as compared to the previous section). IOW this warning was in isolation before, but now you have a whole section on what works / doesn't work, so the warning needs some reworking to avoid being redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks. Done. d215905

@@ -193,6 +193,37 @@ Map on Index types now return other Index types
Other API Changes
^^^^^^^^^^^^^^^^^

- :ref:`DatetimeIndex Partial String Indexing <timeseries.partialindexing>` now works as exact match provided that string resolution coincides with index resolution (:issue:`14826`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put a ref to the new code section as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean new docs section? Added. 0814e5b

@@ -491,10 +475,79 @@ DatetimeIndex Partial String Indexing also works on DataFrames with a ``MultiInd
dft2 = dft2.swaplevel(0, 1).sort_index()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a small reorg to do after we merge this (on the doc sections). But I think just easier for me to do it than explain. :>

@jreback jreback added this to the 0.20.0 milestone Dec 20, 2016
@jreback jreback closed this in 50930a9 Dec 20, 2016
@jreback
Copy link
Contributor

jreback commented Dec 20, 2016

thanks!

pls review the docs when built (will prob take a few hours) and issue a follow up if needed.

@jreback
Copy link
Contributor

jreback commented Dec 20, 2016

@ischurov ischurov deleted the datetimeindex-slices branch December 22, 2016 20:08
@ischurov ischurov mentioned this pull request Dec 22, 2016
@ischurov
Copy link
Contributor Author

@jreback Thanks for the corrections. Articles are not my strong side :) I created a follow-up PR #14966 to correct a couple of typos.

Btw, this is my first PR merged to large open source project. Thanks for your guiding :)

ShaharBental pushed a commit to ShaharBental/pandas that referenced this pull request Dec 26, 2016
…lution

Closes pandas-dev#14826

Fix inconsistency in Partial String Index with 'second' resolution.
See pandas-dev#14826. Now if the timestamp and the index both have resolution
`second`, timestamp is considered as an exact match try and not a
slice. Therefore, for `Series`, scalar will be returned, for
`DataFrame` `KeyError` raised.

Author: Ilya V. Schurov <[email protected]>

Closes pandas-dev#14856 from ischurov/datetimeindex-slices and squashes the following commits:

2881a53 [Ilya V. Schurov] Merge branch 'datetimeindex-slices' of https://github.com/ischurov/pandas into datetimeindex-slices
ac8758e [Ilya V. Schurov] resolved merge conflict in whatsnew/v0.20.0.txt
0e87874 [Ilya V. Schurov] resolved merge conflict in whatsnew/v0.20.0.txt
0814e5b [Ilya V. Schurov] - Addressing code review: added reference to new docs section in whatsnew.
d215905 [Ilya V. Schurov] - Addressing code review: documentation clarification.
c287845 [Ilya V. Schurov] conflict PR pandas-dev#14856 resolved
40eddc3 [Ilya V. Schurov] - Documentation fixes
e17d210 [Ilya V. Schurov] - Whatsnew section added - Documentation section added
67e6bab [Ilya V. Schurov] Addressing code review: more comments added
c901588 [Ilya V. Schurov] Addressing code review: testing different combinations with the loop instead of copy-pasting of the code
9b55117 [Ilya V. Schurov] Addressing code review
b30039d [Ilya V. Schurov] Make flake8 happy.
cc86bdd [Ilya V. Schurov] Fix inconsistency in Partial String Index with 'second' resolution
ea51437 [Ilya V. Schurov] Made this code clearer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants