-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Implement PeriodEngine to fix PeriodIndex truncate bug #17755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Does this make sense though? Even if it fixes that issue - why should a |
@MaximilianR We are already treating Period and int as same. |
11090de
to
165566d
Compare
@Licht-T no this is adding a well-defined value
this is NOT the same thing. We do not treat Periods and integers the same. |
@Licht-T if you want to revise to the issue ok |
@MaximilianR @jreback Thanks for your comment. I am making another solution. |
Codecov Report
@@ Coverage Diff @@
## master #17755 +/- ##
==========================================
- Coverage 91.25% 91.21% -0.05%
==========================================
Files 163 163
Lines 49856 49856
==========================================
- Hits 45496 45475 -21
- Misses 4360 4381 +21
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17755 +/- ##
==========================================
+ Coverage 91.25% 91.26% +<.01%
==========================================
Files 163 163
Lines 50120 50123 +3
==========================================
+ Hits 45737 45744 +7
+ Misses 4383 4379 -4
Continue to review full report at Codecov.
|
@jreback @MaximilianR Now revised. I implemented |
@@ -267,6 +269,10 @@ def __new__(cls, data=None, ordinal=None, freq=None, start=None, end=None, | |||
data = period.extract_ordinals(data, freq) | |||
return cls._from_ordinals(data, name=name, freq=freq) | |||
|
|||
@cache_readonly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might already be implemented by a superclass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Yes, but the 1st argument is not same.
https://github.com/Licht-T/pandas/blob/6255812c30978c6a55f1758927f9c300e35cc82b/pandas/core/indexes/base.py#L1558
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I c. ok, we should make this much more generic (IOW just pass in self and have the engine extract what it needs), but that's another issue.
|
||
def test_truncate(self): | ||
# GH 17717 | ||
idx1 = pd.PeriodIndex([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should go with the Series tests for truncate.
pd.Period('2017-09-03') | ||
]) | ||
series1 = pd.Series([1, 2, 3], index=idx1) | ||
result1 = series1.truncate(after='2017-09-02') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to have a few period index specific tests that exercise the engine. you can look at what tests we have for example for DatetimeIndex / TimedeltaIndex for the same.
@jreback Thank you! I'll add tests for |
@jreback Added some tests. |
Fixed conflicts. |
can you rebase |
pls add a note for 0.22 |
a4f971b
to
f01bd95
Compare
@jreback Rebased & added what's new note. |
pandas/_libs/index.pyx
Outdated
def _call_monotonic(self, values): | ||
return super(PeriodEngine, self)._call_monotonic(values.view('i8')) | ||
|
||
cdef _maybe_get_bool_indexer(self, object val): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this duplicating some of the super code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Almost same, but where getting the values
is not same.
https://github.com/Licht-T/pandas/blob/master/pandas/_libs/index_class_helper.pxi.in#L69
https://github.com/pandas-dev/pandas/pull/17755/files#diff-495f440aad68721ad5ca1f35087fc450R520
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so provide a helper function to avoid repeating all of this code again.
pandas/_libs/index.pyx
Outdated
def get_indexer(self, values): | ||
cdef ndarray[int64_t, ndim=1] ordinals | ||
|
||
self._ensure_mapping_populated() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you call super in most of these cases?
pandas/_libs/index.pyx
Outdated
cdef _get_index_values(self): | ||
return self.vgetter() | ||
|
||
cpdef _call_map_locations(self, values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this necessary to create another way of calling this? e.g. _call_map_locations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback So you mean that we don't have to define _call_map_locations
by cpdef
, instead of cdef
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I defined _call_map_locations
as cdef
, but some tests fail.
This is because _ensure_mapping_populated
is inline
.
@@ -267,6 +269,10 @@ def __new__(cls, data=None, ordinal=None, freq=None, start=None, end=None, | |||
data = period.extract_ordinals(data, freq) | |||
return cls._from_ordinals(data, name=name, freq=freq) | |||
|
|||
@cache_readonly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I c. ok, we should make this much more generic (IOW just pass in self and have the engine extract what it needs), but that's another issue.
idx0 = pd.PeriodIndex([p0, p1, p2]) | ||
expected_idx1_p1 = 1 | ||
expected_idx1_p2 = 2 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some comments on these cases
|
||
ps0 = [p0, p1, p2] | ||
idx0 = pd.PeriodIndex(ps0) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jorisvandenbossche @MaximilianR @shoyer these are similar to Interval discussions, though conceptually much simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and similar to that discussion, I don't like the current (and tested on the lines below) behaviour.
I don't think that:
In [97]: p0 = pd.Period('2017-09-01')
...: p1 = pd.Period('2017-09-02')
In [98]: idx = pd.PeriodIndex([p0, p1])
In [99]: idx.contains("2017-09-01 09:00")
Out[99]: True
should be the behaviour of contains. But let's leave that discussion out of this PR, as I don't think the PR is making any changes to the behaviour of __contains__
/ contains
? (or does it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche @jreback Yes. No changes in __contains__
/ contains
.
7c1f2f3
to
151fa5f
Compare
151fa5f
to
b2a281a
Compare
@jreback Almost fixed, except this. |
@@ -3,10 +3,11 @@ | |||
import pytest | |||
|
|||
import numpy as np | |||
from numpy import testing as ntm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NO, we never use numpy.testing.! remove!
use tm.assert_numpy_array_equal
.
@jreback Removed numpy test module. |
thanks @Licht-T if you'd like to see what can be simplified in index.pyx from the type hierarchy and/or general refactoring would be welcome. |
one more indexer issue |
git diff upstream/master -u -- "*.py" | flake8 --diff