-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
(WIP) ENH: Add PeriodBlock #13755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(WIP) ENH: Add PeriodBlock #13755
Conversation
Current coverage is 85.51% (diff: 84.37%)@@ master #13755 diff @@
==========================================
Files 145 145
Lines 51404 51655 +251
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43975 44172 +197
- Misses 7429 7483 +54
Partials 0 0
|
@@ -2451,6 +2462,182 @@ def shift(self, periods, axis=0, mgr=None): | |||
placement=self.mgr_locs)] | |||
|
|||
|
|||
class PeriodBlock(NonConsolidatableMixIn, DatetimeBlock): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might be better actoring out some of the holding Index suff from DatetimeTZBLock
and creating IndexHolderBlock
(or better name). then sub-classing for DatetimeTZBlock
and PeriodBlock
. Should be in theory less duplication.
dc92e4a
to
8b5ea28
Compare
# delegate | ||
return super(PeriodBlock, self)._astype(dtype=dtype, **kwargs) | ||
|
||
def external_values(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can move to IndexHolder
bcfb433
to
5596c3d
Compare
if values.ndim == 1: | ||
if isinstance(values, Categorical): | ||
self.is_categorical = values | ||
values = np.array(values) | ||
elif is_period(values): | ||
self.is_period = values | ||
values = values.values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not handling other i8 here ATM? (e.g. datetime/datetimetz)?
@sinhrks looks pretty good. certainly can merge mostly as is and just make a to-do on other issues. Looks like pretty much works and doesn't change the api much. Just on a small whatsnew section to we can keep updating. |
661e4d2
to
8d8ef2f
Compare
ideally I'd like this to go in. I think most of the prep work has been done. |
321e337
to
3cdb115
Compare
@sinhrks Is this ready for review? (no docs yet, and some todo's listed in the code) Just to prioritize my time :-) |
Not yet. I'll make a notice once ready, hopefully this weekend... :) |
0ac8737
to
e57c684
Compare
@@ -69,10 +69,14 @@ def __init__(self, values, index, level=-1, value_columns=None, | |||
fill_value=None): | |||
|
|||
self.is_categorical = None | |||
self.is_period = None | |||
if values.ndim == 1: | |||
if isinstance(values, Categorical): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> is_categorical
@@ -84,7 +84,7 @@ def test_unsupported(self): | |||
|
|||
# period | |||
df = pd.DataFrame({'a': pd.period_range('2013', freq='M', periods=3)}) | |||
self.check_error_on_write(df, ValueError) | |||
self.check_error_on_write(df, feather.FeatherError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I don't think feather supports Periods ATM.
@@ -488,6 +506,8 @@ cdef class StringHashTable(HashTable): | |||
if k != self.table.n_buckets: | |||
val = self.table.vals[k] | |||
|
|||
@cython.wraparound(False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file changes can do as a separate PR?
@@ -773,7 +772,7 @@ def infer_freq(index, warn=True): | |||
"dtype on a Series of {0}".format(index.dtype)) | |||
index = values | |||
|
|||
if is_period_arraylike(index): | |||
if isinstance(index, pd.PeriodIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_period_dtype
from pandas.tseries.period import Period | ||
if isnull(fill_value): | ||
fill_value = tslib.iNaT | ||
elif isinstance(fill_value, Period): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use ABCPeriod here (or is_period) maybe better
@@ -73,7 +73,7 @@ def is_datetimetz(array): | |||
|
|||
def is_period(array): | |||
""" return if we are a period array """ | |||
return isinstance(array, ABCPeriodIndex) or is_period_arraylike(array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm so this is now the same as is_period_dtype
, maybe we should just deprecate this then, unless this is really equiv to isinstance(val, Period)
? IOW a scalar
let's close for now. If you want to rebase and fixup lmk. Would like to do this (maybe in concert with a |
git diff upstream/master | flake8 --diff
This PR is NOT ready for review, but to show basic direction. Especially, supporting period arithmetic is quite unreliable because current
_TimeOp
class is basically fornp.ndarray
operations. I think refactoring_TimeOp
is needed prior to this PR.Would like to split the PR to 3 parts:
PeriodDtype
definition, letPeriodIndex
to havePeriodDtype
likeperiod[M]
. (ENH: PeriodIndex now has period dtype #13941)_TimeOp
splitting to_DatetimeOp
and_TimedeltaOp
, mergingDatetimeIndex
andTimedeltaIndex
arithmetics._DatetimeOp
is for lhs or rhs isdatetime64
ordatetimetz
dtypes._TimedeltaOp
is for othertimedelta
related types.PeriodBlock
, adding_PeriodOp
for arithmetic (this PR).