Skip to content

(WIP) ENH: Add PeriodBlock #13755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Jul 22, 2016

This PR is NOT ready for review, but to show basic direction. Especially, supporting period arithmetic is quite unreliable because current _TimeOp class is basically for np.ndarray operations. I think refactoring _TimeOp is needed prior to this PR.

Would like to split the PR to 3 parts:

    1. Add PeriodDtype definition, let PeriodIndex to have PeriodDtype like period[M]. (ENH: PeriodIndex now has period dtype #13941)
    1. Refactor _TimeOp splitting to _DatetimeOp and _TimedeltaOp, merging DatetimeIndex and TimedeltaIndex arithmetics.
    • _DatetimeOp is for lhs or rhs is datetime64 or datetimetz dtypes.
    • _TimedeltaOp is for other timedelta related types.
    1. Add PeriodBlock, adding _PeriodOp for arithmetic (this PR).

@sinhrks sinhrks added Enhancement Performance Memory or execution speed performance Period Period data type labels Jul 22, 2016
@sinhrks sinhrks added this to the 0.19.0 milestone Jul 22, 2016
@codecov-io
Copy link

codecov-io commented Jul 22, 2016

Current coverage is 85.51% (diff: 84.37%)

Merging #13755 into master will decrease coverage by 0.03%

@@             master     #13755   diff @@
==========================================
  Files           145        145          
  Lines         51404      51655   +251   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43975      44172   +197   
- Misses         7429       7483    +54   
  Partials          0          0          

Powered by Codecov. Last update 8e13da2...80e9caf

@@ -2451,6 +2462,182 @@ def shift(self, periods, axis=0, mgr=None):
placement=self.mgr_locs)]


class PeriodBlock(NonConsolidatableMixIn, DatetimeBlock):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might be better actoring out some of the holding Index suff from DatetimeTZBLock and creating IndexHolderBlock (or better name). then sub-classing for DatetimeTZBlock and PeriodBlock. Should be in theory less duplication.

@sinhrks sinhrks force-pushed the periodblock6 branch 4 times, most recently from dc92e4a to 8b5ea28 Compare August 2, 2016 11:51
# delegate
return super(PeriodBlock, self)._astype(dtype=dtype, **kwargs)

def external_values(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can move to IndexHolder

@sinhrks sinhrks force-pushed the periodblock6 branch 2 times, most recently from bcfb433 to 5596c3d Compare August 7, 2016 12:50
if values.ndim == 1:
if isinstance(values, Categorical):
self.is_categorical = values
values = np.array(values)
elif is_period(values):
self.is_period = values
values = values.values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not handling other i8 here ATM? (e.g. datetime/datetimetz)?

@jreback
Copy link
Contributor

jreback commented Aug 7, 2016

@sinhrks looks pretty good. certainly can merge mostly as is and just make a to-do on other issues. Looks like pretty much works and doesn't change the api much.

Just on a small whatsnew section to we can keep updating.

@sinhrks sinhrks force-pushed the periodblock6 branch 2 times, most recently from 661e4d2 to 8d8ef2f Compare August 17, 2016 22:50
@jorisvandenbossche
Copy link
Member

@sinhrks @jreback What is the status of this one? Still try to include it in 0.19.0 or push to 0.20?

@jreback
Copy link
Contributor

jreback commented Aug 31, 2016

ideally I'd like this to go in. I think most of the prep work has been done.

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.19.0rc, 0.19.0 Sep 1, 2016
@jreback jreback modified the milestones: 0.20.0, 0.19.0rc Sep 5, 2016
@jorisvandenbossche
Copy link
Member

@sinhrks Is this ready for review? (no docs yet, and some todo's listed in the code) Just to prioritize my time :-)

@sinhrks
Copy link
Member Author

sinhrks commented Nov 24, 2016

Not yet. I'll make a notice once ready, hopefully this weekend... :)

@@ -69,10 +69,14 @@ def __init__(self, values, index, level=-1, value_columns=None,
fill_value=None):

self.is_categorical = None
self.is_period = None
if values.ndim == 1:
if isinstance(values, Categorical):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> is_categorical

@@ -84,7 +84,7 @@ def test_unsupported(self):

# period
df = pd.DataFrame({'a': pd.period_range('2013', freq='M', periods=3)})
self.check_error_on_write(df, ValueError)
self.check_error_on_write(df, feather.FeatherError)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I don't think feather supports Periods ATM.

@@ -488,6 +506,8 @@ cdef class StringHashTable(HashTable):
if k != self.table.n_buckets:
val = self.table.vals[k]

@cython.wraparound(False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file changes can do as a separate PR?

@@ -773,7 +772,7 @@ def infer_freq(index, warn=True):
"dtype on a Series of {0}".format(index.dtype))
index = values

if is_period_arraylike(index):
if isinstance(index, pd.PeriodIndex):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_period_dtype

from pandas.tseries.period import Period
if isnull(fill_value):
fill_value = tslib.iNaT
elif isinstance(fill_value, Period):
Copy link
Contributor

@jreback jreback Jan 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use ABCPeriod here (or is_period) maybe better

@@ -73,7 +73,7 @@ def is_datetimetz(array):

def is_period(array):
""" return if we are a period array """
return isinstance(array, ABCPeriodIndex) or is_period_arraylike(array)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm so this is now the same as is_period_dtype, maybe we should just deprecate this then, unless this is really equiv to isinstance(val, Period) ? IOW a scalar

@jreback jreback removed this from the 0.20.0 milestone Mar 23, 2017
@jreback
Copy link
Contributor

jreback commented May 7, 2017

let's close for now. If you want to rebase and fixup lmk. Would like to do this (maybe in concert with a IntervalBlock).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Performance Memory or execution speed performance Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API/ENH: allow Period to be a first-class type via a PeriodBlockinternal type
4 participants