-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: PeriodIndex.dropna, difference, take #23095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -519,7 +519,7 @@ def _simple_new(cls, values, name=None, dtype=None, **kwargs): | |
@Appender(_index_shared_docs['_shallow_copy']) | ||
def _shallow_copy(self, values=None, **kwargs): | ||
if values is None: | ||
values = self.values | ||
values = self._take_values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you explain this change? What has take to do with shallow_copy? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is about passing i8 instead of object to _simple_new |
||
attributes = self._get_attributes_dict() | ||
attributes.update(kwargs) | ||
if not len(values) and 'dtype' not in kwargs: | ||
|
@@ -763,6 +763,17 @@ def get_values(self): | |
""" | ||
return self.values | ||
|
||
@property | ||
def _take_values(self): | ||
""" | ||
Values to use in `take` operation, suitable to being passed back to | ||
_shallow_copy/_simple_new. | ||
""" | ||
if is_period_dtype(self): | ||
# Avoid casting to object-dtype | ||
return self._ndarray_values | ||
return self.values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger @jorisvandenbossche is there a better name for this that might fit with EA naming conventions? @jschendel should IntervalIndex be special-cased here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Once we have all EAs, this will always dispatch to the EA, and then we don't need a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The implementation should be good as-is for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is basically .asi8 on Indexes now, yes? Is there a reason we are not aligning on that? or changing .asi8 to this? |
||
|
||
@Appender(IndexOpsMixin.memory_usage.__doc__) | ||
def memory_usage(self, deep=False): | ||
result = super(Index, self).memory_usage(deep=deep) | ||
|
@@ -2158,7 +2169,7 @@ def take(self, indices, axis=0, allow_fill=True, | |
if allow_fill and fill_value is not None: | ||
msg = 'Unable to fill values because {0} cannot contain NA' | ||
raise ValueError(msg.format(self.__class__.__name__)) | ||
taken = self.values.take(indices) | ||
taken = self._take_values.take(indices) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PeriodIndex can hold NA values, so how is influencing PeriodIndex? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. _take_values will be iNaT in locations where values are NaT, after passing to shallow_copy/simple_new, we get NaT back |
||
return self._shallow_copy(taken) | ||
|
||
def _assert_take_fillable(self, values, indices, allow_fill=True, | ||
|
@@ -2929,7 +2940,8 @@ def difference(self, other): | |
self._assert_can_do_setop(other) | ||
|
||
if self.equals(other): | ||
return self._shallow_copy([]) | ||
# pass an empty array with the appropriate dtype | ||
return self._shallow_copy(self[:0]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. did you add a test for this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or this was needed due to the change in shallow_copy? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is just because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, I did not add a test, but can confirm this line is hit in the tests. |
||
|
||
other, result_name = self._convert_can_do_setop(other) | ||
|
||
|
@@ -2940,12 +2952,14 @@ def difference(self, other): | |
|
||
label_diff = np.setdiff1d(np.arange(this.size), indexer, | ||
assume_unique=True) | ||
the_diff = this.values.take(label_diff) | ||
the_diff = this._take_values.take(label_diff) | ||
try: | ||
the_diff = sorting.safe_sort(the_diff) | ||
except TypeError: | ||
pass | ||
|
||
if is_period_dtype(this): | ||
return this._shallow_copy(the_diff, name=result_name) | ||
return this._shallow_copy(the_diff, name=result_name, freq=None) | ||
|
||
def symmetric_difference(self, other, result_name=None): | ||
|
@@ -2994,11 +3008,11 @@ def symmetric_difference(self, other, result_name=None): | |
common_indexer = indexer.take((indexer != -1).nonzero()[0]) | ||
left_indexer = np.setdiff1d(np.arange(this.size), common_indexer, | ||
assume_unique=True) | ||
left_diff = this.values.take(left_indexer) | ||
left_diff = this._take_values.take(left_indexer) | ||
|
||
# {other} minus {this} | ||
right_indexer = (indexer == -1).nonzero()[0] | ||
right_diff = other.values.take(right_indexer) | ||
right_diff = other._take_values.take(right_indexer) | ||
|
||
the_diff = _concat._concat_compat([left_diff, right_diff]) | ||
try: | ||
|
@@ -3008,7 +3022,7 @@ def symmetric_difference(self, other, result_name=None): | |
|
||
attribs = self._get_attributes_dict() | ||
attribs['name'] = result_name | ||
if 'freq' in attribs: | ||
if 'freq' in attribs and not is_period_dtype(self): | ||
attribs['freq'] = None | ||
return self._shallow_copy_with_infer(the_diff, **attribs) | ||
|
||
|
@@ -3028,7 +3042,7 @@ def _get_unique_index(self, dropna=False): | |
if self.is_unique and not dropna: | ||
return self | ||
|
||
values = self.values | ||
values = self._take_values | ||
|
||
if not self.is_unique: | ||
values = self.unique() | ||
|
@@ -4678,7 +4692,7 @@ def dropna(self, how='any'): | |
raise ValueError("invalid how option: {0}".format(how)) | ||
|
||
if self.hasnans: | ||
return self._shallow_copy(self.values[~self._isnan]) | ||
return self._shallow_copy(self._take_values[~self._isnan]) | ||
return self._shallow_copy() | ||
|
||
def _evaluate_with_timedelta_like(self, other, op): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,6 +17,7 @@ | |
from pandas.core.dtypes.common import ( | ||
ensure_int64, | ||
ensure_platform_int, | ||
is_period_dtype, | ||
is_categorical_dtype, | ||
is_object_dtype, | ||
is_hashable, | ||
|
@@ -1036,7 +1037,14 @@ def _get_level_values(self, level, unique=False): | |
labels = self.labels[level] | ||
if unique: | ||
labels = algos.unique(labels) | ||
filled = algos.take_1d(values._values, labels, | ||
|
||
if is_period_dtype(values): | ||
take_vals = values._ndarray_values | ||
else: | ||
# TODO: Why is this _values where elsewhere it is values? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm probably responsible for using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. I'll clear this out in the next pass, assuming this goes through. |
||
# if it were values here we could use _take_values | ||
take_vals = values._values | ||
filled = algos.take_1d(take_vals, labels, | ||
fill_value=values._na_value) | ||
values = values._shallow_copy(filled) | ||
return values | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setup
gets called with every round oftime_foo
. Doing this here is analogous to callingsetup_class
once.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is what
setup_cache
is for: https://asv.readthedocs.io/en/stable/writing_benchmarks.html#setup-and-teardown-functionsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setup_cache pickles and then loads in each step of the loop