-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Empty PeriodIndex issues #13079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,7 +37,7 @@ | |
is_datetime_or_timedelta_dtype, is_bool, | ||
is_bool_dtype, AbstractMethodError, | ||
_maybe_fill) | ||
from pandas.core.config import option_context | ||
from pandas.core.config import option_context, is_callable | ||
import pandas.lib as lib | ||
from pandas.lib import Timestamp | ||
import pandas.tslib as tslib | ||
|
@@ -643,9 +643,20 @@ def apply(self, func, *args, **kwargs): | |
|
||
func = self._is_builtin_func(func) | ||
|
||
@wraps(func) | ||
def f(g): | ||
return func(g, *args, **kwargs) | ||
# this is needed so we don't try and wrap strings. If we could | ||
# resolve functions to their callable functions prior, this | ||
# wouldn't be needed | ||
if args or kwargs: | ||
if is_callable(func): | ||
|
||
@wraps(func) | ||
def f(g): | ||
return func(g, *args, **kwargs) | ||
else: | ||
raise ValueError('func must be a callable if args or ' | ||
'kwargs are supplied') | ||
else: | ||
f = func | ||
|
||
# ignore SettingWithCopy here in case the user mutates | ||
with option_context('mode.chained_assignment', None): | ||
|
@@ -2675,7 +2686,7 @@ def _wrap_transformed_output(self, output, names=None): | |
def _wrap_applied_output(self, keys, values, not_indexed_same=False): | ||
if len(keys) == 0: | ||
# GH #6265 | ||
return Series([], name=self.name) | ||
return Series([], name=self.name, index=keys) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This works for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah this is fine as this is different for |
||
|
||
def _get_index(): | ||
if self.grouper.nkeys > 1: | ||
|
@@ -3222,8 +3233,7 @@ def _wrap_applied_output(self, keys, values, not_indexed_same=False): | |
from pandas.core.index import _all_indexes_same | ||
|
||
if len(keys) == 0: | ||
# XXX | ||
return DataFrame({}) | ||
return DataFrame(index=keys) | ||
|
||
key_names = self.grouper.names | ||
|
||
|
@@ -3646,17 +3656,12 @@ def _gotitem(self, key, ndim, subset=None): | |
def _wrap_generic_output(self, result, obj): | ||
result_index = self.grouper.levels[0] | ||
|
||
if result: | ||
if self.axis == 0: | ||
result = DataFrame(result, index=obj.columns, | ||
columns=result_index).T | ||
else: | ||
result = DataFrame(result, index=obj.index, | ||
columns=result_index) | ||
if self.axis == 0: | ||
return DataFrame(result, index=obj.columns, | ||
columns=result_index).T | ||
else: | ||
result = DataFrame(result) | ||
|
||
return result | ||
return DataFrame(result, index=obj.index, | ||
columns=result_index) | ||
|
||
def _get_data_to_aggregate(self): | ||
obj = self._obj_with_exclusions | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4250,7 +4250,8 @@ def test_series_partial_set_period(self): | |
pd.Period('2011-01-03', freq='D')] | ||
exp = Series([np.nan, 0.2, np.nan], | ||
index=pd.PeriodIndex(keys, name='idx'), name='s') | ||
assert_series_equal(ser.loc[keys], exp, check_index_type=True) | ||
result = ser.loc[keys] | ||
assert_series_equal(result, exp) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you don't need the |
||
def test_partial_set_invalid(self): | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,9 +17,9 @@ | |
from pandas.core.base import _shared_docs | ||
|
||
import pandas.core.common as com | ||
from pandas.core.common import (isnull, _INT64_DTYPE, _maybe_box, | ||
_values_from_object, ABCSeries, | ||
is_integer, is_float, is_object_dtype) | ||
from pandas.core.common import ( | ||
isnull, _INT64_DTYPE, _maybe_box, _values_from_object, ABCSeries, | ||
is_integer, is_float) | ||
from pandas import compat | ||
from pandas.compat.numpy import function as nv | ||
from pandas.util.decorators import Appender, cache_readonly, Substitution | ||
|
@@ -271,10 +271,15 @@ def _from_arraylike(cls, data, freq, tz): | |
|
||
@classmethod | ||
def _simple_new(cls, values, name=None, freq=None, **kwargs): | ||
if not getattr(values, 'dtype', None): | ||
|
||
if not com.is_integer_dtype(values): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm:
I think that we keep the data (internally) as int64, not real sure if that is a guarantee or not. cc @sinhrks There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes internal data must be I think coercing float from int is OK, as long as it is consistent. I haven't noticed float coercion in below case. Let us split it to separate issue.
ATM, slight modification of @jereback suggestion should work?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that'll recursively error - Although I'm not sure what's wrong with the current suggestion? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the problem is if someone calls There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added an int64 coercion. Will that have performance implications? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no that is fine; if its already int64 it will just return (which is what you want), and coerce int32 -> int64 (which is also what we want) |
||
values = np.array(values, copy=False) | ||
if is_object_dtype(values): | ||
return PeriodIndex(values, name=name, freq=freq, **kwargs) | ||
if (len(values) > 0 and com.is_float_dtype(values)): | ||
raise TypeError("PeriodIndex can't take floats") | ||
else: | ||
return PeriodIndex(values, name=name, freq=freq, **kwargs) | ||
|
||
values = np.array(values, dtype='int64', copy=False) | ||
|
||
result = object.__new__(cls) | ||
result._data = values | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,6 +21,7 @@ | |
from pandas.compat.numpy import function as nv | ||
|
||
from pandas.lib import Timestamp | ||
from pandas._period import IncompatibleFrequency | ||
import pandas.lib as lib | ||
import pandas.tslib as tslib | ||
|
||
|
@@ -795,27 +796,27 @@ def _downsample(self, how, **kwargs): | |
ax = self.ax | ||
|
||
new_index = self._get_new_index() | ||
if len(new_index) == 0: | ||
return self._wrap_result(self._selected_obj.reindex(new_index)) | ||
|
||
# Start vs. end of period | ||
memb = ax.asfreq(self.freq, how=self.convention) | ||
|
||
if is_subperiod(ax.freq, self.freq): | ||
# Downsampling | ||
rng = np.arange(memb.values[0], memb.values[-1] + 1) | ||
bins = memb.searchsorted(rng, side='right') | ||
if len(new_index) == 0: | ||
bins = [] | ||
else: | ||
rng = np.arange(memb.values[0], memb.values[-1] + 1) | ||
bins = memb.searchsorted(rng, side='right') | ||
grouper = BinGrouper(bins, new_index) | ||
return self._groupby_and_aggregate(how, grouper=grouper) | ||
elif is_superperiod(ax.freq, self.freq): | ||
return self.asfreq() | ||
elif ax.freq == self.freq: | ||
return self.asfreq() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the change. Is this tested in somewhere? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
|
||
raise ValueError('Frequency {axfreq} cannot be ' | ||
'resampled to {freq}'.format( | ||
axfreq=ax.freq, | ||
freq=self.freq)) | ||
raise IncompatibleFrequency( | ||
'Frequency {} cannot be resampled to {}, as they are not ' | ||
'sub or super periods'.format(ax.freq, self.freq)) | ||
|
||
def _upsample(self, method, limit=None): | ||
""" | ||
|
@@ -838,9 +839,6 @@ def _upsample(self, method, limit=None): | |
obj = self.obj | ||
new_index = self._get_new_index() | ||
|
||
if len(new_index) == 0: | ||
return self._wrap_result(self._selected_obj.reindex(new_index)) | ||
|
||
# Start vs. end of period | ||
memb = ax.asfreq(self.freq, how=self.convention) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is messy. I think would be better to resolve to functions before this (or throughout - not sure why we go to strings at all). But that's for discussion / another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the user can pass strings of course. eg.
.transform('mean')
This is fine, is there a test where it hits the exception? IOW what is a non-callable that actually gets passed here?
It shouldn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'median'
was getting passed here, once the if / then branching upstream was removed for zero-length indexes.Weirdly, the wrapping didn't fail on Python3.
I don't think so, because I don't know anything that passes a string and args.
What do you think about changing
func = self._is_builtin_func(func)
to convert strings to functions? Is there a function that would do that? Then we could do away with this hack.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am all for consolidating things a bit, sure
_is_builtin_func
could just return wrappers; have to be actual functions though (e.g. only return a wrapper if its a function that is defined and NOT a callable already).Its done but might be done in several places.
You might want to do that in another PR. (as these types of conversions happen in groupby,resample,window functions); though did try to make it all in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wrapping here is probably OK.
I'm suggesting that we convert strings to functions here, rather than passing strings onto the next function. So the
func = self._is_builtin_func(func)
call converts'median'
to the function.median
etc (rather than justsum
).Or do we need to resolve which
median
function to convert into downstream?And yup, will do in another PR. Overall some of this stuff feels pretty messy, but maybe that's just because I'm new to it.