Skip to content

BUG: Empty PeriodIndex issues #13079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions doc/source/whatsnew/v0.18.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -176,4 +176,10 @@ Bug Fixes
- Bug in ``Period`` addition raises ``TypeError`` if ``Period`` is on right hand side (:issue:`13069`)
- Bug in ``Peirod`` and ``Series`` or ``Index`` comparison raises ``TypeError`` (:issue:`13200`)
- Bug in ``pd.set_eng_float_format()`` that would prevent NaN's from formatting (:issue:`11981`)
- Bug in ``.resample(..)`` with a ``PeriodIndex`` not changing its ``freq``
appropriately when empty (:issue:`13067`)
- Bug in ``PeriodIndex`` construction returning a ``float64`` index in some
circumstances (:issue:`13067`)
- Bug in ``.resample(..)`` with a ``PeriodIndex`` not retaining its type or name with an empty ``DataFrame``
appropriately when empty (:issue:`13212`)
- Bug in ``groupby`` where ``apply`` returns different result depending on whether first result is ``None`` or not (:issue:`12824`)
39 changes: 22 additions & 17 deletions pandas/core/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
is_datetime_or_timedelta_dtype, is_bool,
is_bool_dtype, AbstractMethodError,
_maybe_fill)
from pandas.core.config import option_context
from pandas.core.config import option_context, is_callable
import pandas.lib as lib
from pandas.lib import Timestamp
import pandas.tslib as tslib
Expand Down Expand Up @@ -643,9 +643,20 @@ def apply(self, func, *args, **kwargs):

func = self._is_builtin_func(func)

@wraps(func)
def f(g):
return func(g, *args, **kwargs)
# this is needed so we don't try and wrap strings. If we could
# resolve functions to their callable functions prior, this
# wouldn't be needed
if args or kwargs:
if is_callable(func):

@wraps(func)
def f(g):
return func(g, *args, **kwargs)
else:
raise ValueError('func must be a callable if args or '
'kwargs are supplied')
else:
f = func
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is messy. I think would be better to resolve to functions before this (or throughout - not sure why we go to strings at all). But that's for discussion / another PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the user can pass strings of course. eg. .transform('mean')

This is fine, is there a test where it hits the exception? IOW what is a non-callable that actually gets passed here?
It shouldn't

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a non-callable that actually gets passed here?

'median' was getting passed here, once the if / then branching upstream was removed for zero-length indexes.
Weirdly, the wrapping didn't fail on Python3.

is there a test where it hits the exception?

I don't think so, because I don't know anything that passes a string and args.

What do you think about changing func = self._is_builtin_func(func) to convert strings to functions? Is there a function that would do that? Then we could do away with this hack.

Copy link
Contributor

@jreback jreback May 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am all for consolidating things a bit, sure _is_builtin_func could just return wrappers; have to be actual functions though (e.g. only return a wrapper if its a function that is defined and NOT a callable already).

Its done but might be done in several places.

You might want to do that in another PR. (as these types of conversions happen in groupby,resample,window functions); though did try to make it all in one place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapping here is probably OK.

I'm suggesting that we convert strings to functions here, rather than passing strings onto the next function. So the func = self._is_builtin_func(func) call converts 'median' to the function .median etc (rather than just sum).

Or do we need to resolve which median function to convert into downstream?

And yup, will do in another PR. Overall some of this stuff feels pretty messy, but maybe that's just because I'm new to it.


# ignore SettingWithCopy here in case the user mutates
with option_context('mode.chained_assignment', None):
Expand Down Expand Up @@ -2675,7 +2686,7 @@ def _wrap_transformed_output(self, output, names=None):
def _wrap_applied_output(self, keys, values, not_indexed_same=False):
if len(keys) == 0:
# GH #6265
return Series([], name=self.name)
return Series([], name=self.name, index=keys)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for Series. Will add an issue for DataFrame, which is materially more complicated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is fine as this is different for DataFrameGroupBy


def _get_index():
if self.grouper.nkeys > 1:
Expand Down Expand Up @@ -3222,8 +3233,7 @@ def _wrap_applied_output(self, keys, values, not_indexed_same=False):
from pandas.core.index import _all_indexes_same

if len(keys) == 0:
# XXX
return DataFrame({})
return DataFrame(index=keys)

key_names = self.grouper.names

Expand Down Expand Up @@ -3646,17 +3656,12 @@ def _gotitem(self, key, ndim, subset=None):
def _wrap_generic_output(self, result, obj):
result_index = self.grouper.levels[0]

if result:
if self.axis == 0:
result = DataFrame(result, index=obj.columns,
columns=result_index).T
else:
result = DataFrame(result, index=obj.index,
columns=result_index)
if self.axis == 0:
return DataFrame(result, index=obj.columns,
columns=result_index).T
else:
result = DataFrame(result)

return result
return DataFrame(result, index=obj.index,
columns=result_index)

def _get_data_to_aggregate(self):
obj = self._obj_with_exclusions
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -4250,7 +4250,8 @@ def test_series_partial_set_period(self):
pd.Period('2011-01-03', freq='D')]
exp = Series([np.nan, 0.2, np.nan],
index=pd.PeriodIndex(keys, name='idx'), name='s')
assert_series_equal(ser.loc[keys], exp, check_index_type=True)
result = ser.loc[keys]
assert_series_equal(result, exp)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the check_index_dtype, this defaults to equiv which just equates Int64/RangeIndex, e.g. exact equivalents.

def test_partial_set_invalid(self):

Expand Down
6 changes: 3 additions & 3 deletions pandas/tests/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -775,11 +775,11 @@ def test_agg_apply_corner(self):
# DataFrame
grouped = self.tsframe.groupby(self.tsframe['A'] * np.nan)
exp_df = DataFrame(columns=self.tsframe.columns, dtype=float,
index=pd.Index(
[], dtype=np.float64))
index=pd.Index([], dtype=np.float64))
assert_frame_equal(grouped.sum(), exp_df, check_names=False)
assert_frame_equal(grouped.agg(np.sum), exp_df, check_names=False)
assert_frame_equal(grouped.apply(np.sum), DataFrame({}, dtype=float))
assert_frame_equal(grouped.apply(np.sum), exp_df.iloc[:, :0],
check_names=False)

def test_agg_grouping_is_list_tuple(self):
from pandas.core.groupby import Grouping
Expand Down
17 changes: 11 additions & 6 deletions pandas/tseries/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@
from pandas.core.base import _shared_docs

import pandas.core.common as com
from pandas.core.common import (isnull, _INT64_DTYPE, _maybe_box,
_values_from_object, ABCSeries,
is_integer, is_float, is_object_dtype)
from pandas.core.common import (
isnull, _INT64_DTYPE, _maybe_box, _values_from_object, ABCSeries,
is_integer, is_float)
from pandas import compat
from pandas.compat.numpy import function as nv
from pandas.util.decorators import Appender, cache_readonly, Substitution
Expand Down Expand Up @@ -271,10 +271,15 @@ def _from_arraylike(cls, data, freq, tz):

@classmethod
def _simple_new(cls, values, name=None, freq=None, **kwargs):
if not getattr(values, 'dtype', None):

if not com.is_integer_dtype(values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm:

if com.is_integer_dtype(values):
     values = np.array(values, dtype='int64', copy=False)
else:
    .... what you have below

I think that we keep the data (internally) as int64, not real sure if that is a guarantee or not.

cc @sinhrks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes internal data must be int dtype. So it should be either coerced / raised.

I think coercing float from int is OK, as long as it is consistent. I haven't noticed float coercion in below case. Let us split it to separate issue.

pd.PeriodIndex([1.1], freq='M')
# ValueError: Given date string not likely a datetime.
pd.PeriodIndex(np.array([1.1]), freq='M')
# PeriodIndex(['1970-02'], dtype='int64', freq='M')

ATM, slight modification of @jereback suggestion should work?

if com.is_integer_dtype(values) or len(values) == 0:
     values = np.array(values, dtype='int64', copy=False)
else:
   return PeriodIndex...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that'll recursively error - PeriodIndex.__new__ will call _simple_new with the floats.

Although I'm not sure what's wrong with the current suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is if someone calls _simple_new with say np.array([1,2,3],dtype='int32') it will work, but it shouldn't; it should be cast to int64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an int64 coercion. Will that have performance implications?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no that is fine; if its already int64 it will just return (which is what you want), and coerce int32 -> int64 (which is also what we want)

values = np.array(values, copy=False)
if is_object_dtype(values):
return PeriodIndex(values, name=name, freq=freq, **kwargs)
if (len(values) > 0 and com.is_float_dtype(values)):
raise TypeError("PeriodIndex can't take floats")
else:
return PeriodIndex(values, name=name, freq=freq, **kwargs)

values = np.array(values, dtype='int64', copy=False)

result = object.__new__(cls)
result._data = values
Expand Down
20 changes: 9 additions & 11 deletions pandas/tseries/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from pandas.compat.numpy import function as nv

from pandas.lib import Timestamp
from pandas._period import IncompatibleFrequency
import pandas.lib as lib
import pandas.tslib as tslib

Expand Down Expand Up @@ -795,27 +796,27 @@ def _downsample(self, how, **kwargs):
ax = self.ax

new_index = self._get_new_index()
if len(new_index) == 0:
return self._wrap_result(self._selected_obj.reindex(new_index))

# Start vs. end of period
memb = ax.asfreq(self.freq, how=self.convention)

if is_subperiod(ax.freq, self.freq):
# Downsampling
rng = np.arange(memb.values[0], memb.values[-1] + 1)
bins = memb.searchsorted(rng, side='right')
if len(new_index) == 0:
bins = []
else:
rng = np.arange(memb.values[0], memb.values[-1] + 1)
bins = memb.searchsorted(rng, side='right')
grouper = BinGrouper(bins, new_index)
return self._groupby_and_aggregate(how, grouper=grouper)
elif is_superperiod(ax.freq, self.freq):
return self.asfreq()
elif ax.freq == self.freq:
return self.asfreq()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use IncompatibleFrequency error (it inherits ValueError).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change. Is this tested in somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


raise ValueError('Frequency {axfreq} cannot be '
'resampled to {freq}'.format(
axfreq=ax.freq,
freq=self.freq))
raise IncompatibleFrequency(
'Frequency {} cannot be resampled to {}, as they are not '
'sub or super periods'.format(ax.freq, self.freq))

def _upsample(self, method, limit=None):
"""
Expand All @@ -838,9 +839,6 @@ def _upsample(self, method, limit=None):
obj = self.obj
new_index = self._get_new_index()

if len(new_index) == 0:
return self._wrap_result(self._selected_obj.reindex(new_index))

# Start vs. end of period
memb = ax.asfreq(self.freq, how=self.convention)

Expand Down
34 changes: 33 additions & 1 deletion pandas/tseries/tests/test_period.py
Original file line number Diff line number Diff line change
Expand Up @@ -1742,13 +1742,45 @@ def test_constructor_datetime64arr(self):
self.assertRaises(ValueError, PeriodIndex, vals, freq='D')

def test_constructor_simple_new(self):
idx = period_range('2007-01', name='p', periods=20, freq='M')
idx = period_range('2007-01', name='p', periods=2, freq='M')
result = idx._simple_new(idx, 'p', freq=idx.freq)
self.assertTrue(result.equals(idx))

result = idx._simple_new(idx.astype('i8'), 'p', freq=idx.freq)
self.assertTrue(result.equals(idx))

result = idx._simple_new(
[pd.Period('2007-01', freq='M'), pd.Period('2007-02', freq='M')],
'p', freq=idx.freq)
self.assertTrue(result.equals(idx))

result = idx._simple_new(
np.array([pd.Period('2007-01', freq='M'),
pd.Period('2007-02', freq='M')]),
'p', freq=idx.freq)
self.assertTrue(result.equals(idx))

def test_constructor_simple_new_empty(self):
# GH13079
idx = PeriodIndex([], freq='M', name='p')
result = idx._simple_new(idx, name='p', freq='M')
assert_index_equal(result, idx)

def test_constructor_simple_new_floats(self):
# GH13079
for floats in [[1.1], np.array([1.1])]:
with self.assertRaises(TypeError):
pd.PeriodIndex._simple_new(floats, freq='M')

def test_shallow_copy_empty(self):

# GH13067
idx = PeriodIndex([], freq='M')
result = idx._shallow_copy()
expected = idx

assert_index_equal(result, expected)

def test_constructor_nat(self):
self.assertRaises(ValueError, period_range, start='NaT',
end='2011-01-01', freq='M')
Expand Down
Loading