Skip to content

BUG: Empty PeriodIndex issues #13079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

max-sixty
Copy link
Contributor

@max-sixty max-sixty commented May 4, 2016

closes #13067
closes #13212

@max-sixty max-sixty force-pushed the period_resample_0 branch from 8ba87d0 to db8b3ad Compare May 4, 2016 02:54
def test_shallow_copy_empty(self):

# GH13067
idx = PeriodIndex([], freq='M')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u also test _simple_new adding name attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test above does that, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above doesn't check empty case. Though _shallow_copy internally uses _simple_new, I meant to guarantee _simple_new API.

@sinhrks sinhrks added Bug Period Period data type Resample resample method labels May 4, 2016
@sinhrks sinhrks added this to the 0.18.2 milestone May 4, 2016
@max-sixty max-sixty force-pushed the period_resample_0 branch from db8b3ad to 724c587 Compare May 4, 2016 06:02
try: # assume values / ints
values = np.array(values, copy=False, dtype='int64')
except TypeError: # objects / Periods
return PeriodIndex(values, name=name, freq=freq, **kwargs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Is this the best way to do this? Previously it was creating float64 arrays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would probably work. IIUC, this is a non ndarray, that could be ints, or objects (meaning Periods, or coercable to).

have a look at com.is_period_arraylike as well. Though that actually traverses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

com.is_period_arraylike didn't work, so we're back to the try except

@max-sixty max-sixty force-pushed the period_resample_0 branch 2 times, most recently from a486745 to 2cac538 Compare May 4, 2016 11:56
@@ -74,3 +74,7 @@ Performance Improvements

Bug Fixes
~~~~~~~~~

- Bug in ``PeriodIndex.resample`` not changing ``freq`` when empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain what changing is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also say: .resample(..) with a PeriodIndex

@max-sixty max-sixty force-pushed the period_resample_0 branch 3 times, most recently from c23aafb to cca6cd3 Compare May 4, 2016 16:27
@max-sixty
Copy link
Contributor Author

@jreback @sinhrks:
Any idea what this might be? It only happens on 2.7 and so I would need to rebuild to step through. No prob if nothing comes to mind though:

======================================================================
ERROR: test_resample_empty (pandas.tseries.tests.test_resample.TestPeriodIndex)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tseries/tests/test_resample.py", line 2054, in test_resample_empty
    result = getattr(s.resample(freq), method)()
  File "/home/travis/build/pydata/pandas/pandas/tseries/resample.py", line 511, in f
    return self._downsample(_method)
  File "/home/travis/build/pydata/pandas/pandas/tseries/resample.py", line 805, in _downsample
    return self._groupby_and_aggregate(how, grouper=grouper)
  File "/home/travis/build/pydata/pandas/pandas/tseries/resample.py", line 379, in _groupby_and_aggregate
    result = grouped.apply(how, *args, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/core/groupby.py", line 645, in apply
    @wraps(func)
  File "/home/travis/miniconda/envs/pandas/lib/python2.7/functools.py", line 33, in update_wrapper
    setattr(wrapper, attr, getattr(wrapped, attr))
AttributeError: 'str' object has no attribute '__module__'
----------------------------------------------------------------------
Ran 10105 tests in 537.776s
FAILED (SKIP=177, errors=1)

@jreback
Copy link
Contributor

jreback commented May 4, 2016

I remember seeing those. I think an attribute is being passed which isn't defined for that object
e.g. nunique for a DataFrame will cause this.

@@ -74,3 +74,8 @@ Performance Improvements

Bug Fixes
~~~~~~~~~

- Bug in ``.resample(..)`` with a ``PeriodIndex`` not changing its ``freq``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't make sense; the bug is returning an incorrect freq when empty, right?

return PeriodIndex(values, name=name, freq=freq, **kwargs)
try: # assume values / ints
values = np.array(values, copy=False, dtype='int64')
except TypeError: # objects / Periods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to change the current behavior below.

pd.PeriodIndex._simple_new(np.array([pd.Period('2011-01', freq='M')]))
# PeriodIndex(['2011-01'], dtype='int64', freq='M')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is correct though, any reason we don't want that?

@MaximilianR as an aside pls run asv on --bench period to verify no changes (under py3) in perf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ugly and I would love a better way of doing it.
I do think the outcomes are correct though - @sinhrks do you disagree?
Open to nice implementation though

Copy link
Member

@sinhrks sinhrks May 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wondering whether the following behavior (on current master) are the same. Also is no inconsistencies between list and ndarray.

pd.PeriodIndex._simple_new([pd.Period('2011-01', freq='M')])
# PeriodIndex(['2011-01'], dtype='int64', freq='M')

pd.PeriodIndex._simple_new(np.array([pd.Period('2011-01', freq='M')]))
# PeriodIndex(['2011-01'], dtype='int64', freq='M')
pd.PeriodIndex._simple_new([1.1], freq='M')
# ValueError: Ordinal must be an integer

pd.PeriodIndex._simple_new(np.array([1.1]), freq='M')
# ValueError: Ordinal must be an integer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sinhrks not sure I understand, those seem right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @sinhrks is asking whether the new implementation is consistent with those examples, from master. I'll check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't working well. Changed & tested.

@jreback
Copy link
Contributor

jreback commented May 14, 2016

want to rebase/update

@max-sixty
Copy link
Contributor Author

max-sixty commented May 18, 2016

is that #13212 ?

Yup!

@max-sixty max-sixty force-pushed the period_resample_0 branch 2 times, most recently from 19d5178 to 5982e21 Compare May 19, 2016 17:58

result = getattr(s.resample(freq), method)()
expected = s
assert_index_equal(result.index, expected_index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u explicitly compare .freq also, as assert_index_equal doesn't (i'll submit another PR once this is fixed).

@max-sixty max-sixty force-pushed the period_resample_0 branch 4 times, most recently from 0672f28 to c65e83e Compare May 20, 2016 16:16
@max-sixty
Copy link
Contributor Author

Should I remove this test now @jreback? I think it's covered by the Base tests now https://github.com/pydata/pandas/blob/master/pandas/tseries/tests/test_resample.py#L1411

@max-sixty max-sixty force-pushed the period_resample_0 branch from c65e83e to 5bd2f40 Compare May 20, 2016 16:29
@jreback
Copy link
Contributor

jreback commented May 20, 2016

yes (just inspect it to make sure that we are covering the bases)

@max-sixty max-sixty force-pushed the period_resample_0 branch 2 times, most recently from 81cb7a3 to 1e539df Compare May 20, 2016 16:39
@max-sixty
Copy link
Contributor Author

OK, I moved a couple around to ensure we weren't losing anything

@@ -572,12 +573,18 @@ def test_agg_consistency(self):
result = r.agg({'r1': 'mean', 'r2': 'sum'})
assert_frame_equal(result, expected)

def test_raises_on_non_datetimelike_index(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put this in Base? (e.g. its a generic test)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. I had thought generic was to run tests across a number of types, while this doesn't depend on the index type.

@max-sixty max-sixty force-pushed the period_resample_0 branch from 4a5fa12 to 8c7b9db Compare May 21, 2016 00:14
@max-sixty
Copy link
Contributor Author

@jreback @sinhrks Green!

@jreback
Copy link
Contributor

jreback commented May 21, 2016

thanks @MaximilianR @sinhrks this is great8!

small issue. I am not convinced that we are testing IncompatibleFrequency fully, IOW we may be just testing for ValueError. But that's a minor point & can be done as a followup.

@sinhrks
Copy link
Member

sinhrks commented May 21, 2016

thanks @MaximilianR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Period Period data type Resample resample method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Empty DataFrame with PeriodIndex doesn't maintain index type on resample
3 participants