Skip to content

BUG: Period resample with length=0 doesn't set freq #13067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented May 3, 2016

  • tests added / passed
  • passes git diff upstream/master | flake8 --diff
  • whatsnew entry

given freq is not applied if data is empty.

s = pd.Series(index=pd.PeriodIndex([], freq='D'))
s.resample('M').mean()
# Series([], Freq: D, dtype: float64)

Expected

s.resample('M').mean()
# Series([], Freq: M, dtype: float64)

@sinhrks sinhrks added Bug Period Period data type Resample resample method labels May 3, 2016
@sinhrks sinhrks added this to the 0.18.2 milestone May 3, 2016
@jreback
Copy link
Contributor

jreback commented May 3, 2016

related to any of: #12871 ?

@sinhrks
Copy link
Member Author

sinhrks commented May 3, 2016

Related to #12868. It must change freq also.

CC: @MaximilianR

@max-sixty
Copy link
Contributor

Thanks for spotting this @sinhrks. This was my change so mea culpa.

Had a quick look and there are a few odd things going on:

  1. Resampling an empty Series with an empty PeriodIndex fails to convert freq for PeriodIndex-ed Series, but works fine for non-PeriodIndex-ed Series:
In [34]: pd.Series(range(4)).reindex(pd.PeriodIndex([],freq='M'))
Out[34]: Series([], Freq: M, dtype: int64)
# works

In [43]: pd.Series([], index=pd.PeriodIndex([], freq='D')).reindex(pd.PeriodIndex([], freq='M'))
Out[43]: Series([], Freq: D, dtype: float64)
# freq is still 'D' - doesn't work

...if this worked, the current resampler implementation would work as expected

  1. _get_new_index is creating a float64 index, where it should be int64:
In [6]: s = pd.Series(index=pd.PeriodIndex([], freq='D'))
In [9]: resampler=s.resample('M')
In [13]: resampler._get_new_index()
Out[13]: PeriodIndex([], dtype='float64', freq='M')

...not sure if this is having any real consequences though

Would be better to have these root issues solved rather than add a fix for the symptoms here, but a fix for the symptoms is better than nothing.

@codecov-io
Copy link

codecov-io commented May 3, 2016

Current coverage is 84.14%

Merging #13067 into master will increase coverage by +<.01%

@@             master     #13067   diff @@
==========================================
  Files           137        137          
  Lines         50227      50230     +3   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42256      42264     +8   
+ Misses         7971       7966     -5   
  Partials          0          0          
  1. File pandas/_version.py (not in diff) was modified. more
    • Misses -5
    • Partials 0
    • Hits +5

Powered by Codecov. Last updated by 87b0f4d...df13ccc

@sinhrks
Copy link
Member Author

sinhrks commented May 3, 2016

@MaximilianR I think .reindex should raise IncompatibleFrequency if freq are mismatched, even if its length is 0.

# on current master
s = pd.Series(index=pd.PeriodIndex(['1970-01', '1970-02', '1970-03'], freq='M'))
s.reindex(pd.PeriodIndex(['1970-01-01'], freq='D'))
# IncompatibleFrequency: Input has different freq=D from PeriodIndex(freq=M)

@max-sixty
Copy link
Contributor

I think .reindex shuld raise IncompatibleFrequency if freq are mismatched, even if its length is 0.

Agreed

@max-sixty
Copy link
Contributor

max-sixty commented May 4, 2016

I removed a bunch of the if/thens and generally cleared some stuff up, including solving the issues above. The thrust is to solve the root causes of why empty indexes need to be handled differently, rather than adding more special casing

@sinhrks do you want to use this PR for this, or do you want me to start another? Assuming you think it's an improvement

From f1e9a6eaa8608cc21a6ca52d5ce6888ced7f4181 Mon Sep 17 00:00:00 2001
From: Maximilian Roos 
Date: Tue, 3 May 2016 21:18:21 -0400
Subject: [PATCH] Lots of clearing up PeriodIndex resampling behaviour

---
 pandas/core/groupby.py                |  2 +-
 pandas/tseries/period.py              |  2 +-
 pandas/tseries/resample.py            | 18 ++++++------------
 pandas/tseries/tests/test_resample.py |  4 +++-
 4 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/pandas/core/groupby.py b/pandas/core/groupby.py
index 7a47911..3d197ac 100644
--- a/pandas/core/groupby.py
+++ b/pandas/core/groupby.py
@@ -2669,7 +2669,7 @@ class SeriesGroupBy(GroupBy):
     def _wrap_applied_output(self, keys, values, not_indexed_same=False):
         if len(keys) == 0:
             # GH #6265
-            return Series([], name=self.name)
+            return Series([], name=self.name, index=Index(keys))

         def _get_index():
             if self.grouper.nkeys > 1:
diff --git a/pandas/tseries/period.py b/pandas/tseries/period.py
index 478b255..61e0e85 100644
--- a/pandas/tseries/period.py
+++ b/pandas/tseries/period.py
@@ -271,7 +271,7 @@ class PeriodIndex(DatelikeOps, DatetimeIndexOpsMixin, Int64Index):
     @classmethod
     def _simple_new(cls, values, name=None, freq=None, **kwargs):
         if not getattr(values, 'dtype', None):
-            values = np.array(values, copy=False)
+            values = np.array(values, copy=False, dtype='int64')
         if is_object_dtype(values):
             return PeriodIndex(values, name=name, freq=freq, **kwargs)

diff --git a/pandas/tseries/resample.py b/pandas/tseries/resample.py
index 167c6c1..2240ab6 100644
--- a/pandas/tseries/resample.py
+++ b/pandas/tseries/resample.py
@@ -789,19 +789,17 @@ class PeriodIndexResampler(DatetimeIndexResampler):
         ax = self.ax

         new_index = self._get_new_index()
-        if len(new_index) == 0:
-            result = self._selected_obj
-            if isinstance(self._selected_obj.index, PeriodIndex):
-                result = result.asfreq(self.freq, how=self.convention)
-            return self._wrap_result(result.reindex(new_index))

         # Start vs. end of period
         memb = ax.asfreq(self.freq, how=self.convention)

         if is_subperiod(ax.freq, self.freq):
             # Downsampling
-            rng = np.arange(memb.values[0], memb.values[-1] + 1)
-            bins = memb.searchsorted(rng, side='right')
+            if len(new_index) == 0:
+                bins = []
+            else:
+                rng = np.arange(memb.values[0], memb.values[-1] + 1)
+                bins = memb.searchsorted(rng, side='right')
             grouper = BinGrouper(bins, new_index)
             return self._groupby_and_aggregate(how, grouper=grouper)
         elif is_superperiod(ax.freq, self.freq):
@@ -811,8 +809,7 @@ class PeriodIndexResampler(DatetimeIndexResampler):

         raise ValueError('Frequency {axfreq} cannot be '
                          'resampled to {freq}'.format(
-                             axfreq=ax.freq,
-                             freq=self.freq))
+                             axfreq=ax.freq, freq=self.freq))

     def _upsample(self, method, limit=None):
         """
@@ -835,9 +832,6 @@ class PeriodIndexResampler(DatetimeIndexResampler):
         obj = self.obj
         new_index = self._get_new_index()

-        if len(new_index) == 0:
-            return self._wrap_result(self._selected_obj.reindex(new_index))
-
         # Start vs. end of period
         memb = ax.asfreq(self.freq, how=self.convention)

diff --git a/pandas/tseries/tests/test_resample.py b/pandas/tseries/tests/test_resample.py
index c796277..346c546 100644
--- a/pandas/tseries/tests/test_resample.py
+++ b/pandas/tseries/tests/test_resample.py
@@ -2053,9 +2053,11 @@ class TestPeriodIndex(Base, tm.TestCase):

             for method in resample_methods:
                 result = getattr(s.resample(freq), method)()
-                assert_series_equal(result, expected)
+                if method!='ohlc':  # returns df
+                    assert_series_equal(result, expected, check_dtype=False)
                 self.assertEqual(result.index.freq, freq)

+
     def test_resample_count(self):

         # GH12774
-- 
2.6.3

@sinhrks
Copy link
Member Author

sinhrks commented May 4, 2016

If yours includes the same fix in a better way, please issue another PR. I'll close this.

@max-sixty max-sixty mentioned this pull request May 4, 2016
4 tasks
@jreback
Copy link
Contributor

jreback commented May 5, 2016

superseded by #13079

@jreback jreback closed this May 5, 2016
@sinhrks sinhrks deleted the period_resample_0 branch May 5, 2016 20:37
jreback pushed a commit that referenced this pull request May 21, 2016
closes #13067
closes #13212

Author: Maximilian Roos <[email protected]>

Closes #13079 from MaximilianR/period_resample_0 and squashes the following commits:

8c7b9db [Maximilian Roos] empty PeriodIndex issues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Period Period data type Resample resample method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants