Skip to content

reindex does not work for groupby series with DateTimeIndex #26209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ptmminh opened this issue Apr 25, 2019 · 4 comments · Fixed by #33638
Closed

reindex does not work for groupby series with DateTimeIndex #26209

ptmminh opened this issue Apr 25, 2019 · 4 comments · Fixed by #33638
Assignees
Labels
Datetime Datetime data dtype good first issue Groupby Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@ptmminh
Copy link

ptmminh commented Apr 25, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

# generate a simple df with weekly DateTimeIndex, group and a value
df = pd.DataFrame({
    'group':['Group1','Group2','Group3']*3, 
    'value':np.random.randint(100, 1000, size=9)}, 
    index=pd.date_range('1991-10-2',periods=3, freq='W-MON').repeat(3)
)

# this works as expected
new_value = df.groupby('group')[['value']].apply(
    lambda x: x.reindex(pd.date_range(x.index.min(), x.index.max(), freq='W-MON'), fill_value=0)
)

# this fails
new_value = df.groupby('group').value.apply(
    lambda x: x.reindex(pd.date_range(x.index.min(), x.index.max(), freq='W-MON'), fill_value=0)
)

Problem description

We have a DF with a group and DateTimeIndex, we want to reindex the values per group to make sure all groups have all the appropriate weekly DateTimeIndex. This works as expected when there is a gap to be resolved by the reindexing... However, it fails when there's no gap to be filled by reindex.

ValueError: cannot reindex from a duplicate axis

This issue is resolved by turning the corresponding Series into a DF by [[]] syntax.

Expected Output

The original Series.

Output of pd.show_versions()

installed versions ------------------ commit: none python: 3.6.8.final.0 python-bits: 64 os: linux os-release: 4.4.0-43-microsoft machine: x86_64 processor: x86_64 byteorder: little lc_all: none lang: en_us.utf-8 locale: en_us.utf-8

pandas: 0.24.2
pytest: 3.5.1
pip: 19.0.3
setuptools: 41.0.0
cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: none
xarray: none
ipython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.8.0
pytz: 2019.1
blosc: none
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: none
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml.etree: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: none
psycopg2: none
jinja2: 2.10
s3fs: none
fastparquet: none
pandas_gbq: none
pandas_datareader: none
gcsfs: None

@WillAyd
Copy link
Member

WillAyd commented Apr 25, 2019

Yea that's strange that the Series selection won't allow you to do this. Must be something with when apply is getting executed to the groups. The code for this exists in pandas.core.groupby if you wanted to take a look PRs would certainly be welcome.

(FYI I edited your original example to make it smaller and easier to debug)

@WillAyd WillAyd added Bug Groupby Datetime Datetime data dtype labels Apr 25, 2019
@mroeschke
Copy link
Member

This looks to be fixed on master. Could use a test

In [154]: new_value
Out[154]:
1991-10-07    840
1991-10-07    790
1991-10-07    378
1991-10-14    439
1991-10-14    824
1991-10-14    146
1991-10-21    397
1991-10-21    726
1991-10-21    343
Name: value, dtype: int64

In [155]: pd.__version__
Out[155]: '1.1.0.dev0+1068.g49bc8d8c9'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Groupby Datetime Datetime data dtype labels Apr 1, 2020
@CloseChoice
Copy link
Member

CloseChoice commented Apr 14, 2020

This is not limited to DateTimeIndex. Could reproduce error with pandas 0.24.2 and the following code:

values = [1, 2, 3, 4]
indices = [1, 1, 2, 2]
df = pd.DataFrame({
        'group': ['Group1', 'Group2'] * 2,
        'value': values},
        index=indices
)
srs_grouped = df.groupby('group').value.apply(
        lambda x: x.reindex(np.arange(x.index.min(), x.index.max() + 1))
)

I created a branch and added a test with similar code. If that's fine and the issue is renamed I create an MR. Alternatively I can add a test with DateTimeIndex.

@simonjayhawkins
Copy link
Member

This looks to be fixed on master.

fixed in #30679

be6a3bc is the first new commit
commit be6a3bc
Author: Jiaxiang [email protected]
Date: Tue Jan 21 00:28:00 2020 +0800

BUG: groupby apply raises ValueError when groupby axis has duplicates and applied identity function (#30679)

@jreback jreback added this to the 1.1 milestone Apr 25, 2020
@jreback jreback added Groupby Indexing Related to indexing on series/frames, not to indexes themselves Datetime Datetime data dtype labels Apr 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype good first issue Groupby Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants