Skip to content

BUG: I got empty Dataframe with index from the summation of empty Dataframe with MultiIndex #40626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tkaewplik opened this issue Mar 25, 2021 · 1 comment
Labels
Bug Duplicate Report Duplicate issue or pull request Groupby

Comments

@tkaewplik
Copy link

tkaewplik commented Mar 25, 2021

I'm not sure that the current behavior is right, so I would like to ask.

When I have a non-empty Dataframe I can use sum(level=['Project', 'Task'']) to get the MultiIndex Dataframe as my expectation. But when I try to sum the empty Dataframe, I got Index Dataframe instead of MultiIndex Dataframe and It has no index name and column name in there.

Is this the correct behavior or this is a bug. If this is the correct one, do you have any suggestions for me to handle this?

Thank you in advance.

Code Sample

pd.__version__
# non-empty data
data = pd.DataFrame([(1,1,1,1,1,1),(2,2,2,2,2,2)],columns=('Project', 'Task', 'Increasing Length', 'Start Period', 'Period', 'Value'))
data.set_index(['Project', 'Task', 'Increasing Length', 'Start Period', 'Period'], inplace=True)
sum_data = data.sum(level=['Project', 'Task'])
print(sum_data)

# empty data
empty_data = pd.DataFrame([],columns=('Project', 'Task', 'Increasing Length', 'Start Period', 'Period', 'Value'))
empty_data.set_index(['Project', 'Task', 'Increasing Length', 'Start Period', 'Period'], inplace=True)
# sum of empty data
sum_empty_data = empty_data.sum(level=['Project', 'Task'])
print(sum_empty_data)

# my expectation
expected_sum_empty_data = pd.DataFrame([],columns=('Project', 'Task', 'Value'))
expected_sum_empty_data.set_index(['Project', 'Task'], inplace=True)
print(expected_sum_empty_data)

Expected Output

image

Installed Version

python : 3.8.8
pandas : 1.2.3
numpy : 1.20.1
OS: Ubuntu 20.04.1 LTS (docker)

@rhshadrach
Copy link
Member

Thanks for the report! This is a duplicate of #26411 which was fixed and will be released in 1.3. Running on master:

empty_data = pd.DataFrame([],columns=('Project', 'Task', 'Increasing Length', 'Start Period', 'Period', 'Value'))
empty_data.set_index(['Project', 'Task', 'Increasing Length', 'Start Period', 'Period'], inplace=True)
sum_empty_data = empty_data.sum(level=['Project', 'Task'])
print(sum_empty_data.index)
print(sum_empty_data.columns)

gives

MultiIndex([], names=['Project', 'Task'])
Index(['Value'], dtype='object')

Reply here if I've missed something and can reopen.

@rhshadrach rhshadrach added the Duplicate Report Duplicate issue or pull request label Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Groupby
Projects
None yet
Development

No branches or pull requests

2 participants