-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: inconsistent behaviour for empty DataFrames #48327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ntachukwu
commented
Aug 31, 2022
•
edited
Loading
edited
- Closes API: Specify the behaviour for operating on empty objects #47959
- Tests added
Hello @Th3nn3ss! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2022-09-08 07:55:53 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Th3nn3ss Thanks for the PR. Can you give a bit of context? Currently, the test you are adding is failing (because the behaviour is not fully consistent)? Was that the goal to show, and would you like to fix it the behaviour here as well?
pandas/tests/groupby/test_apply.py
Outdated
@@ -1,3 +1,4 @@ | |||
import datetime as dt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The datetime module is already imported below as well (so you can update your code to use the existing import)
pandas/tests/groupby/test_apply.py
Outdated
df3 = df2.set_index("date").groupby("a", group_keys=True).apply(lambda x: x) | ||
expected = df3.iloc[:0] # An empty dataframe | ||
|
||
tm.assert_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tm.assert_equal(result, expected) | |
tm.assert_frame_equal(result, expected) |
@jorisvandenbossche Yes, The idea is to show some inconsistencies with empty inputs with some failing tests. I will love to fix it. |
result_empty = pd.DataFrame([]) | ||
expected_empty = pd.DataFrame(pd.Series([])) | ||
|
||
tm.assert_frame_equal(result_empty, expected_empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error returned
E AssertionError: DataFrame are different
E
E DataFrame shape mismatch
E [left]: (0, 0)
E [right]: (0, 1)
pandas/tests/frame/test_constructors.py:3211: AssertionError
result = pd.DataFrame({"a": [], "b": []}) | ||
expected = pd.DataFrame(columns=["a", "b"]) | ||
|
||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error
E AssertionError: DataFrame.index are different
E
E DataFrame.index classes are different
E [left]: RangeIndex(start=0, stop=0, step=1)
E [right]: Index([], dtype='object')
pandas/_testing/asserters.py:318: AssertionError
|
||
with tm.assert_produces_warning(FutureWarning): | ||
# Lists does not warn the user | ||
result = pd.DataFrame({"a": [], "b": []}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error
E AssertionError: Did not see expected warning of class 'FutureWarning'
pandas/_testing/_warnings.py:143: AssertionError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> df_with_series = pd.DataFrame({"a": pd.Series([]), "b": pd.Series([])}) # Prints a warning
<stdin>:1: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
>>> df_with_list = pd.DataFrame({"a": [], "b": []}) # Does not print a warning
>>> tm.assert_frame_equal(df_with_series, df_with_list) # Both df are equal
df3 = df2.set_index("date").groupby("a", group_keys=True).apply(lambda x: x) | ||
expected = df3.iloc[:0] # An empty dataframe | ||
|
||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error
E AssertionError: DataFrame.index are different
E
E DataFrame.index classes are different
E [left]: Int64Index([], dtype='int64', name='a')
E [right]: MultiIndex([], names=['a', 'date'])
pandas/_testing/asserters.py:318: AssertionError
def test_groupby_sample_with_empty_inputs(): | ||
df = DataFrame({"a": [], "b": []}) | ||
|
||
gb_df = df.groupby("a").sample() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [1], in <cell line: 3>()
1 empty_df = pd.DataFrame({'a': [], 'b': []})
----> 2 empty_df.groupby('a').sample()
File ~.../pandas/core/groupby/groupby.py:4284, in sample(self, n, frac, replace, weights, random_state)
4275 assert frac is not None
4276 sample_size = round(frac * group_size)
4278 grp_sample = sample.sample(
4279 group_size,
4280 size=sample_size,
4281 replace=replace,
4282 weights=None if weights is None else weights_arr[grp_indices],
4283 random_state=random_state,
-> 4284 )
4285 sampled_indices.append(grp_indices[grp_sample])
4287 sampled_indices = np.concatenate(sampled_indices)
File <__array_function__ internals>:180, in concatenate(*args, **kwargs)
ValueError: need at least one array to concatenate
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Still working on this @Th3nn3ss or has this been superseded with your other work? |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |