Skip to content

BUG: inconsistent behaviour for empty DataFrames #48327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

ntachukwu
Copy link
Contributor

@ntachukwu ntachukwu commented Aug 31, 2022

@pep8speaks
Copy link

pep8speaks commented Aug 31, 2022

Hello @Th3nn3ss! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-09-08 07:55:53 UTC

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Th3nn3ss Thanks for the PR. Can you give a bit of context? Currently, the test you are adding is failing (because the behaviour is not fully consistent)? Was that the goal to show, and would you like to fix it the behaviour here as well?

@@ -1,3 +1,4 @@
import datetime as dt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datetime module is already imported below as well (so you can update your code to use the existing import)

df3 = df2.set_index("date").groupby("a", group_keys=True).apply(lambda x: x)
expected = df3.iloc[:0] # An empty dataframe

tm.assert_equal(result, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tm.assert_equal(result, expected)
tm.assert_frame_equal(result, expected)

@ntachukwu
Copy link
Contributor Author

@jorisvandenbossche Yes, The idea is to show some inconsistencies with empty inputs with some failing tests.

I will love to fix it.

result_empty = pd.DataFrame([])
expected_empty = pd.DataFrame(pd.Series([]))

tm.assert_frame_equal(result_empty, expected_empty)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error returned

E       AssertionError: DataFrame are different
E       
E       DataFrame shape mismatch
E       [left]:  (0, 0)
E       [right]: (0, 1)

pandas/tests/frame/test_constructors.py:3211: AssertionError

result = pd.DataFrame({"a": [], "b": []})
expected = pd.DataFrame(columns=["a", "b"])

tm.assert_frame_equal(result, expected)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error

E       AssertionError: DataFrame.index are different
E       
E       DataFrame.index classes are different
E       [left]:  RangeIndex(start=0, stop=0, step=1)
E       [right]: Index([], dtype='object')

pandas/_testing/asserters.py:318: AssertionError


with tm.assert_produces_warning(FutureWarning):
# Lists does not warn the user
result = pd.DataFrame({"a": [], "b": []})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error

E           AssertionError: Did not see expected warning of class 'FutureWarning'

pandas/_testing/_warnings.py:143: AssertionError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> df_with_series = pd.DataFrame({"a": pd.Series([]), "b": pd.Series([])}) # Prints a warning
<stdin>:1: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
>>> df_with_list = pd.DataFrame({"a": [], "b": []}) # Does not print a warning
>>> tm.assert_frame_equal(df_with_series, df_with_list) # Both df are equal

df3 = df2.set_index("date").groupby("a", group_keys=True).apply(lambda x: x)
expected = df3.iloc[:0] # An empty dataframe

tm.assert_frame_equal(result, expected)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error

E       AssertionError: DataFrame.index are different
E       
E       DataFrame.index classes are different
E       [left]:  Int64Index([], dtype='int64', name='a')
E       [right]: MultiIndex([], names=['a', 'date'])

pandas/_testing/asserters.py:318: AssertionError

@mroeschke mroeschke added Testing pandas testing functions or related to the test suite DataFrame DataFrame data structure labels Sep 6, 2022
def test_groupby_sample_with_empty_inputs():
df = DataFrame({"a": [], "b": []})

gb_df = df.groupby("a").sample()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [1], in <cell line: 3>()
      1 empty_df = pd.DataFrame({'a': [], 'b': []})
----> 2 empty_df.groupby('a').sample()

File ~.../pandas/core/groupby/groupby.py:4284, in sample(self, n, frac, replace, weights, random_state)
   4275         assert frac is not None
   4276         sample_size = round(frac * group_size)
   4278     grp_sample = sample.sample(
   4279         group_size,
   4280         size=sample_size,
   4281         replace=replace,
   4282         weights=None if weights is None else weights_arr[grp_indices],
   4283         random_state=random_state,
-> 4284     )
   4285     sampled_indices.append(grp_indices[grp_sample])
   4287 sampled_indices = np.concatenate(sampled_indices)

File <__array_function__ internals>:180, in concatenate(*args, **kwargs)

ValueError: need at least one array to concatenate

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2022

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 9, 2022
@mroeschke
Copy link
Member

Still working on this @Th3nn3ss or has this been superseded with your other work?

@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataFrame DataFrame data structure Stale Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: Specify the behaviour for operating on empty objects
4 participants