BUG: inconsistent behaviour for empty DataFrames #48327

ntachukwu · 2022-08-31T13:03:00Z

Closes API: Specify the behaviour for operating on empty objects #47959
Tests added

pep8speaks · 2022-08-31T13:03:03Z

Hello @Th3nn3ss! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-09-08 07:55:53 UTC

jorisvandenbossche

@Th3nn3ss Thanks for the PR. Can you give a bit of context? Currently, the test you are adding is failing (because the behaviour is not fully consistent)? Was that the goal to show, and would you like to fix it the behaviour here as well?

jorisvandenbossche · 2022-09-05T15:54:22Z

pandas/tests/groupby/test_apply.py

@@ -1,3 +1,4 @@
+import datetime as dt


The datetime module is already imported below as well (so you can update your code to use the existing import)

jorisvandenbossche · 2022-09-05T15:54:35Z

pandas/tests/groupby/test_apply.py

+    df3 = df2.set_index("date").groupby("a", group_keys=True).apply(lambda x: x)
+    expected = df3.iloc[:0]  # An empty dataframe
+
+    tm.assert_equal(result, expected)


Suggested change

tm.assert_equal(result, expected)

tm.assert_frame_equal(result, expected)

ntachukwu · 2022-09-06T01:50:47Z

@jorisvandenbossche Yes, The idea is to show some inconsistencies with empty inputs with some failing tests.

I will love to fix it.

ntachukwu · 2022-09-06T04:02:50Z

pandas/tests/frame/test_constructors.py

+    result_empty = pd.DataFrame([])
+    expected_empty = pd.DataFrame(pd.Series([]))
+
+    tm.assert_frame_equal(result_empty, expected_empty)


Error returned

E AssertionError: DataFrame are different E E DataFrame shape mismatch E [left]: (0, 0) E [right]: (0, 1) pandas/tests/frame/test_constructors.py:3211: AssertionError

ntachukwu · 2022-09-06T04:03:35Z

pandas/tests/frame/test_constructors.py

+    result = pd.DataFrame({"a": [], "b": []})
+    expected = pd.DataFrame(columns=["a", "b"])
+
+    tm.assert_frame_equal(result, expected)


Error

E AssertionError: DataFrame.index are different E E DataFrame.index classes are different E [left]: RangeIndex(start=0, stop=0, step=1) E [right]: Index([], dtype='object') pandas/_testing/asserters.py:318: AssertionError

ntachukwu · 2022-09-06T04:04:52Z

pandas/tests/frame/test_constructors.py

+
+    with tm.assert_produces_warning(FutureWarning):
+        # Lists does not warn the user
+        result = pd.DataFrame({"a": [], "b": []})


Error

E AssertionError: Did not see expected warning of class 'FutureWarning' pandas/_testing/_warnings.py:143: AssertionError

>>> df_with_series = pd.DataFrame({"a": pd.Series([]), "b": pd.Series([])}) # Prints a warning <stdin>:1: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning. >>> df_with_list = pd.DataFrame({"a": [], "b": []}) # Does not print a warning >>> tm.assert_frame_equal(df_with_series, df_with_list) # Both df are equal

ntachukwu · 2022-09-06T04:06:21Z

pandas/tests/groupby/test_apply.py

+    df3 = df2.set_index("date").groupby("a", group_keys=True).apply(lambda x: x)
+    expected = df3.iloc[:0]  # An empty dataframe
+
+    tm.assert_frame_equal(result, expected)


Error

E AssertionError: DataFrame.index are different E E DataFrame.index classes are different E [left]: Int64Index([], dtype='int64', name='a') E [right]: MultiIndex([], names=['a', 'date']) pandas/_testing/asserters.py:318: AssertionError

ntachukwu · 2022-09-08T08:02:47Z

pandas/tests/groupby/test_sample.py

+def test_groupby_sample_with_empty_inputs():
+    df = DataFrame({"a": [], "b": []})
+
+    gb_df = df.groupby("a").sample()


--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Input In [1], in <cell line: 3>() 1 empty_df = pd.DataFrame({'a': [], 'b': []}) ----> 2 empty_df.groupby('a').sample() File ~.../pandas/core/groupby/groupby.py:4284, in sample(self, n, frac, replace, weights, random_state) 4275 assert frac is not None 4276 sample_size = round(frac * group_size) 4278 grp_sample = sample.sample( 4279 group_size, 4280 size=sample_size, 4281 replace=replace, 4282 weights=None if weights is None else weights_arr[grp_indices], 4283 random_state=random_state, -> 4284 ) 4285 sampled_indices.append(grp_indices[grp_sample]) 4287 sampled_indices = np.concatenate(sampled_indices) File <__array_function__ internals>:180, in concatenate(*args, **kwargs) ValueError: need at least one array to concatenate

github-actions · 2022-10-09T00:08:12Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

mroeschke · 2022-11-17T01:30:06Z

Still working on this @Th3nn3ss or has this been superseded with your other work?

mroeschke · 2022-11-29T03:00:01Z

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

add test for inconsistent dataframe representation for empty inputs

c4e83b2

precomit

399e209

jorisvandenbossche mentioned this pull request Sep 5, 2022

API: Specify the behaviour for operating on empty objects #47959

Open

1 task

jorisvandenbossche reviewed Sep 5, 2022

View reviewed changes

ntachukwu added 3 commits September 6, 2022 04:15

add test for empty df constructs

0d7223a

precommit

8f7bc39

add test for empty df without column name

ea79bd9

ntachukwu commented Sep 6, 2022

View reviewed changes

ntachukwu mentioned this pull request Sep 6, 2022

BUG: test for empty input construct #48330

Closed

2 tasks

mroeschke added Testing pandas testing functions or related to the test suite DataFrame DataFrame data structure labels Sep 6, 2022

add test for groupby sample on empty inputs

5c945d5

ntachukwu commented Sep 8, 2022

View reviewed changes

github-actions bot added the Stale label Oct 9, 2022

mroeschke closed this Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: inconsistent behaviour for empty DataFrames #48327

BUG: inconsistent behaviour for empty DataFrames #48327

ntachukwu commented Aug 31, 2022 •

edited

Loading

pep8speaks commented Aug 31, 2022 •

edited

Loading

jorisvandenbossche left a comment

jorisvandenbossche Sep 5, 2022

jorisvandenbossche Sep 5, 2022

ntachukwu commented Sep 6, 2022

ntachukwu Sep 6, 2022

ntachukwu Sep 6, 2022

ntachukwu Sep 6, 2022

ntachukwu Sep 6, 2022

ntachukwu Sep 6, 2022

ntachukwu Sep 8, 2022

github-actions bot commented Oct 9, 2022

mroeschke commented Nov 17, 2022

mroeschke commented Nov 29, 2022

	tm.assert_equal(result, expected)
	tm.assert_frame_equal(result, expected)

BUG: inconsistent behaviour for empty DataFrames #48327

BUG: inconsistent behaviour for empty DataFrames #48327

Conversation

ntachukwu commented Aug 31, 2022 • edited Loading

pep8speaks commented Aug 31, 2022 • edited Loading

Comment last updated at 2022-09-08 07:55:53 UTC

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntachukwu commented Sep 6, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 9, 2022

mroeschke commented Nov 17, 2022

mroeschke commented Nov 29, 2022

ntachukwu commented Aug 31, 2022 •

edited

Loading

pep8speaks commented Aug 31, 2022 •

edited

Loading