Skip to content

TST: Dask failure in the CI #57900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
datapythonista opened this issue Mar 18, 2024 · 2 comments
Closed

TST: Dask failure in the CI #57900

datapythonista opened this issue Mar 18, 2024 · 2 comments
Labels
CI Continuous Integration Testing pandas testing functions or related to the test suite

Comments

@datapythonista
Copy link
Member

We have a failure in pandas/core/test_downstream.py in our CI, which I guess it's caused by a change in Dask.

The reason is that when creating a Series from a dask dataframe, the object is treated as a dict, since it's a dict-like, but then we check if the object is empty by calling bool() on it, and dask raises. This is the behavior causing the issue:

>>> from pandas.core.dtypes.common import is_dict_like
>>> data = dask.dataframe.from_array(numpy.array([1, 2, 3]))
>>> is_dict_like(data)
True
>>> bool(data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mgarcia/.miniforge3/envs/pandas-dev/lib/python3.10/site-packages/dask_expr/_collection.py", line 435, in __bool__
    raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.any() or a.all().

I personally don't understand this test much, creating a Series from a DataFrame seems a bit odd, what's the expected output, a Series where each element is another Series?

This also seems to be failing when the dataframe is a pandas dataframe instead of a dask one, but we don't seem to have a test to check that it works:

>>> df = pandas.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> pandas.Series(df)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mgarcia/src/pandas/pandas/core/series.py", line 456, in __init__
    data, index = self._init_dict(data, index, dtype)
  File "/home/mgarcia/src/pandas/pandas/core/series.py", line 546, in _init_dict
    if data:
  File "/home/mgarcia/src/pandas/pandas/core/generic.py", line 1482, in __nonzero__
    raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Any objection to just get rid of the test @mroeschke @phofl ?

@datapythonista datapythonista added Testing pandas testing functions or related to the test suite CI Continuous Integration labels Mar 18, 2024
@phofl
Copy link
Member

phofl commented Mar 18, 2024

Fixed by #57889, so closing here. The test was actually helpful

@phofl phofl closed this as completed Mar 18, 2024
@datapythonista
Copy link
Member Author

Ah, cool, I missed that PR, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

2 participants