TST: Dask failure in the CI #57900

datapythonista · 2024-03-18T20:49:45Z

We have a failure in pandas/core/test_downstream.py in our CI, which I guess it's caused by a change in Dask.

The reason is that when creating a Series from a dask dataframe, the object is treated as a dict, since it's a dict-like, but then we check if the object is empty by calling bool() on it, and dask raises. This is the behavior causing the issue:

>>> from pandas.core.dtypes.common import is_dict_like
>>> data = dask.dataframe.from_array(numpy.array([1, 2, 3]))
>>> is_dict_like(data)
True
>>> bool(data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mgarcia/.miniforge3/envs/pandas-dev/lib/python3.10/site-packages/dask_expr/_collection.py", line 435, in __bool__
    raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.any() or a.all().

I personally don't understand this test much, creating a Series from a DataFrame seems a bit odd, what's the expected output, a Series where each element is another Series?

This also seems to be failing when the dataframe is a pandas dataframe instead of a dask one, but we don't seem to have a test to check that it works:

>>> df = pandas.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> pandas.Series(df)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mgarcia/src/pandas/pandas/core/series.py", line 456, in __init__
    data, index = self._init_dict(data, index, dtype)
  File "/home/mgarcia/src/pandas/pandas/core/series.py", line 546, in _init_dict
    if data:
  File "/home/mgarcia/src/pandas/pandas/core/generic.py", line 1482, in __nonzero__
    raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Any objection to just get rid of the test @mroeschke @phofl ?

The text was updated successfully, but these errors were encountered:

phofl · 2024-03-18T20:50:57Z

Fixed by #57889, so closing here. The test was actually helpful

datapythonista · 2024-03-18T20:52:39Z

Ah, cool, I missed that PR, thanks!

datapythonista added Testing pandas testing functions or related to the test suite CI Continuous Integration labels Mar 18, 2024

phofl closed this as completed Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: Dask failure in the CI #57900

TST: Dask failure in the CI #57900

datapythonista commented Mar 18, 2024

phofl commented Mar 18, 2024

datapythonista commented Mar 18, 2024

TST: Dask failure in the CI #57900

TST: Dask failure in the CI #57900

Comments

datapythonista commented Mar 18, 2024

phofl commented Mar 18, 2024

datapythonista commented Mar 18, 2024