Skip to content

CI Failing - Linux py37_np_dev - test_constructor_list_frames #32289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
simonjayhawkins opened this issue Feb 27, 2020 · 8 comments · Fixed by #34991
Closed

CI Failing - Linux py37_np_dev - test_constructor_list_frames #32289

simonjayhawkins opened this issue Feb 27, 2020 · 8 comments · Fixed by #34991
Labels
Blocker Blocking issue or pull request for an upcoming release CI Continuous Integration Linux Linux OS Unreliable Test Unit tests that occasionally fail
Milestone

Comments

@simonjayhawkins simonjayhawkins added CI Continuous Integration Compat pandas objects compatability with Numpy or Python functions labels Feb 27, 2020
@mroeschke
Copy link
Member

Looks like this build isn't available anymore. Happy to reopen if we continue to see this issue

@simonjayhawkins
Copy link
Member Author

I'll leave this open since #32284 didn't fix the underlying issue. tests were skipped.

@mroeschke mroeschke added Linux Linux OS Unreliable Test Unit tests that occasionally fail and removed Compat pandas objects compatability with Numpy or Python functions labels Apr 10, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1 milestone May 19, 2020
@simonjayhawkins
Copy link
Member Author

this occured in 1.19.0rc1 (xref #34239) and tests are skipped again for 1.20.0.dev

This may become an issue shortly

cc @jbrockmendel

@simonjayhawkins simonjayhawkins removed this from the 1.1 milestone May 19, 2020
@simonjayhawkins simonjayhawkins added the Blocker Blocking issue or pull request for an upcoming release label May 19, 2020
@TomAugspurger
Copy link
Contributor

Here's the root cause

(Pdb) np.__version__
'1.18.5'
(Pdb) np.array([pd.DataFrame()])
array([], shape=(1, 0), dtype=float64)

(Pdb) np.__version__
'1.20.0.dev0+5345c25'
(Pdb) np.array([pd.DataFrame()])
array([], shape=(1, 0, 0), dtype=float64)

I think we should consider deprecating passing a list of DataFrame to the DataFrame constructor. We don't allow passing a list of 2D ndarrays, so we likely shouldn't allow passing a list of 2d DataFrames?

@jreback
Copy link
Contributor

jreback commented Jun 12, 2020

+1 on deprecating passing list of 2D things to the constructor.

@rebecca-palmer
Copy link
Contributor

Nested DataFrames in general do still work, just not this method of creating them:

# df_outer=pd.DataFrame([df_inner,df_inner+10]) # fails in numpy 1.19
# mixed-type list works
df_outer=pd.DataFrame([8,df_inner,df_inner+10])
# or, setting the element later works
df_outer=pd.DataFrame([None,None])
df.outer.at[0,0]=df_inner

As noted above, it seems to happen because numpy.array now returns a 3D array that stacks the DataFrames, instead of a 1D array of objects.

The two failing tests were introduced in #3243 and #5324. We don't document anywhere obvious that nested DataFrames are supposed to work, but #5324 was reported by an actual user of this.

These two are not the only tests marked _is_numpy_dev, but are the only tests that fail with numpy 1.19 in Debian experimental's pandas 1.0.

The exception message it raises (pandas/core/internals/construction.py:324) could be changed to add a note about this issue.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 24, 2020

This is failing the macpython builds now.

https://dev.azure.com/pandas-dev/pandas-wheels/_build/results?buildId=38048&view=logs&j=79f3a53a-4a6d-509c-98b5-ff6e9111f67c&t=25a63239-9c30-5cf0-cb4c-3d01da7b47c5&l=1049

___________ TestDataFrameConstructors.test_constructor_list_frames ____________
[gw0] win32 -- Python 3.6.8 D:\a\1\s\test_venv\Scripts\python.exe

self = <pandas.tests.frame.test_constructors.TestDataFrameConstructors object at 0x2882F090>

    @pytest.mark.xfail(_is_numpy_dev, reason="Interprets list of frame as 3D")
    def test_constructor_list_frames(self):
        # see gh-3243
>       result = DataFrame([DataFrame()])

test_venv\lib\site-packages\pandas\tests\frame\test_constructors.py:151: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_venv\lib\site-packages\pandas\core\frame.py:488: in __init__
    mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
test_venv\lib\site-packages\pandas\core\internals\construction.py:169: in init_ndarray
    values = prep_ndarray(values, copy=copy)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

values = array([], shape=(1, 0, 0), dtype=float64), copy = False

    def prep_ndarray(values, copy=True) -> np.ndarray:
        if not isinstance(values, (np.ndarray, ABCSeries, Index)):
            if len(values) == 0:
                return np.empty((0, 0), dtype=object)
            elif isinstance(values, range):
                arr = np.arange(values.start, values.stop, values.step, dtype="int64")
                return arr[..., np.newaxis]
    
            def convert(v):
                return maybe_convert_platform(v)
    
            # we could have a 1-dim or 2-dim list here
            # this is equiv of np.asarray, but does object conversion
            # and platform dtype preservation
            try:
                if is_list_like(values[0]) or hasattr(values[0], "len"):
                    values = np.array([convert(v) for v in values])
                elif isinstance(values[0], np.ndarray) and values[0].ndim == 0:
                    # GH#21861
                    values = np.array([convert(v) for v in values])
                else:
                    values = convert(values)
            except (ValueError, TypeError):
                values = convert(values)
    
        else:
    
            # drop subclass info, do not copy data
            values = np.asarray(values)
            if copy:
                values = values.copy()
    
>           raise ValueError("Must pass 2-d input")
E           ValueError: Must pass 2-d input

test_venv\lib\site-packages\pandas\core\internals\construction.py:295: ValueError

@TomAugspurger TomAugspurger added this to the 1.1 milestone Jun 25, 2020
@TomAugspurger
Copy link
Contributor

Looking at this again, I think we just remove the test, and perhaps add a whatsnew saying that because of upstream changes to NumPy this isn't supported anymore. DataFrame's are now consistently treated as 2D objects by NumPy, so we can't pretend that they're 1D in the constructor like we were.

old behavior

In [28]: np.__version__
Out[28]: '1.18.5'

In [29]: a = np.ones((0, 0))

In [30]: b = pd.DataFrame()

In [31]: np.array([a]).shape
Out[31]: (1, 0, 0)

In [32]: np.array([b]).shape
Out[32]: (1, 0)

In [33]: pd.DataFrame([a])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-33-49139b5fa086> in <module>
----> 1 pd.DataFrame([a])

~/miniconda3/envs/pandas=1.0.4/lib/python3.8/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    486                     mgr = arrays_to_mgr(arrays, columns, index, columns, dtype=dtype)
    487                 else:
--> 488                     mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
    489             else:
    490                 mgr = init_dict({}, index, columns, dtype=dtype)

~/miniconda3/envs/pandas=1.0.4/lib/python3.8/site-packages/pandas/core/internals/construction.py in init_ndarray(values, index, columns, dtype, copy)
    167     # by definition an array here
    168     # the dtypes will be coerced to a single dtype
--> 169     values = prep_ndarray(values, copy=copy)
    170
    171     if dtype is not None:

~/miniconda3/envs/pandas=1.0.4/lib/python3.8/site-packages/pandas/core/internals/construction.py in prep_ndarray(values, copy)
    293         values = values.reshape((values.shape[0], 1))
    294     elif values.ndim != 2:
--> 295         raise ValueError("Must pass 2-d input")
    296
    297     return values

ValueError: Must pass 2-d input

In [34]: pd.DataFrame([b])
Out[34]:
Empty DataFrame
Columns: []
Index: [0]

New behavior

In [4]: a = np.ones((0, 0))

In [5]: b = pd.DataFrame()

In [6]: np.array([a]).shape
Out[6]: (1, 0, 0)

In [7]: np.array([b]).shape
Out[7]: (1, 0, 0)

In [8]: pd.DataFrame([a])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-49139b5fa086> in <module>
----> 1 pd.DataFrame([a])

~/sandbox/pandas/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    513                     mgr = arrays_to_mgr(arrays, columns, index, columns, dtype=dtype)
    514                 else:
--> 515                     mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
    516             else:
    517                 mgr = init_dict({}, index, columns, dtype=dtype)

~/sandbox/pandas/pandas/core/internals/construction.py in init_ndarray(values, index, columns, dtype, copy)
    188     # by definition an array here
    189     # the dtypes will be coerced to a single dtype
--> 190     values = _prep_ndarray(values, copy=copy)
    191
    192     if dtype is not None:

~/sandbox/pandas/pandas/core/internals/construction.py in _prep_ndarray(values, copy)
    322         values = values.reshape((values.shape[0], 1))
    323     elif values.ndim != 2:
--> 324         raise ValueError(f"Must pass 2-d input. shape={values.shape}")
    325
    326     return values

ValueError: Must pass 2-d input. shape=(1, 0, 0)

In [9]: pd.DataFrame([b])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-338077bdea83> in <module>
----> 1 pd.DataFrame([b])

~/sandbox/pandas/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    513                     mgr = arrays_to_mgr(arrays, columns, index, columns, dtype=dtype)
    514                 else:
--> 515                     mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
    516             else:
    517                 mgr = init_dict({}, index, columns, dtype=dtype)

~/sandbox/pandas/pandas/core/internals/construction.py in init_ndarray(values, index, columns, dtype, copy)
    188     # by definition an array here
    189     # the dtypes will be coerced to a single dtype
--> 190     values = _prep_ndarray(values, copy=copy)
    191
    192     if dtype is not None:

~/sandbox/pandas/pandas/core/internals/construction.py in _prep_ndarray(values, copy)
    322         values = values.reshape((values.shape[0], 1))
    323     elif values.ndim != 2:
--> 324         raise ValueError(f"Must pass 2-d input. shape={values.shape}")
    325
    326     return values

ValueError: Must pass 2-d input. shape=(1, 0, 0)

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jun 25, 2020
TomAugspurger added a commit that referenced this issue Jun 30, 2020
* DOC/TST: DataFrame constructor with a list of DataFrames

Closes #32289
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Jul 2, 2020
Creating a nested DataFrame (which was already not recommended)
via the constructor no longer works.
Give a clearer error and xfail the tests.

Author: Rebecca N. Palmer <[email protected]>
Bug: pandas-dev/pandas#32289
Forwarded: no


Gbp-Pq: Name numpy119_compat.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Aug 20, 2020
Creating a nested DataFrame (which was already not recommended)
via the constructor no longer works.
Give a clearer error and xfail the tests.

Author: Rebecca N. Palmer <[email protected]>
Bug: pandas-dev/pandas#32289
Forwarded: no


Gbp-Pq: Name numpy119_compat.patch
raspbian-autopush pushed a commit to raspbian-packages/pandas that referenced this issue Aug 31, 2020
Creating a nested DataFrame (which was already not recommended)
via the constructor no longer works.
Give a clearer error and xfail the tests.

Author: Rebecca N. Palmer <[email protected]>
Bug: pandas-dev/pandas#32289
Bug-Debian: https://bugs.debian.org/963817
Forwarded: no


Gbp-Pq: Name numpy119_compat.patch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Blocking issue or pull request for an upcoming release CI Continuous Integration Linux Linux OS Unreliable Test Unit tests that occasionally fail
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants