-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: inconsistent behavior and crash in DataFrame.__setitem__
when >=3d ndarray is used
#53366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I had the same issue, anything new on this? Also, note that it only happens for df = pd.DataFrame({"a": np.zeros(4)})
df["b"] = np.zeros((4, 2, 3)) crashes as reported while df = pd.DataFrame({"a": np.zeros(4)})
df["b"] = np.zeros((4, 2, 3)).tolist() works and the DF is: In [2]: df
Out[2]:
a b
0 0.0 [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
1 0.0 [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
2 0.0 [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
3 0.0 [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]] which is what I wanted to achieve before I got this issue. Finally, trying to use the constructor detects the issue and crashes with a more understandable error: pd.DataFrame({"a": np.zeros(4), "b": np.zeros((4, 2, 3))})
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[3], line 1
----> 1 pd.DataFrame({"a": np.zeros(4), "b": np.zeros((4, 2, 3))})
File ~/.virtualenvs/AxonSynthesis/lib/python3.10/site-packages/pandas/core/frame.py:664, in DataFrame.__init__(self, data, index, columns, dtype, copy)
658 mgr = self._init_mgr(
659 data, axes={"index": index, "columns": columns}, dtype=dtype, copy=copy
660 )
662 elif isinstance(data, dict):
663 # GH#38939 de facto copy defaults to False only in non-dict cases
--> 664 mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
665 elif isinstance(data, ma.MaskedArray):
666 import numpy.ma.mrecords as mrecords
File ~/.virtualenvs/AxonSynthesis/lib/python3.10/site-packages/pandas/core/internals/construction.py:493, in dict_to_mgr(data, index, columns, dtype, typ, copy)
489 else:
490 # dtype check to exclude e.g. range objects, scalars
491 arrays = [x.copy() if hasattr(x, "dtype") else x for x in arrays]
--> 493 return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
File ~/.virtualenvs/AxonSynthesis/lib/python3.10/site-packages/pandas/core/internals/construction.py:118, in arrays_to_mgr(arrays, columns, index, dtype, verify_integrity, typ, consolidate)
115 if verify_integrity:
116 # figure out the index, if necessary
117 if index is None:
--> 118 index = _extract_index(arrays)
119 else:
120 index = ensure_index(index)
File ~/.virtualenvs/AxonSynthesis/lib/python3.10/site-packages/pandas/core/internals/construction.py:653, in _extract_index(data)
651 raw_lengths.append(len(val))
652 elif isinstance(val, np.ndarray) and val.ndim > 1:
--> 653 raise ValueError("Per-column arrays must each be 1-dimensional")
655 if not indexes and not raw_lengths:
656 raise ValueError("If using all scalar values, you must pass an index")
ValueError: Per-column arrays must each be 1-dimensional (and again, casting to a list also works properly in this case) EDIT: I can reproduce this issue with |
The last code snippet I didn't dig into how pandas deal with memory when PR #53367 is opened, but managed to block invalid |
I have started working with CryoSat-2 data, and the RADAR waveform is 3D, and while I can convert it with np.tolist, it would be nice if there was some way to use non 1-dimentional data within a DataFrame. For reference, here is a trivial script to replicate. The data is available from ESA https://earth.esa.int/eogateway/missions/cryosat/data. import pandas as pd fname = "CS_OFFL_SIR_SAR_1B_20220302T004121_20220302T004737_E001.nc" This gives the error: "ValueError: Per-column arrays must each be 1-dimensional" If anyone knows of any tricks to get Pandas to work with multidimentional data without casting to a list, please let me know. |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Input array demension is only checked in
BlockManager.insert
.pandas/pandas/core/internals/managers.py
Lines 1404 to 1409 in f5a5c8d
The issue shares a same reason for #51925 .
Expected Behavior
ValueException
should be raised for both situations.Installed Versions
pandas : 2.1.0.dev0+828.gf5a5c8d7f0
numpy : 1.24.3
pytz : 2023.3 dateutil : 2.8.2 setuptools : 67.7.2 pip : 23.1.2 Cython : 0.29.33 pytest : 7.3.1
hypothesis : 6.75.3
sphinx : 6.2.1
blosc : None feather : None xlsxwriter : 3.1.1
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : 1.0.3
psycopg2 : 2.9.3
jinja2 : 3.1.2 IPython : 8.13.2 pandas_datareader: None bs4 : 4.12.2 bottleneck : 1.3.7
brotli :
fastparquet : 2023.4.0
fsspec : 2023.5.0
gcsfs : 2023.5.0
matplotlib : 3.7.1
numba : 0.57.0
numexpr : 2.8.4
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None pyarrow : 12.0.0 pyreadstat : 1.2.1
pyxlsb : 1.0.10
s3fs : 2023.5.0
scipy : 1.10.1
snappy :
sqlalchemy : 2.0.15
tables : 3.8.0
tabulate : 0.9.0
xarray : 2023.5.0
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : 2023.3
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: