-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: to_excel() cuts off list of values when creating Excel file #58849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report! In the example you provided, |
Excel has a hard limit on the size of the contents in a single cell: up to 32,767 characters: https://github.com/pandas-dev/pandas/blob/main/pandas/io/excel/_base.py#L1360 The example provided actually works, because the length of In that case, Excel will cut off the list right at the limit, and it seems @josemariagarcia95 suggests that the Pandas Excel writer should gracefully truncate the written data to fit the maximum Excel cell size. However, it seems like a pathological use case of the Excel writer, because it's not normal to try to store an entire large data series in a single Excel cell; the point of Excel is to save structured data, so the more reasonable way to use it would be to write the series as a proper Excel column, by replacing the line: b.loc[0] = [a] with b['a'] = a which works perfectly. |
Can be closed as a duplicate of #56954 This is a known Excel limitation so round-tripping is not possible here. This is going to warn in pandas 3.0 (see the linked PR of that issue). |
I wasn't aware of the Excel limitation, sorry for the disturbance and thank you all for the help! @chaoyihu that structure comes from a DB in my company where each row holds information about a physical signal and the column "Value" contains a huge array with all the values that were read for that signal during a short period of time (I think it's a couple of seconds but it's a high-frequency signal). I was hoping to export it to Excel so non-technies colleagues could open certain snapshots I provide them with using Excel (since the alternative was to handle them really big CSV files that they would open with a regular Notebook app and that would traumatize them instantly). Again, sorry for the misunderstanding. Thanks! |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
When trying to save a Dataframe which contains a column whose values are LARGE lists of values (each cell of the column hold a list of 16.384 elements) as an Excel with the
to_excel()
method, pandas cuts off each list of the column and save it as a 32766 character-long string. Since each list is cut off, they don't end in a square bracket. When closing them and casting them as list, they hold just 1425 elements.Expected Behavior
The resulting Excel column should hold as much values as the original Dataframe from which it has been produced.
Installed Versions
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2544.0_x64__qbz5n2kfra8p0\Lib\site-packages_distutils_hack_init_.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
INSTALLED VERSIONS
commit : d9cdd2e
python : 3.11.9.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22631
machine : AMD64
processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : es_ES.cp1252
pandas : 2.2.2
numpy : 1.24.4
pytz : 2023.3
dateutil : 2.8.2
setuptools : 65.5.0
pip : 23.2.1
Cython : 0.29.36
pytest : None
hypothesis : None
sphinx : 6.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 5.1.0
html5lib : None
pymysql : 1.0.3
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.14.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2023.5.0
gcsfs : None
matplotlib : 3.7.1
numba : 0.57.1
numexpr : 2.8.4
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 12.0.1
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.11.3
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: