-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Memory leak in Dataframe.memory_usage #29411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can confirm that there is indeed an increased memory use after running this script. Does this happen for you if you pass in different values to |
Yes. I've tested several cases and all have a significant increase of memory use. |
I don't think this is a "memory leak" that we are seeing, but just the overhead of creating Series objects, which are kept alive because of the To illustrate this: first, if you adapt the script to have columns of 10,000 elements instead of 10 elements (and create 100 dataframes of 20 columns, instead of 1000 of 50 to keep memory somewhat limited), you still see the memory usage increase after calling A
I don't know how accurate this estimate is of the size of the Series object, but at least it is much bigger than the actual data in case of 10 rows (and based on this estimate, it's in any case many MBs for the example script). For this example case of many wide dataframes (n_cols > n_rows), this Series creation overhead starts to count. But if you have (more typical) use case of more rows than columns, you will often not notice this overhead. That said, it should still be relatively easy to fix this for |
Code Sample, a copy-pastable example if possible
Problem description
Dataframe's memory_usage function has memory leak. Memory usage after executing 'memory_usage' function should be the same as before.
Expected Output
None
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.16.final.0
python-bits: 64
OS: Darwin
OS-release: 19.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: zh_CN.UTF-8
LOCALE: None.None
pandas: 0.24.2
pytest: None
pip: 19.3.1
setuptools: 19.6.1
Cython: 0.29.13
numpy: 1.16.5
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.1
pytz: 2019.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: