High Memory Usage when Concatenating Partially Indexed Series to Multiindexed Dataframe #20803
Labels
Indexing
Related to indexing on series/frames, not to indexes themselves
MultiIndex
Performance
Memory or execution speed performance
Code Sample, a copy-pastable example if possible
Output:
Problem description
I have a big Dataframe
df
(~100m rows) with a multiindex. I want to concat a seriess
to it as a new column. This series' index is a subset ofdf
's index.If I simply do
df['new'] = s
ordf = pd.concat([df, s], axis=1)
these operations take up a massive amount of memory (as much asdf
) and crash my computer. Adding a new columndf['new'] = "hi"
doesn't cause this problem. See code example; memory usage doubles on the last operation, but clearly not much new data is being added.Expected Output
n/a
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-16-lowlatency
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.6
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.6.0+16.g1763dbd
None
The text was updated successfully, but these errors were encountered: