-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Pandas MultiIndex causes out of memory error #36074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you give a copy/paste-able example (i.e. one that doesnt require downloading a zip file)? See https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports |
sure, I just updated it. |
From your data:
so your memory requirement for the multiindex from_product is almost 4 million rows. Alternatively:
There are only 69710 unique tuples in the multiindex derived from your data. I think what are trying to achieve is doable, but I would ask in StackOverflow, github issues is not an ideal place for this. |
This is the right answer. Closing. |
I have used multi indexing in my code which is causing out of memory error.
import pandas as pd import numpy as np import io import requests url="https://raw.githubusercontent.com/mahsa-ebrahimian/netflix_project/master/netflix_sample_complete.csv" movie_db=pd.read_csv(url, error_bad_lines=False) del movie_db['Unnamed: 0'] iix_n = pd.MultiIndex.from_product([np.unique(movie_db.user_id), np.unique(movie_db.date)]) arr = (movie_db.pivot_table('rating', ['user_id', 'date'], 'item_id', aggfunc='sum').reindex(iix_n,copy=False).to_numpy().reshape(movie_db.user_id.nunique(),movie_db.date.nunique(),-1))
any performance tip or alternative solution to change my data into desired 3D way would be appreciated.
The text was updated successfully, but these errors were encountered: