Skip to content

Lambda: Unable to allocate array with shape (133271, 319) and data type object: MemoryError #29596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dcostelloe2019 opened this issue Nov 13, 2019 · 5 comments
Labels
Needs Info Clarification about behavior needed to assess issue

Comments

@dcostelloe2019
Copy link

Code Sample, a copy-pastable example if possible

# Your code here
obj=s3.get_object(Bucket=bucket,Key=file_name)
 body = obj['Body'].read()
 df = pd.read_excel(io.BytesIO(body),encoding='utf-8',sheet_name='Sheet1',skiprows=4,usecols=use_cols,header=None,names=col_Names,)

Problem description

Excel read file crashes when reading. Specific excel sheet has 300 columns and 130,000 records, reduced columns to 40 by defining filter.

Error Reported:
Unable to allocate array with shape (133271, 319) and data type object: MemoryError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 49, in lambda_handler
df = pd.read_excel(io.BytesIO(body),encoding='utf-8',sheet_name='Products',skiprows=4,usecols=use_cols,header=None,names=col_Names,)
File "/opt/python/lib/python3.6/site-packages/pandas/util/_decorators.py", line 208, in wrapper
return func(*args, **kwargs)
File "/opt/python/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 340, in read_excel
**kwds
File "/opt/python/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 883, in parse
**kwds
File "/opt/python/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 516, in parse
output[asheetname] = parser.read(nrows=nrows)
File "/opt/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 1154, in read
ret = self._engine.read(nrows)
File "/opt/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 2493, in read
alldata = self._rows_to_cols(content)
File "/opt/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 3160, in _rows_to_cols
zipped_content = list(lib.to_object_array(content, min_width=col_len).T)
File "pandas/_libs/lib.pyx", line 2279, in pandas._libs.lib.to_object_array
MemoryError: Unable to allocate array with shape (133271, 319) and data type object

If the issue has not been resolved there, go ahead and file it in the issue tracker.

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
pandas 0.25.3

@jbrockmendel
Copy link
Member

Does decreasing the number of rows/columns read solve the problem? If so, what is the minimal size at which the problem occurs?

@TomAugspurger TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Nov 14, 2019
@dcostelloe2019
Copy link
Author

No change in decreased column, not sure how would I get the number of processed rows?

@Liam3851
Copy link
Contributor

@dcostelloe2019 If you're running on AWS Lambda are you allocating enough memory to the process? The default memory allocation on Lambda is just 128 MB, and an array that size would be about 340 MB (133271 * 319 * 8 / 1e6). My guess is you probably need to configure your lambda function with more RAM (try running it locally to make sure).

@dcostelloe2019
Copy link
Author

Thanks @Liam3851: I gave it the max memory available 3008 MB
I split the workbook into two separate Excel workbooks:

  1. 60711 Rows
  2. 72567 Rows
    Both processed successfully

Still same error with full sized workbook :-(

@simonjayhawkins
Copy link
Member

@dcostelloe2019 Thanks for the report. closing as this doesn't look actionable. We would need a minimal reproducible example to help debug the issue. see https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports ping to reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

5 participants