Lambda: Unable to allocate array with shape (133271, 319) and data type object: MemoryError #29596

dcostelloe2019 · 2019-11-13T13:08:11Z

Code Sample, a copy-pastable example if possible

# Your code here
obj=s3.get_object(Bucket=bucket,Key=file_name)
 body = obj['Body'].read()
 df = pd.read_excel(io.BytesIO(body),encoding='utf-8',sheet_name='Sheet1',skiprows=4,usecols=use_cols,header=None,names=col_Names,)

Problem description

Excel read file crashes when reading. Specific excel sheet has 300 columns and 130,000 records, reduced columns to 40 by defining filter.

Error Reported:
Unable to allocate array with shape (133271, 319) and data type object: MemoryError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 49, in lambda_handler
df = pd.read_excel(io.BytesIO(body),encoding='utf-8',sheet_name='Products',skiprows=4,usecols=use_cols,header=None,names=col_Names,)
File "/opt/python/lib/python3.6/site-packages/pandas/util/_decorators.py", line 208, in wrapper
return func(*args, **kwargs)
File "/opt/python/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 340, in read_excel
**kwds
File "/opt/python/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 883, in parse
**kwds
File "/opt/python/lib/python3.6/site-packages/pandas/io/excel/_base.py", line 516, in parse
output[asheetname] = parser.read(nrows=nrows)
File "/opt/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 1154, in read
ret = self._engine.read(nrows)
File "/opt/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 2493, in read
alldata = self._rows_to_cols(content)
File "/opt/python/lib/python3.6/site-packages/pandas/io/parsers.py", line 3160, in _rows_to_cols
zipped_content = list(lib.to_object_array(content, min_width=col_len).T)
File "pandas/_libs/lib.pyx", line 2279, in pandas._libs.lib.to_object_array
MemoryError: Unable to allocate array with shape (133271, 319) and data type object

If the issue has not been resolved there, go ahead and file it in the issue tracker.

Expected Output

Output of `pd.show_versions()`

[paste the output of pd.show_versions() here below this line]
pandas 0.25.3

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2019-11-13T19:50:21Z

Does decreasing the number of rows/columns read solve the problem? If so, what is the minimal size at which the problem occurs?

dcostelloe2019 · 2019-11-14T15:31:21Z

No change in decreased column, not sure how would I get the number of processed rows?

Liam3851 · 2019-11-14T19:23:25Z

@dcostelloe2019 If you're running on AWS Lambda are you allocating enough memory to the process? The default memory allocation on Lambda is just 128 MB, and an array that size would be about 340 MB (133271 * 319 * 8 / 1e6). My guess is you probably need to configure your lambda function with more RAM (try running it locally to make sure).

dcostelloe2019 · 2019-11-16T16:56:37Z

Thanks @Liam3851: I gave it the max memory available 3008 MB
I split the workbook into two separate Excel workbooks:

60711 Rows
72567 Rows
Both processed successfully

Still same error with full sized workbook :-(

simonjayhawkins · 2020-04-01T13:47:50Z

@dcostelloe2019 Thanks for the report. closing as this doesn't look actionable. We would need a minimal reproducible example to help debug the issue. see https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports ping to reopen

TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Nov 14, 2019

simonjayhawkins closed this as completed Apr 1, 2020

impredicative mentioned this issue Feb 7, 2021

BUG: MemoryError: Unable to allocate #39629

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lambda: Unable to allocate array with shape (133271, 319) and data type object: MemoryError #29596

Lambda: Unable to allocate array with shape (133271, 319) and data type object: MemoryError #29596

dcostelloe2019 commented Nov 13, 2019

jbrockmendel commented Nov 13, 2019

dcostelloe2019 commented Nov 14, 2019

Liam3851 commented Nov 14, 2019

dcostelloe2019 commented Nov 16, 2019

simonjayhawkins commented Apr 1, 2020

Lambda: Unable to allocate array with shape (133271, 319) and data type object: MemoryError #29596

Lambda: Unable to allocate array with shape (133271, 319) and data type object: MemoryError #29596

Comments

dcostelloe2019 commented Nov 13, 2019

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

jbrockmendel commented Nov 13, 2019

dcostelloe2019 commented Nov 14, 2019

Liam3851 commented Nov 14, 2019

dcostelloe2019 commented Nov 16, 2019

simonjayhawkins commented Apr 1, 2020

Output of `pd.show_versions()`