Skip to content

FeatureGroup().ingest() throws "OSError" - "Function not implemented" inside Lambda Function #2844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
makennedy626 opened this issue Jan 11, 2022 · 8 comments
Labels
component: feature store Relates to the SageMaker Feature Store Platform type: bug

Comments

@makennedy626
Copy link

Describe the bug
This error is thrown when running a Docker Lambda Function in the first step of my State Machine. Please see below for additional information.

To reproduce

  1. Create a Lambda Function that uses a Docker Image configured similarly to System Information.
  2. Include from sagemaker.feature_store.feature_group import FeatureGroup, FeatureGroup().ingest() in your function code.
  3. Test the function - I was able to reproduce the error in the Lambda Console via the Test tab.

Expected behavior
Feature Group should successfully ingest the data.

Screenshots or logs

{
  "error": "OSError",
  "cause": {
    "errorMessage": "[Errno 38] Function not implemented",
    "errorType": "OSError",
    "requestId": <REDACTED>,
    "stackTrace": [
      "  File \"/var/task/app.py\", line 298, in lambda_handler\n    master()\n",
      "  File \"/var/task/app.py\", line 268, in master\n    <REDACTED_FEATURE_GROUP_NAME>.ingest(data_frame=df2, max_workers=1, wait=True)\n",
      "  File \"/var/task/sagemaker/feature_store/feature_group.py\", line 627, in ingest\n    manager.run(data_frame=data_frame, wait=wait, timeout=timeout)\n",
      "  File \"/var/task/sagemaker/feature_store/feature_group.py\", line 371, in run\n    self._run_multi_process(data_frame=data_frame, wait=wait, timeout=timeout)\n",
      "  File \"/var/task/sagemaker/feature_store/feature_group.py\", line 297, in _run_multi_process\n    self._processing_pool = ProcessingPool(self.max_processes, init_worker)\n",
      "  File \"/var/task/pathos/multiprocessing.py\", line 111, in __init__\n    self._serve()\n",
      "  File \"/var/task/pathos/multiprocessing.py\", line 123, in _serve\n    _pool = Pool(nodes)\n",
      "  File \"/var/task/multiprocess/pool.py\", line 191, in __init__\n    self._setup_queues()\n",
      "  File \"/var/task/multiprocess/pool.py\", line 343, in _setup_queues\n    self._inqueue = self._ctx.SimpleQueue()\n",
      "  File \"/var/task/multiprocess/context.py\", line 113, in SimpleQueue\n    return SimpleQueue(ctx=self.get_context())\n",
      "  File \"/var/task/multiprocess/queues.py\", line 345, in __init__\n    self._rlock = ctx.Lock()\n",
      "  File \"/var/task/multiprocess/context.py\", line 68, in Lock\n    return Lock(ctx=self.get_context())\n",
      "  File \"/var/task/multiprocess/synchronize.py\", line 168, in __init__\n    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)\n",
      "  File \"/var/task/multiprocess/synchronize.py\", line 63, in __init__\n    sl = self._semlock = _multiprocessing.SemLock(\n"
    ]
  }
}

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.70.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): N/A
  • Framework version: N/A
  • Python version: 3.9
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): Y - public.ecr.aws/lambda/python:3.9

Additional context

  1. No issues running this locally in debugger (not in Docker container).
  2. I have seen similar errors related to _multiprocessing.SemLock in the kedro repository where they were creating Lambda Functions connected via a State Machine, which they were (afaik) unable to resolve or circumvent, so a resolution to this issue might be applicable / helpful for many users of different packages.
@makennedy626
Copy link
Author

I also ran this in a container with Python 3.7 and had the same result.

@DFuller134
Copy link

Same error. I attempted down-grading to Python 3.6 and still the same [Errno 38] Function not implemented.

@makennedy626
Copy link
Author

@DFuller134 in case this helps you:

I was using this in a Lambda Function as a step in my State Machine. To work around this issue, I used the arn:aws:states:::aws-sdk:sagemakerfeaturestoreruntime:putRecord resource instead of using a Lambda Function. Of course, I had to abstract out the logic for getting the records to put to the Feature Store into a new Lambda Function that runs first and passes it's formatted data to that previously referred to resource.

@DFuller134
Copy link

DFuller134 commented Mar 2, 2022

Thanks @makennedy626. I used the same PutRecord approach. I figured the issue was related to Lambda's lack of support for python multiprocessing due to lack of support for shared memory for processes. So I ended up writing my own custom parallel processing feature store ingest function that used pipes instead of queues. Very fast!

@makennedy626
Copy link
Author

@DFuller134 that's impressive! Do you mind sharing the code with me? If it's too much trouble (too much IP to remove etc.) then no worries.

@DFuller134
Copy link

DFuller134 commented Mar 2, 2022

Sure! @makennedy626. Here is the link to a gist of my approach: https://gist.github.com/DFuller134/dbf8f17918823e281c03d5b46f61bd4f

You can use the lambda_multiprocessing function for any multiprocessing workload in Lambda. The code for the parallelization of multi-threaded calls to the Feature Store PutRecord function serves as an example.

If I can find the time, I might attempt a PR that fixes this in the SageMaker SDK.

@psnilesh
Copy link
Contributor

This PR has added a workaround to not fork another process if max_processes = 1 and max_workers = 1. It will help unblock ingestion from environments where multiprocessing isn't fully implemented, albeit at the cost of a lower throughput.

@martinRenou martinRenou added the component: feature store Relates to the SageMaker Feature Store Platform label Sep 22, 2023
@knikure
Copy link
Contributor

knikure commented Dec 14, 2023

This issue with FeatureStore session in nonconcurrency ingestion is resolved in the PR #3617.
Kindly use sagemaker version >= v2.131.0 to get rid of this issue.

@knikure knikure closed this as completed Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: feature store Relates to the SageMaker Feature Store Platform type: bug
Projects
None yet
Development

No branches or pull requests

5 participants