Skip to content

Update feature_group.py ingest() description #3615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 28, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 20 additions & 15 deletions src/sagemaker/feature_store/feature_group.py
Original file line number Diff line number Diff line change
Expand Up @@ -702,21 +702,26 @@ def ingest(
) -> IngestionManagerPandas:
"""Ingest the content of a pandas DataFrame to feature store.

``max_worker`` number of thread will be created to work on different partitions of
the ``data_frame`` in parallel.

``max_processes`` number of processes will be created to work on different partitions
of the ``data_frame`` in parallel, each with ``max_worker`` threads.

The ingest function will attempt to ingest all records in the data frame. If ``wait``
is True, then an exception is thrown after all records have been processed. If ``wait``
is False, then a later call to the returned instance IngestionManagerPandas' ``wait()``
function will throw an exception.

Zero based indices of rows that failed to be ingested can be found in the exception.
They can also be found from the IngestionManagerPandas' ``failed_rows`` function after
the exception is thrown.

``max_worker`` the number of threads created to work on different partitions of the
``data_frame`` in parallel.

``max_processes`` the number of processes will be created to work on different
partitions of the ``data_frame`` in parallel, each with ``max_worker`` threads.

The ingest function attempts to ingest all records in the data frame. SageMaker
Feature Store throws an exception if it fails to ingest any records.

If ``wait`` is ``True``, Feature Store runs the ``ingest`` function synchronously.
You receive an ``IngestionError`` if there are any records that can't be ingested.
If ``wait`` is ``False``, Feature Store runs the ``ingest`` function asynchronously.

Instead of setting ``wait`` to ``True`` in the ``ingest`` function, you can invoke
the ``wait`` function on the returned instance of ``IngestionManagerPandas`` to run
the ``ingest`` function synchronously.

To access the rows that failed to ingest, set ``wait`` to ``False``. The
``IngestionError.failed_rows`` object saves all of the rows that failed to ingest.

`profile_name` argument is an optional one. It will use the default credential if None is
passed. This `profile_name` is used in the sagemaker_featurestore_runtime client only. See
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html for more
Expand Down