Skip to content

feat: New methods to ingest and create Feature Groups #3707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Sep 22, 2023

Conversation

JoseJuan98
Copy link
Contributor

@JoseJuan98 JoseJuan98 commented Mar 9, 2023

Issue #, if available:

Description of changes: added a new methods to automate processes to work with Feature Groups, one to prepare a FG before creation and another to extract directly a dataframe from a FG faster. Most relevant methods:

  • get_feature_group_as_dataframe(...): method that automates the extraction of the data in a FeatureGroup selecting by version, by latest ingestion or by all data ingested.

  • prepare_fg_from_dataframe_or_file(...): method that automates all the logic to prepare the new batch data to be ingested into a FeatureGroup.

Also, I extended the method FeatureGroup.as_dataframe() to accept kwargs to specify some parameters in the pandas.read_csv use to create the dataframe as I encounter some problems with encoding and dtypes that could be fix specifying low_memory=False and the encoding used.

Testing done:
Local tests with tox following the contribution guide.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@JoseJuan98 JoseJuan98 requested a review from a team as a code owner March 9, 2023 11:42
@JoseJuan98 JoseJuan98 requested review from claytonparnell and removed request for a team March 9, 2023 11:42
@JoseJuan98 JoseJuan98 mentioned this pull request Mar 9, 2023
9 tasks
@JoseJuan98
Copy link
Contributor Author

Hi @claytonparnell I open this PR because I had one previously that I didn't rebase and it was messy now to rebase, so I closed it and open this one. I have check the tox tests (sphinx,flake8,pylint,docstyle,black-check,twine) and they pass succesfully and also I have check the unit tests and the integration tests in my AWS account.

Could you please run the bot to run the tests?

Copy link
Contributor

@navinsoni navinsoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run all

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: d2ff02a
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: d2ff02a
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: d2ff02a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: d2ff02a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor Author

@JoseJuan98 JoseJuan98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made changes

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: d2ff02a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@JoseJuan98 JoseJuan98 force-pushed the feature/fg_utils branch 4 times, most recently from 58d85b6 to 9e45be9 Compare March 10, 2023 08:36
@JoseJuan98
Copy link
Contributor Author

Hi @navinsoni , I made the changes requested, could you run test again and review them to check everything is in order?

@navinsoni
Copy link
Contributor

Hi @navinsoni , I made the changes requested, could you run test again and review them to check everything is in order?

starting.

Copy link
Contributor

@navinsoni navinsoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run all

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 9e45be9
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@claytonparnell claytonparnell self-assigned this Mar 10, 2023
@claytonparnell
Copy link
Collaborator

@JoseJuan98 please run tox -e black-format to resolve formatting issues

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: 9e45be9
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 9e45be9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 9e45be9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 9e45be9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@JoseJuan98
Copy link
Contributor Author

JoseJuan98 commented Mar 10, 2023

@JoseJuan98 please run tox -e black-format to resolve formatting issues

@navinsoni and @claytonparnell, I fixed it, and in the integration tests I guess I don't understand what the fixture region_name returns, it returns feature_store_session.boto_session.region_name, in the docs is shown as Any and my method need a string, that's why it's failing.
I changed it now to str(region_name), if you have any suggestions please let me know, if everything is alright would you mind rerunning the tests, please?

Copy link
Contributor

@benieric benieric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run all

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: ca543d9
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: ca543d9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: ca543d9
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: ca543d9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: ca543d9
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: ca543d9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@benieric benieric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run all

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: ca543d9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: ca543d9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: ca543d9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: ca543d9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: ca543d9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@JoseJuan98
Copy link
Contributor Author

hi @benieric , I see all the checks have passed. Do you know the next steps to follow to merge the PR? Please let me know if there is anything required from my side.

Thank you,
Jose.

@benieric
Copy link
Contributor

@JoseJuan98 Just need to get approval from a reviewer and address any comments they may have.

@knikure knikure merged commit 65ca6ba into aws:master Sep 22, 2023
@JoseJuan98 JoseJuan98 deleted the feature/fg_utils branch November 24, 2023 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Merged
Development

Successfully merging this pull request may close these issues.

9 participants