You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I was trying to define a feature group based on a sample pandas DataFrame though load_feature_definitions method.
The method failed due to the way the accepted pandas types are hardcoded on the FeatureGroup class.
The attribute lists integer pandas data types as 'int...' while with pandas list them as 'Int...'.
To reproduce
Sagemaker Notebook on a ml.t2.medium instance and conda_python3 kernel.
Sagemaker Version 2.72.1
Pandas Version 1.1.5
Build a pandas DataFrame with Integers.
Apply convert_dtypes method (from pandas.DataFrame) over the previously built DataFrame
Attempt to build feature definitions with the load_feature_definitions method using the previous DataFrame as parameter.
Expected behavior
Success on loading feature definitions.
Ran into this today.. thankfully @tomasosorio did a great job of capturing the issue and fix. I've tried the fix out locally and it works great. I submitted a PR (#3740)
For folks looking to code around this (until #3740 gets merged).. here's a sloppy workaround:
@staticmethod
def convert_nullable_types(df: pd.DataFrame) -> pd.DataFrame:
"""Convert the new Pandas 'nullable types' since AWS SageMaker code doesn't currently support them
See: https://github.com/aws/sagemaker-python-sdk/pull/3740"""
for column in list(df.select_dtypes(include=[pd.Int64Dtype]).columns):
df[column] = df[column].astype('int64')
for column in list(df.select_dtypes(include=[pd.Float64Dtype]).columns):
df[column] = df[column].astype('float64')
return df
Simply call this right before you send the dataframe to load_feature_definitions()
# Convert Int64 and Float64 types (see: https://github.com/aws/sagemaker-python-sdk/pull/3740)
self.input_df = self.remove_nullable_types(self.input_df)
# Create a Feature Group and load our Feature Definitions
my_feature_group = FeatureGroup(name=self.output_uuid, sagemaker_session=self.sm_session)
my_feature_group.load_feature_definitions(data_frame=self.input_df)
Describe the bug
I was trying to define a feature group based on a sample pandas DataFrame though load_feature_definitions method.
The method failed due to the way the accepted pandas types are hardcoded on the FeatureGroup class.
The attribute lists integer pandas data types as 'int...' while with pandas list them as 'Int...'.
To reproduce
Sagemaker Notebook on a ml.t2.medium instance and conda_python3 kernel.
Sagemaker Version 2.72.1
Pandas Version 1.1.5
Build a pandas DataFrame with Integers.
Apply convert_dtypes method (from pandas.DataFrame) over the previously built DataFrame
Attempt to build feature definitions with the load_feature_definitions method using the previous DataFrame as parameter.
Expected behavior
Success on loading feature definitions.
Screenshots or logs
System information
A description of your system. Please provide:
Additional context
Swapping (in load_feature_definitions method)
TO
Might solve the issue.
The text was updated successfully, but these errors were encountered: