@@ -380,6 +380,99 @@ location for the data set to be saved there.
380
380
From here you can train a model using this data set and then perform
381
381
inference.
382
382
383
+ .. rubric :: Using the Offline Store SDK: Getting Started
384
+ :name: bCe9CA61b79
385
+
386
+ The Feature Store Offline SDK provides the ability to quickly and easily
387
+ build ML-ready datasets for use by ML model training or pre-processing.
388
+ The SDK makes it easy to build datasets from SQL join, point-in-time accurate
389
+ join, and event range time frames, all without the need to write any SQL code.
390
+ This functionality is accessed via the DatasetBuilder class which is the
391
+ primary entry point for the SDK functionality.
392
+
393
+ .. code :: python
394
+
395
+ from sagemaker.feature_store.feature_store import FeatureStore
396
+
397
+ feature_store = FeatureStore(sagemaker_session = feature_store_session)
398
+
399
+ .. code :: python
400
+
401
+ base_feature_group = identity_feature_group
402
+ target_feature_group = transaction_feature_group
403
+
404
+ You can create dataset using `create_dataset ` of feature store API.
405
+ `base ` can either be a feature group or a pandas dataframe.
406
+
407
+ .. code :: python
408
+
409
+ result_df, query = feature_store.create_dataset(
410
+ base = base_feature_group,
411
+ output_path = f " s3:// { s3_bucket_name} "
412
+ ).to_dataframe()
413
+
414
+ If you want to join other feature group, you can specify extra
415
+ feature group using `with_feature_group ` method.
416
+
417
+ .. code :: python
418
+
419
+ dataset_builder = feature_store.create_dataset(
420
+ base = base_feature_group,
421
+ output_path = f " s3:// { s3_bucket_name} "
422
+ ).with_feature_group(target_feature_group, record_identifier_name)
423
+
424
+ result_df, query = dataset_builder.to_dataframe()
425
+
426
+ .. rubric :: Using the Offline Store SDK: Configuring the DatasetBuilder
427
+ :name: bCe9CA61b80
428
+
429
+ How the DatasetBuilder produces the resulting dataframe can be configured
430
+ in various ways.
431
+
432
+ By default the Python SDK will exclude all deleted and duplicate records.
433
+ However if you need either of them in returned dataset, you can call
434
+ `include_duplicated_records ` or `include_deleted_records ` when creating
435
+ dataset builder.
436
+
437
+ .. code :: python
438
+
439
+ dataset_builder.include_duplicated_records().to_dataframe()
440
+
441
+ The DatasetBuilder provides `with_number_of_records_from_query_results ` and
442
+ `with_number_of_recent_records_by_record_identifier ` methods to limit the
443
+ number of records returned for the offline snapshot.
444
+
445
+ .. code :: python
446
+
447
+ dataset_builder.with_number_of_recent_records_by_record_identifier(number_of_recent_records = 1 ).to_dataframe()
448
+
449
+ `with_number_of_records_from_query_results ` will limit the number of records
450
+ in the output. For example, when N = 100, only 100 records are going to be
451
+ returned in either the csv or dataframe.
452
+
453
+ .. code :: python
454
+
455
+ dataset_builder.with_number_of_records_from_query_results(number_of_records = 100 ).to_dataframe()
456
+
457
+ On the other hand, `with_number_of_recent_records_by_record_identifier ` is
458
+ used to deal with records which have the same identifier. They are going
459
+ to be sorted according to `event_time ` and return at most N recent records
460
+ in the output.
461
+
462
+ Since these functions return the dataset builder, these functions can
463
+ be chained.
464
+
465
+ .. code :: python
466
+
467
+ dataset_builder
468
+ .with_number_of_records_from_query_results(number_of_records = 100 )
469
+ .include_duplicated_records()
470
+ .to_dataframe()
471
+
472
+ There are additional configurations that can be made for various use cases,
473
+ such as time travel and point-in-time join. These are outlined in the
474
+ Feature Store DatasetBuilder API Reference.
475
+
383
476
.. rubric :: Delete a feature group
384
477
:name: bCe9CA61b78
385
478
@@ -395,3 +488,4 @@ The following code example is from the fraud detection example.
395
488
396
489
identity_feature_group.delete()
397
490
transaction_feature_group.delete()
491
+
0 commit comments