@@ -436,38 +436,40 @@ dataset builder.
436
436
437
437
.. code :: python
438
438
439
- dataset_builder.include_duplicated_records().to_dataframe()
439
+ dataset_builder.include_duplicated_records()
440
+ dataset_builder.include_deleted_records()
440
441
441
442
The DatasetBuilder provides `with_number_of_records_from_query_results ` and
442
443
`with_number_of_recent_records_by_record_identifier ` methods to limit the
443
444
number of records returned for the offline snapshot.
444
445
445
- .. code :: python
446
-
447
- dataset_builder.with_number_of_recent_records_by_record_identifier(number_of_recent_records = 1 ).to_dataframe()
448
-
449
446
`with_number_of_records_from_query_results ` will limit the number of records
450
447
in the output. For example, when N = 100, only 100 records are going to be
451
448
returned in either the csv or dataframe.
452
449
453
450
.. code :: python
454
451
455
- dataset_builder.with_number_of_records_from_query_results(number_of_records = 100 ).to_dataframe( )
452
+ dataset_builder.with_number_of_records_from_query_results(number_of_records = N )
456
453
457
454
On the other hand, `with_number_of_recent_records_by_record_identifier ` is
458
455
used to deal with records which have the same identifier. They are going
459
456
to be sorted according to `event_time ` and return at most N recent records
460
457
in the output.
461
458
459
+ .. code :: python
460
+
461
+ dataset_builder.with_number_of_recent_records_by_record_identifier(number_of_recent_records = N)
462
+
462
463
Since these functions return the dataset builder, these functions can
463
464
be chained.
464
465
465
466
.. code :: python
466
467
467
468
dataset_builder
468
- .with_number_of_records_from_query_results(number_of_records = 100 )
469
+ .with_number_of_records_from_query_results(number_of_records = N )
469
470
.include_duplicated_records()
470
- .with_number_of_recent_records_by_record_identifier(number_of_recent_records = 1 )
471
+ .with_number_of_recent_records_by_record_identifier(number_of_recent_records = N)
472
+ .to_dataframe()
471
473
472
474
There are additional configurations that can be made for various use cases,
473
475
such as time travel and point-in-time join. These are outlined in the
0 commit comments