Skip to content

Commit 28e2040

Browse files
heitorlessarubenfonseca
authored andcommitted
docs: fix seek positioning and byte size
1 parent 23f672a commit 28e2040

File tree

3 files changed

+6
-8
lines changed

3 files changed

+6
-8
lines changed

docs/utilities/streaming.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ For example, let's imagine you have a large CSV file, each row has a non-uniform
107107

108108
You found out the last row has exactly 30 bytes. We can use `seek()` to skip to the end of the file, read 30 bytes, then transform to CSV.
109109

110-
```python title="Reading only the last CSV row" hl_lines="16 18"
110+
```python title="Reading only the last CSV row" hl_lines="16 19"
111111
--8<-- "examples/streaming/src/s3_csv_stream_non_uniform_seek.py"
112112
```
113113

@@ -121,7 +121,7 @@ You can also solve with `seek`, but let's take a large uniform CSV file to make
121121
--8<-- "examples/streaming/src/uniform_sample.csv"
122122
```
123123

124-
You found out that each row has 8 bytes, the header line has 22 bytes, and every new line has 1 byte.
124+
You found out that each row has 8 bytes, the header line has 21 bytes, and every new line has 1 byte.
125125

126126
You want to skip the first 100 lines.
127127

examples/streaming/src/s3_csv_stream_non_uniform_seek.py

+2-4
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,8 @@
1212
def lambda_handler(event: Dict[str, str], context: LambdaContext):
1313
sample_csv = S3Object(bucket=event["bucket"], key="sample.csv")
1414

15-
# Jump to the end of the file
16-
sample_csv.seek(0, io.SEEK_END)
17-
# From the current position, jump exactly 30 bytes
18-
sample_csv.seek(sample_csv.tell() - LAST_ROW_SIZE, io.SEEK_SET)
15+
# From the end of the file, jump exactly 30 bytes backwards
16+
sample_csv.seek(-LAST_ROW_SIZE, io.SEEK_END)
1917

2018
# Transform portion of data into CSV with our headers
2119
sample_csv.transform(CsvTransform(fieldnames=CSV_HEADERS), in_place=True)

examples/streaming/src/s3_csv_stream_seek.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717

1818
CSV_HEADERS = ["reading", "position", "type"]
1919
ROW_SIZE = 8 + 1 # 1 byte newline
20-
HEADER_SIZE = 22 + 1 # 1 byte newline
20+
HEADER_SIZE = 21 + 1 # 1 byte newline
2121
LINES_TO_JUMP = 100
2222

2323

@@ -28,7 +28,7 @@ def lambda_handler(event: Dict[str, str], context: LambdaContext):
2828
sample_csv.seek(HEADER_SIZE, io.SEEK_SET)
2929

3030
# Jump 100 lines of 9 bytes each (8 bytes of data + 1 byte newline)
31-
sample_csv.seek(LINES_TO_JUMP * ROW_SIZE, io.SEEK_SET)
31+
sample_csv.seek(LINES_TO_JUMP * ROW_SIZE, io.SEEK_CUR)
3232

3333
sample_csv.transform(CsvTransform(), in_place=True)
3434
for row in sample_csv:

0 commit comments

Comments
 (0)