Skip to content

Commit 1e1218e

Browse files
committed
fix: remove json transformation
1 parent ce9e75c commit 1e1218e

File tree

5 files changed

+55
-45
lines changed

5 files changed

+55
-45
lines changed

aws_lambda_powertools/utilities/streaming/transformations/base.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,8 @@ def __init__(self, *args, **kwargs):
1616
@abstractmethod
1717
def transform(self, input_stream: IO[bytes]) -> T:
1818
"""
19-
Transform the data from input_stream into something that implements IO[bytes].
19+
Transforms the data from input_stream into an implementation of IO[bytes].
20+
2021
This allows you to return your own object while still conforming to a protocol
2122
that allows transformations to be nested.
2223
"""

aws_lambda_powertools/utilities/streaming/transformations/json.py

-39
This file was deleted.

docs/utilities/streaming.md

+11-5
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@ The streaming utility handles streaming data from AWS for processing data sets b
99

1010
* Simple interface to stream data from S3, even when the data is larger than memory
1111
* Read your S3 file using the patterns you already know to deal with files in Python
12-
* Includes common transformations to data stored in S3, like Gzip and Json deserialization
12+
* Includes common transformations to data stored in S3, like Gzip and CSV deserialization
1313
* Build your own data transformation and add it to the pipeline
1414

1515
## Background
1616

1717
Processing S3 files inside your Lambda function presents challenges when the file is bigger than the allocated
18-
amount of memory. Your data may also be stored using a set of encapsulation layers (gzip, JSON strings, etc).
18+
amount of memory. Your data may also be stored using a set of encapsulation layers (gzip, CSV, zip files, etc).
1919

2020
This utility makes it easy to process data coming from S3 files, while applying data transformations transparently
2121
to the data stream.
@@ -87,14 +87,20 @@ For instance, if you want to unzip an S3 file compressed using `LZMA` you could
8787
--8<-- "examples/streaming/src/s3_transform_lzma.py"
8888
```
8989

90+
Or, if you want to load a `TSV` file, you can just change the delimiter on the `CSV` transform:
91+
92+
```python hl_lines="12"
93+
--8<-- "examples/streaming/src/s3_transform_tsv.py"
94+
```
95+
9096
### Building your own data transformation
9197

9298
You can build your own custom data transformation by extending the `BaseTransform` class.
93-
The `transform` method receives an `io.RawIOBase` object, and you are responsible for returning an object that is also
94-
a `io.RawIOBase`.
99+
The `transform` method receives an `IO[bytes]` object, and you are responsible for returning an object that is also
100+
a `IO[bytes]`.
95101

96102
```python hl_lines="9 37 38"
97-
--8<-- "aws_lambda_powertools/utilities/streaming/transformations/json.py"
103+
--8<-- "examples/streaming/src/s3_json_transform.py"
98104
```
99105

100106
## Testing your code
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
import io
2+
from typing import IO, Optional
3+
4+
import ijson
5+
6+
from aws_lambda_powertools.utilities.streaming.transformations import BaseTransform
7+
8+
9+
# Using io.RawIOBase gets us default implementations of many of the common IO methods
10+
class JsonDeserializer(io.RawIOBase):
11+
def __init__(self, input_stream: IO[bytes]):
12+
self.input = ijson.items(input_stream, "", multiple_values=True)
13+
14+
def read(self, size: int = -1) -> Optional[bytes]:
15+
raise NotImplementedError(f"{__name__} does not implement read")
16+
17+
def readline(self, size: Optional[int] = None) -> bytes:
18+
raise NotImplementedError(f"{__name__} does not implement readline")
19+
20+
def read_object(self) -> dict:
21+
return self.input.__next__()
22+
23+
def __next__(self):
24+
return self.read_object()
25+
26+
27+
class JsonTransform(BaseTransform):
28+
def transform(self, input_stream: IO[bytes]) -> JsonDeserializer:
29+
return JsonDeserializer(input_stream=input_stream)
+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
from typing import Dict
2+
3+
from aws_lambda_powertools.utilities.streaming.s3_object import S3Object
4+
from aws_lambda_powertools.utilities.streaming.transformations import CsvTransform
5+
from aws_lambda_powertools.utilities.typing import LambdaContext
6+
7+
8+
def lambda_handler(event: Dict[str, str], context: LambdaContext):
9+
s3 = S3Object(bucket=event["bucket"], key=event["key"])
10+
11+
tsv_stream = s3.transform(CsvTransform(delimiter="\t"))
12+
for obj in tsv_stream:
13+
print(obj)

0 commit comments

Comments
 (0)