RFC: Buffer low-level logs and flush on high-level log #3410
Replies: 8 comments 9 replies
-
Hi @zirkelc, thank you so much for taking the time to open this RFC. I'll make sure to review it properly by early next week at the latest. |
Beta Was this translation helpful? Give feedback.
-
Hi @zirkelc! Thank you very much for kickstarting this RFC and helping us design the idea of implementing this buffering logic, this will be beneficial for all versions of Powertools, not just TypeScript. In this initial feedback, I will not focus on TypeScript (and code is just pseudocode), but on this feature as a whole, because we need to find a way to standardize this across all runtimes. New methodsI'm not sure if creating new methods like const bufferConfig = new LoggerBuffer(buffering=true, bufferSizer=1024);
const logger = new Logger(loggerBuffer=bufferConfig); This code might work better for TypeScript and Java/dotnet rather than Python, where we probably need to configure this in the Lambda Handler decorator. I need to test it, but the concept is the same: make this a default behavior instead of having different functions and different flush methods. Limit buffer sizeI like the idea of using a fixed default circular buffer size and allowing the customer to change that default size based on their needs. This highlights the shared responsibility model, where we provide some security default values as a safe guardrail, but the customer can change that and understand that it is their own responsibility. Also, the buffer size should be in bytes and not the number of log lines. What we need to understand better: inspect the log lines across all runtimes and see an average size when we log different events that Lambda gets. Flush buffer on runtime errorsThis is one of the hardest parts of this implementation. Due to Lambda's execution model and in cases where the customer is not using extensions - and we can't assume that customers always use extensions - we don't have access to the
I'm not sure I agree that we should rely on a dependency to do this. Correct me if I'm wrong, but it seems that not everyone is using Middy in their Lambda and this way we will force customers to install Middy to have this feature, which may be challenging for adoption. An alternative is to wrap the Lambda handler in a decorator (is this a possibility in TS and dotnet?) the same as we do with Python when the customer uses inject_lambda_context. In my opinion, if we can't solve this problem, I don't see customers adopting this feature for reasons such as: 1 - Storing DEBUG/ERROR logs in a buffer and only flushing them in case of an exception is the best benefit of this feature, in my opinion. This way, customers will have logs only for debugging in case of exceptions and will not pollute their logs with unnecessary debug logs for a normal execution. 2 - Customers can store information logs in a buffer, for example, and only release them at the end of the execution, but what is the benefit? Even though printing to stdout is a blocking operation, we have no reports that this affects Lambda execution in normal situations. Clearing buffer after successful execution
We need your opinion here in some aspects for dotnet @hjgraca. Again, thanks so much for starting this RFC @zirkelc and please let me know what you think. |
Beta Was this translation helpful? Give feedback.
-
Just so I don't forget, I'm linking this issue from Python repository here: aws-powertools/powertools-lambda-python#4432 |
Beta Was this translation helpful? Give feedback.
-
Hello everyone! Just a quick heads up, I'm running a PoC with some scenarios and validating critical points before we move forward with this RFC. We'll have some updates in the coming weeks. |
Beta Was this translation helpful? Give feedback.
-
Hi @zirkelc - thank you for your patience. It took me longer than I would have liked to get back to you on this. First of all thank you for putting time and effort into writing this thought out proposal. Here are a few comments that I have after a couple reads: Feature should be opt-in My first comment is that the feature should be completely opt-in. I am unsure if this was already the intention behind the RFC, but since I expect this feature to have some kind of impact on either performance or resource consumption I would want only customers who are actually interested in it to have this overhead. Doing this would also allow us to release the feature in a minor release rather than having to wait for a major one. Log level changed at runtime I'd like us to discuss what happens when customers change the log level at runtime. Specifically, the Logger allows customers to set a new log level after the log has been initialized via With this in mind, I'd like the RFC to have an agreement on what happens to buffered logs when the log level is changed to one that is above (aka more verbose) than the original one. For example: const logger = new Logger({ logLevel: 'warn', bufferLogs: true });
logger.info('foo'); // buffered
logger.setLevel('info'); // what happens to the buffered logs here? Log Sampling I'd also like to discuss how this new feature integrates with the log sampling existing one. For context, this existing feature allows customers to dynamically change the log level to This addresses addresses a similar concern to this proposed feature, albeit from a different angle since for log sampling the assumption is that if there’s a consistent issue in my workload it’ll show up in a statistically relevant sample of my requests. With this in mind, what happens when a request is sampled (aka the log level is set to Data structure The proposal suggests we should implement our own For example, in this case I could see us using an When to flush logs If I understand the proposal well, especially in the section that details the From that section I understand the const logger = new Logger({ logLevel: ‘info’ });
logger.debug(‘foo’);
logger.debug(‘bar’);
logger.info(‘baz’); // all logs are emitted here Since the original assumption behind this feature is that as a customer I want to have all the logs to diagnose an issue, I would rather instead do one of the following two options:
This would cover most of the cases where as a developer I want to get extra info because I know there was an error. The only case not covered by this mechanism is unhandled exceptions, which can’t be addressed today and will be addressed by a separate feature. Cleanup buffer after successful execution I don’t think we should rely on a Middy.js middleware or class method decorator for this purpose since not everyone uses these ways of wrapping a Lambda handler. I think we should instead modify the data structure we use to hold the buffered logs to be namespaced so that log entries from one request can never interfere with others. This would also make the cleaning process more straightforward. For example, if we had a structure like a Map where the keys are a request identifier (TBD) and the value is an Array/Set of logs, then the The main issue I see with this approach is that we’d need to find a request identifier that is widely available and easy to access. The The request ID could be another candidate but just like the previous one, it's also not present outside the request cycle and it also requires customers to explicitly pass a reference of the Something I mentioned during our initial discussion in the linked issue would be to leverage language features like Limit buffer size I share @leandrodamascena's sentiment about having a circular buffer with configurable size in bytes. In addition to his comment about analyzing average log item size to define a default size I would also like to understand the performance impact of actually inspecting the log size and whether there's anything we can do to limit this impact. In terms of questions from my side, I have mainly one: Should we buffer logs emitted outside the Lambda handler if buffering is enabled? I can see us wanting to do this, but doing so complicates even further the problem of "how do we clean up after each request" since these logs are inherently not linked to any invocation. |
Beta Was this translation helpful? Give feedback.
-
Hey @zirkelc , thanks for the RFC, Leandro and Andrea already dived deep into most critical questions I had on my mind. One addition I noticed is missing (or maybe I missed it), is the information about buffer size overflow. If the buffer is fixed and we replace logs, I would suggest to add a warning to indicate that it happened. This will help customers to see that they might need to increase the buffer size and that some logs are missing. |
Beta Was this translation helpful? Give feedback.
-
I have a PoC running in Python with most of all the things that we discussed here.. The experience is something like this: from aws_lambda_powertools import Logger
from aws_lambda_powertools.logging.buffer import BufferConfig
buffer_config = BufferConfig(max_size=10240, compress=False)
logger = Logger(service="payment", buffer_config=buffer_config) I'm reviewing the RFC again and I have 2 final questions: 1 - Should we allow customers to set max_size and/or max_lines, discarding the oldest entries when either limit is reached? 2 - Does it make sense to compress the buffer? I was running a test and 1k log entries in debug mode and they are 100kb without compressing and 45kb compressing. But I'm not sure if someone is buffering 1k log entries and will want to flush it in case of 1 error, what do you think? The CPU consumption doesn't change much, but I still need to do some more detailed tests. I'm just waiting for @dreamorosi's final considerations to submit the PR in Python. Thanks |
Beta Was this translation helpful? Give feedback.
-
Just wanted to update the RFC sharing that the feature was released in v2.16.0 - let us know what you think! |
Beta Was this translation helpful? Give feedback.
-
Note
The RFC is now closed, the latest spec for the feature can be found here, while implementation can be tracked here (#519) for TypeScript and here (aws-powertools/powertools-lambda-python#6060) for Python.
Is this related to an existing feature request or issue?
#519
#3142
Which area does this RFC relate to?
Logger
Summary
Log items which are below the configured log level should be buffered in memory and flushed to CloudWatch on error.
Use case
The Logger log level controls which log items are written to stdout and which are discarded.
There are five log levels: TRACE, DEBUG, INFO, WARN, ERROR.
The log level is set on instantiation, for example:
The Logger has methods for each log level to create log items:
Log items can be categorized into low-level and high-level logs.
Low-level log items are those which are below the configured log level (log < log level).
High-level log items are those which are at the or above the configured log level (log >= log level).
Low-level logs are discarded while high-level logs are written to stdout:
In a development environment, we typically set the log level to DEBUG or INFO since we use these logs for debugging purposes.
On the other hand, in production, we usually set the log level to WARN or ERROR, because we have much more traffic and want to avoid the noise in the CloudWatch logs.
However, when an error occurs, we want to have as much context as possible to understand and diagnose the root cause. Therefore, we want to write all low-level logs to stdout, regardless of their log level.
For example, a public Lambda Function URL serving as backend for a web application:
Since such Lambda function is invoked frequently, it should avoid writing unnecessary logs.
However, in case of an error we want to log everything like the request and response.
This includes the original
event
that was not logged because of the INFO level.Proposal
The following is a high-level design proposal with pseudo-typescript code how it could be implemented.
Logger.processLogItem()
method to buffer and flush low-level logsBuffer
class to store buffered log items on theLogger
instanceLogger.writeBuffer()
to write a log item to the bufferLogger.flushBuffer()
to flush the buffered log items to stdoutLogger.printLog()
method to accept a serialized log item from the bufferLogger.processLogItem()
The
Logger.processLogItem
method is called by the different public log methods (info
,warn
,error
, etc.).Therefore it would make sense to extend this method to implement the proposed behavior:
Buffer class
The
Buffer
class stores buffered log items on theLogger
instance.The exact implementation is yet to be determined.
Logger.writeBuffer()
The
Logger.writeBuffer()
method writes a log item to the buffer.Since the log items contain
extraInput
which is an object, we need to make sure that the object is serialized to a string when it is buffered in case it is mutated by the application.Logger.flushBuffer()
The
Logger.flushBuffer()
method flushes all buffered log items to stdout.Since the log items were serialized when they were buffered, we can write them directly to stdout without the need to serialize them again.
Logger.printLog()
The
Logger.printLog()
method is changed to accept a serialized log item as input.In this case, the log item is written directly to stdout.
Out of scope
Code outside of this project is out of scope, like Lambda Runtime APIs, etc.
Potential challenges
The following challenges could be identified:
Cleanup buffer after successful execution
The Lambda execution environment will be re-used for multiple invocations. If the
Logger
instance is created outside of the handler function, the buffer could contain stale log items from previous invocations. Ideally, the buffer will be cleared after each invocation.Possible solutions
middy
to clear the buffer after each invocation.Flush buffer on runtime errors
If the Lambda function throws an unhandled error, the buffer could contain important log items that were not written to stdout.
Possible solutions
middy
to flush the buffer on error.Limit buffer to avoid memory issues
Since the buffer stores serialized log items, it could grow very large and consume a lot of memory, potentially causing an out of memory error. Therefore, the buffer should be limited depending on certain metrics like number of entries and total buffer size.
Limit buffer length
The buffer could be implemented as a sliding window of fixed length.
New items are added to the buffer until its length reaches the threshold. Then, every new item drops the oldest item from the buffer, therefore the buffer length remains constant.
Limit buffer size
The buffer keeps track of the total size of serialized log items.
If the total size of serialized log items reaches the threshold, the buffer drops the oldest log items until the total size is below the threshold.
Dependencies and Integrations
This feature could be integrated with a middleware like Middy.js to:
Alternative solutions
No response
Acknowledgment
Future readers
Please react with 👍 and your use case to help us understand customer demand.
Beta Was this translation helpful? Give feedback.
All reactions