-
Notifications
You must be signed in to change notification settings - Fork 534
Supporting byte streams #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is the same problem with Lists, Maps, Sets, and basically every non-trivial data type. If you want to control the size of the emitted ByteStrings you can always add a chunker element. While I agree that the use-case of ByteString streams is common (I am writing a streaming decoder tool right now), I don't believe the problem outlined here is very relevant in practice. As for communication over the network, the emitted ByteString sizes will be limited by send/receive buffers anyway, it does not matter how large the original sent chunk was. |
I don't think it's quite the same problem. Lists, Maps and other data structures have semantic meaning: you often really do want a stream of Map[String, Int] and not just a stream of (String, Int). Whereas byte streams only use arrays because a stream of actual bytes would be very inefficient. Network: I'm not sure I understand you correctly. If a 'reactive streams over sockets' protocol is implemented as proposed in #45, would it not need to communicate applicative messages equivalent to |
TCPs semantics are problematic here. One is pointed out by the difference in the units of |
@danarmak The problems of buffering and of back pressure are not quite as coupled as you portray them to be: the One important thing to note is that back pressure is mediated locally at every asynchronous boundary, so your example of the ZIP bomb can be fixed quite easily by making the extraction stream-based and using a Publisher to hand out the resulting byte stream piece by piece, as requested. That way you retain full control over the memory usage. Having requested one unit on the input side can very well generate a large number of units on the output side. |
@rkuhn In the case of the zip bomb, we need to manually configure each producer to produce ByteStrings of the correct size (or in the correct size range). Consumers will also need to know this size in order to use request(n) correctly. Maybe manual configuration will be enough; I worried that doing so across different producer/consumer implementations might be a problem. The TCP problem is different. IIUC, you say you'd rely on TCP's own back pressure mechanism and just ignore the 'n' parameter of request(n) over the TCP link. That still wouldn't work over UDP. The question is really whether semi-manual configuration of 'block size range' for each producer/consumer will be enough. Probably it will be, it just won't be as automatic or convenient as with non-byte-streams. And maybe that's good enough to keep the API simple. If everyone feels that way, then please close this ticket. |
Related to this, I could have a I see byte streams as similar, because in reality a stream is broken into discrete chunks (say There was discussion at some point to do something like Considering this, is there an API change that could be made to |
I agree that there isn't a simple change to the API that would solve this problem. And a complex change is not warranted. So I'm closing this. Thanks for discussing. Also, I'm writing a Reactive Streams implementation based on Futures (for internal use, but I'm trying get permission to opensource it). We will use byte streams in many places, so I'll see for myself how this issue resolves itself. Maybe it will turn out to be enough to use sane defaults everywhere. |
I'm concerned that the API doesn't support a very common use case: streams of bytes. (Or, less commonly, chars, ints, bits, etc.) These types all have in common the fact that they are transported not one by one, but in arrays, buffers, strings, etc. And these buffer types don't have size limits as part of the type.
This creates a problem: if a Subscriber[ByteString] calls request(1), it will get one more ByteString - but of unknown size. This conflicts with the basic requirement of back pressure. Imagine a stream with an 'unzip' processor; a naive implementation would create zip bombs.
The programmer could manually configure all components in a reactive stream pipeline to emit arrays within min-max size limits. But this would mean hand-tuning for performance (since optimal buffer size varies with component), instead of relying on automatic back pressure communication.
Worse, if the programmer doesn't control both ends of a channel, he won't be able to rely on the behavior of the other side - which might be written using a different Reactive Streams implementation, a different language, or be across a network. This also limits the usability of a language-specific custom type like a size-limited ByteString.
I think this usecase will be common, so I'm suggesting it should be addressed in the API. What do you think?
The text was updated successfully, but these errors were encountered: