You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Python 2 default string type (`<str>`) is a binary string,
non-unicode. We receive data from a socket, from a Popen stream, from a
file as a string and operate on those strings without any conversions.
Python 3 draws a line here. We usually operate on unicode strings in the
code (because this is the default string type, `<str>`), but receive
bytes from a socket and a Popen stream. We can use unicode or binary
streams for files (unicode by default[^1]).
This commit decouples bytes and strings. In most cases it means that we
convert data from bytes to a string after receiving from a socket /
Popen stream and convert it back from a string to bytes before writting
to a socket. Those operations are no-op on Python 2.
So, the general rule for our APIs is to accept and return `<str>`
disregarding Python version. Not `<bytes>`, not `<unicode>`.
The only non-trivial change is around `FilteredStream` and writes into
`sys.stdout`. The `FilteredStream` instance replaces `sys.stdout` during
execution of a test, so it should follow the usual convention and accept
`<str>` in the `write()` method. This is both intuitive and necessary,
because `*.py` tests rely on `print('bla bla')` to write into a result
file.
However the stream should also accept `<bytes>`, because we have a unit
test (`unit/json.test`), which produces a binary output, which does not
conform UTF-8 encoding. The separate `write_bytes()` method was
introduced for this sake. UnittestServer and AppServer write tests
output as bytes directly, TarantoolServer rely on the usual string
output.
We also use bytes directly, when write from one stream to another one:
in `app_server.py` for stderr (writting to a log file), in
`tarantool_server.py` for log destination property (because it is the
destination for Popen).
[^1]: Technically it depends on a system locale, but, hey, does anyone
see a non UTF-8 locale after the millennium?
Part of #20
Co-authored-by: Sergey Bronnikov <[email protected]>
0 commit comments