Skip to content
This repository was archived by the owner on Mar 13, 2022. It is now read-only.
This repository was archived by the owner on Mar 13, 2022. It is now read-only.

select should not be used in WSClient because python-websockets does its own buffering #106

Closed
@PaulFurtado

Description

@PaulFurtado

I'm hitting an issue where some exec calls made via the python client hang.

For background, I'm in the process of migrating my cluster from docker to cri-o and I have a suite of acceptance tests written in python that I run against the cluster to verify functionality. A good deal of of the tests exec commands inside pods and upon switching to cri-o, a certain command in the tests hangs/times out every time. Initially, I thought there was a bug in cri-o, but If the same command is executed via kubectl, it does not hang, so I started diving into the python client.

I believe I've tracked the problem down to the select call that is made in WSClient:

r, _, _ = select.select(
(self.sock.sock, ), (), (), timeout)

If I remove that select call, commands never hang.

I think that it's invalid to call select there because select is a system call that checks if a socket ready to be read/written, however, with websockets, we're several layers of abstraction away from the underlying system socket that select is checking and buffering is occurring at each of the layers. I suspect what is happening is:

  1. recv_data_frame is called. The underlying recv call on the socket uses a fixed buffer size and receives multiple frames and recv_data_frame returns only one of them.
  2. on the next iteration, we check if the underlying socket has any bytes available using select. It does not, but there is a frame waiting in the buffer in websocket-client that recv_data_frame would instantly return.

So I think the solution is to simply remove that select call, but I'm curious why it was added in the first place and whether removing it breaks some expectation. The only thing that I can think of is that it's being used to make read timeouts possible, however, if so, that is still buggy because the select could return if 1 byte is available to read, but then the recv calls in websocket-client would still block waiting on a complete frame.

@mbohlool what do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions