Skip to content

Implement measures to avoid blocking when flushing batch #350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
btasker opened this issue Oct 19, 2021 · 1 comment · Fixed by #356
Closed

Implement measures to avoid blocking when flushing batch #350

btasker opened this issue Oct 19, 2021 · 1 comment · Fixed by #356
Labels
enhancement New feature or request
Milestone

Comments

@btasker
Copy link
Contributor

btasker commented Oct 19, 2021

Proposal:

Create a "no-block" batch flush mode

Current behavior:

Currently, when writing in batch mode, there may be a period of delay when the batch is flushed and points are written to a distant InfluxDB instance - the interpreter has to switch context between threads to perform the write, which has knock on effects for the calling application.

For latency sensitive uses, this is problematic as it delays the calling application.

For example, a service offering an API might choose to write analytic data out to InfluxDB whenever an endpoint is called - so their responses to their own API consumers may be delayed if network conditions between the calling app and InfluxDB are sub-optimal.

With a batch size of 500, by calculating the time (in ms) between iterations, we can see the impact of the write (this is with a deliberately distant InfluxDB instance to really highlight the difference)

    with client.write_api(write_options=WriteOptions(batch_size=500)) as wa:
        for x in range(1, 1000):
            point = f"foo,bar=sed fieldval={x} {time.time_ns()}"
            wa.write(bucket="btasker+cloud2's Bucket", record=point)
            now = time.time_ns()
            delta = now - last
            print(f"{x}: {delta}")
            last = now
....

495: 6312
496: 6053
497: 6186
498: 6059
499: 6057
500: 2906668
501: 124634
502: 12492
503: 9888

Desired behavior:

It's possible to work around this by using multiprocessing and/or similar approaches.

What'd be good though, is if the client library could implement this itself so that it's abstracted away from developers - that way they won't need to generate boilerplate to address this issue.

In effect, in the example above there should be no more overhead/delay to the calling application on iteration 500 than there is on iteration 1,2,3 etc.

Use case:

The time impact of context switching will affect particularly latency sensitive applications.

@bednar
Copy link
Contributor

bednar commented Nov 3, 2021

The PR with MultiprocessingWriter is in review - #356. It will be publish in upcoming v1.24 release.

If you would like to use this feature before regular release, please install client via:

pip install git+https://github.com/influxdata/influxdb-client-python.git@feat/multiprocessing-writer

or by:

pip install git+https://github.com/influxdata/influxdb-client-python.git@master

it the PR is merged.

@bednar bednar added this to the 1.24.0 milestone Nov 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants