Implement measures to avoid blocking when flushing batch #350

btasker · 2021-10-19T09:47:29Z

Proposal:

Create a "no-block" batch flush mode

Current behavior:

Currently, when writing in batch mode, there may be a period of delay when the batch is flushed and points are written to a distant InfluxDB instance - the interpreter has to switch context between threads to perform the write, which has knock on effects for the calling application.

For latency sensitive uses, this is problematic as it delays the calling application.

For example, a service offering an API might choose to write analytic data out to InfluxDB whenever an endpoint is called - so their responses to their own API consumers may be delayed if network conditions between the calling app and InfluxDB are sub-optimal.

With a batch size of 500, by calculating the time (in ms) between iterations, we can see the impact of the write (this is with a deliberately distant InfluxDB instance to really highlight the difference)

    with client.write_api(write_options=WriteOptions(batch_size=500)) as wa:
        for x in range(1, 1000):
            point = f"foo,bar=sed fieldval={x} {time.time_ns()}"
            wa.write(bucket="btasker+cloud2's Bucket", record=point)
            now = time.time_ns()
            delta = now - last
            print(f"{x}: {delta}")
            last = now
....

495: 6312
496: 6053
497: 6186
498: 6059
499: 6057
500: 2906668
501: 124634
502: 12492
503: 9888

Desired behavior:

It's possible to work around this by using multiprocessing and/or similar approaches.

What'd be good though, is if the client library could implement this itself so that it's abstracted away from developers - that way they won't need to generate boilerplate to address this issue.

In effect, in the example above there should be no more overhead/delay to the calling application on iteration 500 than there is on iteration 1,2,3 etc.

Use case:

The time impact of context switching will affect particularly latency sensitive applications.

The text was updated successfully, but these errors were encountered:

bednar · 2021-11-03T09:14:20Z

The PR with MultiprocessingWriter is in review - #356. It will be publish in upcoming v1.24 release.

If you would like to use this feature before regular release, please install client via:

pip install git+https://github.com/influxdata/influxdb-client-python.git@feat/multiprocessing-writer

or by:

pip install git+https://github.com/influxdata/influxdb-client-python.git@master

it the PR is merged.

bednar added the enhancement New feature or request label Oct 19, 2021

bednar added the state: in progress label Oct 27, 2021

bednar mentioned this issue Nov 2, 2021

feat: add MultiprocessingWriter to help user write data in independent OS process #356

Merged

6 tasks

bednar removed the state: in progress label Nov 4, 2021

bednar closed this as completed in #356 Nov 12, 2021

bednar added this to the 1.24.0 milestone Nov 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement measures to avoid blocking when flushing batch #350

Implement measures to avoid blocking when flushing batch #350

btasker commented Oct 19, 2021

bednar commented Nov 3, 2021

Implement measures to avoid blocking when flushing batch #350

Implement measures to avoid blocking when flushing batch #350

Comments

btasker commented Oct 19, 2021

bednar commented Nov 3, 2021