Skip to content

Read optimization #1697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

rusher
Copy link
Contributor

@rusher rusher commented Apr 18, 2025

readPacket Optimization Summary

Problem

The readPacket function is called very frequently during database operations, but it repeatedly checks configuration options that don't change after a connection is established (such as compression and ReadTimeout settings).

Solution

Create a dedicated function to handle packet reading after connection establishment, eliminating redundant checks of unchanging configuration options.

Performance Impact

This is not just a micro-optimization, as readPacket is called extremely frequently during database operations.

Benchmark Results

Before Optimization:

goos: linux
goarch: amd64
pkg: github.com/go-sql-driver/mysql
cpu: 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz
BenchmarkSelect10000rows
BenchmarkSelect10000rows-16    	     832	   1397179 ns/op

After Optimization:

BenchmarkSelect10000rows-16    	    1350	    886636 ns/op

Checklist

  • Code compiles correctly
  • Created tests which fail without the change (if possible)
  • All tests passing
  • Extended the README / documentation, if necessary
  • Added myself / the copyright holder to the AUTHORS file

Summary by CodeRabbit

  • New Features

    • Added a new benchmark for selecting 10,000 rows from a MariaDB database to measure performance.
  • Refactor

    • Streamlined how data is read from the database connection, centralizing read logic and improving handling of compression and timeouts.
    • Improved test setup for connection and packet reading logic, ensuring consistent and explicit configuration in tests.
  • Style

    • Fixed a minor comment typo for clarity.

rusher added 2 commits March 31, 2025 18:04
* ensuring performance schema is enabled when testing some performance schema results
* Added logic to check if the default collation is overridden by the server character_set_collations
* ensure using IANA timezone in test, since tzinfo depending on system won't have deprecated tz like "US/Central" and "US/Pacific"
Since configuration options doesn't change at runtime, after connection is established, use dedicated function, in order to avoid multiple test test compress, checking ReadTimeout configuration option
Copy link

coderabbitai bot commented Apr 18, 2025

Walkthrough

The changes introduce a refactor of the mysqlConn struct's data reading mechanism by adding two new function fields, readFunc and readNextFunc, to centralize and abstract how data is read, including support for timeouts and compression. The logic for selecting the appropriate reading function is moved from scattered conditional branches into these fields, which are set during connection setup and when compression or TLS is negotiated. Associated tests are updated to explicitly set these function fields, ensuring consistent and predictable behavior. Additionally, a new benchmark for selecting 10,000 rows from MariaDB is added.

Changes

File(s) Change Summary
benchmark_test.go Added BenchmarkSelect10000rows to benchmark selecting 10,000 rows from MariaDB, including setup and error checks.
compress_test.go Set readNextFunc and readFunc on mysqlConn in uncompressHelper before calling readPacket().
connection.go Added readFunc and readNextFunc fields to mysqlConn; removed readWithTimeout method; fixed comment typo.
connection_test.go Updated TestInterpolateParams to create/reuse buffer and TCPConn; explicitly set readNextFunc and readFunc.
connector.go Set readFunc and readNextFunc on mysqlConn during connection setup; handle timeout and compression logic.
packets.go Refactored readPacket to use readNextFunc and readFunc; set readFunc after TLS upgrade in handshake.
packets_test.go Updated tests to create/reuse buffer; explicitly set readNextFunc and readFunc on mysqlConn in helpers/tests.

Sequence Diagram(s)

sequenceDiagram
    participant Test/Benchmark
    participant mysqlConn
    participant Buffer/CompIO
    participant NetConn

    Test/Benchmark->>mysqlConn: Initialize (set readFunc/readNextFunc)
    mysqlConn->>Buffer/CompIO: Assign readNextFunc (compression aware)
    mysqlConn->>NetConn: Assign readFunc (with/without timeout)
    Test/Benchmark->>mysqlConn: Call readPacket()
    mysqlConn->>Buffer/CompIO: Call readNextFunc(readFunc)
    Buffer/CompIO->>NetConn: Call readFunc
    Buffer/CompIO-->>mysqlConn: Return data
    mysqlConn-->>Test/Benchmark: Return packet data
Loading

Poem

In tunnels of code where the data flows fast,
A rabbit refactored the reading at last.
With functions assigned for each way to read,
Compression or timeout, it’s all guaranteed!
Benchmarks and tests now hop right along—
The code is more clear, the logic more strong.
🐇✨

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0eee81 and 5c7f33a.

📒 Files selected for processing (3)
  • benchmark_test.go (1 hunks)
  • connection_test.go (1 hunks)
  • packets.go (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • connection_test.go
  • packets.go
  • benchmark_test.go
⏰ Context from checks skipped due to timeout of 90000ms (20)
  • GitHub Check: test (ubuntu-latest, 1.22, 9.0)
  • GitHub Check: test (windows-latest, 1.24, mariadb-10.5)
  • GitHub Check: test (ubuntu-latest, 1.23, 9.0)
  • GitHub Check: test (windows-latest, 1.24, mariadb-10.11)
  • GitHub Check: test (windows-latest, 1.24, mariadb-10.6)
  • GitHub Check: test (windows-latest, 1.24, mariadb-11.1)
  • GitHub Check: test (windows-latest, 1.24, 5.7)
  • GitHub Check: test (windows-latest, 1.24, mariadb-11.2)
  • GitHub Check: test (windows-latest, 1.24, 9.0)
  • GitHub Check: test (windows-latest, 1.24, mariadb-11.4)
  • GitHub Check: test (windows-latest, 1.24, 8.4)
  • GitHub Check: test (macos-latest, 1.24, mariadb-10.6)
  • GitHub Check: test (macos-latest, 1.24, 8.4)
  • GitHub Check: test (macos-latest, 1.24, 8.0)
  • GitHub Check: test (macos-latest, 1.24, 9.0)
  • GitHub Check: test (ubuntu-latest, 1.24, mariadb-11.4)
  • GitHub Check: test (ubuntu-latest, 1.24, 8.0)
  • GitHub Check: test (ubuntu-latest, 1.24, mariadb-10.6)
  • GitHub Check: test (ubuntu-latest, 1.24, mariadb-10.11)
  • GitHub Check: test (ubuntu-latest, 1.24, 5.7)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
benchmark_test.go (1)

116-155: Well-designed benchmark for measuring read performance improvements

This benchmark effectively demonstrates the performance optimization from the PR by selecting 10,000 rows from a MariaDB sequence table, which triggers many calls to the optimized readPacket function. The benchmark is well-structured with proper error handling and resource cleanup.

Consider starting the timer after statement preparation to isolate just the row-reading performance:

-    b.StartTimer()
     stmt, err := db.Prepare("SELECT * FROM seq_1_to_10000")
     if err != nil {
         b.Fatalf("Failed to prepare statement: %v", err)
     }
     defer stmt.Close()
+    b.StartTimer()
     for n := 0; n < b.N; n++ {

This would provide more precise measurement focusing exclusively on the row reading performance, which is where your optimization has the most impact.

packets.go (1)

29-112: Excellent read optimization in packet handling

The changes to readPacket effectively eliminate repeated conditional checks on compression and timeout settings by using function pointers that are initialized once during connection setup. This is a significant performance optimization for a hot path in the codebase.

The approach replaces direct buffer reads with calls to the appropriate read functions via readNextFunc and readFunc function fields, which are set only once when the connection is established.

Consider adding a comment before the readPacket function that briefly explains the optimization approach, which would help future maintainers understand the design pattern:

// Read packet to buffer 'data'
// Uses readNextFunc and readFunc function fields for optimized reading
// that avoids repeated checks of unchanging configuration options
func (mc *mysqlConn) readPacket() ([]byte, error) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 879eb11 and f0eee81.

📒 Files selected for processing (9)
  • AUTHORS (1 hunks)
  • benchmark_test.go (1 hunks)
  • compress_test.go (1 hunks)
  • connection.go (2 hunks)
  • connection_test.go (1 hunks)
  • connector.go (3 hunks)
  • driver_test.go (4 hunks)
  • packets.go (3 hunks)
  • packets_test.go (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
connection_test.go (1)
dsn.go (1)
  • Config (37-84)
🔇 Additional comments (22)
AUTHORS (1)

40-40: Author added for contribution

The addition of "Diego Dupin" to the AUTHORS file properly acknowledges the contributor responsible for the read optimization work per the project's guidelines.

compress_test.go (1)

43-44: Explicit assignment of read functions for compression support

These additions are required to support the refactored read mechanism. The test now explicitly sets the read function pointers that were previously handled implicitly, ensuring proper packet reading during compression tests.

connection.go (2)

42-43: Added function fields to abstract reading operations

The addition of readFunc and readNextFunc function fields to the mysqlConn struct is the core of this optimization. These fields allow the packet reading logic to be established once at connection time rather than checking configuration options on every read operation.

This design improves performance by avoiding repeated checks of unchanging configuration values like compression settings and read timeout during the frequently called packet reading operations.


242-242: Fixed typo in comment

Corrected "its" to "it's" in the comment for better grammar.

connection_test.go (4)

21-22: Refactored test setup to use predefined buffer and connection

Extracted buffer and connection creation into variables for reuse and improved clarity in the test setup.


24-25: Using extracted variables in mysqlConn initialization

The test now uses the predefined buffer and connection objects, making the code more consistent and maintainable.


30-31: Initialized read function fields for test

These additions explicitly set the read function fields to match the new connection.go implementation. This ensures the test works correctly with the refactored read mechanism and validates the correct behavior of the interpolation functionality with the new design.


47-52:

❌ Incorrect review comment

Verify other tests are consistent with the new design

The other test functions in this file (TestInterpolateParamsJSONRawMessage and subsequent tests) don't initialize the new read function fields. While these tests may be working without errors (since they don't actually perform network reads), it would be more consistent to update them similarly to the first test.


🏁 Script executed:

#!/bin/bash
# Check if other tests in this file are actually reading packets
rg -A 5 "readPacket|read.*Next" connection_test.go

Length of output: 211


No action needed: other tests already initialize the new read functions
The search confirms that subsequent tests in connection_test.go already set both readNextFunc: buf.readNext and readFunc: nc.Read. No consistency updates are required.

Likely an incorrect or invalid review comment.

packets.go (1)

367-368: Correctly handling TLS connection upgrade

This change ensures that after upgrading to TLS, subsequent reads use the TLS connection's Read method directly, maintaining the optimization pattern established in this PR.

connector.go (2)

134-148: Efficient read function initialization

This code efficiently initializes the read function pointers based on connection configuration. By setting up the appropriate read functions once during connection initialization, the code eliminates the need to check these conditions on every read operation, which is the core of this optimization.

The approach is particularly effective for the ReadTimeout configuration, where it creates a closure that sets the deadline before each read, but only creates this closure once.


190-191: Proper handling of compressed connections

This change correctly updates the read function pointer when compression is enabled, maintaining the optimization pattern established in this PR. This ensures that the compressed I/O's readNext method is used directly without conditional checks on each packet read.

packets_test.go (5)

97-113: Updated test helpers for new read function fields

This change correctly initializes the new read function fields in the test helper, which is necessary to make tests work with the updated code. It also improves efficiency by reusing the same buffer instance for both the buf field and the readNextFunc function.


115-138: Updated test to properly initialize read functions

This test update correctly initializes the read function fields required by the new packet reading implementation. The test still effectively verifies the same behavior while accommodating the structural changes from the optimization.


172-280: Properly adapted packet split test

This test update correctly initializes the read function fields required by the new packet reading implementation. The core logic of the test remains unchanged, continuing to verify that large packets are correctly split and reassembled.


282-326: Updated packet fail tests

This test update correctly initializes the read function fields required by the new packet reading implementation. The test scenarios for error handling remain unchanged, ensuring that error handling continues to work properly with the optimized code.


328-364: Updated regression test

This test update correctly initializes the read function fields required by the new packet reading implementation. The regression test still effectively verifies the same behavior while accommodating the structural changes from the optimization.

driver_test.go (6)

1633-1658: Good addition for MariaDB compatibility!

This code properly handles the case where MariaDB might override the default collation for a character set via the character_set_collations system variable. It parses the charset=collation pairs and uses them to determine the expected collation value, making the test more robust.


1665-1672: Proper error handling for the collation check

This conditional statement nicely complements the MariaDB compatibility code by verifying that the collation is either the original expected value or the forced (overridden) value determined by the MariaDB configuration.


1721-1721: Good update to standard IANA timezone names

Switching from US/Central and US/Pacific to America/New_York and Asia/Hong_Kong uses more standard IANA timezone identifiers, which is recommended practice.


1729-1730: Timezone reference update consistent with the list change

This correctly updates the reference time location to use America/New_York instead of US/Central, consistent with the timezone list change above.


1749-1749: Updated error message matches new timezone

The error message correctly references the new timezone name for consistency.


3577-3585: Robust check for performance schema availability

This new check prevents the test from failing on MySQL instances where the performance schema is disabled. Good practice to check for prerequisites before running tests that depend on specific configurations.

@methane
Copy link
Member

methane commented Apr 18, 2025

please remove unrelated changes.

@coveralls
Copy link

coveralls commented Apr 19, 2025

Coverage Status

coverage: 82.989% (-0.06%) from 83.048%
when pulling 5c7f33a on rusher:read_optimization
into c786d41 on go-sql-driver:master.

@methane
Copy link
Member

methane commented Apr 21, 2025

nice catch about performance overhead. But I don't think "repeatedly checks configuration options" cause it. And I don't like solution too.
Would you create a PR that only add benchmark?

@rusher
Copy link
Contributor Author

rusher commented Apr 22, 2025

added the PR #1703 as requested that correspond only to the benchmark part of this PR
(working only with MariaDB server, since for MySQL it would requires way more lines, and server is not what we are testing here).

@rusher
Copy link
Contributor Author

rusher commented Apr 22, 2025

But I don't think "repeatedly checks configuration options" cause it. And I don't like solution too.

You'll see the only related change in this PR that is used in benchmark are those checks (most of the code is just adapting the test to this changes)

I've specifically use a big resultset with small value. When reading rows, most of the code use is ReadPacket. This show the cost of those repeted checks.

@methane
Copy link
Member

methane commented Apr 23, 2025

You'll see the only related change in this PR that is used in benchmark are those checks (most of the code is just adapting the test to this changes)

See #1705.

@methane methane closed this Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants