-
Notifications
You must be signed in to change notification settings - Fork 2.3k
packets: implemented compression protocol #649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Could you show benchmark result for compressed, uncompressed (new) and uncompressed (current)? |
…kwards compatibility
For reference, here is the documentation I consulted. Benchmarks:
Here are the new benchmark stats:
Thank you for your comments on my pull request! |
connection.go
Outdated
parseTime bool | ||
strict bool | ||
reader packetReader | ||
writer io.Writer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to carry around both reader
and writer
as well as buf
and netConn
?
Can't those be unified into one interface?
This interface might also have some sort of sequence reset function. Then sequence
and compressionSequence
could be internal states of the respective struct implementing the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of your comment is that you suggest merging reader
and buf
, and writer
and netConn
. Because both buf
and netConn
are used in various capacities outside of simple reading and writing (e.g. setting deadlines or getting the complete buffer) , it's not easily possible to subsume buf
and netConn
into reader and writer respectively.
I wrote things thusly to avoid having the addition of compression significantly affect buffer.go. While this refactoring sounds like a great idea, I think that it may be out of the scope of this ticket, and would be better undertaken in a separate ticket with some design/brainstorming done beforehand. This would also avoid obfuscating the scope of this pull request, which was only the addition of the compression protocol.
In re: sequence numbers:
Because sequence and compression sequence need to be reset on the basis of higher level actions (eg. issuing a new mysql command), it seems that a writer-level abstraction should not know about specific mysql connection logic.
compress.go
Outdated
} | ||
|
||
func (cw *compressedWriter) writeComprPacketToNetwork(data []byte, uncomprLength int) error { | ||
data = append(cw.header, data...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't avoid allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid allocation, you can require 7byte space is reserved in payload to caller.
Caller can reserve it like:
cHeader := make([]byte, 7)
var cBuff = new(bytes.Buffer)
zw := zlib.NewWriter(cBuff)
...
cBuff.Reset()
cBuff.Write(cHeader)
zw.Reset(cBuff)
...
|
||
maxPayloadLength := maxPacketSize - 4 | ||
|
||
for length >= maxPayloadLength { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why both of length := len(data) - 4
and maxPayloadLength := maxPacketSize - 4
required?
How about this?
maxPayload := maxPacketSize - 4
for len(data) > maxPayload {
...
data = data[maxPayload:]
}
compress.go
Outdated
lenSmall := len(dataSmall) | ||
|
||
var b bytes.Buffer | ||
writer := zlib.NewWriter(&b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytes.Buffer and zlib's writer are resettable.
At least, move them before loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can cache those on the compressedWriter
and reset both. This is a more significant gain that simply declaring them before the loop, as in a majority of use cases the loop is not actually hit.
compress.go
Outdated
return totalBytes, nil | ||
} | ||
|
||
func (cw *compressedWriter) writeComprPacketToNetwork(data []byte, uncomprLength int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name seems bit long. writeCompressedPacket()
may be better name.
payload
is better than data
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change data
to payload
. As for compressedWriter
's writeComprPacketToNetwork
, I was thinking of changing its name to writeToNetwork
for readability and brevity reasons. Having compressedWriter
have both a Write
and a WriteCompressedPacket
function would not clarify the difference between the two functions -- namely, that it is writeComprPacketToNetwork
does the actual network write, whereas Write
is the top level function that does all the needed actions.
compress.go
Outdated
} | ||
|
||
func (cw *compressedWriter) writeComprPacketToNetwork(data []byte, uncomprLength int) error { | ||
data = append(cw.header, data...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid allocation, you can require 7byte space is reserved in payload to caller.
Caller can reserve it like:
cHeader := make([]byte, 7)
var cBuff = new(bytes.Buffer)
zw := zlib.NewWriter(cBuff)
...
cBuff.Reset()
cBuff.Write(cHeader)
zw.Reset(cBuff)
...
FYI, current benchmark is using too small data to know zlib performance. |
ping This pull request needs some changes (and a rebase to resolve the conflicts) |
I've made all changes suggested above except for changing the Benchmark tests. I wanted to ask how it would be recommended I best do this. Generally, the utility of the compression becomes apparent only when several rows of information are received at once. However, I'm not sure how I would do this with the go-sql-driver, as it appears that rows are obtained one by one using a cursor. Would you suggest a test where only one row is looked at but it has a very large string within it that row? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general concept seems fine to me and with some minor changes it could be merged.
But I still think that with smarter abstraction the implementation could integrate a bit better with the existing code.
Anyway, thank you for working on this long missing feature.
connection.go
Outdated
sequence uint8 | ||
compressionSequence uint8 | ||
parseTime bool | ||
reader packetReader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move all interfaces (reader
and writer
to the top (after netConn
)
dsn.go
Outdated
@@ -57,6 +57,7 @@ type Config struct { | |||
MultiStatements bool // Allow multiple statements in one query | |||
ParseTime bool // Parse time values to time.Time | |||
RejectReadOnly bool // Reject read-only connections | |||
Compression bool // Compress packets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The struct fields should have the same name as the DSN param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
compress.go
Outdated
) | ||
|
||
|
||
type packetReader interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used also when no compression is used. It is probably better to move this interface to connection.go
return &compressedWriter{ | ||
connWriter: connWriter, | ||
mc: mc, | ||
zw: zlib.NewWriter(new(bytes.Buffer)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it really need to introduce two extra buffers?
Would it make sense to combine the reader and writer (e.g. as a virtual buffer) and share some data, like the mc
and the buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been a while since I've looked at my own code so I may be wrong about this, but between compressedReader
and compressedWriter
, it appears the only thing they are sharing is a reference to mc
? The reader uses a slice, but that slice needs to act as memory between reads so I don't think it would make sense to share it. Please let me know what you meant by this comment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reader uses a slice, but that slice needs to act as memory between reads so I don't think it would make sense to share it
That answers my question. Currently we only have one buffer for each connection, which is shared between reads and writes.
compress.go
Outdated
|
||
defer cr.zr.Close() | ||
|
||
//use existing capacity in bytesBuf if possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space missing
Please also run |
@methane: |
Please rebase your own master branch on top of
|
No need to rebase, since we use "Squash and merge". |
I run again the benchmark on more stable Linux machine. I can't see significant difference:
|
Cr4 -- removing buffer from mysqlConn
How is it going now? |
Any updates on this? What's left to do, and is there anything I can help with? |
Any updates on this PR ? |
thanks for all your great efforts. |
Hi! Any updates on this PR? |
Am also interested in this PR 😄 |
Hello everyone -- I will likely not be picking this up again. Things have gotten hectic in my life and I can't give this the proper attention. However, anyone is welcome to pick it up where I left off! |
continued in #1487 |
Description
Implemented the SQL compression protocol. This new feature is used by using "compress=1".
Checklist