Skip to content

packets: implemented compression protocol #649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 37 commits into from

Conversation

bLamarche413
Copy link

@bLamarche413 bLamarche413 commented Aug 11, 2017

Description

Implemented the SQL compression protocol. This new feature is used by using "compress=1".

Checklist

  • Code compiles correctly
  • Created tests which fail without the change (if possible)
  • All tests passing
  • Extended the README / documentation, if necessary
  • Added myself / the copyright holder to the AUTHORS file

@methane
Copy link
Member

methane commented Aug 12, 2017

Could you show benchmark result for compressed, uncompressed (new) and uncompressed (current)?

@bLamarche413
Copy link
Author

bLamarche413 commented Aug 16, 2017

For reference, here is the documentation I consulted.

Benchmarks:
Here are the old benchmark stats:

BenchmarkQueryContext/1-4  	   20000	     98673 ns/op	     729 B/op	      22 allocs/op
BenchmarkQueryContext/2-4  	   20000	     89651 ns/op	     728 B/op	      22 allocs/op
BenchmarkQueryContext/3-4  	   20000	     91343 ns/op	     729 B/op	      22 allocs/op
BenchmarkQueryContext/4-4  	   20000	     88788 ns/op	     729 B/op	      22 allocs/op
BenchmarkExecContext/1-4   	   20000	     97542 ns/op	     728 B/op	      22 allocs/op
BenchmarkExecContext/2-4   	   20000	     91997 ns/op	     728 B/op	      22 allocs/op
BenchmarkExecContext/3-4   	   20000	     89143 ns/op	     728 B/op	      22 allocs/op
BenchmarkExecContext/4-4   	   10000	    104725 ns/op	     728 B/op	      22 allocs/op
BenchmarkQuery-4           	   20000	     89338 ns/op	     748 B/op	      23 allocs/op
BenchmarkExec-4            	   20000	     81922 ns/op	      67 B/op	       3 allocs/op
BenchmarkRoundtripTxt-4    	   10000	    198159 ns/op	   15938 B/op	      16 allocs/op
BenchmarkRoundtripBin-4    	   10000	    162701 ns/op	     695 B/op	      18 allocs/op
BenchmarkInterpolation-4   	 2000000	       820 ns/op	     176 B/op	       1 allocs/op
BenchmarkParseDSN-4        	  200000	      9042 ns/op	    6896 B/op	      63 allocs/op
PASS
ok  	github.com/mysql	53.699s

Here are the new benchmark stats:
BenchmarkQueryCompression does the same thing as BenchmarkQuery, but with compression turned on.

BenchmarkQueryContext/1-4     	   10000	    100547 ns/op	     732 B/op	      22 allocs/op
BenchmarkQueryContext/2-4     	   20000	     91023 ns/op	     728 B/op	      22 allocs/op
BenchmarkQueryContext/3-4     	   20000	     90626 ns/op	     729 B/op	      22 allocs/op
BenchmarkQueryContext/4-4     	   10000	    100878 ns/op	     729 B/op	      22 allocs/op
BenchmarkExecContext/1-4      	   10000	    157982 ns/op	     728 B/op	      22 allocs/op
BenchmarkExecContext/2-4      	   10000	    104514 ns/op	     728 B/op	      22 allocs/op
BenchmarkExecContext/3-4      	   10000	    101186 ns/op	     728 B/op	      22 allocs/op
BenchmarkExecContext/4-4      	   20000	     99318 ns/op	     728 B/op	      22 allocs/op
BenchmarkQuery-4              	   10000	    105718 ns/op	     752 B/op	      23 allocs/op
BenchmarkQueryCompression-4   	   10000	    104518 ns/op	     972 B/op	      27 allocs/op
BenchmarkExec-4               	   20000	     82831 ns/op	      67 B/op	       3 allocs/op
BenchmarkRoundtripTxt-4       	   10000	    247710 ns/op	   15938 B/op	      16 allocs/op
BenchmarkRoundtripBin-4       	   10000	    163471 ns/op	     695 B/op	      18 allocs/op
BenchmarkInterpolation-4      	 1000000	      1303 ns/op	     176 B/op	       1 allocs/op
BenchmarkParseDSN-4           	  100000	     13798 ns/op	    6896 B/op	      63 allocs/op
PASS
ok  	github.com/mysql	46.113s

Thank you for your comments on my pull request!

connection.go Outdated
parseTime bool
strict bool
reader packetReader
writer io.Writer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to carry around both reader and writer as well as buf and netConn?
Can't those be unified into one interface?

This interface might also have some sort of sequence reset function. Then sequence and compressionSequence could be internal states of the respective struct implementing the interface.

Copy link
Author

@bLamarche413 bLamarche413 Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of your comment is that you suggest merging reader and buf, and writer and netConn. Because both buf and netConn are used in various capacities outside of simple reading and writing (e.g. setting deadlines or getting the complete buffer) , it's not easily possible to subsume buf and netConn into reader and writer respectively.

I wrote things thusly to avoid having the addition of compression significantly affect buffer.go. While this refactoring sounds like a great idea, I think that it may be out of the scope of this ticket, and would be better undertaken in a separate ticket with some design/brainstorming done beforehand. This would also avoid obfuscating the scope of this pull request, which was only the addition of the compression protocol.

In re: sequence numbers:
Because sequence and compression sequence need to be reset on the basis of higher level actions (eg. issuing a new mysql command), it seems that a writer-level abstraction should not know about specific mysql connection logic.

compress.go Outdated
}

func (cw *compressedWriter) writeComprPacketToNetwork(data []byte, uncomprLength int) error {
data = append(cw.header, data...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't avoid allocation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid allocation, you can require 7byte space is reserved in payload to caller.
Caller can reserve it like:

cHeader := make([]byte, 7)
var cBuff = new(bytes.Buffer)
zw := zlib.NewWriter(cBuff)
...
cBuff.Reset()
cBuff.Write(cHeader)
zw.Reset(cBuff)
...


maxPayloadLength := maxPacketSize - 4

for length >= maxPayloadLength {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why both of length := len(data) - 4 and maxPayloadLength := maxPacketSize - 4 required?
How about this?

maxPayload := maxPacketSize - 4

for len(data) > maxPayload {
...
    data = data[maxPayload:]
}

compress.go Outdated
lenSmall := len(dataSmall)

var b bytes.Buffer
writer := zlib.NewWriter(&b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes.Buffer and zlib's writer are resettable.
At least, move them before loop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can cache those on the compressedWriter and reset both. This is a more significant gain that simply declaring them before the loop, as in a majority of use cases the loop is not actually hit.

compress.go Outdated
return totalBytes, nil
}

func (cw *compressedWriter) writeComprPacketToNetwork(data []byte, uncomprLength int) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name seems bit long. writeCompressedPacket() may be better name.
payload is better than data too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change data to payload. As for compressedWriter's writeComprPacketToNetwork, I was thinking of changing its name to writeToNetwork for readability and brevity reasons. Having compressedWriter have both a Write and a WriteCompressedPacket function would not clarify the difference between the two functions -- namely, that it is writeComprPacketToNetwork does the actual network write, whereas Write is the top level function that does all the needed actions.

compress.go Outdated
}

func (cw *compressedWriter) writeComprPacketToNetwork(data []byte, uncomprLength int) error {
data = append(cw.header, data...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid allocation, you can require 7byte space is reserved in payload to caller.
Caller can reserve it like:

cHeader := make([]byte, 7)
var cBuff = new(bytes.Buffer)
zw := zlib.NewWriter(cBuff)
...
cBuff.Reset()
cBuff.Write(cHeader)
zw.Reset(cBuff)
...

@methane
Copy link
Member

methane commented Aug 22, 2017

FYI, current benchmark is using too small data to know zlib performance.

@julienschmidt
Copy link
Member

ping

This pull request needs some changes (and a rebase to resolve the conflicts)

@bLamarche413
Copy link
Author

I've made all changes suggested above except for changing the Benchmark tests. I wanted to ask how it would be recommended I best do this. Generally, the utility of the compression becomes apparent only when several rows of information are received at once. However, I'm not sure how I would do this with the go-sql-driver, as it appears that rows are obtained one by one using a cursor. Would you suggest a test where only one row is looked at but it has a very large string within it that row?

Copy link
Member

@julienschmidt julienschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general concept seems fine to me and with some minor changes it could be merged.

But I still think that with smarter abstraction the implementation could integrate a bit better with the existing code.

Anyway, thank you for working on this long missing feature.

connection.go Outdated
sequence uint8
compressionSequence uint8
parseTime bool
reader packetReader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move all interfaces (reader and writer to the top (after netConn)

dsn.go Outdated
@@ -57,6 +57,7 @@ type Config struct {
MultiStatements bool // Allow multiple statements in one query
ParseTime bool // Parse time values to time.Time
RejectReadOnly bool // Reject read-only connections
Compression bool // Compress packets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struct fields should have the same name as the DSN param

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

compress.go Outdated
)


type packetReader interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used also when no compression is used. It is probably better to move this interface to connection.go

return &compressedWriter{
connWriter: connWriter,
mc: mc,
zw: zlib.NewWriter(new(bytes.Buffer)),
Copy link
Member

@julienschmidt julienschmidt Oct 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really need to introduce two extra buffers?
Would it make sense to combine the reader and writer (e.g. as a virtual buffer) and share some data, like the mc and the buffer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been a while since I've looked at my own code so I may be wrong about this, but between compressedReader and compressedWriter, it appears the only thing they are sharing is a reference to mc? The reader uses a slice, but that slice needs to act as memory between reads so I don't think it would make sense to share it. Please let me know what you meant by this comment!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reader uses a slice, but that slice needs to act as memory between reads so I don't think it would make sense to share it

That answers my question. Currently we only have one buffer for each connection, which is shared between reads and writes.

compress.go Outdated

defer cr.zr.Close()

//use existing capacity in bytesBuf if possible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space missing

@julienschmidt
Copy link
Member

Please also run go fmt before pushing code. The CI builds are currently not successful since it finds some not properly formatted code.

@julienschmidt
Copy link
Member

@methane:
BenchmarkSmall-4: 86824 ns/op vs 99420 ns/op
That's roughly +15%. I wouldn't call that negligible.

@methane methane mentioned this pull request Apr 10, 2018
5 tasks
@dolmen
Copy link
Contributor

dolmen commented Jun 15, 2018

Please rebase your own master branch on top of origin/master instead of merging origin/master into yours:

git remote update
git rebase origin/master
git push -f

@methane
Copy link
Member

methane commented Jun 15, 2018

No need to rebase, since we use "Squash and merge".
"merge master" helps reviewing changes from last review.

@methane
Copy link
Member

methane commented Jul 30, 2018

I run again the benchmark on more stable Linux machine. I can't see significant difference:

benchmark            old ns/op     new ns/op     delta
BenchmarkSmall-6     56154         54070         -3.71%
BenchmarkPlain-6     2502622       2464086       -1.54%

@julienschmidt julienschmidt modified the milestones: v1.5.0, v1.6.0 Apr 24, 2019
@iambudi
Copy link

iambudi commented Apr 13, 2020

How is it going now?

@mhemmings
Copy link

Any updates on this? What's left to do, and is there anything I can help with?

@julienschmidt julienschmidt modified the milestones: v1.6.0, v1.7.0 Jun 1, 2020
@ghost
Copy link

ghost commented Aug 3, 2020

Any updates on this PR ?

@deltacat
Copy link

thanks for all your great efforts.
just want to know is there anyone still work on this important feature?

@kolkov
Copy link

kolkov commented Jan 2, 2021

Hi! Any updates on this PR?

@cassya-remitly
Copy link

Am also interested in this PR 😄

@bLamarche413
Copy link
Author

Hello everyone -- I will likely not be picking this up again. Things have gotten hectic in my life and I can't give this the proper attention. However, anyone is welcome to pick it up where I left off!

@methane
Copy link
Member

methane commented Mar 11, 2024

continued in #1487

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.