Skip to content

"LOAD DATA LOCAL INFILE" is much slower than mysql command line #364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
h12w opened this issue Aug 18, 2015 · 29 comments
Closed

"LOAD DATA LOCAL INFILE" is much slower than mysql command line #364

h12w opened this issue Aug 18, 2015 · 29 comments

Comments

@h12w
Copy link

h12w commented Aug 18, 2015

In my test, I am trying to transfer 149MB files (each small file is about 3MB).
With go-sql-driver/mysql, it takes more than 3 minutes, but with mysql command line, it only takes 16 seconds.

Here is my guess:

  1. It handles my concurrent request sequentially.
  2. It does not compress the traffic.
@methane
Copy link
Member

methane commented Aug 18, 2015

  1. It handles my concurrent request sequentially.

How do you configure?

  1. It does not compress the traffic.

Have you tried mysql commandline with and without compress?

@h12w
Copy link
Author

h12w commented Aug 18, 2015

I just open one DB connection and use it to do db.Exec("LOAD DATA ....") in 8 goroutines, also I have set:

    runtime.GOMAXPROCS(runtime.NumCPU())

I use the same strategy to call mysql command line (with compress option) like below:

cmd := exec.Command("mysql", "--login-path="+l.LoginPath, "--local-infile=TRUE", "--compress=TRUE", `--execute=LOAD DATA .....`)

@methane
Copy link
Member

methane commented Aug 18, 2015

How about mysql client without compress?
Is it same to go-sql-driver/mysql?

@h12w
Copy link
Author

h12w commented Aug 18, 2015

I have not tested that for this case, but I have compared mysql with/without compression with other data, the compression should not possibly reduce the time to 1/8.

@ryan0x44
Copy link

@h12w are each of your 8 goroutines writing to the same table?

@h12w
Copy link
Author

h12w commented Oct 26, 2015

Yes, I run the test based on one table. But the service I built periodically imports data into multiple tables of MySQL. Currently we use the mysql command line tool and the database engine is switched from InnoDB to MyISAM, which improves the performance significantly.

@ryan0x44
Copy link

Interesting. I did a test on ~570MB of data importing into a single table and mysqlimport took ~56s while my Go lang implementation (not using go routines) took around ~57s on average. Both tests used a socket connection, and the table uses MyISAM (truncated before each test run). Appears to be working ok for me for the single table use case.

@methane
Copy link
Member

methane commented Oct 26, 2015

I have not tested that for this case, but I have compared mysql with/without compression with other data, the compression should not possibly reduce the time to 1/8.

It is important to compare mysql client and go-sql-driver/mysql.
Without this, we can't know how much performance difference is in your environment.

@h12w
Copy link
Author

h12w commented Oct 26, 2015

@ryandjurovich, I would have the same result with your configuration. The difference happens when you reuse the same DB connection and load multiple data files simultaneously in multiple goroutines. Please note that a DB connection is implemented as a connection pool that should allows multiple actual DB connections.

If I have an 8-core CPU, I would expect about 7-8 seconds loading speed for 570MB data in a 1G Ethernet environment.

@methane
Copy link
Member

methane commented Oct 26, 2015

@h12w Have you configured DB.SetMaxIdleConns ?

@h12w
Copy link
Author

h12w commented Oct 26, 2015

SetMaxIdleConns is not set.

@methane
Copy link
Member

methane commented Oct 26, 2015

I'm sorry. SetMaxIdleConns shoudn't affect since mysql cli doesn't pool connections.

Could you get some stackdump for two programs?
To get stackdump:

import  (
    "net"
    _ "net/http/pprof"
)

func init() {
    go http.ListenAndServe(":3000", nil)
}

and

for i in $(seq 1 100)
do
curl http://127.0.0.1:3000/debug/pprof/goroutines?debug=1 > dump.$i
sleep 0.3
done

@h12w
Copy link
Author

h12w commented Oct 27, 2015

Multiple process instances of mysql cli are called concurrently, even if mysql cli doesn't pool connections.

@methane
Copy link
Member

methane commented Oct 27, 2015

How many concurrency CLI version use? 8?

@methane
Copy link
Member

methane commented Oct 27, 2015

Anyway, I think stackdump is best information you can provide us except reproducible example.

@julienschmidt
Copy link
Member

First of all a working test (as in a small, complete program) is missing. We have no idea what you are really doing there.

The INFILE code is quite simple. The server sends a request, we check if the file is allowed to be transmitted and then just send the bytes in multiple packets to the server. I don't see what we could have done terrible wrong there.
The only thing we could reevaluate is the following part:

mysql/infile.go

Line 157 in d4c1329

if ioErr := mc.writePacket(data[:4+n]); ioErr != nil {

We just send whatever Read returns. In the worst case it's just a slice of the length 1, which could produce overhead. But I think that's also very unlikely. Usually Read fills the whole buffer unless there is a good reason not to.

@h12w
Copy link
Author

h12w commented Jan 14, 2016

Thanks. I will try to write a test program for comparison next week.

@methane
Copy link
Member

methane commented Feb 22, 2016

I found important difference between libmysqlclient and go-sql-driver/mysql.

libmysqlclient uses net_buffer_length buffer size. (default: 16KB).

In case of go-sql-driver/mysql, it uses max_packet_allowed system variable (default: 1MB), or maxPacketSize defined in const.go (1 << 24 = 16MB).

Linux's typical readahed size is 128KB. Larger buffer size may makes inefficient.

Additionally,

mysql/infile.go

Line 142 in d4c1329

err = fmt.Errorf("Local File '%s' too large: Size: %d, Max: %d", name, fileSize, mc.maxPacketAllowed)
limits file size by maxPacketsAllowed. But LOAD LOCAL INFILE sends file by multiple packets. I think this limitation should be removed.

@methane
Copy link
Member

methane commented Feb 24, 2016

@h12w Could you test #424?

@h12w
Copy link
Author

h12w commented Feb 25, 2016

Thank you for the efforts. I am planing to use this driver instead of mysql command line tool recently. I will try to test it in the real environment then.

@methane
Copy link
Member

methane commented Feb 26, 2016

@h12w Could you take dstat 3 > dstat.log or vmstat 3 > vmstat.log while executing LOAD LOCAL INFILE?
dstat is preferred since it includes network trafic.

@h12w
Copy link
Author

h12w commented Jul 3, 2016

Our production code has switched from mysql command line tool to go-sql-driver now, and here is the dstat.log. I am not sure if it is useful because there are other workloads running on the same server besides the MySQL writer:

dstat.log:

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  2   1  97   0   0   0|  24k  148k|   0     0 |   0     0 | 616  3688 
  2   1  97   0   0   0|   0   167k| 239k  214k|   0     0 |2259  3591 
  2   1  95   2   0   0|   0   115k| 228k  157k|   0     0 |2530  3901 
  3   1  88   8   0   0|   0   144k| 259k  208k|   0     0 |2735  4150 
  2   1  96   1   0   0|   0   160k| 214k  176k|   0     0 |2231  3547 
  2   1  97   0   0   0|   0   275k| 190k  177k|   0     0 |2512  3915 
  2   0  97   0   0   0|   0   447k| 215k  161k|   0     0 |2282  3519 
  2   1  96   0   0   0|   0   284k| 250k  212k|   0     0 |2635  4131 
  2   1  97   0   0   0|   0   128k| 196k  195k|   0     0 |2409  3726 
  1   1  98   0   0   0|   0   185k| 177k  125k|   0     0 |2099  3303 
  2   1  82  15   0   1|   0   123k| 222k  184k|   0     0 |2449  3808 
  2   1  97   0   0   0|   0   152k| 205k  187k|   0     0 |2468  3900 
  2   1  97   1   0   0|   0   172k| 199k  149k|   0     0 |2127  3306 
  1   1  95   2   0   0|   0   128k| 181k  133k|   0     0 |2135  3324 
  2   1  97   1   0   0|   0   179k| 202k  169k|   0     0 |2087  3315 
  2   1  97   0   0   0|   0   513k| 174k  166k|   0     0 |2062  3046 
  1   1  97   1   0   0|   0   171k| 197k  147k|   0     0 |2120  3134 
  2   1  98   0   0   0|   0   217k| 192k  186k|   0     0 |2199  3492 
  2   0  97   0   0   0|   0   157k| 249k  200k|   0     0 |2126  3358 
  2   0  97   0   0   0|   0   151k| 195k  139k|   0     0 |2023  3135 
  3   1  95   1   0   0|   0   175k| 246k  189k|   0     0 |2676  3812 
  2   1  97   0   0   0|   0   173k| 187k  155k|   0     0 |2127  3302 
  1   1  97   1   0   0|   0   220k| 194k  180k|   0     0 |2201  3477 
  3   1  96   0   0   0|   0   221k| 258k  193k|   0     0 |2931  4602 
  2   1  97   1   0   0|   0   193k| 238k  198k|   0     0 |2321  3588 
  2   0  96   2   0   0|   0   248k| 224k  189k|   0     0 |2280  3599 
  2   1  94   3   0   0|1365B  469k| 238k  176k|   0     0 |2693  4077 
  2   1  95   1   0   0|   0   276k| 199k  185k|   0     0 |2764  4249 
  2   1  96   1   0   0|   0   163k| 214k  200k|   0     0 |2475  3900 
  2   0  97   1   0   0|   0   164k| 248k  174k|   0     0 |2211  3543 
  2   1  95   2   0   0|   0    81k| 202k  157k|   0     0 |2334  3677 
  2   1  97   0   0   0|   0   197k| 213k  170k|   0     0 |2316  3841 
  2   1  96   2   0   0|   0   116k| 189k  181k|   0     0 |2080  3248 
  2   1  95   1   0   0|   0   155k| 230k  169k|   0     0 |2423  3914 
  2   1  97   0   0   0|   0   209k| 232k  186k|   0     0 |2227  3520 
  2   1  97   1   0   0|   0   225k| 184k  161k|   0     0 |1961  3138 
  2   1  97   0   0   0|   0   283k| 202k  157k|   0     0 |2184  3306 
  2   1  97   1   0   0|   0   221k| 221k  170k|   0     0 |2397  3635 
  2   1  96   1   0   0|   0   299k| 176k  158k|   0     0 |1907  3055 
  1   1  94   4   0   0|   0   187k| 195k  146k|   0     0 |2017  3187 
  3   1  94   2   0   0|   0   193k| 232k  191k|   0     0 |2517  3776 
  2   0  92   5   0   0|   0    84k| 253k  199k|   0     0 |2224  3523 
  1   1  97   1   0   0|   0   165k| 238k  234k|   0     0 |2304  3587 
  2   1  95   2   0   0|   0    80k| 252k  164k|   0     0 |2598  4131 
  2   1  97   1   0   0|   0   185k| 218k  178k|   0     0 |2155  3402 
  2   1  96   1   0   0|   0   157k| 241k  202k|   0     0 |2270  3467 
  3   1  96   0   0   0|   0   389k| 211k  185k|   0     0 |2719  4042 
  2   0  92   5   0   0|   0   165k| 234k  177k|   0     0 |2320  3524 
  2   1  96   0   0   1|   0   253k| 222k  195k|   0     0 |2712  4133 
  1   0  98   0   0   0|   0   181k| 180k  166k|   0     0 |2096  3326 
  2   0  98   0   0   0|   0   312k| 205k  137k|   0     0 |2327  3640 
  2   1  97   0   0   0|   0   235k| 212k  183k|   0     0 |2454  3716 
  2   1  96   1   0   0|   0   140k| 217k  200k|   0     0 |2802  4247 
  2   1  96   1   0   0|   0   204k| 216k  154k|   0     0 |2615  4125 
  2   0  97   1   0   0|   0   137k| 185k  161k|   0     0 |2163  3330 
  2   0  98   0   0   0|   0   191k| 211k  185k|   0     0 |2121  3365 

@methane
Copy link
Member

methane commented Jul 3, 2016

@h12w Thank you. Is this log on MySQL server? or client?

@h12w
Copy link
Author

h12w commented Jul 3, 2016

It is a log from the client side, and the MySQL server is on another machine.

@methane
Copy link
Member

methane commented Jul 4, 2016

How was the speed?
Is it still slower than mysql cli?

@h12w
Copy link
Author

h12w commented Jul 4, 2016

The performance is good enough so we have made the switch.
Sorry we did not do any further benchmark for comparison.

@methane
Copy link
Member

methane commented Jul 4, 2016

Relieved to hear that. Thank you for testing.

@methane
Copy link
Member

methane commented Jul 4, 2016

Can this issue be closed?

@h12w
Copy link
Author

h12w commented Jul 4, 2016

Let me just close it. I will report if any further performance issue found. Thank you very much!

@h12w h12w closed this as completed Jul 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants