Skip to content

Fixed LOAD DATA LOCAL INFILE commands #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

srstrickland
Copy link

  • Fixes for both "buffered" & "unbuffered" cursor types
  • Registered test_load_local to run with tests
  • Refactored test_load_local tests to work with the tornado framework

More information here:
#29

* Fixes for both "buffered" & "unbuffered" cursor types
* Registered `test_load_local` to run with tests
* Refactored `test_load_local` tests to work with the tornado framework

More information here:
PyMySQL#29
@methane
Copy link
Member

methane commented May 5, 2016

In LoadLocal.send_data:

                     chunk = open_file.read(chunk_size)

This line will block entire your ioloop.

@methane
Copy link
Member

methane commented May 5, 2016

In same function:

                     self.connection._write_bytes(packet)

This only append packet into write buffer.
You should wait here.

@methane
Copy link
Member

methane commented May 5, 2016

As you can see, writing proper async code is hard job.
Wrong code works for simple case, including most unit test.
This is why I can't recommend this project easily, and I want to minimize this library.

@srstrickland srstrickland force-pushed the fix/load_data_local branch from af71cc7 to 4dfd091 Compare May 6, 2016 00:33
@srstrickland
Copy link
Author

Fair enough. My initial goal was just to get it functional, not necessarily to vet all the "async" portions of it. But I've taken your suggestions & revised. You can obviously take it or leave it. Thanks

@methane
Copy link
Member

methane commented May 6, 2016

FYI, current chunk_size is too large. 2**24 may be larger than OS readahead size.
I think 16KB is optimal.
While waiting data sent to socket, OS fills readahead buffer background. If next read is smaller than
filled readahead buffer, it can be nonblocking.

@methane methane closed this May 6, 2016
@srstrickland
Copy link
Author

True... but in reality the stream is going to be doing nonblocking reads against the file descriptor (PipeIOStream sets O_NONBLOCK)... and no OS is going to give back that much data in a single nonblocking read, which governs the chunk size that gets handled. That said, it probably would be better to take control of that more tightly.

@methane
Copy link
Member

methane commented May 6, 2016

I can't understand what stream you mean.
Your code read from file (not IOStream) and write it to stream (TCP or Unix socket stream).
Where did you use PipeIOStream??

@methane
Copy link
Member

methane commented May 6, 2016

See also: go-sql-driver/mysql#424

@srstrickland
Copy link
Author

After you made the comment about the blocking file read, I changed to use tornado.iostream.PipeIOStream:

https://github.com/Tubular/Tornado-MySQL/blob/fix/load_data_local/tornado_mysql/connections.py#L1253-1254

@methane
Copy link
Member

methane commented May 6, 2016

If you want to use PipeIOStream, you should use pipe. Maybe, "/usr/bin/cat /path/to/file" (This is not works on Windows, off course).

File I/O can't be non blocking. That's why libuv uses threadpool for file I/O.

@srstrickland
Copy link
Author

Yes, my mistake. The truth is that for my application it doesn't matter much, since the portions that use LOAD DATA INFILE are not time critical and are not simultaneously serving, say, user requests. I had initially written a version that uses a ThreadPoolExecutor behind the scenes to do the file reads, but for python27 that would've required the futures pip package (native in py3). So I punted on that for now. I see that you removed infile support, so I'm assuming you won't want to incorporate my changes, even if I do get nonblocking reads in. Ping me if you ever change your mind.

@methane
Copy link
Member

methane commented May 6, 2016

FYI, aio-libs/aiomysql is actively maintained. Tornado can be used with asyncio now.
I recommend you to use Python 3.5 and aiomysql, not Python 2.7 and this.

@srstrickland
Copy link
Author

That's the eventual plan, actually... but for now we have to support py2.7 & py3.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants