Skip to content

LOAD DATA LOCAL Broken? #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
srstrickland opened this issue May 4, 2016 · 5 comments
Closed

LOAD DATA LOCAL Broken? #29

srstrickland opened this issue May 4, 2016 · 5 comments

Comments

@srstrickland
Copy link

I'm having issues getting LOAD DATA LOCAL INFILE... commands to run, getting the following error:

  File "<snip>/tornado_mysql/connections.py", line 1188, in _get_descriptions
    for i in range_type(self.field_count):
TypeError: 'NoneType' object cannot be interpreted as an integer

I checked and there is a test file for this functionality (https://github.com/PyMySQL/Tornado-MySQL/blob/master/tornado_mysql/tests/test_load_local.py), but the test doesn't run. Looks like they are old and I had to sprinkle yield statements to even get them to work w/ the tornado async interfaces. I moved those tests into a file that was being picked up (didn't investigate too much why the tests weren't running from that file), and indeed got the same errors as when I ran the queries manually.

Going against MySQL Server 5.5 (specifically Server version: 5.5.47-0ubuntu0.14.04.1).

Happy to provide more details and/or help fix. But this is a major roadblock for me and would appreciate any feedback / collaboration.

Full stacktrace:

2016-05-03 22:08:24,317 (unknown.audience) managepy.manage[40128]: CRITICAL App failed with status
Traceback (most recent call last):
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/managepy/manage.py", line 230, in run
    return_code = yield result
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 807, in run
    value = future.result()
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/concurrent.py", line 209, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 810, in run
    yielded = self.gen.throw(*sys.exc_info())
  File "src/audience/shared/commands/test.py", line 51, in run
    yield cxn.execute(query)
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 807, in run
    value = future.result()
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/concurrent.py", line 209, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 810, in run
    yielded = self.gen.throw(*sys.exc_info())
  File "/Users/scott/dev/tbcode/svc-audience/src/audience/shared/db_async.py", line 126, in execute
    yield cursor.execute(query)
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 807, in run
    value = future.result()
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/concurrent.py", line 209, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 810, in run
    yielded = self.gen.throw(*sys.exc_info())
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado_mysql/cursors.py", line 132, in execute
    yield self._query(query)
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 807, in run
    value = future.result()
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/concurrent.py", line 209, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 810, in run
    yielded = self.gen.throw(*sys.exc_info())
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado_mysql/cursors.py", line 389, in _query
    yield conn.query(q, unbuffered=True)
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 807, in run
    value = future.result()
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/concurrent.py", line 209, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 810, in run
    yielded = self.gen.throw(*sys.exc_info())
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado_mysql/connections.py", line 731, in query
    yield self._read_query_result(unbuffered=unbuffered)
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 807, in run
    value = future.result()
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/concurrent.py", line 209, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 810, in run
    yielded = self.gen.throw(*sys.exc_info())
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado_mysql/connections.py", line 858, in _read_query_result
    yield result.init_unbuffered_query()
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 807, in run
    value = future.result()
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/concurrent.py", line 209, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 810, in run
    yielded = self.gen.throw(*sys.exc_info())
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado_mysql/connections.py", line 1084, in init_unbuffered_query
    yield self._get_descriptions()
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 807, in run
    value = future.result()
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/concurrent.py", line 209, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado/gen.py", line 212, in wrapper
    yielded = next(result)
  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado_mysql/connections.py", line 1188, in _get_descriptions
    for i in range_type(self.field_count):
TypeError: 'NoneType' object cannot be interpreted as an integer
@srstrickland
Copy link
Author

sample program:

#!/usr/bin/env python
from __future__ import print_function
import os

from tornado import ioloop, gen
import tornado_mysql

@gen.coroutine
def execute(cxn, query):
    cur = cxn.cursor()
    try:
        print('RUNNING QUERY: {}'.format(query))
        yield cur.execute(query)
        print('DESC: {}'.format(cur.description))
        for row in cur.fetchall():
            print(' ROW: {}'.format(row))
    finally:
        cur.close()

@gen.coroutine
def execute2(cxn, query):
    cur = cxn.cursor(tornado_mysql.cursors.SSCursor)
    try:
        print('RUNNING QUERY: {}'.format(query))
        yield cur.execute(query)
        print('DESC: {}'.format(cur.description))
        for row in (yield cur.fetchall()):
            print(' ROW: {}'.format(row))
    finally:
        cur.close()

@gen.coroutine
def main():
    cxn = yield tornado_mysql.connect(host='192.168.200.10', port=3306, user='root', passwd='root',
                                      db='scott', local_infile=True)
    table='tmp_infile'
    file='./file.dat'
    tornado_mysql.connections.DEBUG = True
    try:
        yield execute(cxn, 'CREATE TABLE IF NOT EXISTS {table} (id INT NOT NULL, value NUMERIC)'.format(table=table))
        yield execute(cxn, 'TRUNCATE TABLE {table}'.format(table=table))
        with open(file, 'w') as f:
            f.write('1,1.1\n')
            f.write('2,2.2\n')
            f.write('3,3.3\n')
            f.write('4,4.4\n')
            f.write('5,5.5\n')
        query = "LOAD DATA LOCAL INFILE '{file}' INTO TABLE {table} FIELDS TERMINATED BY ',' (id, value)".format(
            file=file,
            table=table
        )
        yield execute2(cxn, query)  # CHANGE TO execute(...) FOR BUFFERED
    finally:
        cxn.close()


ioloop.IOLoop.current().run_sync(main)

In the above form, I get errors of the form:

  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado_mysql/connections.py", line 1187, in _get_descriptions
    for i in range_type(self.field_count):
TypeError: 'NoneType' object cannot be interpreted as an integer

When I change execute2 to execute (which uses the default cursor / buffers all the results), I get:

  File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado_mysql/connections.py", line 1222, in send_data
    if not self.connection.socket:
AttributeError: 'Connection' object has no attribute 'socket'

srstrickland added a commit to tubular/Tornado-MySQL that referenced this issue May 4, 2016
* Fixes for both "buffered" & "unbuffered" cursor types
* Registered `test_load_local` to run with tests
* Refactored `test_load_local` tests to work with the tornado framework

More information here:
PyMySQL#29
@methane
Copy link
Member

methane commented May 5, 2016

BTW, Why do you use Tornado-MySQL?
Why don't you use threadpool and PyMySQL or mysqlclient-python?

In general, threadpool is better than async for MySQL. Especially, file
I/O.

I want to remove LOAD LOCAL IN FILE support instead of fix.
This library is for special purpose, not for general purpose.
And I want to minimize maintenance cost of this project.
2016/05/05 午前3:02 "Scott Strickland" [email protected]:

sample program:

#!/usr/bin/env python
from future import print_function
import os

from tornado import ioloop, gen
import tornado_mysql

@gen.coroutine
def execute(cxn, query):
cur = cxn.cursor()
try:
print('RUNNING QUERY: {}'.format(query))
yield cur.execute(query)
print('DESC: {}'.format(cur.description))
for row in cur.fetchall():
print(' ROW: {}'.format(row))
finally:
cur.close()

@gen.coroutine
def execute2(cxn, query):
cur = cxn.cursor(tornado_mysql.cursors.SSCursor)
try:
print('RUNNING QUERY: {}'.format(query))
yield cur.execute(query)
print('DESC: {}'.format(cur.description))
for row in (yield cur.fetchall()):
print(' ROW: {}'.format(row))
finally:
cur.close()

@gen.coroutine
def main():
cxn = yield tornado_mysql.connect(host='192.168.200.10', port=3306, user='root', passwd='root',
db='scott', local_infile=True)
table='tmp_infile'
file='./file.dat'
tornado_mysql.connections.DEBUG = True
try:
yield execute(cxn, 'CREATE TABLE IF NOT EXISTS {table} (id INT NOT NULL, value NUMERIC)'.format(table=table))
yield execute(cxn, 'TRUNCATE TABLE {table}'.format(table=table))
with open(file, 'w') as f:
f.write('1,1.1\n')
f.write('2,2.2\n')
f.write('3,3.3\n')
f.write('4,4.4\n')
f.write('5,5.5\n')
query = "LOAD DATA LOCAL INFILE '{file}' INTO TABLE {table} FIELDS TERMINATED BY ',' (id, value)".format(
file=file,
table=table
)
yield execute2(cxn, query) # CHANGE TO execute(...) FOR BUFFERED
finally:
cxn.close()

ioloop.IOLoop.current().run_sync(main)

In the above form, I get errors of the form:

File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado_mysql/connections.py", line 1187, in _get_descriptions
for i in range_type(self.field_count):
TypeError: 'NoneType' object cannot be interpreted as an integer

When I change execute2 to execute (which uses the default cursor /
buffers all the results), I get:

File "/Users/scott/.virtualenvs/audience/lib/python3.4/site-packages/tornado_mysql/connections.py", line 1222, in send_data
if not self.connection.socket:
AttributeError: 'Connection' object has no attribute 'socket'


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#29 (comment)

@srstrickland
Copy link
Author

  1. To answer your question, I have a project w/ several async applications that use Tornado's IOLoop. Use of LOAD LOCAL INFILE... commands is not widespread, but it is paramount to the project and I don't want to use multiple MySQL client libraries in the same project and mix their uses.
  2. I think you'll agree that there are circumstances that warrant an async solution rather than using threadpools. Specifically for LOAD LOCAL INFILE commands, maybe not, but see my first point that I need one library that can handle everything.
  3. I've done all the work; see Fixed LOAD DATA LOCAL INFILE commands #30. As part of the PR, I even fixed, enabled, and verified the tests that were originally intended for the vanilla PyMySQL.
  4. What you have in master is broken. It suggests to even a casual reader that LOAD LOCAL INFILE is supported (there is even a flag to enable/disable support for it), but what you get are exceptions. This is worse than not supporting it. Whether you choose to remove it at a future time is up to you (but I don't see why you would, if it were functional)
  5. Can you describe exactly why & how this library is "special purpose" (as it should be outlined in the project's readme)? If designed for a special use case, that should be clearly documented... along with why certain aspects of basic MySQL functionality (i.e. LOAD LOCAL INFILE) are not supported (other than "it's too hard"). Why are you opposed to accepting work from the community to make it better / more "general purpose"?

I apologize if any of this comes off as confrontational... that was not my intention and I am willing to collaborate with you now & in the future. I promise to have the best intentions 😄

@methane
Copy link
Member

methane commented May 5, 2016

I think you'll agree that there are circumstances that warrant an async solution rather than using threadpools.

Why I start this project is just my personal interest.
Async solution is better than thread in case of handling massive idle connections.
When writing websocket chat, async is better than thread.
But massive idle connections is not good for MySQL.
And async solution is not suitable to file I/O too. For example, libuv uses threadpool for file I/O.

I need one library that can handle everything.

If so, I recommend thread.

I've done all the work; see #30. As part of the PR, I even fixed, enabled, and verified the tests that were originally intended for the vanilla PyMySQL.

Good job about it. But I want to minimize what I maintain.

Can you describe exactly why & how this library is "special purpose" (as it should be outlined in the project's readme)?

Because

  • Ineffective than other solutions.
  • We use this only for very limited use cases (Simple CRUD).
  • There are many important bugs fixed in upstream PyMySQL which I haven't port.

@methane
Copy link
Member

methane commented May 5, 2016

Plus

  • Keeping API backward compatibility is hard. I will be required to switch normal function to coroutine for fix bugs, then all users should add yield from where calling the API.

@methane methane closed this as completed May 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants