Skip to content

read_sql: TypeError: Argument 'rows' has incorrect type #11522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
PetitLepton opened this issue Nov 5, 2015 · 5 comments
Closed

read_sql: TypeError: Argument 'rows' has incorrect type #11522

PetitLepton opened this issue Nov 5, 2015 · 5 comments
Labels
IO SQL to_sql, read_sql, read_sql_query
Milestone

Comments

@PetitLepton
Copy link
Contributor

Hi,
I encountered a problem when using read_sql with a remote mySQL database.

After loading the modules,

import pymysql
import pandas as pd
import sqlalchemy

I create both a pymysql connector and a sqlalchemy engine with pymysql for the same database.

connector = pymysql.connect(...)
engine = sqlalchemy.create_engine('mysql+pymysql://...')

Everything is fine with the engine

df = pd.DataFrame()
query = " SELECT id, created_at FROM themes"
for chunk in pd.read_sql(query, engine, chunksize=100):
    df = df.append(chunk)

but fails with the connector

df = pd.DataFrame()
query = " SELECT id, created_at FROM themes"
for chunk in pd.read_sql(query, connector, chunksize=100):
    df = df.append(chunk)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-9a0ee2188a78> in <module>()
      1 df = pd.DataFrame()
      2 query = " SELECT id, created_at FROM themes"
----> 3 for chunk in pd.read_sql(query, connector_mySQL, chunksize=100):
      4     df = df.append(chunk)

/usr/local/lib/python3.5/site-packages/pandas/io/sql.py in _query_iterator(cursor, chunksize, columns, index_col, coerce_float, parse_dates)
   1563                 yield _wrap_result(data, columns, index_col=index_col,
   1564                                    coerce_float=coerce_float,
-> 1565                                    parse_dates=parse_dates)
   1566 
   1567     def read_query(self, sql, index_col=None, coerce_float=True, params=None,

/usr/local/lib/python3.5/site-packages/pandas/io/sql.py in _wrap_result(data, columns, index_col, coerce_float, parse_dates)
    135 
    136     frame = DataFrame.from_records(data, columns=columns,
--> 137                                    coerce_float=coerce_float)
    138 
    139     _parse_date_columns(frame, parse_dates)

/usr/local/lib/python3.5/site-packages/pandas/core/frame.py in from_records(cls, data, index, exclude, columns, coerce_float, nrows)
    928         else:
    929             arrays, arr_columns = _to_arrays(data, columns,
--> 930                                              coerce_float=coerce_float)
    931 
    932             arr_columns = _ensure_index(arr_columns)

/usr/local/lib/python3.5/site-packages/pandas/core/frame.py in _to_arrays(data, columns, coerce_float, dtype)
   5211     if isinstance(data[0], (list, tuple)):
   5212         return _list_to_arrays(data, columns, coerce_float=coerce_float,
-> 5213                                dtype=dtype)
   5214     elif isinstance(data[0], collections.Mapping):
   5215         return _list_of_dict_to_arrays(data, columns,

/usr/local/lib/python3.5/site-packages/pandas/core/frame.py in _list_to_arrays(data, columns, coerce_float, dtype)
   5289 def _list_to_arrays(data, columns, coerce_float=False, dtype=None):
   5290     if len(data) > 0 and isinstance(data[0], tuple):
-> 5291         content = list(lib.to_object_array_tuples(data).T)
   5292     else:
   5293         # list of lists

TypeError: Argument 'rows' has incorrect type (expected list, got tuple)

Here are the versions of the modules

print(pymysql.__version__)
print(sqlalchemy.__version__)
print(pd.__version__)

0.6.7.None
1.0.9
0.17.0

Best regards,
Flavien.

@jorisvandenbossche
Copy link
Member

@PetitLepton Thanks for the report.

Can you show the output of?

cur = connector.cursor()
cur.execute("SELECT id, created_at FROM themes").fetchall()

@jorisvandenbossche jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Nov 5, 2015
@PetitLepton
Copy link
Contributor Author

@jorisvandenbossche Hi, this one works fine, I was using it so far

cur = connector.cursor()
cur.execute("SELECT id, created_at FROM themes")
cur.fetchall()
((1, datetime.datetime(2012, 12, 2, 4, 6, 17)),
 (2, datetime.datetime(2012, 12, 2, 4, 20, 56)),
 (4, datetime.datetime(2012, 12, 2, 4, 57, 39)),
 (5, datetime.datetime(2012, 12, 2, 4, 59, 31)),
 (6, datetime.datetime(2012, 12, 2, 5, 3, 7)),
 ...

@jorisvandenbossche
Copy link
Member

Thanks! As I suspected, the problem is that it returns a tuple of tuples instead of a list of tuples, and DataFrame.from_records fails on this:

In [1]: pd.DataFrame.from_records([(1,2), (3,4)])
Out[1]:
   0  1
0  1  2
1  3  4

In [2]: pd.DataFrame.from_records(((1,2), (3,4)))
...
c:\users\vdbosscj\scipy\pandas-joris\pandas\core\frame.pyc in _list_to_arrays(da
ta, columns, coerce_float, dtype)
   5300 def _list_to_arrays(data, columns, coerce_float=False, dtype=None):
   5301     if len(data) > 0 and isinstance(data[0], tuple):
-> 5302         content = list(lib.to_object_array_tuples(data).T)
   5303     else:
   5304         # list of lists

TypeError: Argument 'rows' has incorrect type (expected list, got tuple)

This has always been the case in pandas, so I am wondering if this is a change in pymysql

@jreback This is easily solved in the sql code by ensuring it is a list before passing to DataFrame.from_records, or do you think this is something from_records should handle itself?

@jorisvandenbossche jorisvandenbossche added this to the 0.17.1 milestone Nov 6, 2015
@jreback
Copy link
Contributor

jreback commented Nov 7, 2015

@jorisvandenbossche no, list-of-tuples is the specified type, tuple-of-tuple is not allowed as I think it can signify nested types that would require more parsing (its not allowed in the DataFrame constructor either).

I suppose these might be able to be relaxed but would be a separate issue

In [3]: DataFrame([('a','b')])
Out[3]: 
   0  1
0  a  b

In [4]: DataFrame((('a','b')))
PandasError: DataFrame constructor not properly called!

In [5]: DataFrame(np.array([('a','b')]))
Out[5]: 
   0  1
0  a  b

@jorisvandenbossche
Copy link
Member

OK, no problem. It is easy to ensure in the SQL code that it is a list of tuples and not a tuple of tuples

@jreback jreback modified the milestones: Next Major Release, 0.17.1 Nov 15, 2015
jorisvandenbossche added a commit that referenced this issue Dec 19, 2015
BUG: force list type for tuples from chunked sql table reads #11522
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.18.0, Next Major Release Dec 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

3 participants