sql values returned incorrectly. Worked in 0.10.0 doesn't in 0.11.0 #3763

JasonCrowe · 2013-06-05T18:31:32Z

####### Code

name_q = "SELECT cust_num, name from custaddr"
result = CONNECTION.execute(name_q)
rows = result.fetchall()

COLUMNS = ['cust_num', 'name']
name_df = DataFrame(rows, columns=COLUMNS)

####### Code end

This code worked without fail on 0.10, but started failing as soon as I updated to 0.11.0. It gives me the following error.

ValueError Traceback (most recent call last)
in ()
5
6 COLUMNS = ['cust_num', 'name']
----> 7 name_df = DataFrame(rows, columns=COLUMNS)
8
9 #name_df[:5]

c:\Python27\lib\site-packages\pandas\core\frame.pyc in init(self, data, index, columns, dtype, copy)
426 else:
427 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 428 copy=copy)
429 else:
430 mgr = self._init_ndarray(data, index, columns, dtype=dtype,

c:\Python27\lib\site-packages\pandas\core\frame.pyc in _init_ndarray(self, values, index, columns, dtype, copy)
556 columns = _ensure_index(columns)
557
--> 558 return create_block_manager_from_blocks([ values.T ], [ columns, index ])
559
560 def _wrap_array(self, arr, axes, copy=False):

c:\Python27\lib\site-packages\pandas\core\internals.pyc in create_block_manager_from_blocks(blocks, axes)
1814 blocks = [ getattr(b,'values',b) for b in blocks ]
1815 tot_items = sum(b.shape[0] for b in blocks)
-> 1816 construction_error(tot_items,blocks[0].shape[1:],axes)
1817
1818 def create_block_manager_from_arrays(arrays, names, axes):

c:\Python27\lib\site-packages\pandas\core\internals.pyc in construction_error(tot_items, block_shape, axes)
1797 raise ValueError("Shape of passed values is %s, indices imply %s" % (
1798 tuple(map(int, [tot_items] + list(block_shape))),
-> 1799 tuple(map(int, [len(ax) for ax in axes]))))
1800
1801

ValueError: Shape of passed values is (1, 50178), indices imply (2, 50178)

If I remove the named columns with the following code:

####### Code

name_q = "SELECT cust_num, name from custaddr"
result = CONNECTION.execute(name_q)
rows = result.fetchall()

COLUMNS = ['cust_num', 'name']

name_df = DataFrame(rows)#, columns=COLUMNS)

name_df[:5]

####### Code end

I get:

0
0 [C001505, @@stokes GENERAL STORE]
1 [C002815, 3R DISTRIBUTING LLC]
2 [C000032, 4 C'S EXPERT SHARPENING CO]
3 [C000033, 422 SHOES]
4 [C000034, 5 L ENTERPRISES]

in 0.10 this would return two columns of data rather than one column of list.

cpcloud · 2013-06-06T03:43:19Z

to make sure that you don't have one row that is longer or shorter than the rest try the following (git master (v0.11.1 soon-to-be-released) assigns None or NaN depending on the type when this happens, I can't remember if that was different in 0.11.0)

import numpy as np
np.all(np.array(map(len, rows)) == len(rows[0]))

what is the output?

cpcloud · 2013-06-06T03:47:46Z

if you get True, then this might be a bug that has been fixed: I can't replicate with what I think your data look like. I'm assuming a list of tuples of strings or a list of lists of strings. Is that correct?

JasonCrowe · 2013-06-06T12:13:44Z

The output was true. I am returning a record set from SQL Server that is made up of two string fields.

I have downgraded to 0.10.1 and the problem goes away.

cpcloud · 2013-06-06T19:08:50Z

hmm ok i will take a look

cpcloud · 2013-06-06T19:33:44Z

@JasonCrowe Okay, here's what i'm doing to try and reproduce this, being that you probably cannot give me the data. Let me know if the sql that creates your tables is different, e.g., did I use the correct types?

import sqlite3 as sql
from pandas.util.testing import rands # random strings
nameq = 'SELECT cust_num, name from custaddr'
name_create = 'CREATE TABLE custaddr (cust_num text, name text)'
conn = sql.connect(':memory:')
cursor = conn.cursor()
cursor.execute(name_create)
conn.commit()
strlen = 10
numrecs = 1000
data = [(rands(strlen), rands(strlen)) for _ in xrange(numrecs)]
cursor.executemany('INSERT INTO custaddr VALUES (?,?)', data)
conn.commit()
result = conn.execute(nameq)
rows = result.fetchall()
cols = 'cust_num', 'name'
df = DataFrame(rows, columns=cols)

This works as expected on v0.10.1.

jreback · 2013-06-06T19:35:36Z

@cpcloud of course may be sql server vs other flavors issue

cpcloud · 2013-06-06T19:36:44Z

oh ok. i have never used sql (beyond examples) before...so i'm basically a n00b when it comes to pitfalls...should i keep going?

JasonCrowe · 2013-06-06T19:37:02Z

That looks right as far as types.

Here is a clip of the actual data:
C001505, @@stokes GENERAL STORE
C002815, 3R DISTRIBUTING LLC
C000032, 4 C'S EXPERT SHARPENING CO
C000033, 422 SHOES
C000034, 5 L ENTERPRISES

jreback · 2013-06-06T19:38:10Z

@cpcloud fyi....try pd.read_sql(....) as well, this does some type conversions

JasonCrowe · 2013-06-06T19:38:41Z

Using the following code it works fine in 0.11.0

name_q = "SELECT cust_num, name from custaddr "
name_df = sql.read_frame(name_q, CONN)

jreback · 2013-06-06T19:40:35Z

@JasonCrowe is rows (when it comes out of fetchall) list-like? an actual list?
what if you do

DataFrame(list(rows),columns=COLUMNS)?

cpcloud · 2013-06-06T19:40:56Z

was just about to ask that...

JasonCrowe · 2013-06-06T19:44:41Z

If I limit the query to the top 10 and print rows I get:

[(u'C000001', u'WAL-MART CORPORATE'), (u'C000002', u'CANADIAN TIRE CORPORATE'), (u'C000003', u'WAL-MART NON ASN'), (u'C000004', u'CANADIAN TIRE'), (u'C000005', u'HOME DEPOT'), (u'C000006', u'MEIJER'), (u'C000007', u'SAMS'), (u'C000008', u'TARGET'), (u'C000009', u'WAL-MART'), (u'C000010', u'WAL-MART CANADA')]

I will try DataFrame(list(rows),columns=COLUMNS)

JasonCrowe · 2013-06-06T19:46:56Z

Result:

ValueError: Shape of passed values is (1, 10), indices imply (2, 10)

cpcloud · 2013-06-06T19:48:38Z

this is with exactly the same data u show above?

jreback · 2013-06-06T19:49:46Z

This works, must be something else

In [2]: l
Out[2]: 
[(u'C000001', u'WAL-MART CORPORATE'),
 (u'C000002', u'CANADIAN TIRE CORPORATE'),
 (u'C000003', u'WAL-MART NON ASN'),
 (u'C000004', u'CANADIAN TIRE'),
 (u'C000005', u'HOME DEPOT'),
 (u'C000006', u'MEIJER'),
 (u'C000007', u'SAMS'),
 (u'C000008', u'TARGET'),
 (u'C000009', u'WAL-MART'),
 (u'C000010', u'WAL-MART CANADA')]

In [3]: DataFrame(l,columns=list('AB'))
Out[3]: 
         A                        B
0  C000001       WAL-MART CORPORATE
1  C000002  CANADIAN TIRE CORPORATE
2  C000003         WAL-MART NON ASN
3  C000004            CANADIAN TIRE
4  C000005               HOME DEPOT
5  C000006                   MEIJER
6  C000007                     SAMS
7  C000008                   TARGET
8  C000009                 WAL-MART
9  C000010          WAL-MART CANADA

cpcloud · 2013-06-06T19:50:15Z

yep that's what i get 2

JasonCrowe · 2013-06-06T19:50:28Z

Yes, they both use the same query:

SELECT top 10 cust_num, name
from custaddr
WHERE name is not null
and cust_seq = 0
order by cust_num

cpcloud · 2013-06-06T19:53:21Z

what is cust_seq? column of ints?

JasonCrowe · 2013-06-06T19:55:02Z

Yes, but the result is the same with or without the where sql.

cpcloud · 2013-06-06T19:57:50Z

you should programmatically check the type of each tuple element of the list and then each element of that tuple (even though looks like they are all unicode objects) to make sure that there's no weirdness in there. fyi using print can hide things sometimes, e.g., print [] and print str([]) look the same in the console but they are different objects...

cpcloud · 2013-06-06T20:00:10Z

@JasonCrowe u can do what @jreback just showed correct? let's get a baseline here...

JasonCrowe · 2013-06-06T20:03:33Z

DataFrame(l,columns=list('AB')) shows the same results as he posted.

cpcloud · 2013-06-06T20:05:24Z

ok that's good. rows[:10] == l should return True...does it?

cpcloud · 2013-06-06T20:06:03Z

maybe wrap it in sorted as needed...

jreback · 2013-06-06T20:09:51Z

try this:

while(True):
   r = rows[0:100]
   try:
       DataFrame(r,columns=list('AB'))
   except:
       print r
    rows = rows[100:]

see where it breaks

JasonCrowe · 2013-06-06T20:11:45Z

Returns False

Will try while loop next.

cpcloud · 2013-06-06T20:12:53Z

at least one element of the list, then, is not a tuple...

jreback · 2013-06-06T20:16:08Z

yep....weird

cpcloud · 2013-06-06T20:29:24Z

try type(rows[list(map(lambda x: isinstance(x, tuple), rows)).index(False)]) to get the type of the first non tuple in rows.

JasonCrowe · 2013-06-06T20:32:20Z

while:

[(u'C000001', u'WAL-MART CORPORATE'), (u'C000002', u'CANADIAN TIRE CORPORATE'), (u'C000003', u'WAL-MART NON ASN'), (u'C000004', u'CANADIAN TIRE'), (u'C000005', u'HOME DEPOT'), (u'C000006', u'MEIJER'), (u'C000007', u'SAMS'), (u'C000008', u'TARGET'), (u'C000009', u'WAL-MART'), (u'C000010', u'WAL-MART CANADA')]
[]
[]
[]
[]
[]
[]

JasonCrowe · 2013-06-06T20:32:55Z

In [14]:type(rows[list(map(lambda x: isinstance(x, tuple), rows)).index(False)])

Out[14]:pyodbc.Row

cpcloud · 2013-06-06T20:33:00Z

there you go

cpcloud · 2013-06-06T20:33:31Z

i knew someone had put a sneaky repr in somewhere...

JasonCrowe · 2013-06-06T20:35:34Z

So.... Is this a bug in pyodbc?

jreback · 2013-06-06T20:35:50Z

read_sql handles this correctly

@cpcloud maybe we should make a little function in utils somewhere to 'check' input and report where it is bad
(not part of the construction code) but can be called manually

cpcloud · 2013-06-06T20:36:53Z

don't think so, but i might argue that they shouldn't repr to a Python builtin so u need to convert those objects to tuples or lists. a quick glance of the Row docs for pyodbc suggests that Row implements the collections.Sequence protocol (__getitem__, __setitem__, __len__, and __contains__, IIRC) which means you can probably do map(tuple, rows) until we do what @jreback suggests

jreback · 2013-06-06T20:42:01Z

@JasonCrowe
fundamentally the issue is that pandas doesn't coerce list like objects
eg you could actually store a pyodbc row object in a frame (of course the utility is limited in that case)

@cpcloud maybe we could deal with coercing collections.Sequence though

cpcloud · 2013-06-06T20:45:30Z

@jreback hm why didn't list(rows) work? that's essentially what _safe_fetch is doing...

jreback · 2013-06-06T20:46:56Z

it's not recursively deep just a top level conversipn

cpcloud · 2013-06-06T20:47:25Z

oh i c, it's from_records that makes it work

cpcloud · 2013-06-06T20:48:52Z

no that's not it...nvm

cpcloud · 2013-06-06T20:51:14Z

wait! it looks like it is: _list_to_arrays is doing it i believe...

jreback · 2013-06-06T21:02:46Z

@cpcloud a bit deeper...I know where it is....see the refernces issue

cpcloud · 2013-06-06T21:05:30Z

ok. cool. i'm having a helluva time trying to connect to a database to test this...is there no way to do this with pyodbc?

jreback · 2013-06-13T18:15:26Z

closing this....enhancement issue in #3783

cpcloud mentioned this issue Jun 6, 2013

DOC: document the pitfalls of different byte orders #3780

Merged

jreback mentioned this issue Jun 6, 2013

ENH: allow list of collections.Sequence to be coerced in DataFrame constructor #3783

Closed

jreback closed this as completed Jun 13, 2013

sql values returned incorrectly. Worked in 0.10.0 doesn't in 0.11.0 #3763

sql values returned incorrectly. Worked in 0.10.0 doesn't in 0.11.0 #3763

Comments

JasonCrowe commented Jun 5, 2013

####### Code

####### Code end

####### Code

COLUMNS = ['cust_num', 'name']

####### Code end

cpcloud commented Jun 6, 2013

cpcloud commented Jun 6, 2013

JasonCrowe commented Jun 6, 2013

cpcloud commented Jun 6, 2013

cpcloud commented Jun 6, 2013

jreback commented Jun 6, 2013

cpcloud commented Jun 6, 2013

JasonCrowe commented Jun 6, 2013

jreback commented Jun 6, 2013

JasonCrowe commented Jun 6, 2013

jreback commented Jun 6, 2013

cpcloud commented Jun 6, 2013

JasonCrowe commented Jun 6, 2013

JasonCrowe commented Jun 6, 2013

cpcloud commented Jun 6, 2013

jreback commented Jun 6, 2013

cpcloud commented Jun 6, 2013

JasonCrowe commented Jun 6, 2013

cpcloud commented Jun 6, 2013

JasonCrowe commented Jun 6, 2013

cpcloud commented Jun 6, 2013

cpcloud commented Jun 6, 2013

JasonCrowe commented Jun 6, 2013

cpcloud commented Jun 6, 2013

cpcloud commented Jun 6, 2013

jreback commented Jun 6, 2013

JasonCrowe commented Jun 6, 2013

cpcloud commented Jun 6, 2013

jreback commented Jun 6, 2013

cpcloud commented Jun 6, 2013

JasonCrowe commented Jun 6, 2013

JasonCrowe commented Jun 6, 2013

cpcloud commented Jun 6, 2013

cpcloud commented Jun 6, 2013

JasonCrowe commented Jun 6, 2013

jreback commented Jun 6, 2013

cpcloud commented Jun 6, 2013

jreback commented Jun 6, 2013

cpcloud commented Jun 6, 2013

jreback commented Jun 6, 2013

cpcloud commented Jun 6, 2013

cpcloud commented Jun 6, 2013

cpcloud commented Jun 6, 2013

jreback commented Jun 6, 2013

cpcloud commented Jun 6, 2013

jreback commented Jun 13, 2013