Skip to content

sql values returned incorrectly. Worked in 0.10.0 doesn't in 0.11.0 #3763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JasonCrowe opened this issue Jun 5, 2013 · 46 comments
Closed

Comments

@JasonCrowe
Copy link

####### Code

name_q = "SELECT cust_num, name from custaddr"
result = CONNECTION.execute(name_q)
rows = result.fetchall()

COLUMNS = ['cust_num', 'name']
name_df = DataFrame(rows, columns=COLUMNS)

####### Code end

This code worked without fail on 0.10, but started failing as soon as I updated to 0.11.0. It gives me the following error.


ValueError Traceback (most recent call last)
in ()
5
6 COLUMNS = ['cust_num', 'name']
----> 7 name_df = DataFrame(rows, columns=COLUMNS)
8
9 #name_df[:5]

c:\Python27\lib\site-packages\pandas\core\frame.pyc in init(self, data, index, columns, dtype, copy)
426 else:
427 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 428 copy=copy)
429 else:
430 mgr = self._init_ndarray(data, index, columns, dtype=dtype,

c:\Python27\lib\site-packages\pandas\core\frame.pyc in _init_ndarray(self, values, index, columns, dtype, copy)
556 columns = _ensure_index(columns)
557
--> 558 return create_block_manager_from_blocks([ values.T ], [ columns, index ])
559
560 def _wrap_array(self, arr, axes, copy=False):

c:\Python27\lib\site-packages\pandas\core\internals.pyc in create_block_manager_from_blocks(blocks, axes)
1814 blocks = [ getattr(b,'values',b) for b in blocks ]
1815 tot_items = sum(b.shape[0] for b in blocks)
-> 1816 construction_error(tot_items,blocks[0].shape[1:],axes)
1817
1818 def create_block_manager_from_arrays(arrays, names, axes):

c:\Python27\lib\site-packages\pandas\core\internals.pyc in construction_error(tot_items, block_shape, axes)
1797 raise ValueError("Shape of passed values is %s, indices imply %s" % (
1798 tuple(map(int, [tot_items] + list(block_shape))),
-> 1799 tuple(map(int, [len(ax) for ax in axes]))))
1800
1801

ValueError: Shape of passed values is (1, 50178), indices imply (2, 50178)

If I remove the named columns with the following code:

####### Code

name_q = "SELECT cust_num, name from custaddr"
result = CONNECTION.execute(name_q)
rows = result.fetchall()

COLUMNS = ['cust_num', 'name']

name_df = DataFrame(rows)#, columns=COLUMNS)

name_df[:5]

####### Code end

I get:

0
0 [C001505, @@stokes GENERAL STORE]
1 [C002815, 3R DISTRIBUTING LLC]
2 [C000032, 4 C'S EXPERT SHARPENING CO]
3 [C000033, 422 SHOES]
4 [C000034, 5 L ENTERPRISES]

in 0.10 this would return two columns of data rather than one column of list.

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

to make sure that you don't have one row that is longer or shorter than the rest try the following (git master (v0.11.1 soon-to-be-released) assigns None or NaN depending on the type when this happens, I can't remember if that was different in 0.11.0)

import numpy as np
np.all(np.array(map(len, rows)) == len(rows[0]))

what is the output?

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

if you get True, then this might be a bug that has been fixed: I can't replicate with what I think your data look like. I'm assuming a list of tuples of strings or a list of lists of strings. Is that correct?

@JasonCrowe
Copy link
Author

The output was true. I am returning a record set from SQL Server that is made up of two string fields.

I have downgraded to 0.10.1 and the problem goes away.

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

hmm ok i will take a look

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

@JasonCrowe Okay, here's what i'm doing to try and reproduce this, being that you probably cannot give me the data. Let me know if the sql that creates your tables is different, e.g., did I use the correct types?

import sqlite3 as sql
from pandas.util.testing import rands # random strings
nameq = 'SELECT cust_num, name from custaddr'
name_create = 'CREATE TABLE custaddr (cust_num text, name text)'
conn = sql.connect(':memory:')
cursor = conn.cursor()
cursor.execute(name_create)
conn.commit()
strlen = 10
numrecs = 1000
data = [(rands(strlen), rands(strlen)) for _ in xrange(numrecs)]
cursor.executemany('INSERT INTO custaddr VALUES (?,?)', data)
conn.commit()
result = conn.execute(nameq)
rows = result.fetchall()
cols = 'cust_num', 'name'
df = DataFrame(rows, columns=cols)

This works as expected on v0.10.1.

@jreback
Copy link
Contributor

jreback commented Jun 6, 2013

@cpcloud of course may be sql server vs other flavors issue

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

oh ok. i have never used sql (beyond examples) before...so i'm basically a n00b when it comes to pitfalls...should i keep going?

@JasonCrowe
Copy link
Author

That looks right as far as types.

Here is a clip of the actual data:
C001505, @@stokes GENERAL STORE
C002815, 3R DISTRIBUTING LLC
C000032, 4 C'S EXPERT SHARPENING CO
C000033, 422 SHOES
C000034, 5 L ENTERPRISES

@jreback
Copy link
Contributor

jreback commented Jun 6, 2013

@cpcloud fyi....try pd.read_sql(....) as well, this does some type conversions

@JasonCrowe
Copy link
Author

Using the following code it works fine in 0.11.0

name_q = "SELECT cust_num, name from custaddr "
name_df = sql.read_frame(name_q, CONN)

@jreback
Copy link
Contributor

jreback commented Jun 6, 2013

@JasonCrowe is rows (when it comes out of fetchall) list-like? an actual list?
what if you do

DataFrame(list(rows),columns=COLUMNS)?

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

was just about to ask that...

@JasonCrowe
Copy link
Author

If I limit the query to the top 10 and print rows I get:

[(u'C000001', u'WAL-MART CORPORATE'), (u'C000002', u'CANADIAN TIRE CORPORATE'), (u'C000003', u'WAL-MART NON ASN'), (u'C000004', u'CANADIAN TIRE'), (u'C000005', u'HOME DEPOT'), (u'C000006', u'MEIJER'), (u'C000007', u'SAMS'), (u'C000008', u'TARGET'), (u'C000009', u'WAL-MART'), (u'C000010', u'WAL-MART CANADA')]

I will try DataFrame(list(rows),columns=COLUMNS)

@JasonCrowe
Copy link
Author

Result:

ValueError: Shape of passed values is (1, 10), indices imply (2, 10)

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

this is with exactly the same data u show above?

@jreback
Copy link
Contributor

jreback commented Jun 6, 2013

This works, must be something else

In [2]: l
Out[2]: 
[(u'C000001', u'WAL-MART CORPORATE'),
 (u'C000002', u'CANADIAN TIRE CORPORATE'),
 (u'C000003', u'WAL-MART NON ASN'),
 (u'C000004', u'CANADIAN TIRE'),
 (u'C000005', u'HOME DEPOT'),
 (u'C000006', u'MEIJER'),
 (u'C000007', u'SAMS'),
 (u'C000008', u'TARGET'),
 (u'C000009', u'WAL-MART'),
 (u'C000010', u'WAL-MART CANADA')]

In [3]: DataFrame(l,columns=list('AB'))
Out[3]: 
         A                        B
0  C000001       WAL-MART CORPORATE
1  C000002  CANADIAN TIRE CORPORATE
2  C000003         WAL-MART NON ASN
3  C000004            CANADIAN TIRE
4  C000005               HOME DEPOT
5  C000006                   MEIJER
6  C000007                     SAMS
7  C000008                   TARGET
8  C000009                 WAL-MART
9  C000010          WAL-MART CANADA

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

yep that's what i get 2

@JasonCrowe
Copy link
Author

Yes, they both use the same query:

SELECT top 10 cust_num, name
from custaddr
WHERE name is not null
and cust_seq = 0
order by cust_num

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

what is cust_seq? column of ints?

@JasonCrowe
Copy link
Author

Yes, but the result is the same with or without the where sql.

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

you should programmatically check the type of each tuple element of the list and then each element of that tuple (even though looks like they are all unicode objects) to make sure that there's no weirdness in there. fyi using print can hide things sometimes, e.g., print [] and print str([]) look the same in the console but they are different objects...

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

@JasonCrowe u can do what @jreback just showed correct? let's get a baseline here...

@JasonCrowe
Copy link
Author

DataFrame(l,columns=list('AB')) shows the same results as he posted.

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

ok that's good. rows[:10] == l should return True...does it?

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

maybe wrap it in sorted as needed...

@jreback
Copy link
Contributor

jreback commented Jun 6, 2013

try this:

while(True):
   r = rows[0:100]
   try:
       DataFrame(r,columns=list('AB'))
   except:
       print r
    rows = rows[100:]

see where it breaks

@JasonCrowe
Copy link
Author

Returns False

Will try while loop next.

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

at least one element of the list, then, is not a tuple...

@jreback
Copy link
Contributor

jreback commented Jun 6, 2013

yep....weird

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

try type(rows[list(map(lambda x: isinstance(x, tuple), rows)).index(False)]) to get the type of the first non tuple in rows.

@JasonCrowe
Copy link
Author

while:

[(u'C000001', u'WAL-MART CORPORATE'), (u'C000002', u'CANADIAN TIRE CORPORATE'), (u'C000003', u'WAL-MART NON ASN'), (u'C000004', u'CANADIAN TIRE'), (u'C000005', u'HOME DEPOT'), (u'C000006', u'MEIJER'), (u'C000007', u'SAMS'), (u'C000008', u'TARGET'), (u'C000009', u'WAL-MART'), (u'C000010', u'WAL-MART CANADA')]
[]
[]
[]
[]
[]
[]

@JasonCrowe
Copy link
Author

In [14]:type(rows[list(map(lambda x: isinstance(x, tuple), rows)).index(False)])

Out[14]:pyodbc.Row

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

there you go

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

i knew someone had put a sneaky repr in somewhere...

@JasonCrowe
Copy link
Author

So.... Is this a bug in pyodbc?

@jreback
Copy link
Contributor

jreback commented Jun 6, 2013

read_sql handles this correctly

@cpcloud maybe we should make a little function in utils somewhere to 'check' input and report where it is bad
(not part of the construction code) but can be called manually

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

don't think so, but i might argue that they shouldn't repr to a Python builtin so u need to convert those objects to tuples or lists. a quick glance of the Row docs for pyodbc suggests that Row implements the collections.Sequence protocol (__getitem__, __setitem__, __len__, and __contains__, IIRC) which means you can probably do map(tuple, rows) until we do what @jreback suggests

@jreback
Copy link
Contributor

jreback commented Jun 6, 2013

@JasonCrowe
fundamentally the issue is that pandas doesn't coerce list like objects
eg you could actually store a pyodbc row object in a frame (of course the utility is limited in that case)

@cpcloud maybe we could deal with coercing collections.Sequence though

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

@jreback hm why didn't list(rows) work? that's essentially what _safe_fetch is doing...

@jreback
Copy link
Contributor

jreback commented Jun 6, 2013

it's not recursively deep just a top level conversipn

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

oh i c, it's from_records that makes it work

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

no that's not it...nvm

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

wait! it looks like it is: _list_to_arrays is doing it i believe...

@jreback
Copy link
Contributor

jreback commented Jun 6, 2013

@cpcloud a bit deeper...I know where it is....see the refernces issue

@cpcloud
Copy link
Member

cpcloud commented Jun 6, 2013

ok. cool. i'm having a helluva time trying to connect to a database to test this...is there no way to do this with pyodbc?

@jreback
Copy link
Contributor

jreback commented Jun 13, 2013

closing this....enhancement issue in #3783

@jreback jreback closed this as completed Jun 13, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants