-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
sql values returned incorrectly. Worked in 0.10.0 doesn't in 0.11.0 #3763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
to make sure that you don't have one row that is longer or shorter than the rest try the following (git master (v0.11.1 soon-to-be-released) assigns import numpy as np
np.all(np.array(map(len, rows)) == len(rows[0])) what is the output? |
if you get |
The output was true. I am returning a record set from SQL Server that is made up of two string fields. I have downgraded to 0.10.1 and the problem goes away. |
hmm ok i will take a look |
@JasonCrowe Okay, here's what i'm doing to try and reproduce this, being that you probably cannot give me the data. Let me know if the sql that creates your tables is different, e.g., did I use the correct types? import sqlite3 as sql
from pandas.util.testing import rands # random strings
nameq = 'SELECT cust_num, name from custaddr'
name_create = 'CREATE TABLE custaddr (cust_num text, name text)'
conn = sql.connect(':memory:')
cursor = conn.cursor()
cursor.execute(name_create)
conn.commit()
strlen = 10
numrecs = 1000
data = [(rands(strlen), rands(strlen)) for _ in xrange(numrecs)]
cursor.executemany('INSERT INTO custaddr VALUES (?,?)', data)
conn.commit()
result = conn.execute(nameq)
rows = result.fetchall()
cols = 'cust_num', 'name'
df = DataFrame(rows, columns=cols) This works as expected on |
@cpcloud of course may be sql server vs other flavors issue |
oh ok. i have never used sql (beyond examples) before...so i'm basically a n00b when it comes to pitfalls...should i keep going? |
That looks right as far as types. Here is a clip of the actual data: |
@cpcloud fyi....try |
Using the following code it works fine in 0.11.0 name_q = "SELECT cust_num, name from custaddr " |
@JasonCrowe is rows (when it comes out of fetchall) list-like? an actual list?
|
was just about to ask that... |
If I limit the query to the top 10 and print rows I get: [(u'C000001', u'WAL-MART CORPORATE'), (u'C000002', u'CANADIAN TIRE CORPORATE'), (u'C000003', u'WAL-MART NON ASN'), (u'C000004', u'CANADIAN TIRE'), (u'C000005', u'HOME DEPOT'), (u'C000006', u'MEIJER'), (u'C000007', u'SAMS'), (u'C000008', u'TARGET'), (u'C000009', u'WAL-MART'), (u'C000010', u'WAL-MART CANADA')] I will try DataFrame(list(rows),columns=COLUMNS) |
Result: ValueError: Shape of passed values is (1, 10), indices imply (2, 10) |
this is with exactly the same data u show above? |
This works, must be something else
|
yep that's what i get 2 |
Yes, they both use the same query: SELECT top 10 cust_num, name |
what is |
Yes, but the result is the same with or without the where sql. |
you should programmatically check the type of each tuple element of the list and then each element of that tuple (even though looks like they are all |
@JasonCrowe u can do what @jreback just showed correct? let's get a baseline here... |
DataFrame(l,columns=list('AB')) shows the same results as he posted. |
ok that's good. |
maybe wrap it in |
try this:
see where it breaks |
Returns False Will try while loop next. |
at least one element of the list, then, is not a |
yep....weird |
try |
while: [(u'C000001', u'WAL-MART CORPORATE'), (u'C000002', u'CANADIAN TIRE CORPORATE'), (u'C000003', u'WAL-MART NON ASN'), (u'C000004', u'CANADIAN TIRE'), (u'C000005', u'HOME DEPOT'), (u'C000006', u'MEIJER'), (u'C000007', u'SAMS'), (u'C000008', u'TARGET'), (u'C000009', u'WAL-MART'), (u'C000010', u'WAL-MART CANADA')] |
In [14]:type(rows[list(map(lambda x: isinstance(x, tuple), rows)).index(False)]) Out[14]:pyodbc.Row |
there you go |
i knew someone had put a sneaky |
So.... Is this a bug in pyodbc? |
read_sql handles this correctly @cpcloud maybe we should make a little function in utils somewhere to 'check' input and report where it is bad |
don't think so, but i might argue that they shouldn't |
@JasonCrowe @cpcloud maybe we could deal with coercing collections.Sequence though |
@jreback hm why didn't |
it's not recursively deep just a top level conversipn |
oh i c, it's |
no that's not it...nvm |
wait! it looks like it is: |
@cpcloud a bit deeper...I know where it is....see the refernces issue |
ok. cool. i'm having a helluva time trying to connect to a database to test this...is there no way to do this with pyodbc? |
closing this....enhancement issue in #3783 |
####### Code
name_q = "SELECT cust_num, name from custaddr"
result = CONNECTION.execute(name_q)
rows = result.fetchall()
COLUMNS = ['cust_num', 'name']
name_df = DataFrame(rows, columns=COLUMNS)
####### Code end
This code worked without fail on 0.10, but started failing as soon as I updated to 0.11.0. It gives me the following error.
ValueError Traceback (most recent call last)
in ()
5
6 COLUMNS = ['cust_num', 'name']
----> 7 name_df = DataFrame(rows, columns=COLUMNS)
8
9 #name_df[:5]
c:\Python27\lib\site-packages\pandas\core\frame.pyc in init(self, data, index, columns, dtype, copy)
426 else:
427 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
--> 428 copy=copy)
429 else:
430 mgr = self._init_ndarray(data, index, columns, dtype=dtype,
c:\Python27\lib\site-packages\pandas\core\frame.pyc in _init_ndarray(self, values, index, columns, dtype, copy)
556 columns = _ensure_index(columns)
557
--> 558 return create_block_manager_from_blocks([ values.T ], [ columns, index ])
559
560 def _wrap_array(self, arr, axes, copy=False):
c:\Python27\lib\site-packages\pandas\core\internals.pyc in create_block_manager_from_blocks(blocks, axes)
1814 blocks = [ getattr(b,'values',b) for b in blocks ]
1815 tot_items = sum(b.shape[0] for b in blocks)
-> 1816 construction_error(tot_items,blocks[0].shape[1:],axes)
1817
1818 def create_block_manager_from_arrays(arrays, names, axes):
c:\Python27\lib\site-packages\pandas\core\internals.pyc in construction_error(tot_items, block_shape, axes)
1797 raise ValueError("Shape of passed values is %s, indices imply %s" % (
1798 tuple(map(int, [tot_items] + list(block_shape))),
-> 1799 tuple(map(int, [len(ax) for ax in axes]))))
1800
1801
ValueError: Shape of passed values is (1, 50178), indices imply (2, 50178)
If I remove the named columns with the following code:
####### Code
name_q = "SELECT cust_num, name from custaddr"
result = CONNECTION.execute(name_q)
rows = result.fetchall()
COLUMNS = ['cust_num', 'name']
name_df = DataFrame(rows)#, columns=COLUMNS)
name_df[:5]
####### Code end
I get:
0
0 [C001505, @@stokes GENERAL STORE]
1 [C002815, 3R DISTRIBUTING LLC]
2 [C000032, 4 C'S EXPERT SHARPENING CO]
3 [C000033, 422 SHOES]
4 [C000034, 5 L ENTERPRISES]
in 0.10 this would return two columns of data rather than one column of list.
The text was updated successfully, but these errors were encountered: