Skip to content

TypeError: object of type 'NoneType' has no len() with MySQL empty columns #3523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
randyzwitch opened this issue May 3, 2013 · 13 comments
Closed

Comments

@randyzwitch
Copy link

pandas '0.11.0'
mysql.connector '1.0.9'

Import statements and options

import mysql.connector
from mysql.connector import errorcode
import pandas.io.sql as sql
import pandas as pd
from pandas import DataFrame

Error

Running this code against MySQL (view includes several completely blank columns):

#Queries for CCI since Nov 2012
queries = sql.read_frame("select * from vw_queries_lj where query_account_id='34' and from_unixtime(query_created, '%Y-%m-%d') >= '2012-11-01' limit 100;", con)

Causes this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-78e88f1ac26a> in <module>()
      1 #Queries for CCI since Nov 2012
----> 2 queries = sql.read_frame("select * from vw_queries_lj where query_account_id='34' and from_unixtime(query_created, '%Y-%m-%d') >= '2012-11-01' limit 100;", con)
      3 

/Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas/io/sql.pyc in read_frame(sql, con, index_col, coerce_float, params)
    160 
    161     result = DataFrame.from_records(rows, columns=columns,
--> 162                                     coerce_float=coerce_float)
    163 
    164     if index_col is not None:

/Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in from_records(cls, data, index, exclude, columns, coerce_float, nrows)
   1070         else:
   1071             arrays, arr_columns = _to_arrays(data, columns,
-> 1072                                              coerce_float=coerce_float)
   1073 
   1074             arr_columns = _ensure_index(arr_columns)

/Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _to_arrays(data, columns, coerce_float, dtype)
   5449         return arrays, columns
   5450 
-> 5451     if len(data) == 0:
   5452         return [], []  # columns if columns is not None else []
   5453     if isinstance(data[0], (list, tuple)):

TypeError: object of type 'NoneType' has no len()

No error

Running query on primary key (i.e. no blanks) returns a proper data frrame

#Queries for CCI since Nov 2012
queries = sql.read_frame("select token from vw_queries_lj where query_account_id='34' and from_unixtime(query_created, '%Y-%m-%d') >= '2012-11-01' limit 100;", con)

Expectation

My expected behavior would be that pandas would be able to handle blank columns as character or float, not as NoneType

@jreback
Copy link
Contributor

jreback commented May 10, 2013

this looks the same as #3562, closed by #3564, @y-p ?

@ghost
Copy link

ghost commented May 10, 2013

Not sure. and there's no easily reproducible example , I'd hold off until @randyzwitch reports the
issue solved in master.

@jreback
Copy link
Contributor

jreback commented May 30, 2013

@randyzwitch can you tests this off of master, see if its resolved?

@randyzwitch
Copy link
Author

I'll take a look

@randyzwitch
Copy link
Author

Unfortunately, it's still showing an error for this query/pandas code:

queries = sql.read_frame("select * from vw_queries_lj where query_account_id='34' and from_unixtime(query_created, '%Y-%m-%d') >= '2012-11-01' limit 100;", con)

As a sanity check, I ran the same query within a MySQL client, as well as R, and both return a proper answer. So it's valid MySQL syntax returning an answer.

However, if you remove part of the where clause, all of the sudden there's no pandas error

queries = sql.read_frame("select * from vw_queries_lj where from_unixtime(query_created, '%Y-%m-%d') >= '2012-11-01' limit 100;", con)

At this point, I don't know if this is too specific of a bug to try and chase down, since I can't guarantee it's not a data specific error or a MySQL driver problem.

@jreback
Copy link
Contributor

jreback commented May 30, 2013

@randyzwitch if you want to be adventurous

what need to see is what is passed to DataFrame.from_records, all of the arguments

run your script under the debugger, e.g.

python -m pdb script.py

when you get to the error, go u till you can see the objects that you want and post them here

@randyzwitch
Copy link
Author

I'll give that a shot when I get back to work on Monday

@randyzwitch
Copy link
Author

@jreback Not sure if this is what you're looking for, I've never worked with the debugger before:

new-host-3:Desktop randyzwitch$ python -m pdb pandas-error.py
> /Users/randyzwitch/Desktop/pandas-error.py(2)<module>()
-> import mysql.connector
(Pdb) continue
Traceback (most recent call last):
  File "/Users/randyzwitch/anaconda/lib/python2.7/pdb.py", line 1314, in main
    pdb._runscript(mainpyfile)
  File "/Users/randyzwitch/anaconda/lib/python2.7/pdb.py", line 1233, in _runscript
    self.run(statement)
  File "/Users/randyzwitch/anaconda/lib/python2.7/bdb.py", line 400, in run
    exec cmd in globals, locals
  File "<string>", line 1, in <module>
  File "pandas-error.py", line 2, in <module>
    import mysql.connector
  File "/Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/io/sql.py", line 162, in read_frame
    coerce_float=coerce_float)
  File "/Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/core/frame.py", line 1114, in from_records
    coerce_float=coerce_float)
  File "/Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/core/frame.py", line 5764, in _to_arrays
    if not len(data):
TypeError: object of type 'NoneType' has no len()
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/core/frame.py(5764)_to_arrays()
-> if not len(data):
(Pdb) u
> /Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/core/frame.py(1114)from_records()
-> coerce_float=coerce_float)
(Pdb) u
> /Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/io/sql.py(162)read_frame()
-> coerce_float=coerce_float)
(Pdb) u
> /Users/randyzwitch/Desktop/pandas-error.py(2)<module>()
-> import mysql.connector
(Pdb) u
> <string>(1)<module>()
(Pdb) u
> /Users/randyzwitch/anaconda/lib/python2.7/bdb.py(400)run()
-> exec cmd in globals, locals
(Pdb) u
> /Users/randyzwitch/anaconda/lib/python2.7/pdb.py(1233)_runscript()
-> self.run(statement)
(Pdb) u
> /Users/randyzwitch/anaconda/lib/python2.7/pdb.py(1314)main()
-> pdb._runscript(mainpyfile)
(Pdb) u
*** Oldest frame
(Pdb) u

@jreback
Copy link
Contributor

jreback commented Jun 3, 2013

go u till you get to read_from
then

print rows
print columns

@randyzwitch
Copy link
Author

In the error message, it's claiming that there are zero results returned; in my MySQL client, 100 rows (due to the LIMIT statement) are returned.

Note that in the MySQL returned answer, there are approximately 20 columns that are NULL for this query, listed below:

data_integrity_address2
java_version
quicktime_version
data_integrity_blocked
phone2
submit -- ebureau_etarget-result

Deleting all of the null columns and re-running the query still returns None for print rows.

> /Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/core/frame.py(1114)from_records()
-> coerce_float=coerce_float)
(Pdb) print rows
*** NameError: name 'rows' is not defined
(Pdb) print columns
Index([u'provider', u'audit', u'token', u'type', u'result', u'query_created', u'lead_created', u'authentic', u'reg_rule', u'age', u'age_rule', u'data_integrity', u'data_integrity_rule', u'data_integrity_email', u'data_integrity_f_name', u'data_integrity_l_name', u'data_integrity_address1', u'data_integrity_address2', u'data_integrity_city', u'data_integrity_state', u'data_integrity_zip', u'data_integrity_prog', u'url_value', u'url_rule', u'entity_value', u'entity_rule', u'risk', u'risk_rule', u'call_center', u'call_center_rule', u'device_five_minutes', u'device_hour', u'device_twelve_hours', u'device_day', u'device_week', u'device_five_minutes_rule', u'device_hour_rule', u'device_twelve_hours_rule', u'device_day_rule', u'device_week_rule', u'ip_five_minutes', u'ip_hour', u'ip_twelve_hours', u'ip_day', u'ip_week', u'ip_five_minutes_rule', u'ip_hour_rule', u'ip_twelve_hours_rule', u'ip_day_rule', u'ip_week_rule', u'lead_five_minutes', u'lead_hour', u'lead_twelve_hours', u'lead_day', u'lead_week', u'lead_five_minutes_rule', u'lead_hour_rule', u'lead_twelve_hours_rule', u'lead_day_rule', u'lead_week_rule', u'consumer_five_minutes', u'consumer_hour', u'consumer_twelve_hours', u'consumer_day', u'consumer_week', u'consumer_five_minutes_rule', u'consumer_hour_rule', u'consumer_twelve_hours_rule', u'consumer_day_rule', u'consumer_week_rule', u'total_entities', u'total_entities_rule', u'total_hops', u'total_hops_rule', u'lead_dupe_check', u'lead_dupe_check_rule', u'consumer_dupe_check', u'consumer_dupe_check_rule', u'query_account_id', u'campaign_key', u'campaign', u'campaign_call_center', u'lead_account_code', u'lead_account', u'ip', u'risk_rating', u'fuzzy_id', u'os_id', u'browser_id', u'x_forwarded_for', u'user_agent', u'resolution', u'time_zone_offset', u'http_accept_headers', u'fonts_installed', u'java_version', u'quicktime_version', u'flash_version', u'screen_dpi', u'language', u'isp', u'city', u'region', u'browser', u'last_ts', u'lead_age_seconds', u'lead_age_minutes', u'lead_duration_seconds', u'lead_duration_minutes', u'field_changes', u'unique_id', u'threatmetrix', u'risk_blocked', u'data_integrity_blocked', u'data_integrity_passed', u'data_integrity_failed', u'data_integrity_default', u'phone1', u'phone2', u'file_date', u'submit', u'absent_pages', u'absent_start', u'phone1_areacode', u'phone1_exchange', u'phone1_last4', u'phone2_areacode', u'phone2_exchange', u'phone2_last4', u'email_user', u'email_domain', u'cpaFraudScore', u'num_accidents', u'occupation', u'primary_use', u'residence_status', u'state_licensed', u'time_residence', u'age_1st_licensed', u'credit_rating', u'current_exp', u'dob', u'gender', u'highest_ed', u'license_num', u'military_affiliation', u'ebureau_escore-qualityscore_value', u'ebureau_everify-phone', u'ebureau_everify-address', u'ebureau_everify-email', u'ebureau_everify-result', u'ebureau_escore-result', u'ebureau_etarget-result'], dtype=object)
(Pdb) u
> /Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/io/sql.py(162)read_frame()
-> coerce_float=coerce_float)
(Pdb) print rows
None
(Pdb) print columns
['provider', 'audit', 'token', 'type', 'result', 'query_created', 'lead_created', 'authentic', 'reg_rule', 'age', 'age_rule', 'data_integrity', 'data_integrity_rule', 'data_integrity_email', 'data_integrity_f_name', 'data_integrity_l_name', 'data_integrity_address1', 'data_integrity_address2', 'data_integrity_city', 'data_integrity_state', 'data_integrity_zip', 'data_integrity_prog', 'url_value', 'url_rule', 'entity_value', 'entity_rule', 'risk', 'risk_rule', 'call_center', 'call_center_rule', 'device_five_minutes', 'device_hour', 'device_twelve_hours', 'device_day', 'device_week', 'device_five_minutes_rule', 'device_hour_rule', 'device_twelve_hours_rule', 'device_day_rule', 'device_week_rule', 'ip_five_minutes', 'ip_hour', 'ip_twelve_hours', 'ip_day', 'ip_week', 'ip_five_minutes_rule', 'ip_hour_rule', 'ip_twelve_hours_rule', 'ip_day_rule', 'ip_week_rule', 'lead_five_minutes', 'lead_hour', 'lead_twelve_hours', 'lead_day', 'lead_week', 'lead_five_minutes_rule', 'lead_hour_rule', 'lead_twelve_hours_rule', 'lead_day_rule', 'lead_week_rule', 'consumer_five_minutes', 'consumer_hour', 'consumer_twelve_hours', 'consumer_day', 'consumer_week', 'consumer_five_minutes_rule', 'consumer_hour_rule', 'consumer_twelve_hours_rule', 'consumer_day_rule', 'consumer_week_rule', 'total_entities', 'total_entities_rule', 'total_hops', 'total_hops_rule', 'lead_dupe_check', 'lead_dupe_check_rule', 'consumer_dupe_check', 'consumer_dupe_check_rule', 'query_account_id', 'campaign_key', 'campaign', 'campaign_call_center', 'lead_account_code', 'lead_account', 'ip', 'risk_rating', 'fuzzy_id', 'os_id', 'browser_id', 'x_forwarded_for', 'user_agent', 'resolution', 'time_zone_offset', 'http_accept_headers', 'fonts_installed', 'java_version', 'quicktime_version', 'flash_version', 'screen_dpi', 'language', 'isp', 'city', 'region', 'browser', 'last_ts', 'lead_age_seconds', 'lead_age_minutes', 'lead_duration_seconds', 'lead_duration_minutes', 'field_changes', 'unique_id', 'threatmetrix', 'risk_blocked', 'data_integrity_blocked', 'data_integrity_passed', 'data_integrity_failed', 'data_integrity_default', 'phone1', 'phone2', 'file_date', 'submit', 'absent_pages', 'absent_start', 'phone1_areacode', 'phone1_exchange', 'phone1_last4', 'phone2_areacode', 'phone2_exchange', 'phone2_last4', 'email_user', 'email_domain', 'cpaFraudScore', 'num_accidents', 'occupation', 'primary_use', 'residence_status', 'state_licensed', 'time_residence', 'age_1st_licensed', 'credit_rating', 'current_exp', 'dob', 'gender', 'highest_ed', 'license_num', 'military_affiliation', 'ebureau_escore-qualityscore_value', 'ebureau_everify-phone', 'ebureau_everify-address', 'ebureau_everify-email', 'ebureau_everify-result', 'ebureau_escore-result', 'ebureau_etarget-result']
(Pdb) u
> /Users/randyzwitch/Desktop/pandas-error.py(2)<module>()
-> import mysql.connector

@jreback
Copy link
Contributor

jreback commented Jun 3, 2013

I don't think this is a pandas issue; you query is not returning anything, so you need to test using just the lower level connections and cursors to see if what is being return from the raw query

its possible (just a guess), that your setup is something to do with this

@randyzwitch
Copy link
Author

It's definitely a weird issue...I don't think it's my setup, as different values of query_account_id in the WHERE clause returns an answer with pandas, and the query causing the error in pandas returns an answer using R and MySQL specific tools.

That said, I don't want to waste your time anymore, since it seems to be one of those tiny, non-reproducible errors. Hopefully it just fixes itself over time.

Thanks for your help.

@jreback
Copy link
Contributor

jreback commented Jun 3, 2013

np

What I mean is that your R/MySQL specific tools are not going thru python so they don't really test via python

its problably somethign to do with the MySQL connector in python itself; I don't know enough about this to help you....

@jreback jreback closed this as completed Jun 4, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants