-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TypeError: object of type 'NoneType' has no len() with MySQL empty columns #3523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not sure. and there's no easily reproducible example , I'd hold off until @randyzwitch reports the |
@randyzwitch can you tests this off of master, see if its resolved? |
I'll take a look |
Unfortunately, it's still showing an error for this query/pandas code: queries = sql.read_frame("select * from vw_queries_lj where query_account_id='34' and from_unixtime(query_created, '%Y-%m-%d') >= '2012-11-01' limit 100;", con) As a sanity check, I ran the same query within a MySQL client, as well as R, and both return a proper answer. So it's valid MySQL syntax returning an answer. However, if you remove part of the where clause, all of the sudden there's no pandas error queries = sql.read_frame("select * from vw_queries_lj where from_unixtime(query_created, '%Y-%m-%d') >= '2012-11-01' limit 100;", con) At this point, I don't know if this is too specific of a bug to try and chase down, since I can't guarantee it's not a data specific error or a MySQL driver problem. |
@randyzwitch if you want to be adventurous what need to see is what is passed to run your script under the debugger, e.g.
when you get to the error, go |
I'll give that a shot when I get back to work on Monday |
@jreback Not sure if this is what you're looking for, I've never worked with the debugger before: new-host-3:Desktop randyzwitch$ python -m pdb pandas-error.py
> /Users/randyzwitch/Desktop/pandas-error.py(2)<module>()
-> import mysql.connector
(Pdb) continue
Traceback (most recent call last):
File "/Users/randyzwitch/anaconda/lib/python2.7/pdb.py", line 1314, in main
pdb._runscript(mainpyfile)
File "/Users/randyzwitch/anaconda/lib/python2.7/pdb.py", line 1233, in _runscript
self.run(statement)
File "/Users/randyzwitch/anaconda/lib/python2.7/bdb.py", line 400, in run
exec cmd in globals, locals
File "<string>", line 1, in <module>
File "pandas-error.py", line 2, in <module>
import mysql.connector
File "/Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/io/sql.py", line 162, in read_frame
coerce_float=coerce_float)
File "/Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/core/frame.py", line 1114, in from_records
coerce_float=coerce_float)
File "/Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/core/frame.py", line 5764, in _to_arrays
if not len(data):
TypeError: object of type 'NoneType' has no len()
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/core/frame.py(5764)_to_arrays()
-> if not len(data):
(Pdb) u
> /Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/core/frame.py(1114)from_records()
-> coerce_float=coerce_float)
(Pdb) u
> /Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/io/sql.py(162)read_frame()
-> coerce_float=coerce_float)
(Pdb) u
> /Users/randyzwitch/Desktop/pandas-error.py(2)<module>()
-> import mysql.connector
(Pdb) u
> <string>(1)<module>()
(Pdb) u
> /Users/randyzwitch/anaconda/lib/python2.7/bdb.py(400)run()
-> exec cmd in globals, locals
(Pdb) u
> /Users/randyzwitch/anaconda/lib/python2.7/pdb.py(1233)_runscript()
-> self.run(statement)
(Pdb) u
> /Users/randyzwitch/anaconda/lib/python2.7/pdb.py(1314)main()
-> pdb._runscript(mainpyfile)
(Pdb) u
*** Oldest frame
(Pdb) u |
go
|
In the error message, it's claiming that there are zero results returned; in my MySQL client, 100 rows (due to the LIMIT statement) are returned. Note that in the MySQL returned answer, there are approximately 20 columns that are NULL for this query, listed below: data_integrity_address2 Deleting all of the null columns and re-running the query still returns None for print rows. > /Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/core/frame.py(1114)from_records()
-> coerce_float=coerce_float)
(Pdb) print rows
*** NameError: name 'rows' is not defined
(Pdb) print columns
Index([u'provider', u'audit', u'token', u'type', u'result', u'query_created', u'lead_created', u'authentic', u'reg_rule', u'age', u'age_rule', u'data_integrity', u'data_integrity_rule', u'data_integrity_email', u'data_integrity_f_name', u'data_integrity_l_name', u'data_integrity_address1', u'data_integrity_address2', u'data_integrity_city', u'data_integrity_state', u'data_integrity_zip', u'data_integrity_prog', u'url_value', u'url_rule', u'entity_value', u'entity_rule', u'risk', u'risk_rule', u'call_center', u'call_center_rule', u'device_five_minutes', u'device_hour', u'device_twelve_hours', u'device_day', u'device_week', u'device_five_minutes_rule', u'device_hour_rule', u'device_twelve_hours_rule', u'device_day_rule', u'device_week_rule', u'ip_five_minutes', u'ip_hour', u'ip_twelve_hours', u'ip_day', u'ip_week', u'ip_five_minutes_rule', u'ip_hour_rule', u'ip_twelve_hours_rule', u'ip_day_rule', u'ip_week_rule', u'lead_five_minutes', u'lead_hour', u'lead_twelve_hours', u'lead_day', u'lead_week', u'lead_five_minutes_rule', u'lead_hour_rule', u'lead_twelve_hours_rule', u'lead_day_rule', u'lead_week_rule', u'consumer_five_minutes', u'consumer_hour', u'consumer_twelve_hours', u'consumer_day', u'consumer_week', u'consumer_five_minutes_rule', u'consumer_hour_rule', u'consumer_twelve_hours_rule', u'consumer_day_rule', u'consumer_week_rule', u'total_entities', u'total_entities_rule', u'total_hops', u'total_hops_rule', u'lead_dupe_check', u'lead_dupe_check_rule', u'consumer_dupe_check', u'consumer_dupe_check_rule', u'query_account_id', u'campaign_key', u'campaign', u'campaign_call_center', u'lead_account_code', u'lead_account', u'ip', u'risk_rating', u'fuzzy_id', u'os_id', u'browser_id', u'x_forwarded_for', u'user_agent', u'resolution', u'time_zone_offset', u'http_accept_headers', u'fonts_installed', u'java_version', u'quicktime_version', u'flash_version', u'screen_dpi', u'language', u'isp', u'city', u'region', u'browser', u'last_ts', u'lead_age_seconds', u'lead_age_minutes', u'lead_duration_seconds', u'lead_duration_minutes', u'field_changes', u'unique_id', u'threatmetrix', u'risk_blocked', u'data_integrity_blocked', u'data_integrity_passed', u'data_integrity_failed', u'data_integrity_default', u'phone1', u'phone2', u'file_date', u'submit', u'absent_pages', u'absent_start', u'phone1_areacode', u'phone1_exchange', u'phone1_last4', u'phone2_areacode', u'phone2_exchange', u'phone2_last4', u'email_user', u'email_domain', u'cpaFraudScore', u'num_accidents', u'occupation', u'primary_use', u'residence_status', u'state_licensed', u'time_residence', u'age_1st_licensed', u'credit_rating', u'current_exp', u'dob', u'gender', u'highest_ed', u'license_num', u'military_affiliation', u'ebureau_escore-qualityscore_value', u'ebureau_everify-phone', u'ebureau_everify-address', u'ebureau_everify-email', u'ebureau_everify-result', u'ebureau_escore-result', u'ebureau_etarget-result'], dtype=object)
(Pdb) u
> /Users/randyzwitch/anaconda/lib/python2.7/site-packages/pandas-1.11.0.dev_-py2.7-macosx-10.5-x86_64.egg/pandas/io/sql.py(162)read_frame()
-> coerce_float=coerce_float)
(Pdb) print rows
None
(Pdb) print columns
['provider', 'audit', 'token', 'type', 'result', 'query_created', 'lead_created', 'authentic', 'reg_rule', 'age', 'age_rule', 'data_integrity', 'data_integrity_rule', 'data_integrity_email', 'data_integrity_f_name', 'data_integrity_l_name', 'data_integrity_address1', 'data_integrity_address2', 'data_integrity_city', 'data_integrity_state', 'data_integrity_zip', 'data_integrity_prog', 'url_value', 'url_rule', 'entity_value', 'entity_rule', 'risk', 'risk_rule', 'call_center', 'call_center_rule', 'device_five_minutes', 'device_hour', 'device_twelve_hours', 'device_day', 'device_week', 'device_five_minutes_rule', 'device_hour_rule', 'device_twelve_hours_rule', 'device_day_rule', 'device_week_rule', 'ip_five_minutes', 'ip_hour', 'ip_twelve_hours', 'ip_day', 'ip_week', 'ip_five_minutes_rule', 'ip_hour_rule', 'ip_twelve_hours_rule', 'ip_day_rule', 'ip_week_rule', 'lead_five_minutes', 'lead_hour', 'lead_twelve_hours', 'lead_day', 'lead_week', 'lead_five_minutes_rule', 'lead_hour_rule', 'lead_twelve_hours_rule', 'lead_day_rule', 'lead_week_rule', 'consumer_five_minutes', 'consumer_hour', 'consumer_twelve_hours', 'consumer_day', 'consumer_week', 'consumer_five_minutes_rule', 'consumer_hour_rule', 'consumer_twelve_hours_rule', 'consumer_day_rule', 'consumer_week_rule', 'total_entities', 'total_entities_rule', 'total_hops', 'total_hops_rule', 'lead_dupe_check', 'lead_dupe_check_rule', 'consumer_dupe_check', 'consumer_dupe_check_rule', 'query_account_id', 'campaign_key', 'campaign', 'campaign_call_center', 'lead_account_code', 'lead_account', 'ip', 'risk_rating', 'fuzzy_id', 'os_id', 'browser_id', 'x_forwarded_for', 'user_agent', 'resolution', 'time_zone_offset', 'http_accept_headers', 'fonts_installed', 'java_version', 'quicktime_version', 'flash_version', 'screen_dpi', 'language', 'isp', 'city', 'region', 'browser', 'last_ts', 'lead_age_seconds', 'lead_age_minutes', 'lead_duration_seconds', 'lead_duration_minutes', 'field_changes', 'unique_id', 'threatmetrix', 'risk_blocked', 'data_integrity_blocked', 'data_integrity_passed', 'data_integrity_failed', 'data_integrity_default', 'phone1', 'phone2', 'file_date', 'submit', 'absent_pages', 'absent_start', 'phone1_areacode', 'phone1_exchange', 'phone1_last4', 'phone2_areacode', 'phone2_exchange', 'phone2_last4', 'email_user', 'email_domain', 'cpaFraudScore', 'num_accidents', 'occupation', 'primary_use', 'residence_status', 'state_licensed', 'time_residence', 'age_1st_licensed', 'credit_rating', 'current_exp', 'dob', 'gender', 'highest_ed', 'license_num', 'military_affiliation', 'ebureau_escore-qualityscore_value', 'ebureau_everify-phone', 'ebureau_everify-address', 'ebureau_everify-email', 'ebureau_everify-result', 'ebureau_escore-result', 'ebureau_etarget-result']
(Pdb) u
> /Users/randyzwitch/Desktop/pandas-error.py(2)<module>()
-> import mysql.connector |
I don't think this is a pandas issue; you query is not returning anything, so you need to test using just the lower level connections and cursors to see if what is being return from the raw query its possible (just a guess), that your setup is something to do with this |
It's definitely a weird issue...I don't think it's my setup, as different values of query_account_id in the WHERE clause returns an answer with pandas, and the query causing the error in pandas returns an answer using R and MySQL specific tools. That said, I don't want to waste your time anymore, since it seems to be one of those tiny, non-reproducible errors. Hopefully it just fixes itself over time. Thanks for your help. |
np What I mean is that your R/MySQL specific tools are not going thru python so they don't really test via python its problably somethign to do with the MySQL connector in python itself; I don't know enough about this to help you.... |
pandas '0.11.0'
mysql.connector '1.0.9'
Import statements and options
import mysql.connector
from mysql.connector import errorcode
import pandas.io.sql as sql
import pandas as pd
from pandas import DataFrame
Error
Running this code against MySQL (view includes several completely blank columns):
Causes this error:
No error
Running query on primary key (i.e. no blanks) returns a proper data frrame
Expectation
My expected behavior would be that pandas would be able to handle blank columns as character or float, not as NoneType
The text was updated successfully, but these errors were encountered: