-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
pd.read_sql_query() does not convert NULLs to NaN #14319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Seems to be the same as #14314 (though your example is better, thanks; I'll copy it over there). The Does your actual use-case use |
Let's actually reopen this, is it worth adding a |
My actual example was slightly more complex. I used an SQL query and had a way to do chunk-wise reading, which made this issue more likely. |
I am not sure it is worth adding such a parameter. The issue is also that it should not necessarily result in a float column, as in SQL the NULLs can be in any type of column. To solve the dtype issues with multiple chunks, we could also think about a The problem with queries is that we do not have any information about the database table layout, we just get back result sets. But maybe we could also have a new kwarg to specify the table name where to get the type information. Or a helper function to get this information and to automatically construct a dict that could be passed to a possible |
Agreed about the A helper method for getting dtypes from the table might be useful. I've written similar methods in the past that get the column info from a sqlalchemy MetaData object and transform them to a dictionary of |
Is there any work around for this while the issue is being addressed? I tried the following, but neither works:
|
Does |
same issue here, pandas forced decimal to float dtype when using read_sql_query() :( |
Was this ever resolved? I am having the same issue. See also: https://stackoverflow.com/questions/53315035/pandas-read-sql-query-returning-none-for-all-values-in-some-columns I am using pandas read_sql_query to read data from a MySQL database table into a pandas dataframe. Some columns in this table have all NULL values. For those columns the pandas dataframe contains None in every row. For all other columns the dataframe contains NaN where there was a NULL value. Can anyone explain why None is returned for the all NULL columns? And how do I make sure I have all NaNs, hopefully without doing manual conversions? I should add that two of the columns causing this problem are float and the third is of type double, Here is an example. The columns pef and fer contain all NULLS in the database.
In the MySQL database these columns are defined as:
I would expect the columns pef and fer to contain NaN in each row, not None. |
Still open. Is your actual query Pandas can't use the dtypes from the table for arbitrary sql expressions passed to |
My actual query is more complicated than that and involves multiple tables. So I can't just use pd.read_sql_table. |
Closing in favor of #10285 |
A small, complete example of the issue
Expected Output
In pandas 0.18.1 this will result in a column of type
object
withNone
values, whereas I needed float("nan"). The coerce_float=True option made no difference. This is most needed, when reading in a float column chunk-wise, since there may be sequences of NULLs.(also http://stackoverflow.com/questions/30652457/adjust-pandas-read-sql-query-null-value-treatment/)
The text was updated successfully, but these errors were encountered: