Skip to content

No way to specify higher precision (e.g. Double) when saving DataFrame with floating number to MySQL #9009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zhopan77 opened this issue Dec 5, 2014 · 2 comments · Fixed by #9041
Labels
IO SQL to_sql, read_sql, read_sql_query

Comments

@zhopan77
Copy link

zhopan77 commented Dec 5, 2014

When saving DataFrame to MySQL, Pandas will map Python float (by default double precision) to MySQL FLOAT (by default single precision). I couldn't find a way to either specify the use of MySQL DOUBLE, or MySQL DECIMAL.

When saving some financial data this will cause loss of precision. For example, 2-year treasury future price quotes are multiples of 1/4/32 = 0.0078125, and there are usually 3 digits before the decimal point. A real-world example is 109.8828125 (TUZ4 settlement price on 12/04/2014). Saving a DataFrame with this number to MySQL results in the rounded 109.883. The error itself is not huge (2.4% of a tick), but still undesirable, especially considering using MySQL DOUBLE or DECIMAL can easily improve the precision.

Currently a workaround is to modify pandas/io/sql.py to map Python float to MySQL FLOAT(53), which forces the use of DOUBLE. There are two places to change, one inside method definition SQLTable._sqlalchemy_type(self, col), the other inside dictionary definition _SQL_TYPES. This work around will waste storage when only single precision is needed. It will be better if some option is provided to utilize MySQL DOUBLE and DECIMAL only when needed.

@jorisvandenbossche jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Dec 5, 2014
@jorisvandenbossche
Copy link
Member

@zhopan77 Thanks for the report!

There was recently introduced a new feature that allows you to override the default chosen type using the dtype kwarg in to_sql (will be released in 0.15.2 in one or two weeks). See #8926.
So that should solve the immediate possibility to change the default (without changing the _sqlalchemy_type or _SQL_TYPES).

But regarding that default. For the mysql legacy SQL_TYPES, I am not going to change it anymore, as it is deprecated anyways. But for the sqlalchemy mode, I think we should use something by default that does not result in data loss.
Can you give a small example? Because eg with posgresql, FLOAT does mean double precision (and if you write a table, it also use the postgresql double precision type)

@jorisvandenbossche
Copy link
Member

Example with mysql:

In [57]: df = pd.DataFrame(np.random.randn(5, 5), columns=list('abcde'))

In [58]: df.to_sql('test_float', engine_pymysql, index=False, if_exists='replace')

In [59]: res = pd.read_sql_table('test_float', engine_pymysql)

In [60]: df.iloc[0,0]
Out[60]: -0.50432201678449218

In [61]: res.iloc[0,0]
Out[61]: -0.50432200000000005

In [62]: from sqlalchemy.dialects.mysql import DOUBLE

In [63]: df.to_sql('test_float', engine_pymysql, index=False, if_exists='replace', dtype={'a':DOUBLE})

In [64]: res = pd.read_sql_table('test_float', engine_pymysql)

In [65]: res.iloc[0,0]
Out[65]: -0.50432201679999999

It is a bit annoying that the Float sqlalchemy type means something different for differen flavors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants