No way to specify higher precision (e.g. Double) when saving DataFrame with floating number to MySQL #9009

zhopan77 · 2014-12-05T03:01:25Z

When saving DataFrame to MySQL, Pandas will map Python float (by default double precision) to MySQL FLOAT (by default single precision). I couldn't find a way to either specify the use of MySQL DOUBLE, or MySQL DECIMAL.

When saving some financial data this will cause loss of precision. For example, 2-year treasury future price quotes are multiples of 1/4/32 = 0.0078125, and there are usually 3 digits before the decimal point. A real-world example is 109.8828125 (TUZ4 settlement price on 12/04/2014). Saving a DataFrame with this number to MySQL results in the rounded 109.883. The error itself is not huge (2.4% of a tick), but still undesirable, especially considering using MySQL DOUBLE or DECIMAL can easily improve the precision.

Currently a workaround is to modify pandas/io/sql.py to map Python float to MySQL FLOAT(53), which forces the use of DOUBLE. There are two places to change, one inside method definition SQLTable._sqlalchemy_type(self, col), the other inside dictionary definition _SQL_TYPES. This work around will waste storage when only single precision is needed. It will be better if some option is provided to utilize MySQL DOUBLE and DECIMAL only when needed.

jorisvandenbossche · 2014-12-05T08:46:48Z

@zhopan77 Thanks for the report!

There was recently introduced a new feature that allows you to override the default chosen type using the dtype kwarg in to_sql (will be released in 0.15.2 in one or two weeks). See #8926.
So that should solve the immediate possibility to change the default (without changing the _sqlalchemy_type or _SQL_TYPES).

But regarding that default. For the mysql legacy SQL_TYPES, I am not going to change it anymore, as it is deprecated anyways. But for the sqlalchemy mode, I think we should use something by default that does not result in data loss.
Can you give a small example? Because eg with posgresql, FLOAT does mean double precision (and if you write a table, it also use the postgresql double precision type)

jorisvandenbossche · 2014-12-05T09:32:36Z

Example with mysql:

In [57]: df = pd.DataFrame(np.random.randn(5, 5), columns=list('abcde'))

In [58]: df.to_sql('test_float', engine_pymysql, index=False, if_exists='replace')

In [59]: res = pd.read_sql_table('test_float', engine_pymysql)

In [60]: df.iloc[0,0]
Out[60]: -0.50432201678449218

In [61]: res.iloc[0,0]
Out[61]: -0.50432200000000005

In [62]: from sqlalchemy.dialects.mysql import DOUBLE

In [63]: df.to_sql('test_float', engine_pymysql, index=False, if_exists='replace', dtype={'a':DOUBLE})

In [64]: res = pd.read_sql_table('test_float', engine_pymysql)

In [65]: res.iloc[0,0]
Out[65]: -0.50432201679999999

It is a bit annoying that the Float sqlalchemy type means something different for differen flavors.

jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Dec 5, 2014

artemyk mentioned this issue Dec 8, 2014

ENH: Store in SQL using double precision #9041

Merged

jorisvandenbossche closed this as completed in #9041 Jan 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No way to specify higher precision (e.g. Double) when saving DataFrame with floating number to MySQL #9009

No way to specify higher precision (e.g. Double) when saving DataFrame with floating number to MySQL #9009

zhopan77 commented Dec 5, 2014

jorisvandenbossche commented Dec 5, 2014

jorisvandenbossche commented Dec 5, 2014

No way to specify higher precision (e.g. Double) when saving DataFrame with floating number to MySQL #9009

No way to specify higher precision (e.g. Double) when saving DataFrame with floating number to MySQL #9009

Comments

zhopan77 commented Dec 5, 2014

jorisvandenbossche commented Dec 5, 2014

jorisvandenbossche commented Dec 5, 2014