You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When saving DataFrame to MySQL, Pandas will map Python float (by default double precision) to MySQL FLOAT (by default single precision). I couldn't find a way to either specify the use of MySQL DOUBLE, or MySQL DECIMAL.
When saving some financial data this will cause loss of precision. For example, 2-year treasury future price quotes are multiples of 1/4/32 = 0.0078125, and there are usually 3 digits before the decimal point. A real-world example is 109.8828125 (TUZ4 settlement price on 12/04/2014). Saving a DataFrame with this number to MySQL results in the rounded 109.883. The error itself is not huge (2.4% of a tick), but still undesirable, especially considering using MySQL DOUBLE or DECIMAL can easily improve the precision.
Currently a workaround is to modify pandas/io/sql.py to map Python float to MySQL FLOAT(53), which forces the use of DOUBLE. There are two places to change, one inside method definition SQLTable._sqlalchemy_type(self, col), the other inside dictionary definition _SQL_TYPES. This work around will waste storage when only single precision is needed. It will be better if some option is provided to utilize MySQL DOUBLE and DECIMAL only when needed.
The text was updated successfully, but these errors were encountered:
There was recently introduced a new feature that allows you to override the default chosen type using the dtype kwarg in to_sql (will be released in 0.15.2 in one or two weeks). See #8926.
So that should solve the immediate possibility to change the default (without changing the _sqlalchemy_type or _SQL_TYPES).
But regarding that default. For the mysql legacy SQL_TYPES, I am not going to change it anymore, as it is deprecated anyways. But for the sqlalchemy mode, I think we should use something by default that does not result in data loss.
Can you give a small example? Because eg with posgresql, FLOAT does mean double precision (and if you write a table, it also use the postgresql double precision type)
In [57]: df = pd.DataFrame(np.random.randn(5, 5), columns=list('abcde'))
In [58]: df.to_sql('test_float', engine_pymysql, index=False, if_exists='replace')
In [59]: res = pd.read_sql_table('test_float', engine_pymysql)
In [60]: df.iloc[0,0]
Out[60]: -0.50432201678449218
In [61]: res.iloc[0,0]
Out[61]: -0.50432200000000005
In [62]: from sqlalchemy.dialects.mysql import DOUBLE
In [63]: df.to_sql('test_float', engine_pymysql, index=False, if_exists='replace', dtype={'a':DOUBLE})
In [64]: res = pd.read_sql_table('test_float', engine_pymysql)
In [65]: res.iloc[0,0]
Out[65]: -0.50432201679999999
It is a bit annoying that the Float sqlalchemy type means something different for differen flavors.
When saving DataFrame to MySQL, Pandas will map Python float (by default double precision) to MySQL FLOAT (by default single precision). I couldn't find a way to either specify the use of MySQL DOUBLE, or MySQL DECIMAL.
When saving some financial data this will cause loss of precision. For example, 2-year treasury future price quotes are multiples of 1/4/32 = 0.0078125, and there are usually 3 digits before the decimal point. A real-world example is 109.8828125 (TUZ4 settlement price on 12/04/2014). Saving a DataFrame with this number to MySQL results in the rounded 109.883. The error itself is not huge (2.4% of a tick), but still undesirable, especially considering using MySQL DOUBLE or DECIMAL can easily improve the precision.
Currently a workaround is to modify pandas/io/sql.py to map Python float to MySQL FLOAT(53), which forces the use of DOUBLE. There are two places to change, one inside method definition SQLTable._sqlalchemy_type(self, col), the other inside dictionary definition _SQL_TYPES. This work around will waste storage when only single precision is needed. It will be better if some option is provided to utilize MySQL DOUBLE and DECIMAL only when needed.
The text was updated successfully, but these errors were encountered: