ENH: Store in SQL using double precision #9041

artemyk · 2014-12-08T08:17:20Z

Closes #9009 . @jorisvandenbossche -- do you think this should test/use single precision ? We could test for float32 perhaps? My own feeling is that this is the safer default.

jorisvandenbossche · 2014-12-12T13:27:01Z

We should indeed be safe by default, so opt for what you did now.
But maybe we can do a refinement now at once? For example, distuinguish float32/float64 dtype and give appropriate sqlalchemy type. Also int32/int64 is a possibility.

artemyk · 2014-12-12T22:37:41Z

Unless I'm missing something, it's actually surprisingly tricky to differentiate between float32 / float64 in a robust manner. We could check dtype, but that wouldn't catch object-typed series w. NAs. inference.pyx, which we use, doesn't return this information (just returns 'floating'), nor does util.pxd (which it calls), nor does numpy_helper.h (which that calls) . It's surely doable to add those functions but that almost seems like a separate PR.

jorisvandenbossche · 2014-12-22T13:42:38Z

I wouldn't worry about the object dtype. We are already doing the extra of inferring floating data in object data (which I already regard as a nice extra but not essential feature, as users should ensure their data have the correct dtype, and there are functions to do this like DataFrame.convert_objects).

So I would just check the dtype, when there are 'floating' data, and use the general case for object dtype.

artemyk · 2014-12-22T19:57:13Z

@jorisvandenbossche OK, added checks for float32 and int32

artemyk · 2014-12-22T20:51:05Z

Also made a couple of small changes that I though were logical while playing w. this --- passing through dtype argument in to_sql and get_schema, and changed the default floating MYSQL type to be DOUBLE (not SINGLE).

jorisvandenbossche · 2014-12-23T10:12:15Z

pandas/io/sql.py

+                is_type_class = isinstance(my_type, type) and \
+                                issubclass(my_type, types.TypeEngine)
+                is_type_obj   = isinstance(my_type, types.TypeEngine)
+                if not is_type_class and not is_type_obj:


Can you maybe leave this out?
In any case, this is to fix #9083, so it should also have a test associated with it. And I also already have a patch for this (a bit cleaner I think, it's what I mentioned in #9083), but not yet pushed it as a PR. Sorry for not doing that directly, I will do that now.

artemyk · 2014-12-23T21:55:27Z

I made your suggested changes.

I copied and pasted your fix from #9138 for now --- I can remove and rebase once you have merged.

Also, I can get the precision to go up to 14 digits. However, to get this I needed to insert some special code into get_table. Specifically, by default SQLAlchemy reads double-precision numbers into a 'Decimal' type w. 10 digits of precision. See http://stackoverflow.com/questions/19387882/how-to-make-sqlalchemy-return-float-instead-of-decimal-when-reflecting-tables .

jorisvandenbossche · 2014-12-24T10:42:59Z

pandas/io/tests/test_sql.py

+        dtype = sqlalchemy.Float if self.mode == 'sqlalchemy' else 'REAL'
+        create_sql = sql.get_schema(self.test_frame1, 'test', 'sqlite',
+                                    con=self.conn, dtype={'D':dtype})
+        self.assertTrue('CREATE' in create_sql)


maybe just add here a check that the Float/Real is in the schema

type = 'FLOAT' if self.mode == 'sqlalchemy' else 'REAL' self.assertTrue(type in create_sql)

but maybe make a custom df with only ints, so we are sure the float is there because of the dtype argument and not because it was already there (because now, the schema will not be different with or without using the dtype argument)

jorisvandenbossche · 2014-12-24T10:44:43Z

@artemyk regarding the precision issue (because of the decimal use). I think that looks OK to do (as we should ensure the full precision is available), but I was wondering if this has performance implications?

jorisvandenbossche · 2014-12-24T16:47:37Z

@artemyk I merged the other PR, so you can rebase this one

artemyk · 2014-12-24T16:56:16Z

@jorisvandenbossche OK, rebased / upgraded test.
Re performance implications --- I don't think so, it just makes sqlalchemy skip a conversion to Decimal. It's more inline with what happens if we do sqlalchemyengine.execute('SELECT * FROM tbl') (in that case, FLOATs don't get converted to Decimal objects)

jorisvandenbossche · 2014-12-24T16:57:25Z

OK, do you know the reason sqlalchemy does this by default? (just wondering)

jorisvandenbossche · 2014-12-24T17:08:35Z

pandas/io/sql.py

+
+        # Avoid casting double-precision floats into decimals
+        from sqlalchemy import Numeric
+        for column in tbl.columns.values():


small 'style' comment: the .values is not needed here, you can just iterate through the tbl.columns

artemyk · 2014-12-24T17:18:13Z

Good point about .values .
Re asdecimal, here's what I found on http://docs.sqlalchemy.org/en/rel_0_9/core/types.html :

When using the Numeric type, care should be taken to ensure that the asdecimal setting is apppropriate for the DBAPI in use - when Numeric applies a conversion from Decimal->float or float-> Decimal, this conversion incurs an additional performance overhead for all result columns received.
DBAPIs that return Decimal natively (e.g. psycopg2) will have better accuracy and higher performance with a setting of True, as the native translation to Decimal reduces the amount of floating- point issues at play, and the Numeric type itself doesn’t need to apply any further conversions. However, another DBAPI which returns floats natively will incur an additional conversion overhead, and is still subject to floating point data loss - in which case asdecimal=False will at least remove the extra conversion overhead.

jorisvandenbossche · 2014-12-24T23:30:09Z

@artemyk I justed tested it with postgresql/psycopg2, and it seems that there, asdecimal is already False for me when using DOUBLE PRECISION(53) (which contrasts a bit your quote). So of course, there is no performance difference in that case.
But in any case, I suppose the check can do no harm?

artemyk · 2014-12-25T00:06:53Z

@jorisvandenbossche I'm now wondering --- do you think its possible there is a DECIMAL-type datatype in , say, postgres, that can store arbitrary precision decimals? And that the code we have here would force conversion to Float, thereby loosing precision?

artemyk · 2015-01-20T20:59:13Z

@jorisvandenbossche Can we merge?

jorisvandenbossche · 2015-01-21T23:04:00Z

@artemyk Yes! (sorry for dropping the ball here)

Can you just rebase?

Better test Style fix Whats new fix

artemyk · 2015-01-24T18:52:38Z

@jorisvandenbossche Rebased...

ENH: Store in SQL using double precision

jorisvandenbossche · 2015-01-25T18:01:21Z

@artemyk Thanks for the work on this, and really for letting this take so long!

jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Dec 12, 2014

jorisvandenbossche added this to the 0.16.0 milestone Dec 12, 2014

artemyk force-pushed the doubleprecisionsql branch from 901c605 to bfc1ee2 Compare December 22, 2014 19:50

artemyk force-pushed the doubleprecisionsql branch 2 times, most recently from 0503295 to a8ba2f2 Compare December 22, 2014 20:46

jorisvandenbossche reviewed Dec 23, 2014
View reviewed changes

jorisvandenbossche mentioned this pull request Dec 23, 2014

FIX: to_sql dtype argument accepting SQLAlchemy type instance (GH9083) #9138

Merged

artemyk force-pushed the doubleprecisionsql branch 3 times, most recently from 1727541 to 98cb55d Compare December 24, 2014 01:22

jorisvandenbossche reviewed Dec 24, 2014
View reviewed changes

artemyk force-pushed the doubleprecisionsql branch from 98cb55d to 9503c79 Compare December 24, 2014 16:48

artemyk force-pushed the doubleprecisionsql branch from 0907c07 to 8836ab2 Compare December 24, 2014 16:56

jorisvandenbossche reviewed Dec 24, 2014
View reviewed changes

artemyk force-pushed the doubleprecisionsql branch from 8836ab2 to 1e36981 Compare December 24, 2014 17:14

artemyk force-pushed the doubleprecisionsql branch 2 times, most recently from 65f0253 to e5b7cb7 Compare January 24, 2015 18:50

ENH: Store in SQL using double precision

bb0c2e8

Better test Style fix Whats new fix

artemyk force-pushed the doubleprecisionsql branch from e5b7cb7 to bb0c2e8 Compare January 24, 2015 18:51

jorisvandenbossche added a commit that referenced this pull request Jan 25, 2015

Merge pull request #9041 from artemyk/doubleprecisionsql

3fbc1ee

ENH: Store in SQL using double precision

jorisvandenbossche merged commit 3fbc1ee into pandas-dev:master Jan 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Store in SQL using double precision #9041

ENH: Store in SQL using double precision #9041

artemyk commented Dec 8, 2014

jorisvandenbossche commented Dec 12, 2014

artemyk commented Dec 12, 2014

jorisvandenbossche commented Dec 22, 2014

artemyk commented Dec 22, 2014

artemyk commented Dec 22, 2014

jorisvandenbossche Dec 23, 2014

artemyk commented Dec 23, 2014

jorisvandenbossche Dec 24, 2014

jorisvandenbossche commented Dec 24, 2014

jorisvandenbossche commented Dec 24, 2014

artemyk commented Dec 24, 2014

jorisvandenbossche commented Dec 24, 2014

jorisvandenbossche Dec 24, 2014

artemyk commented Dec 24, 2014

jorisvandenbossche commented Dec 24, 2014

artemyk commented Dec 25, 2014

artemyk commented Jan 20, 2015

jorisvandenbossche commented Jan 21, 2015

artemyk commented Jan 24, 2015

jorisvandenbossche commented Jan 25, 2015

ENH: Store in SQL using double precision #9041

ENH: Store in SQL using double precision #9041

Conversation

artemyk commented Dec 8, 2014

jorisvandenbossche commented Dec 12, 2014

artemyk commented Dec 12, 2014

jorisvandenbossche commented Dec 22, 2014

artemyk commented Dec 22, 2014

artemyk commented Dec 22, 2014

jorisvandenbossche Dec 23, 2014

Choose a reason for hiding this comment

artemyk commented Dec 23, 2014

jorisvandenbossche Dec 24, 2014

Choose a reason for hiding this comment

jorisvandenbossche commented Dec 24, 2014

jorisvandenbossche commented Dec 24, 2014

artemyk commented Dec 24, 2014

jorisvandenbossche commented Dec 24, 2014

jorisvandenbossche Dec 24, 2014

Choose a reason for hiding this comment

artemyk commented Dec 24, 2014

jorisvandenbossche commented Dec 24, 2014

artemyk commented Dec 25, 2014

artemyk commented Jan 20, 2015

jorisvandenbossche commented Jan 21, 2015

artemyk commented Jan 24, 2015

jorisvandenbossche commented Jan 25, 2015