ENH: dtype costumization on to_sql (GH8778) #8926

tiagoantao · 2014-11-29T13:33:44Z

This is the proposed general gist of the changes. My ad-hoc testing suggests that this might work. If this is an acceptible design, I will proceed to make the formal test and change the docs (including docstrings)

Closes #8778

jorisvandenbossche · 2014-11-29T13:41:50Z

@tiagoantao I think in general this is a good approach. Further needs:

tests
docstring + documentation
release note (look at /doc/source/whatsnew/v0.15.2.txt)

jorisvandenbossche · 2014-11-29T13:43:07Z

pandas/core/generic.py

@@ -922,7 +922,7 @@ def to_msgpack(self, path_or_buf=None, **kwargs):
        return packers.to_msgpack(path_or_buf, self, **kwargs)

    def to_sql(self, name, con, flavor='sqlite', schema=None, if_exists='fail',
-               index=True, index_label=None, chunksize=None):
+               index=True, index_label=None, chunksize=None, dtypes={}):


I think a default of None would be better

jorisvandenbossche · 2014-11-29T13:53:28Z

We should also check that the user provides a SQLAlchemy type and not something else for now I think

jorisvandenbossche · 2014-11-29T13:55:37Z

@mangecoeur @hayd @artemyk OK with this addition?

jorisvandenbossche · 2014-11-29T15:04:41Z

pandas/core/generic.py

@@ -954,12 +954,15 @@ def to_sql(self, name, con, flavor='sqlite', schema=None, if_exists='fail',
        chunksize : int, default None
            If not None, then rows will be written in batches of this size at a
            time.  If None, all rows will be written at once.
+        dtypes: optional datatypes for SQL columns (dictionary with


there should be a space before the colon (sphinx formatting is rather strict)

and the same at all other places

artemyk · 2014-11-29T18:50:22Z

@tiagoantao @jorisvandenbossche I realize this is late in the game, but just a short thought --- this whole situation (to_sql on a DataFrame with missing data) seems common enough that perhaps we should handle it without the user having to pass in their own datatypes. What if instead of having to specify the types as in this PR, we iterated over the columns with dtype==object, and did pandas.core.common._infer_dtype_from_scalar on their first non-null entry?

jorisvandenbossche · 2014-11-29T23:09:46Z

@artemyk That is maybe a worthwile thing to look at for that specific issue, but apart from that I think the functionality in this PR is also useful for other things? (so maybe just not say it closes that specific nan issue).
Eg #7957 could also use this PR to at least have it possible to solve the problem by the user for now.

jorisvandenbossche · 2014-11-29T23:13:00Z

continuing the discussion on the approach proposed by @artemyk in the issue: #8778 (comment)

artemyk · 2014-11-30T00:36:22Z

@jorisvandenbossche That makes sense how dtypes would be useful for various edge-cases.
@tiagoantao @jorisvandenbossche One minor comment -- unless I'm missing something, it seems like dtypes is not passed along in SQLiteDatabase.to_sql. If dtypes is not implemented for SQLite (ie legacy) class, then I think the documentation should reflect this, and probably an exception should be thrown if dtypes are passed in.

tiagoantao · 2014-11-30T10:19:44Z

I am taking care of implemeting this on SQLiteDatabase

tiagoantao · 2014-11-30T10:56:38Z

I hope this sorts out the issue with SQLite

jorisvandenbossche · 2014-11-30T11:01:10Z

pandas/io/sql.py

+            for col, my_type in dtypes.items():
+                if not issubclass(my_type, type_api.TypeEngine):
+                    raise ValueError('The type of %s is not a SQLAlchemy '
+                                      'type' % col)


I would put this check in SQLTable

jorisvandenbossche · 2014-11-30T11:45:34Z

On the name of the argument, I was thinking it should be dtype instead of dtypes (for consistency with eg read_csv)

But is the name OK? Because it aren't exactly 'dtype's you specify, but sql types? (@jreback)

jorisvandenbossche · 2014-11-30T11:49:49Z

pandas/core/generic.py

@@ -954,12 +954,15 @@ def to_sql(self, name, con, flavor='sqlite', schema=None, if_exists='fail',
        chunksize : int, default None
            If not None, then rows will be written in batches of this size at a
            time.  If None, all rows will be written at once.
+        dtypes : optional datatypes for SQL columns (dictionary with


forgot to adapt the docstring here I think

jorisvandenbossche · 2014-11-30T11:51:03Z

pandas/io/sql.py

+            for col, my_type in dtypes.items():
+                if my_type not in _SQL_TYPES.keys():
+                    raise ValueError('%s (%s) not a SQLite Type' %
+                                     (col, my_type))


I don't think we should do this check here, as actually SQLite is very liberal in what it accepts as a type name (actually everything). So maybe we should just check that is has to be a string?

jreback · 2014-11-30T23:39:37Z

@jorisvandenbossche I think dtype is ok, this is on the to_sql so not confusing here.

jorisvandenbossche · 2014-11-30T23:53:51Z

OK, Tiago, can you:

dtypes -> dtype
squash
change the bare assert (see https://github.com/pydata/pandas/pull/8926/files#r21059649)
the test failure is for MySQL, because mysql has no boolean type, mysqlalchemy converts to Boolean into a TINYINT for mysql, so the check fails. You can do a custom test depending on the flavor (like for example: https://github.com/pydata/pandas/blob/master/pandas/io/tests/test_sql.py#L1457)

and then I think it should be ready to merge!

tiagoantao · 2014-12-01T13:57:22Z

I have done my first rebase (quite ridiculous, but I have never done it in the past), so I am not sure if this was correctly done, could you please have a look and advise? All the rest is, hopefully, done.

jorisvandenbossche · 2014-12-01T14:16:14Z

@tiagoantao Not fully I think. You have a squashed commit here, but also still the old ones. Normally simply doing following steps should resolve it:

git fetch upstream
git rebase -i upstream/master  # and then change the 'pick' into 'squash' or 'fixup' for all your commits apart from the first one
git push -f origin

If you have problems with it, just ask!

tiagoantao · 2014-12-01T14:42:34Z

I should have forced the push. I hope this will work now.

Thanks for all your help and patience with this.

jorisvandenbossche · 2014-12-01T14:44:59Z

Yep, that is better!
I will do a final review later, but it is looking good.

jorisvandenbossche · 2014-12-02T09:06:02Z

pandas/io/sql.py

@@ -857,7 +860,7 @@ def _harmonize_columns(self, parse_dates=None):
                col_type = self._numpy_type(sql_col.type)

                if col_type is datetime or col_type is date:
-                    if not issubclass(df_col.dtype.type, np.datetime64):
+                    if not isinstance(df_col.dtype.type, np.datetime64):


Is there a reason for this change? (because if so, better to do in a separate commit + a test for the case this change was needed)

I was in auto-mode looking for this on my patch. This was completely
unintended. Sorry. Will correct.

testing dtypes parameter dtypes defaults to None dtype type checking and docstrings dtype exception checking sphinx dtypes corrections if/else to or simplification informative exception of errouneous SQLAlchemy subclassing type checking basic documentation of the dtypes feature issue number correct test position issue correction SQLite dtype configuration Testing Legagy SQLite with dtype configuration changed the position of a dtype check assert_raise assert_raise return user specified dtype, not SQL_TYPE test cleanup better docstrings better docstrings docs and test refactoring Do not test on MySQL legacy dtypes->dtype dtypes->dtype assert->assertTrue Type test in mysql correct mysql test type reverting unintended change

ENH: dtype costumization on to_sql (GH8778)

jorisvandenbossche · 2014-12-02T23:31:44Z

@tiagoantao Thanks a lot for this nice PR!

jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Nov 29, 2014

jorisvandenbossche added this to the 0.15.2 milestone Nov 29, 2014

jorisvandenbossche reviewed Nov 29, 2014
View reviewed changes

jorisvandenbossche changed the title ~~dtype costumization on sql read_table~~ ENH: dtype costumization on to_sql (GH8778) Nov 29, 2014

jorisvandenbossche reviewed Nov 29, 2014
View reviewed changes

jreback added Dtype Conversions Unexpected or buggy dtype conversions Enhancement labels Nov 29, 2014

jorisvandenbossche mentioned this pull request Nov 29, 2014

problem with to_sql with NA #8778

Closed

jorisvandenbossche reviewed Nov 30, 2014
View reviewed changes

jorisvandenbossche reviewed Dec 2, 2014
View reviewed changes

jorisvandenbossche added a commit that referenced this pull request Dec 2, 2014

Merge pull request #8926 from tiagoantao/master

dd670e1

ENH: dtype costumization on to_sql (GH8778)

jorisvandenbossche merged commit dd670e1 into pandas-dev:master Dec 2, 2014

This was referenced Dec 2, 2014

DataFrame.to_sql generated text field that could not be used by "group by". (MS sql) #7957

Open

No way to specify higher precision (e.g. Double) when saving DataFrame with floating number to MySQL #9009

Closed

DOC: expand docs on sql type conversion #9038

Merged

This was referenced Dec 15, 2014

SQL: test with oldest supported sqlalchemy #9087

Closed

BUG: error in handling a sqlalchemy type with arguments (instantiated type, not class) #9083

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: dtype costumization on to_sql (GH8778) #8926

ENH: dtype costumization on to_sql (GH8778) #8926

tiagoantao commented Nov 29, 2014

jorisvandenbossche commented Nov 29, 2014

jorisvandenbossche Nov 29, 2014

jorisvandenbossche commented Nov 29, 2014

jorisvandenbossche commented Nov 29, 2014

jorisvandenbossche Nov 29, 2014

jorisvandenbossche Nov 29, 2014

artemyk commented Nov 29, 2014

jorisvandenbossche commented Nov 29, 2014

jorisvandenbossche commented Nov 29, 2014

artemyk commented Nov 30, 2014

tiagoantao commented Nov 30, 2014

tiagoantao commented Nov 30, 2014

jorisvandenbossche Nov 30, 2014

jorisvandenbossche commented Nov 30, 2014

jorisvandenbossche Nov 30, 2014

jorisvandenbossche Nov 30, 2014

jreback commented Nov 30, 2014

jorisvandenbossche commented Nov 30, 2014

tiagoantao commented Dec 1, 2014

jorisvandenbossche commented Dec 1, 2014

tiagoantao commented Dec 1, 2014

jorisvandenbossche commented Dec 1, 2014

jorisvandenbossche Dec 2, 2014

tiagoantao Dec 2, 2014

jorisvandenbossche commented Dec 2, 2014

ENH: dtype costumization on to_sql (GH8778) #8926

ENH: dtype costumization on to_sql (GH8778) #8926

Conversation

tiagoantao commented Nov 29, 2014

jorisvandenbossche commented Nov 29, 2014

Choose a reason for hiding this comment

jorisvandenbossche commented Nov 29, 2014

jorisvandenbossche commented Nov 29, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

artemyk commented Nov 29, 2014

jorisvandenbossche commented Nov 29, 2014

jorisvandenbossche commented Nov 29, 2014

artemyk commented Nov 30, 2014

tiagoantao commented Nov 30, 2014

tiagoantao commented Nov 30, 2014

Choose a reason for hiding this comment

jorisvandenbossche commented Nov 30, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Nov 30, 2014

jorisvandenbossche commented Nov 30, 2014

tiagoantao commented Dec 1, 2014

jorisvandenbossche commented Dec 1, 2014

tiagoantao commented Dec 1, 2014

jorisvandenbossche commented Dec 1, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Dec 2, 2014