Skip to content

ENH: dtype costumization on to_sql (GH8778) #8926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 2, 2014
Merged

ENH: dtype costumization on to_sql (GH8778) #8926

merged 1 commit into from
Dec 2, 2014

Conversation

tiagoantao
Copy link
Contributor

This is the proposed general gist of the changes. My ad-hoc testing suggests that this might work. If this is an acceptible design, I will proceed to make the formal test and change the docs (including docstrings)

Closes #8778

@jorisvandenbossche
Copy link
Member

@tiagoantao I think in general this is a good approach. Further needs:

  • tests
  • docstring + documentation
  • release note (look at /doc/source/whatsnew/v0.15.2.txt)

@jorisvandenbossche jorisvandenbossche added the IO SQL to_sql, read_sql, read_sql_query label Nov 29, 2014
@jorisvandenbossche jorisvandenbossche added this to the 0.15.2 milestone Nov 29, 2014
@@ -922,7 +922,7 @@ def to_msgpack(self, path_or_buf=None, **kwargs):
return packers.to_msgpack(path_or_buf, self, **kwargs)

def to_sql(self, name, con, flavor='sqlite', schema=None, if_exists='fail',
index=True, index_label=None, chunksize=None):
index=True, index_label=None, chunksize=None, dtypes={}):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a default of None would be better

@jorisvandenbossche
Copy link
Member

We should also check that the user provides a SQLAlchemy type and not something else for now I think

@jorisvandenbossche jorisvandenbossche changed the title dtype costumization on sql read_table ENH: dtype costumization on to_sql (GH8778) Nov 29, 2014
@jorisvandenbossche
Copy link
Member

@mangecoeur @hayd @artemyk OK with this addition?

@@ -954,12 +954,15 @@ def to_sql(self, name, con, flavor='sqlite', schema=None, if_exists='fail',
chunksize : int, default None
If not None, then rows will be written in batches of this size at a
time. If None, all rows will be written at once.
dtypes: optional datatypes for SQL columns (dictionary with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be a space before the colon (sphinx formatting is rather strict)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the same at all other places

@artemyk
Copy link
Contributor

artemyk commented Nov 29, 2014

@tiagoantao @jorisvandenbossche I realize this is late in the game, but just a short thought --- this whole situation (to_sql on a DataFrame with missing data) seems common enough that perhaps we should handle it without the user having to pass in their own datatypes. What if instead of having to specify the types as in this PR, we iterated over the columns with dtype==object, and did pandas.core.common._infer_dtype_from_scalar on their first non-null entry?

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Enhancement labels Nov 29, 2014
@jorisvandenbossche
Copy link
Member

@artemyk That is maybe a worthwile thing to look at for that specific issue, but apart from that I think the functionality in this PR is also useful for other things? (so maybe just not say it closes that specific nan issue).
Eg #7957 could also use this PR to at least have it possible to solve the problem by the user for now.

@jorisvandenbossche
Copy link
Member

continuing the discussion on the approach proposed by @artemyk in the issue: #8778 (comment)

@artemyk
Copy link
Contributor

artemyk commented Nov 30, 2014

@jorisvandenbossche That makes sense how dtypes would be useful for various edge-cases.
@tiagoantao @jorisvandenbossche One minor comment -- unless I'm missing something, it seems like dtypes is not passed along in SQLiteDatabase.to_sql. If dtypes is not implemented for SQLite (ie legacy) class, then I think the documentation should reflect this, and probably an exception should be thrown if dtypes are passed in.

@tiagoantao
Copy link
Contributor Author

I am taking care of implemeting this on SQLiteDatabase

@tiagoantao
Copy link
Contributor Author

I hope this sorts out the issue with SQLite

for col, my_type in dtypes.items():
if not issubclass(my_type, type_api.TypeEngine):
raise ValueError('The type of %s is not a SQLAlchemy '
'type' % col)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this check in SQLTable

@jorisvandenbossche
Copy link
Member

On the name of the argument, I was thinking it should be dtype instead of dtypes (for consistency with eg read_csv)

But is the name OK? Because it aren't exactly 'dtype's you specify, but sql types? (@jreback)

@@ -954,12 +954,15 @@ def to_sql(self, name, con, flavor='sqlite', schema=None, if_exists='fail',
chunksize : int, default None
If not None, then rows will be written in batches of this size at a
time. If None, all rows will be written at once.
dtypes : optional datatypes for SQL columns (dictionary with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to adapt the docstring here I think

for col, my_type in dtypes.items():
if my_type not in _SQL_TYPES.keys():
raise ValueError('%s (%s) not a SQLite Type' %
(col, my_type))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this check here, as actually SQLite is very liberal in what it accepts as a type name (actually everything). So maybe we should just check that is has to be a string?

@jreback
Copy link
Contributor

jreback commented Nov 30, 2014

@jorisvandenbossche I think dtype is ok, this is on the to_sql so not confusing here.

@jorisvandenbossche
Copy link
Member

OK, Tiago, can you:

and then I think it should be ready to merge!

@tiagoantao
Copy link
Contributor Author

I have done my first rebase (quite ridiculous, but I have never done it in the past), so I am not sure if this was correctly done, could you please have a look and advise? All the rest is, hopefully, done.

@jorisvandenbossche
Copy link
Member

@tiagoantao Not fully I think. You have a squashed commit here, but also still the old ones. Normally simply doing following steps should resolve it:

git fetch upstream
git rebase -i upstream/master  # and then change the 'pick' into 'squash' or 'fixup' for all your commits apart from the first one
git push -f origin

If you have problems with it, just ask!

@tiagoantao
Copy link
Contributor Author

I should have forced the push. I hope this will work now.

Thanks for all your help and patience with this.

@jorisvandenbossche
Copy link
Member

Yep, that is better!
I will do a final review later, but it is looking good.

@@ -857,7 +860,7 @@ def _harmonize_columns(self, parse_dates=None):
col_type = self._numpy_type(sql_col.type)

if col_type is datetime or col_type is date:
if not issubclass(df_col.dtype.type, np.datetime64):
if not isinstance(df_col.dtype.type, np.datetime64):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for this change? (because if so, better to do in a separate commit + a test for the case this change was needed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was in auto-mode looking for this on my patch. This was completely
unintended. Sorry. Will correct.

testing dtypes parameter

dtypes defaults to None

dtype type checking and docstrings

dtype exception checking

sphinx dtypes corrections

if/else to or simplification

informative exception of errouneous SQLAlchemy subclassing

type checking

basic documentation of the dtypes feature

issue number

correct test position

issue correction

SQLite dtype configuration

Testing Legagy SQLite with dtype configuration

changed the position of a dtype check

assert_raise

assert_raise

return user specified dtype, not SQL_TYPE

test cleanup

better docstrings

better docstrings

docs and test refactoring

Do not test on MySQL legacy

dtypes->dtype

dtypes->dtype

assert->assertTrue

Type test in mysql

correct mysql test type

reverting unintended change
jorisvandenbossche added a commit that referenced this pull request Dec 2, 2014
ENH: dtype costumization on to_sql (GH8778)
@jorisvandenbossche jorisvandenbossche merged commit dd670e1 into pandas-dev:master Dec 2, 2014
@jorisvandenbossche
Copy link
Member

@tiagoantao Thanks a lot for this nice PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

problem with to_sql with NA
4 participants