Skip to content

Commit 56b84a0

Browse files
DOC: expand docs on sql type conversion
1 parent 7d13fdd commit 56b84a0

File tree

4 files changed

+48
-22
lines changed

4 files changed

+48
-22
lines changed

doc/source/io.rst

+25-12
Original file line numberDiff line numberDiff line change
@@ -3393,12 +3393,34 @@ the database using :func:`~pandas.DataFrame.to_sql`.
33933393
33943394
data.to_sql('data', engine)
33953395
3396-
With some databases, writing large DataFrames can result in errors due to packet size limitations being exceeded. This can be avoided by setting the ``chunksize`` parameter when calling ``to_sql``. For example, the following writes ``data`` to the database in batches of 1000 rows at a time:
3396+
With some databases, writing large DataFrames can result in errors due to
3397+
packet size limitations being exceeded. This can be avoided by setting the
3398+
``chunksize`` parameter when calling ``to_sql``. For example, the following
3399+
writes ``data`` to the database in batches of 1000 rows at a time:
33973400

33983401
.. ipython:: python
33993402
34003403
data.to_sql('data_chunked', engine, chunksize=1000)
34013404
3405+
SQL data types
3406+
""""""""""""""
3407+
3408+
:func:`~pandas.DataFrame.to_sql` will try to map your data to an appropriate
3409+
SQL data type based on the dtype of the data. When you have columns of dtype
3410+
``object``, pandas will try to infer the data type.
3411+
3412+
You can always override the default type by specifying the desired SQL type of
3413+
any of the columns by using the ``dtype`` argument. This argument needs a
3414+
dictionary mapping column names to SQLAlchemy types (or strings for the sqlite3
3415+
fallback mode).
3416+
For example, specifying to use the sqlalchemy ``String`` type instead of the
3417+
default ``Text`` type for string columns:
3418+
3419+
.. ipython:: python
3420+
3421+
from sqlalchemy.types import String
3422+
data.to_sql('data_dtype', engine, dtype={'Col_1': String})
3423+
34023424
.. note::
34033425

34043426
Due to the limited support for timedelta's in the different database
@@ -3413,15 +3435,6 @@ With some databases, writing large DataFrames can result in errors due to packet
34133435
Because of this, reading the database table back in does **not** generate
34143436
a categorical.
34153437

3416-
.. note::
3417-
3418-
You can specify the SQL type of any of the columns by using the dtypes
3419-
parameter (a dictionary mapping column names to SQLAlchemy types). This
3420-
can be useful in cases where columns with NULL values are inferred by
3421-
Pandas to an excessively general datatype (e.g. a boolean column is is
3422-
inferred to be object because it has NULLs).
3423-
3424-
34253438
Reading Tables
34263439
~~~~~~~~~~~~~~
34273440

@@ -3782,11 +3795,11 @@ is lost when exporting.
37823795

37833796
*Stata* only supports string value labels, and so ``str`` is called on the
37843797
categories when exporting data. Exporting ``Categorical`` variables with
3785-
non-string categories produces a warning, and can result a loss of
3798+
non-string categories produces a warning, and can result a loss of
37863799
information if the ``str`` representations of the categories are not unique.
37873800

37883801
Labeled data can similarly be imported from *Stata* data files as ``Categorical``
3789-
variables using the keyword argument ``convert_categoricals`` (``True`` by default).
3802+
variables using the keyword argument ``convert_categoricals`` (``True`` by default).
37903803
The keyword argument ``order_categoricals`` (``True`` by default) determines
37913804
whether imported ``Categorical`` variables are ordered.
37923805

doc/source/whatsnew/v0.15.2.txt

+10-1
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,16 @@ API changes
9696
Enhancements
9797
~~~~~~~~~~~~
9898

99-
- Added the ability to specify the SQL type of columns when writing a DataFrame to a database (:issue:`8778`).
99+
- Added the ability to specify the SQL type of columns when writing a DataFrame
100+
to a database (:issue:`8778`).
101+
For example, specifying to use the sqlalchemy ``String`` type instead of the
102+
default ``Text`` type for string columns:
103+
104+
.. code-block::
105+
106+
from sqlalchemy.types import String
107+
data.to_sql('data_dtype', engine, dtype={'Col_1': String})
108+
100109
- Added ability to export Categorical data to Stata (:issue:`8633`). See :ref:`here <io.stata-categorical>` for limitations of categorical variables exported to Stata data files.
101110
- Added ability to export Categorical data to to/from HDF5 (:issue:`7621`). Queries work the same as if it was an object array. However, the ``category`` dtyped data is stored in a more efficient manner. See :ref:`here <io.hdf5-categorical>` for an example and caveats w.r.t. prior versions of pandas.
102111
- Added support for ``searchsorted()`` on `Categorical` class (:issue:`8420`).

pandas/core/generic.py

+4-3
Original file line numberDiff line numberDiff line change
@@ -954,8 +954,9 @@ def to_sql(self, name, con, flavor='sqlite', schema=None, if_exists='fail',
954954
chunksize : int, default None
955955
If not None, then rows will be written in batches of this size at a
956956
time. If None, all rows will be written at once.
957-
dtype : Dictionary of column name to SQLAlchemy type, default None
958-
Optional datatypes for SQL columns.
957+
dtype : dict of column name to SQL type, default None
958+
Optional specifying the datatype for columns. The SQL type should
959+
be a SQLAlchemy type, or a string for sqlite3 fallback connection.
959960
960961
"""
961962
from pandas.io import sql
@@ -4128,7 +4129,7 @@ def func(self, axis=None, dtype=None, out=None, skipna=True,
41284129

41294130
y = _values_from_object(self).copy()
41304131

4131-
if skipna and issubclass(y.dtype.type,
4132+
if skipna and issubclass(y.dtype.type,
41324133
(np.datetime64, np.timedelta64)):
41334134
result = accum_func(y, axis)
41344135
mask = isnull(self)

pandas/io/sql.py

+9-6
Original file line numberDiff line numberDiff line change
@@ -518,8 +518,9 @@ def to_sql(frame, name, con, flavor='sqlite', schema=None, if_exists='fail',
518518
chunksize : int, default None
519519
If not None, then rows will be written in batches of this size at a
520520
time. If None, all rows will be written at once.
521-
dtype : dictionary of column name to SQLAchemy type, default None
522-
optional datatypes for SQL columns.
521+
dtype : dict of column name to SQL type, default None
522+
Optional specifying the datatype for columns. The SQL type should
523+
be a SQLAlchemy type, or a string for sqlite3 fallback connection.
523524
524525
"""
525526
if if_exists not in ('fail', 'replace', 'append'):
@@ -1133,8 +1134,9 @@ def to_sql(self, frame, name, if_exists='fail', index=True,
11331134
chunksize : int, default None
11341135
If not None, then rows will be written in batches of this size at a
11351136
time. If None, all rows will be written at once.
1136-
dtype : dictionary of column name to SQLAlchemy type, default None
1137-
Optional datatypes for SQL columns.
1137+
dtype : dict of column name to SQL type, default None
1138+
Optional specifying the datatype for columns. The SQL type should
1139+
be a SQLAlchemy type.
11381140
11391141
"""
11401142
if dtype is not None:
@@ -1468,8 +1470,9 @@ def to_sql(self, frame, name, if_exists='fail', index=True,
14681470
chunksize : int, default None
14691471
If not None, then rows will be written in batches of this
14701472
size at a time. If None, all rows will be written at once.
1471-
dtype : dictionary of column_name to SQLite string type, default None
1472-
optional datatypes for SQL columns.
1473+
dtype : dict of column name to SQL type, default None
1474+
Optional specifying the datatype for columns. The SQL type should
1475+
be a string.
14731476
14741477
"""
14751478
if dtype is not None:

0 commit comments

Comments
 (0)