Skip to content

DOC: expand docs on sql type conversion #9038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 25 additions & 12 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3393,12 +3393,34 @@ the database using :func:`~pandas.DataFrame.to_sql`.

data.to_sql('data', engine)

With some databases, writing large DataFrames can result in errors due to packet size limitations being exceeded. This can be avoided by setting the ``chunksize`` parameter when calling ``to_sql``. For example, the following writes ``data`` to the database in batches of 1000 rows at a time:
With some databases, writing large DataFrames can result in errors due to
packet size limitations being exceeded. This can be avoided by setting the
``chunksize`` parameter when calling ``to_sql``. For example, the following
writes ``data`` to the database in batches of 1000 rows at a time:

.. ipython:: python

data.to_sql('data_chunked', engine, chunksize=1000)

SQL data types
""""""""""""""

:func:`~pandas.DataFrame.to_sql` will try to map your data to an appropriate
SQL data type based on the dtype of the data. When you have columns of dtype
``object``, pandas will try to infer the data type.

You can always override the default type by specifying the desired SQL type of
any of the columns by using the ``dtype`` argument. This argument needs a
dictionary mapping column names to SQLAlchemy types (or strings for the sqlite3
fallback mode).
For example, specifying to use the sqlalchemy ``String`` type instead of the
default ``Text`` type for string columns:

.. ipython:: python

from sqlalchemy.types import String
data.to_sql('data_dtype', engine, dtype={'Col_1': String})

.. note::

Due to the limited support for timedelta's in the different database
Expand All @@ -3413,15 +3435,6 @@ With some databases, writing large DataFrames can result in errors due to packet
Because of this, reading the database table back in does **not** generate
a categorical.

.. note::

You can specify the SQL type of any of the columns by using the dtypes
parameter (a dictionary mapping column names to SQLAlchemy types). This
can be useful in cases where columns with NULL values are inferred by
Pandas to an excessively general datatype (e.g. a boolean column is is
inferred to be object because it has NULLs).


Reading Tables
~~~~~~~~~~~~~~

Expand Down Expand Up @@ -3782,11 +3795,11 @@ is lost when exporting.

*Stata* only supports string value labels, and so ``str`` is called on the
categories when exporting data. Exporting ``Categorical`` variables with
non-string categories produces a warning, and can result a loss of
non-string categories produces a warning, and can result a loss of
information if the ``str`` representations of the categories are not unique.

Labeled data can similarly be imported from *Stata* data files as ``Categorical``
variables using the keyword argument ``convert_categoricals`` (``True`` by default).
variables using the keyword argument ``convert_categoricals`` (``True`` by default).
The keyword argument ``order_categoricals`` (``True`` by default) determines
whether imported ``Categorical`` variables are ordered.

Expand Down
11 changes: 10 additions & 1 deletion doc/source/whatsnew/v0.15.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,16 @@ API changes
Enhancements
~~~~~~~~~~~~

- Added the ability to specify the SQL type of columns when writing a DataFrame to a database (:issue:`8778`).
- Added the ability to specify the SQL type of columns when writing a DataFrame
to a database (:issue:`8778`).
For example, specifying to use the sqlalchemy ``String`` type instead of the
default ``Text`` type for string columns:

.. code-block::

from sqlalchemy.types import String
data.to_sql('data_dtype', engine, dtype={'Col_1': String})

- Added ability to export Categorical data to Stata (:issue:`8633`). See :ref:`here <io.stata-categorical>` for limitations of categorical variables exported to Stata data files.
- Added ability to export Categorical data to to/from HDF5 (:issue:`7621`). Queries work the same as if it was an object array. However, the ``category`` dtyped data is stored in a more efficient manner. See :ref:`here <io.hdf5-categorical>` for an example and caveats w.r.t. prior versions of pandas.
- Added support for ``searchsorted()`` on `Categorical` class (:issue:`8420`).
Expand Down
7 changes: 4 additions & 3 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -954,8 +954,9 @@ def to_sql(self, name, con, flavor='sqlite', schema=None, if_exists='fail',
chunksize : int, default None
If not None, then rows will be written in batches of this size at a
time. If None, all rows will be written at once.
dtype : Dictionary of column name to SQLAlchemy type, default None
Optional datatypes for SQL columns.
dtype : dict of column name to SQL type, default None
Optional specifying the datatype for columns. The SQL type should
be a SQLAlchemy type, or a string for sqlite3 fallback connection.

"""
from pandas.io import sql
Expand Down Expand Up @@ -4128,7 +4129,7 @@ def func(self, axis=None, dtype=None, out=None, skipna=True,

y = _values_from_object(self).copy()

if skipna and issubclass(y.dtype.type,
if skipna and issubclass(y.dtype.type,
(np.datetime64, np.timedelta64)):
result = accum_func(y, axis)
mask = isnull(self)
Expand Down
15 changes: 9 additions & 6 deletions pandas/io/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -518,8 +518,9 @@ def to_sql(frame, name, con, flavor='sqlite', schema=None, if_exists='fail',
chunksize : int, default None
If not None, then rows will be written in batches of this size at a
time. If None, all rows will be written at once.
dtype : dictionary of column name to SQLAchemy type, default None
optional datatypes for SQL columns.
dtype : dict of column name to SQL type, default None
Optional specifying the datatype for columns. The SQL type should
be a SQLAlchemy type, or a string for sqlite3 fallback connection.

"""
if if_exists not in ('fail', 'replace', 'append'):
Expand Down Expand Up @@ -1133,8 +1134,9 @@ def to_sql(self, frame, name, if_exists='fail', index=True,
chunksize : int, default None
If not None, then rows will be written in batches of this size at a
time. If None, all rows will be written at once.
dtype : dictionary of column name to SQLAlchemy type, default None
Optional datatypes for SQL columns.
dtype : dict of column name to SQL type, default None
Optional specifying the datatype for columns. The SQL type should
be a SQLAlchemy type.

"""
if dtype is not None:
Expand Down Expand Up @@ -1468,8 +1470,9 @@ def to_sql(self, frame, name, if_exists='fail', index=True,
chunksize : int, default None
If not None, then rows will be written in batches of this
size at a time. If None, all rows will be written at once.
dtype : dictionary of column_name to SQLite string type, default None
optional datatypes for SQL columns.
dtype : dict of column name to SQL type, default None
Optional specifying the datatype for columns. The SQL type should
be a string.

"""
if dtype is not None:
Expand Down