Skip to content

Commit 68cde94

Browse files
committed
Merge remote-tracking branch 'upstream/master' into disown-tz-only-rebased
2 parents 4d3b55e + c1af4f5 commit 68cde94

21 files changed

+380
-109
lines changed

doc/source/io.rst

+48
Original file line numberDiff line numberDiff line change
@@ -4989,6 +4989,54 @@ with respect to the timezone.
49894989
timezone aware or naive. When reading ``TIMESTAMP WITH TIME ZONE`` types, pandas
49904990
will convert the data to UTC.
49914991

4992+
.. _io.sql.method:
4993+
4994+
Insertion Method
4995+
++++++++++++++++
4996+
4997+
.. versionadded:: 0.24.0
4998+
4999+
The parameter ``method`` controls the SQL insertion clause used.
5000+
Possible values are:
5001+
5002+
- ``None``: Uses standard SQL ``INSERT`` clause (one per row).
5003+
- ``'multi'``: Pass multiple values in a single ``INSERT`` clause.
5004+
It uses a *special* SQL syntax not supported by all backends.
5005+
This usually provides better performance for analytic databases
5006+
like *Presto* and *Redshift*, but has worse performance for
5007+
traditional SQL backend if the table contains many columns.
5008+
For more information check the SQLAlchemy `documention
5009+
<http://docs.sqlalchemy.org/en/latest/core/dml.html#sqlalchemy.sql.expression.Insert.values.params.*args>`__.
5010+
- callable with signature ``(pd_table, conn, keys, data_iter)``:
5011+
This can be used to implement a more performant insertion method based on
5012+
specific backend dialect features.
5013+
5014+
Example of a callable using PostgreSQL `COPY clause
5015+
<https://www.postgresql.org/docs/current/static/sql-copy.html>`__::
5016+
5017+
# Alternative to_sql() *method* for DBs that support COPY FROM
5018+
import csv
5019+
from io import StringIO
5020+
5021+
def psql_insert_copy(table, conn, keys, data_iter):
5022+
# gets a DBAPI connection that can provide a cursor
5023+
dbapi_conn = conn.connection
5024+
with dbapi_conn.cursor() as cur:
5025+
s_buf = StringIO()
5026+
writer = csv.writer(s_buf)
5027+
writer.writerows(data_iter)
5028+
s_buf.seek(0)
5029+
5030+
columns = ', '.join('"{}"'.format(k) for k in keys)
5031+
if table.schema:
5032+
table_name = '{}.{}'.format(table.schema, table.name)
5033+
else:
5034+
table_name = table.name
5035+
5036+
sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
5037+
table_name, columns)
5038+
cur.copy_expert(sql=sql, file=s_buf)
5039+
49925040
Reading Tables
49935041
''''''''''''''
49945042

doc/source/whatsnew/v0.24.0.rst

+2
Original file line numberDiff line numberDiff line change
@@ -377,6 +377,7 @@ Other Enhancements
377377
- :meth:`DataFrame.between_time` and :meth:`DataFrame.at_time` have gained the ``axis`` parameter (:issue:`8839`)
378378
- The ``scatter_matrix``, ``andrews_curves``, ``parallel_coordinates``, ``lag_plot``, ``autocorrelation_plot``, ``bootstrap_plot``, and ``radviz`` plots from the ``pandas.plotting`` module are now accessible from calling :meth:`DataFrame.plot` (:issue:`11978`)
379379
- :class:`IntervalIndex` has gained the :attr:`~IntervalIndex.is_overlapping` attribute to indicate if the ``IntervalIndex`` contains any overlapping intervals (:issue:`23309`)
380+
- :func:`pandas.DataFrame.to_sql` has gained the ``method`` argument to control SQL insertion clause. See the :ref:`insertion method <io.sql.method>` section in the documentation. (:issue:`8953`)
380381

381382
.. _whatsnew_0240.api_breaking:
382383

@@ -1356,6 +1357,7 @@ Datetimelike
13561357
- Bug in :func:`to_datetime` where ``box`` and ``utc`` arguments were ignored when passing a :class:`DataFrame` or ``dict`` of unit mappings (:issue:`23760`)
13571358
- Bug in :attr:`Series.dt` where the cache would not update properly after an in-place operation (:issue:`24408`)
13581359
- Bug in :class:`PeriodIndex` where comparisons against an array-like object with length 1 failed to raise ``ValueError`` (:issue:`23078`)
1360+
- Bug in :meth:`DatetimeIndex.astype`, :meth:`PeriodIndex.astype` and :meth:`TimedeltaIndex.astype` ignoring the sign of the ``dtype`` for unsigned integer dtypes (:issue:`24405`).
13591361

13601362
Timedelta
13611363
^^^^^^^^^

pandas/core/arrays/datetimelike.py

+30-17
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@
2323
is_datetime64_dtype, is_datetime64tz_dtype, is_datetime_or_timedelta_dtype,
2424
is_dtype_equal, is_extension_array_dtype, is_float_dtype, is_integer_dtype,
2525
is_list_like, is_object_dtype, is_offsetlike, is_period_dtype,
26-
is_string_dtype, is_timedelta64_dtype, needs_i8_conversion, pandas_dtype)
26+
is_string_dtype, is_timedelta64_dtype, is_unsigned_integer_dtype,
27+
needs_i8_conversion, pandas_dtype)
2728
from pandas.core.dtypes.dtypes import DatetimeTZDtype
2829
from pandas.core.dtypes.generic import ABCDataFrame, ABCIndexClass, ABCSeries
2930
from pandas.core.dtypes.inference import is_array_like
@@ -397,7 +398,7 @@ def _ndarray_values(self):
397398
# ----------------------------------------------------------------
398399
# Rendering Methods
399400

400-
def _format_native_types(self, na_rep=u'NaT', date_format=None):
401+
def _format_native_types(self, na_rep='NaT', date_format=None):
401402
"""
402403
Helper method for astype when converting to strings.
403404
@@ -598,6 +599,11 @@ def astype(self, dtype, copy=True):
598599
# we deliberately ignore int32 vs. int64 here.
599600
# See https://github.com/pandas-dev/pandas/issues/24381 for more.
600601
values = self.asi8
602+
603+
if is_unsigned_integer_dtype(dtype):
604+
# Again, we ignore int32 vs. int64
605+
values = values.view("uint64")
606+
601607
if copy:
602608
values = values.copy()
603609
return values
@@ -612,6 +618,28 @@ def astype(self, dtype, copy=True):
612618
else:
613619
return np.asarray(self, dtype=dtype)
614620

621+
def view(self, dtype=None):
622+
"""
623+
New view on this array with the same data.
624+
625+
Parameters
626+
----------
627+
dtype : numpy dtype, optional
628+
629+
Returns
630+
-------
631+
ndarray
632+
With the specified `dtype`.
633+
"""
634+
return self._data.view(dtype=dtype)
635+
636+
# ------------------------------------------------------------------
637+
# ExtensionArray Interface
638+
# TODO:
639+
# * _from_sequence
640+
# * argsort / _values_for_argsort
641+
# * _reduce
642+
615643
def unique(self):
616644
result = unique1d(self.asi8)
617645
return type(self)(result, dtype=self.dtype)
@@ -674,21 +702,6 @@ def _values_for_argsort(self):
674702
# These are not part of the EA API, but we implement them because
675703
# pandas currently assumes they're there.
676704

677-
def view(self, dtype=None):
678-
"""
679-
New view on this array with the same data.
680-
681-
Parameters
682-
----------
683-
dtype : numpy dtype, optional
684-
685-
Returns
686-
-------
687-
ndarray
688-
With the specified `dtype`.
689-
"""
690-
return self._data.view(dtype=dtype)
691-
692705
def value_counts(self, dropna=False):
693706
"""
694707
Return a Series containing counts of unique values.

pandas/core/arrays/datetimes.py

+30-30
Original file line numberDiff line numberDiff line change
@@ -563,6 +563,35 @@ def __iter__(self):
563563
for v in converted:
564564
yield v
565565

566+
def astype(self, dtype, copy=True):
567+
# We handle
568+
# --> datetime
569+
# --> period
570+
# DatetimeLikeArrayMixin Super handles the rest.
571+
dtype = pandas_dtype(dtype)
572+
573+
if (is_datetime64_ns_dtype(dtype) and
574+
not is_dtype_equal(dtype, self.dtype)):
575+
# GH#18951: datetime64_ns dtype but not equal means different tz
576+
new_tz = getattr(dtype, 'tz', None)
577+
if getattr(self.dtype, 'tz', None) is None:
578+
return self.tz_localize(new_tz)
579+
result = self.tz_convert(new_tz)
580+
if new_tz is None:
581+
# Do we want .astype('datetime64[ns]') to be an ndarray.
582+
# The astype in Block._astype expects this to return an
583+
# ndarray, but we could maybe work around it there.
584+
result = result._data
585+
return result
586+
elif is_datetime64tz_dtype(self.dtype) and is_dtype_equal(self.dtype,
587+
dtype):
588+
if copy:
589+
return self.copy()
590+
return self
591+
elif is_period_dtype(dtype):
592+
return self.to_period(freq=dtype.freq)
593+
return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)
594+
566595
# ----------------------------------------------------------------
567596
# ExtensionArray Interface
568597

@@ -581,7 +610,7 @@ def _validate_fill_value(self, fill_value):
581610
# -----------------------------------------------------------------
582611
# Rendering Methods
583612

584-
def _format_native_types(self, na_rep=u'NaT', date_format=None, **kwargs):
613+
def _format_native_types(self, na_rep='NaT', date_format=None, **kwargs):
585614
from pandas.io.formats.format import _get_format_datetime64_from_values
586615
fmt = _get_format_datetime64_from_values(self, date_format)
587616

@@ -1095,35 +1124,6 @@ def to_perioddelta(self, freq):
10951124
m8delta = i8delta.view('m8[ns]')
10961125
return TimedeltaArrayMixin(m8delta)
10971126

1098-
def astype(self, dtype, copy=True):
1099-
# We handle
1100-
# --> datetime
1101-
# --> period
1102-
# Super handles the rest.
1103-
dtype = pandas_dtype(dtype)
1104-
1105-
if (is_datetime64_ns_dtype(dtype) and
1106-
not is_dtype_equal(dtype, self.dtype)):
1107-
# GH 18951: datetime64_ns dtype but not equal means different tz
1108-
new_tz = getattr(dtype, 'tz', None)
1109-
if getattr(self.dtype, 'tz', None) is None:
1110-
return self.tz_localize(new_tz)
1111-
result = self.tz_convert(new_tz)
1112-
if new_tz is None:
1113-
# Do we want .astype('datetime64[ns]') to be an ndarray.
1114-
# The astype in Block._astype expects this to return an
1115-
# ndarray, but we could maybe work around it there.
1116-
result = result._data
1117-
return result
1118-
elif is_datetime64tz_dtype(self.dtype) and is_dtype_equal(self.dtype,
1119-
dtype):
1120-
if copy:
1121-
return self.copy()
1122-
return self
1123-
elif is_period_dtype(dtype):
1124-
return self.to_period(freq=dtype.freq)
1125-
return super(DatetimeArrayMixin, self).astype(dtype, copy)
1126-
11271127
# -----------------------------------------------------------------
11281128
# Properties - Vectorized Timestamp Properties/Methods
11291129

pandas/core/arrays/timedeltas.py

+32-24
Original file line numberDiff line numberDiff line change
@@ -297,16 +297,45 @@ def _validate_fill_value(self, fill_value):
297297
"Got '{got}'.".format(got=fill_value))
298298
return fill_value
299299

300+
def astype(self, dtype, copy=True):
301+
# We handle
302+
# --> timedelta64[ns]
303+
# --> timedelta64
304+
# DatetimeLikeArrayMixin super call handles other cases
305+
dtype = pandas_dtype(dtype)
306+
307+
if is_timedelta64_dtype(dtype) and not is_timedelta64_ns_dtype(dtype):
308+
# by pandas convention, converting to non-nano timedelta64
309+
# returns an int64-dtyped array with ints representing multiples
310+
# of the desired timedelta unit. This is essentially division
311+
if self._hasnans:
312+
# avoid double-copying
313+
result = self._data.astype(dtype, copy=False)
314+
values = self._maybe_mask_results(result,
315+
fill_value=None,
316+
convert='float64')
317+
return values
318+
result = self._data.astype(dtype, copy=copy)
319+
return result.astype('i8')
320+
elif is_timedelta64_ns_dtype(dtype):
321+
if copy:
322+
return self.copy()
323+
return self
324+
return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy=copy)
325+
300326
# ----------------------------------------------------------------
301327
# Rendering Methods
302328

303-
def _format_native_types(self):
304-
return self.astype(object)
305-
306329
def _formatter(self, boxed=False):
307330
from pandas.io.formats.format import _get_format_timedelta64
308331
return _get_format_timedelta64(self, box=True)
309332

333+
def _format_native_types(self, na_rep='NaT', date_format=None):
334+
from pandas.io.formats.format import _get_format_timedelta64
335+
336+
formatter = _get_format_timedelta64(self._data, na_rep)
337+
return np.array([formatter(x) for x in self._data])
338+
310339
# ----------------------------------------------------------------
311340
# Arithmetic Methods
312341

@@ -755,27 +784,6 @@ def to_pytimedelta(self):
755784
"""
756785
return tslibs.ints_to_pytimedelta(self.asi8)
757786

758-
def astype(self, dtype, copy=True):
759-
# We handle
760-
# --> timedelta64[ns]
761-
# --> timedelta64
762-
dtype = pandas_dtype(dtype)
763-
764-
if is_timedelta64_dtype(dtype) and not is_timedelta64_ns_dtype(dtype):
765-
# essentially this is division
766-
result = self._data.astype(dtype, copy=copy)
767-
if self._hasnans:
768-
values = self._maybe_mask_results(result,
769-
fill_value=None,
770-
convert='float64')
771-
return values
772-
return result.astype('i8')
773-
elif is_timedelta64_ns_dtype(dtype):
774-
if copy:
775-
return self.copy()
776-
return self
777-
return super(TimedeltaArrayMixin, self).astype(dtype, copy=copy)
778-
779787
days = _field_accessor("days", "days",
780788
"Number of days for each element.")
781789
seconds = _field_accessor("seconds", "seconds",

pandas/core/dtypes/missing.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@
1414
is_period_dtype, is_scalar, is_string_dtype, is_string_like_dtype,
1515
is_timedelta64_dtype, needs_i8_conversion, pandas_dtype)
1616
from .generic import (
17-
ABCExtensionArray, ABCGeneric, ABCIndexClass, ABCMultiIndex, ABCSeries)
17+
ABCDatetimeArray, ABCExtensionArray, ABCGeneric, ABCIndexClass,
18+
ABCMultiIndex, ABCSeries, ABCTimedeltaArray)
1819
from .inference import is_list_like
1920

2021
isposinf_scalar = libmissing.isposinf_scalar
@@ -108,7 +109,8 @@ def _isna_new(obj):
108109
elif isinstance(obj, ABCMultiIndex):
109110
raise NotImplementedError("isna is not defined for MultiIndex")
110111
elif isinstance(obj, (ABCSeries, np.ndarray, ABCIndexClass,
111-
ABCExtensionArray)):
112+
ABCExtensionArray,
113+
ABCDatetimeArray, ABCTimedeltaArray)):
112114
return _isna_ndarraylike(obj)
113115
elif isinstance(obj, ABCGeneric):
114116
return obj._constructor(obj._data.isna(func=isna))

pandas/core/generic.py

+13-2
Original file line numberDiff line numberDiff line change
@@ -2386,7 +2386,7 @@ def to_msgpack(self, path_or_buf=None, encoding='utf-8', **kwargs):
23862386
**kwargs)
23872387

23882388
def to_sql(self, name, con, schema=None, if_exists='fail', index=True,
2389-
index_label=None, chunksize=None, dtype=None):
2389+
index_label=None, chunksize=None, dtype=None, method=None):
23902390
"""
23912391
Write records stored in a DataFrame to a SQL database.
23922392
@@ -2424,6 +2424,17 @@ def to_sql(self, name, con, schema=None, if_exists='fail', index=True,
24242424
Specifying the datatype for columns. The keys should be the column
24252425
names and the values should be the SQLAlchemy types or strings for
24262426
the sqlite3 legacy mode.
2427+
method : {None, 'multi', callable}, default None
2428+
Controls the SQL insertion clause used:
2429+
2430+
* None : Uses standard SQL ``INSERT`` clause (one per row).
2431+
* 'multi': Pass multiple values in a single ``INSERT`` clause.
2432+
* callable with signature ``(pd_table, conn, keys, data_iter)``.
2433+
2434+
Details and a sample callable implementation can be found in the
2435+
section :ref:`insert method <io.sql.method>`.
2436+
2437+
.. versionadded:: 0.24.0
24272438
24282439
Raises
24292440
------
@@ -2505,7 +2516,7 @@ def to_sql(self, name, con, schema=None, if_exists='fail', index=True,
25052516
from pandas.io import sql
25062517
sql.to_sql(self, name, con, schema=schema, if_exists=if_exists,
25072518
index=index, index_label=index_label, chunksize=chunksize,
2508-
dtype=dtype)
2519+
dtype=dtype, method=method)
25092520

25102521
def to_pickle(self, path, compression='infer',
25112522
protocol=pkl.HIGHEST_PROTOCOL):

pandas/core/indexes/base.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -712,8 +712,9 @@ def view(self, cls=None):
712712
Parameters
713713
----------
714714
dtype : numpy dtype or pandas type
715-
Note that any integer `dtype` is treated as ``'int64'``,
716-
regardless of the sign and size.
715+
Note that any signed integer `dtype` is treated as ``'int64'``,
716+
and any unsigned integer `dtype` is treated as ``'uint64'``,
717+
regardless of the size.
717718
copy : bool, default True
718719
By default, astype always returns a newly allocated object.
719720
If copy is set to False and internal requirements on dtype are

0 commit comments

Comments
 (0)