Skip to content

Commit 6beba0d

Browse files
committed
API: add DatetimeBlockTZ pandas-dev#8260
fix scalar comparisons vs None generally fix NaT formattting in Series TST: skip postgresql test with tz's update for msgpack Conflicts: pandas/core/base.py pandas/core/categorical.py pandas/core/format.py pandas/tests/test_base.py pandas/util/testing.py full interop for tz-aware Series & timedeltas pandas-dev#10763
1 parent 63c587d commit 6beba0d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+2217
-884
lines changed

doc/source/basics.rst

+11-4
Original file line numberDiff line numberDiff line change
@@ -1549,9 +1549,10 @@ dtypes
15491549
------
15501550

15511551
The main types stored in pandas objects are ``float``, ``int``, ``bool``,
1552-
``datetime64[ns]``, ``timedelta[ns]`` and ``object``. In addition these dtypes
1553-
have item sizes, e.g. ``int64`` and ``int32``. A convenient :attr:`~DataFrame.dtypes``
1554-
attribute for DataFrames returns a Series with the data type of each column.
1552+
``datetime64[ns]`` and ``datetime64[ns, tz]`` (in >= 0.17.0), ``timedelta[ns]``, ``category`` (in >= 0.15.0), and ``object``. In addition these dtypes
1553+
have item sizes, e.g. ``int64`` and ``int32``. See :ref:`Series with TZ <timeseries.timezone_series>` for more detail on ``datetime64[ns, tz]`` dtypes.
1554+
1555+
A convenient :attr:`~DataFrame.dtypes`` attribute for DataFrames returns a Series with the data type of each column.
15551556

15561557
.. ipython:: python
15571558
@@ -1773,8 +1774,14 @@ dtypes:
17731774
df['tdeltas'] = df.dates.diff()
17741775
df['uint64'] = np.arange(3, 6).astype('u8')
17751776
df['other_dates'] = pd.date_range('20130101', periods=3).values
1777+
df['tz_aware_dates'] = pd.date_range('20130101', periods=3, tz='US/Eastern')
17761778
df
17771779
1780+
And the dtypes
1781+
1782+
.. ipython:: python
1783+
1784+
df.dtypes
17781785
17791786
:meth:`~DataFrame.select_dtypes` has two parameters ``include`` and ``exclude`` that allow you to
17801787
say "give me the columns WITH these dtypes" (``include``) and/or "give the
@@ -1827,7 +1834,7 @@ All numpy dtypes are subclasses of ``numpy.generic``:
18271834
18281835
.. note::
18291836

1830-
Pandas also defines an additional ``category`` dtype, which is not integrated into the normal
1837+
Pandas also defines an types `category``, and ``datetime64[ns, tz]``, which are not integrated into the normal
18311838
numpy hierarchy and wont show up with the above function.
18321839

18331840
.. note::

doc/source/timeseries.rst

+27
Original file line numberDiff line numberDiff line change
@@ -1734,3 +1734,30 @@ constructor as well as ``tz_localize``.
17341734
17351735
# tz_convert(None) is identical with tz_convert('UTC').tz_localize(None)
17361736
didx.tz_convert('UCT').tz_localize(None)
1737+
1738+
.. _timeseries.timezone_series:
1739+
1740+
TZ aware Dtypes
1741+
~~~~~~~~~~~~~~~
1742+
1743+
.. versionadded:: 0.17.0
1744+
1745+
``Series/DatetimeIndex`` with a timezone naive value are represented with a dtype of ``datetime64[ns]``.
1746+
1747+
.. ipython:: python
1748+
1749+
dr = pd.date_range('20130101',periods=3)
1750+
dr
1751+
s = Series(dr)
1752+
s
1753+
1754+
``Series/DatetimeIndex`` with a timezone aware value are represented with a dtype of ``datetime64[ns, tz]``.
1755+
1756+
.. ipython:: python
1757+
1758+
dr = pd.date_range('20130101',periods=3,tz='US/Eastern')
1759+
dr
1760+
s = Series(dr)
1761+
s
1762+
1763+
Both of these ``Series`` can be manipulated via the ``.dt`` accessor, see the :ref:`docs <basics.dt_accessors>` as well.

doc/source/whatsnew/v0.17.0.txt

+82
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ users upgrade to this version.
1414
Highlights include:
1515

1616
- Release the Global Interpreter Lock (GIL) on some cython operations, see :ref:`here <whatsnew_0170.gil>`
17+
- Support for a ``datetime64[ns]`` with timezones as a first-class dtype, see :ref:`here <whatsnew_0170.tz>`
1718
- The default for ``to_datetime`` will now be to ``raise`` when presented with unparseable formats,
1819
previously this would return the original input, see :ref:`here <whatsnew_0170.api_breaking.to_datetime>`
1920
- The default for ``dropna`` in ``HDFStore`` has changed to ``False``, to store by default all rows even
@@ -564,6 +565,84 @@ Removal of prior version deprecations/changes
564565

565566
- Remove use of some deprecated numpy comparison operations, mainly in tests. (:issue:`10569`)
566567

568+
.. _dask: https://dask.readthedocs.org/en/latest/
569+
570+
.. _whatsnew_0170.tz:
571+
572+
Datetime with TZ
573+
~~~~~~~~~~~~~~~~
574+
575+
We are adding an implementation that natively supports datetime with timezones. A ``Series`` or a ``DataFrame`` column previously
576+
*could* be assigned a datetime with timezones, and would work as an ``object`` dtype. This had performance issues with a large
577+
number rows. (:issue:`8260`, :issue:`10763`)
578+
579+
The new implementation allows for having a single-timezone across all rows, and operating on it in a performant manner.
580+
581+
.. ipython:: python
582+
583+
df = DataFrame({'A' : date_range('20130101',periods=3),
584+
'B' : date_range('20130101',periods=3,tz='US/Eastern'),
585+
'C' : date_range('20130101',periods=3,tz='CET')})
586+
df
587+
df.dtypes
588+
589+
.. ipython:: python
590+
591+
df.B
592+
df.B.dt.tz_localize(None)
593+
594+
This uses a new-dtype representation as well, that is very similar in look-and-feel to its numpy cousin ``datetime64[ns]``
595+
596+
.. ipython:: python
597+
598+
df['B'].dtype
599+
type(df['B']).dtype
600+
601+
.. note::
602+
603+
There is a slightly different string repr for the underlying ``DatetimeIndex`` as a result of the dtype changes, but
604+
functionaily these are the same.
605+
606+
Previously
607+
608+
.. code-block:: python
609+
610+
In [1]: pd.date_range('20130101',periods=3,tz='US/Eastern')
611+
Out[1]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
612+
'2013-01-03 00:00:00-05:00'],
613+
dtype='datetime64[ns]', freq='D', tz='US/Eastern')
614+
615+
In [2]: pd.date_range('20130101',periods=3,tz='US/Eastern').dtype
616+
Out[2]: dtype('<M8[ns]')
617+
618+
.. ipython:: python
619+
620+
pd.date_range('20130101',periods=3,tz='US/Eastern')
621+
pd.date_range('20130101',periods=3,tz='US/Eastern').dtype
622+
623+
explain ``DatetimeIndex`` repr change & dtype display
624+
625+
626+
.. _whatsnew_0170.gil:
627+
628+
Releasing the GIL
629+
~~~~~~~~~~~~~~~~~
630+
631+
We are releasing the global-interpreter-lock (GIL) on some cython operations.
632+
This will allow other threads to run simultaneously during computation, potentially allowing performance improvements
633+
from multi-threading. Notably ``groupby`` and some indexing operations are a benefit from this. (:issue:`8882`)
634+
635+
For example the groupby expression in the following code will have the GIL released during the factorization step, e.g. ``df.groupby('key')``
636+
as well as the ``.sum()`` operation.
637+
638+
.. code-block:: python
639+
640+
N = 1e6
641+
df = DataFrame({'key' : np.random.randint(0,ngroups,size=N),
642+
'data' : np.random.randn(N) })
643+
df.groupby('key')['data'].sum()
644+
645+
Releasing of the GIL could benefit an application that uses threads for user interactions (e.g. ``QT``), or performaning multi-threaded computations. A nice example of a library that can handle these types of computation-in-parallel is the dask_ library.
567646

568647
.. _whatsnew_0170.performance:
569648

@@ -587,6 +666,9 @@ Bug Fixes
587666

588667

589668
- Bug in ``DataFrame.to_html(index=False)`` renders unnecessary ``name`` row (:issue:`10344`)
669+
- Bug in ``DatetimeIndex`` when localizing with ``NaT`` (:issue:`10477`)
670+
- Bug in ``Series.dt`` ops in preserving meta-data (:issue:`10477`)
671+
- Bug in preserving ``NaT`` when passed in an otherwise invalid ``to_datetime`` construction (:issue:`10477`)
590672
- Bug in ``DataFrame.apply`` when function returns categorical series. (:issue:`9573`)
591673
- Bug in ``to_datetime`` with invalid dates and formats supplied (:issue:`10154`)
592674
- Bug in ``Index.drop_duplicates`` dropping name(s) (:issue:`10115`)

pandas/core/algorithms.py

+17-5
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ def value_counts(values, sort=True, ascending=False, normalize=False,
206206
"""
207207
from pandas.core.series import Series
208208
from pandas.tools.tile import cut
209-
from pandas.tseries.period import PeriodIndex
209+
from pandas import Index, PeriodIndex, DatetimeIndex
210210

211211
name = getattr(values, 'name', None)
212212
values = Series(values).values
@@ -225,11 +225,15 @@ def value_counts(values, sort=True, ascending=False, normalize=False,
225225

226226
dtype = values.dtype
227227
is_period = com.is_period_arraylike(values)
228+
is_datetimetz = com.is_datetimetz(values)
228229

229-
if com.is_datetime_or_timedelta_dtype(dtype) or is_period:
230+
if com.is_datetime_or_timedelta_dtype(dtype) or is_period or is_datetimetz:
230231

231232
if is_period:
232-
values = PeriodIndex(values, name=name)
233+
values = PeriodIndex(values)
234+
elif is_datetimetz:
235+
tz = getattr(values, 'tz', None)
236+
values = DatetimeIndex(values).tz_localize(None)
233237

234238
values = values.view(np.int64)
235239
keys, counts = htable.value_count_int64(values)
@@ -239,8 +243,14 @@ def value_counts(values, sort=True, ascending=False, normalize=False,
239243
msk = keys != iNaT
240244
keys, counts = keys[msk], counts[msk]
241245

246+
# localize to the original tz if necessary
247+
if is_datetimetz:
248+
keys = DatetimeIndex(keys).tz_localize(tz)
249+
242250
# convert the keys back to the dtype we came in
243-
keys = keys.astype(dtype)
251+
else:
252+
keys = keys.astype(dtype)
253+
244254

245255
elif com.is_integer_dtype(dtype):
246256
values = com._ensure_int64(values)
@@ -254,7 +264,9 @@ def value_counts(values, sort=True, ascending=False, normalize=False,
254264
keys = np.insert(keys, 0, np.NaN)
255265
counts = np.insert(counts, 0, mask.sum())
256266

257-
result = Series(counts, index=com._values_from_object(keys), name=name)
267+
if not isinstance(keys, Index):
268+
keys = Index(keys)
269+
result = Series(counts, index=keys, name=name)
258270

259271
if bins is not None:
260272
# TODO: This next line should be more efficient

pandas/core/base.py

+9-1
Original file line numberDiff line numberDiff line change
@@ -396,6 +396,14 @@ def hasnans(self):
396396
""" return if I have any nans; enables various perf speedups """
397397
return com.isnull(self).any()
398398

399+
def _reduce(self, op, name, axis=0, skipna=True, numeric_only=None,
400+
filter_type=None, **kwds):
401+
""" perform the reduction type operation if we can """
402+
func = getattr(self,name,None)
403+
if func is None:
404+
raise TypeError("{klass} cannot perform the operation {op}".format(klass=self.__class__.__name__,op=name))
405+
return func(**kwds)
406+
399407
def value_counts(self, normalize=False, sort=True, ascending=False,
400408
bins=None, dropna=True):
401409
"""
@@ -585,7 +593,7 @@ def drop_duplicates(self, keep='first', inplace=False):
585593
@deprecate_kwarg('take_last', 'keep', mapping={True: 'last', False: 'first'})
586594
@Appender(_shared_docs['duplicated'] % _indexops_doc_kwargs)
587595
def duplicated(self, keep='first'):
588-
keys = com._ensure_object(self.values)
596+
keys = com._values_from_object(com._ensure_object(self.values))
589597
duplicated = lib.duplicated(keys, keep=keep)
590598
try:
591599
return self._constructor(duplicated,

pandas/core/categorical.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,14 @@
1212
import pandas.core.common as com
1313
from pandas.util.decorators import cache_readonly, deprecate_kwarg
1414

15-
from pandas.core.common import (CategoricalDtype, ABCSeries, ABCIndexClass, ABCCategoricalIndex,
15+
from pandas.core.common import (ABCSeries, ABCIndexClass, ABCPeriodIndex, ABCCategoricalIndex,
1616
isnull, notnull, is_dtype_equal,
1717
is_categorical_dtype, is_integer_dtype, is_object_dtype,
1818
_possibly_infer_to_datetimelike, get_dtype_kinds,
1919
is_list_like, is_sequence, is_null_slice, is_bool,
2020
_ensure_platform_int, _ensure_object, _ensure_int64,
2121
_coerce_indexer_dtype, take_1d)
22+
from pandas.core.dtypes import CategoricalDtype
2223
from pandas.util.terminal import get_terminal_size
2324
from pandas.core.config import get_option
2425

0 commit comments

Comments
 (0)