Skip to content

Commit 4eb7126

Browse files
committed
Merge remote-tracking branch 'upstream/master'
2 parents 5626cdd + 4c3d4d4 commit 4eb7126

File tree

13 files changed

+274
-70
lines changed

13 files changed

+274
-70
lines changed

asv_bench/benchmarks/period.py

+25
Original file line numberDiff line numberDiff line change
@@ -49,3 +49,28 @@ def time_value_counts_pindex(self):
4949
self.i.value_counts()
5050

5151

52+
class period_standard_indexing(object):
53+
goal_time = 0.2
54+
55+
def setup(self):
56+
self.index = PeriodIndex(start='1985', periods=1000, freq='D')
57+
self.series = Series(range(1000), index=self.index)
58+
self.period = self.index[500]
59+
60+
def time_get_loc(self):
61+
self.index.get_loc(self.period)
62+
63+
def time_shape(self):
64+
self.index.shape
65+
66+
def time_shallow_copy(self):
67+
self.index._shallow_copy()
68+
69+
def time_series_loc(self):
70+
self.series.loc[self.period]
71+
72+
def time_align(self):
73+
pd.DataFrame({'a': self.series, 'b': self.series[:500]})
74+
75+
def time_intersection(self):
76+
self.index[:750].intersection(self.index[250:])

doc/source/timeseries.rst

+100-37
Original file line numberDiff line numberDiff line change
@@ -358,8 +358,8 @@ See :ref:`here <timeseries.oob>` for ways to represent data outside these bound.
358358

359359
.. _timeseries.datetimeindex:
360360

361-
DatetimeIndex
362-
-------------
361+
Indexing
362+
--------
363363

364364
One of the main uses for ``DatetimeIndex`` is as an index for pandas objects.
365365
The ``DatetimeIndex`` class contains many timeseries related optimizations:
@@ -399,8 +399,8 @@ intelligent functionality like selection, slicing, etc.
399399
400400
.. _timeseries.partialindexing:
401401

402-
DatetimeIndex Partial String Indexing
403-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
402+
Partial String Indexing
403+
~~~~~~~~~~~~~~~~~~~~~~~
404404

405405
You can pass in dates and strings that parse to dates as indexing parameters:
406406

@@ -457,22 +457,6 @@ We are stopping on the included end-point as it is part of the index
457457
458458
dft['2013-1-15':'2013-1-15 12:30:00']
459459
460-
.. warning::
461-
462-
The following selection will raise a ``KeyError``; otherwise this selection methodology
463-
would be inconsistent with other selection methods in pandas (as this is not a *slice*, nor does it
464-
resolve to one)
465-
466-
.. code-block:: python
467-
468-
dft['2013-1-15 12:30:00']
469-
470-
To select a single row, use ``.loc``
471-
472-
.. ipython:: python
473-
474-
dft.loc['2013-1-15 12:30:00']
475-
476460
.. versionadded:: 0.18.0
477461

478462
DatetimeIndex Partial String Indexing also works on DataFrames with a ``MultiIndex``. For example:
@@ -491,12 +475,86 @@ DatetimeIndex Partial String Indexing also works on DataFrames with a ``MultiInd
491475
dft2 = dft2.swaplevel(0, 1).sort_index()
492476
dft2.loc[idx[:, '2013-01-05'], :]
493477
494-
Datetime Indexing
495-
~~~~~~~~~~~~~~~~~
478+
.. _timeseries.slice_vs_exact_match:
479+
480+
Slice vs. exact match
481+
~~~~~~~~~~~~~~~~~~~~~
482+
483+
.. versionchanged:: 0.20.0
484+
485+
The same string used as an indexing parameter can be treated either as a slice or as an exact match depending on the resolution of an index. If the string is less accurate than the index, it will be treated as a slice, otherwise as an exact match.
486+
487+
For example, let us consider ``Series`` object which index has minute resolution.
488+
489+
.. ipython:: python
490+
491+
series_minute = pd.Series([1, 2, 3],
492+
pd.DatetimeIndex(['2011-12-31 23:59:00',
493+
'2012-01-01 00:00:00',
494+
'2012-01-01 00:02:00']))
495+
series_minute.index.resolution
496+
497+
A Timestamp string less accurate than a minute gives a ``Series`` object.
498+
499+
.. ipython:: python
500+
501+
series_minute['2011-12-31 23']
502+
503+
A Timestamp string with minute resolution (or more accurate), gives a scalar instead, i.e. it is not casted to a slice.
504+
505+
.. ipython:: python
506+
507+
series_minute['2011-12-31 23:59']
508+
series_minute['2011-12-31 23:59:00']
509+
510+
If index resolution is second, then, the minute-accurate timestamp gives a ``Series``.
496511

497-
Indexing a ``DateTimeIndex`` with a partial string depends on the "accuracy" of the period, in other words how specific the interval is in relation to the frequency of the index. In contrast, indexing with datetime objects is exact, because the objects have exact meaning. These also follow the semantics of *including both endpoints*.
512+
.. ipython:: python
513+
514+
series_second = pd.Series([1, 2, 3],
515+
pd.DatetimeIndex(['2011-12-31 23:59:59',
516+
'2012-01-01 00:00:00',
517+
'2012-01-01 00:00:01']))
518+
series_second.index.resolution
519+
series_second['2011-12-31 23:59']
520+
521+
If the timestamp string is treated as a slice, it can be used to index ``DataFrame`` with ``[]`` as well.
522+
523+
.. ipython:: python
524+
525+
dft_minute = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]},
526+
index=series_minute.index)
527+
dft_minute['2011-12-31 23']
528+
529+
530+
:: warning::
531+
532+
However if the string is treated as an exact match the selection in ``DataFrame``'s ``[]`` will be column-wise and not row-wise, see :ref:`Indexing Basics <indexing.basics>`. For example ``dft_minute['2011-12-31 23:59']`` will raise ``KeyError`` as ``'2012-12-31 23:59'`` has the same resolution as index and there is no column with such name:
533+
534+
To select a single row, use ``.loc``.
535+
536+
.. ipython:: python
537+
538+
dft_minute.loc['2011-12-31 23:59']
539+
540+
Note also that ``DatetimeIndex`` resolution cannot be less precise than day.
498541

499-
These ``datetime`` objects are specific ``hours, minutes,`` and ``seconds`` even though they were not explicitly specified (they are ``0``).
542+
.. ipython:: python
543+
544+
series_monthly = pd.Series([1, 2, 3],
545+
pd.DatetimeIndex(['2011-12',
546+
'2012-01',
547+
'2012-02']))
548+
series_monthly.index.resolution
549+
series_monthly['2011-12'] # returns Series
550+
551+
552+
Exact Indexing
553+
~~~~~~~~~~~~~~
554+
555+
As discussed in previous section, indexing a ``DateTimeIndex`` with a partial string depends on the "accuracy" of the period, in other words how specific the interval is in relation to the resolution of the index. In contrast, indexing with ``Timestamp`` or ``datetime`` objects is exact, because the objects have exact meaning. These also follow the semantics of *including both endpoints*.
556+
557+
These ``Timestamp`` and ``datetime`` objects have exact ``hours, minutes,`` and ``seconds``, even though they were not explicitly specified (they are ``0``).
500558

501559
.. ipython:: python
502560
@@ -525,10 +583,10 @@ regularity will result in a ``DatetimeIndex`` (but frequency is lost):
525583
526584
ts[[0, 2, 6]].index
527585
528-
.. _timeseries.offsets:
586+
.. _timeseries.components:
529587

530588
Time/Date Components
531-
~~~~~~~~~~~~~~~~~~~~~~~~~~~
589+
--------------------
532590

533591
There are several time/date properties that one can access from ``Timestamp`` or a collection of timestamps like a ``DateTimeIndex``.
534592

@@ -564,6 +622,8 @@ There are several time/date properties that one can access from ``Timestamp`` or
564622

565623
Furthermore, if you have a ``Series`` with datetimelike values, then you can access these properties via the ``.dt`` accessor, see the :ref:`docs <basics.dt_accessors>`
566624

625+
.. _timeseries.offsets:
626+
567627
DateOffset objects
568628
------------------
569629

@@ -628,12 +688,12 @@ We could have done the same thing with ``DateOffset``:
628688
629689
The key features of a ``DateOffset`` object are:
630690

631-
- it can be added / subtracted to/from a datetime object to obtain a
632-
shifted date
633-
- it can be multiplied by an integer (positive or negative) so that the
634-
increment will be applied multiple times
635-
- it has ``rollforward`` and ``rollback`` methods for moving a date forward
636-
or backward to the next or previous "offset date"
691+
- it can be added / subtracted to/from a datetime object to obtain a
692+
shifted date
693+
- it can be multiplied by an integer (positive or negative) so that the
694+
increment will be applied multiple times
695+
- it has ``rollforward`` and ``rollback`` methods for moving a date forward
696+
or backward to the next or previous "offset date"
637697

638698
Subclasses of ``DateOffset`` define the ``apply`` function which dictates
639699
custom date increment logic, such as adding business days:
@@ -745,7 +805,7 @@ used exactly like a ``Timedelta`` - see the
745805
746806
Note that some offsets (such as ``BQuarterEnd``) do not have a
747807
vectorized implementation. They can still be used but may
748-
calculate significantly slower and will raise a ``PerformanceWarning``
808+
calculate significantly slower and will show a ``PerformanceWarning``
749809

750810
.. ipython:: python
751811
:okwarning:
@@ -755,8 +815,8 @@ calculate significantly slower and will raise a ``PerformanceWarning``
755815
756816
.. _timeseries.custombusinessdays:
757817

758-
Custom Business Days (Experimental)
759-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
818+
Custom Business Days
819+
~~~~~~~~~~~~~~~~~~~~
760820

761821
The ``CDay`` or ``CustomBusinessDay`` class provides a parametric
762822
``BusinessDay`` class which can be used to create customized business day
@@ -785,7 +845,7 @@ Let's map to the weekday names
785845
786846
pd.Series(dts.weekday, dts).map(pd.Series('Mon Tue Wed Thu Fri Sat Sun'.split()))
787847
788-
As of v0.14 holiday calendars can be used to provide the list of holidays. See the
848+
Holiday calendars can be used to provide the list of holidays. See the
789849
:ref:`holiday calendar<timeseries.holiday>` section for more information.
790850

791851
.. ipython:: python
@@ -1289,12 +1349,15 @@ limited to, financial applications.
12891349
See some :ref:`cookbook examples <cookbook.resample>` for some advanced strategies
12901350

12911351
Starting in version 0.18.1, the ``resample()`` function can be used directly from
1292-
DataFrameGroupBy objects, see the :ref:`groupby docs <groupby.transform.window_resample>`.
1352+
``DataFrameGroupBy`` objects, see the :ref:`groupby docs <groupby.transform.window_resample>`.
12931353

12941354
.. note::
12951355

12961356
``.resample()`` is similar to using a ``.rolling()`` operation with a time-based offset, see a discussion :ref:`here <stats.moments.ts-versus-resampling>`
12971357

1358+
Basics
1359+
~~~~~~
1360+
12981361
.. ipython:: python
12991362
13001363
rng = pd.date_range('1/1/2012', periods=100, freq='S')

doc/source/whatsnew/v0.19.2.txt

+1
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ Performance Improvements
2222
~~~~~~~~~~~~~~~~~~~~~~~~
2323

2424
- Improved performance of ``.replace()`` (:issue:`12745`)
25+
- Improved performance of ``PeriodIndex`` (:issue:`14822`)
2526
- Improved performance ``Series`` creation with a datetime index and dictionary data (:issue:`14894`)
2627

2728
.. _whatsnew_0192.enhancements.other:

doc/source/whatsnew/v0.20.0.txt

+31-3
Original file line numberDiff line numberDiff line change
@@ -193,14 +193,42 @@ in prior versions of pandas) (:issue:`11915`).
193193

194194
.. _whatsnew_0200.api:
195195

196+
Other API Changes
197+
^^^^^^^^^^^^^^^^^
198+
196199
- ``CParserError`` has been renamed to ``ParserError`` in ``pd.read_csv`` and will be removed in the future (:issue:`12665`)
197200
- ``SparseArray.cumsum()`` and ``SparseSeries.cumsum()`` will now always return ``SparseArray`` and ``SparseSeries`` respectively (:issue:`12855`)
201+
- :ref:`DatetimeIndex Partial String Indexing <timeseries.partialindexing>` now works as exact match provided that string resolution coincides with index resolution, including a case when both are seconds (:issue:`14826`). See :ref:`Slice vs. Exact Match <timeseries.slice_vs_exact_match>` for details.
198202

203+
.. ipython:: python
199204

205+
df = DataFrame({'a': [1, 2, 3]}, DatetimeIndex(['2011-12-31 23:59:59',
206+
'2012-01-01 00:00:00',
207+
'2012-01-01 00:00:01']))
208+
Previous Behavior:
200209

210+
.. code-block:: ipython
201211

202-
Other API Changes
203-
^^^^^^^^^^^^^^^^^
212+
In [4]: df['2011-12-31 23:59:59']
213+
Out[4]:
214+
a
215+
2011-12-31 23:59:59 1
216+
217+
In [5]: df['a']['2011-12-31 23:59:59']
218+
Out[5]:
219+
2011-12-31 23:59:59 1
220+
Name: a, dtype: int64
221+
222+
223+
New Behavior:
224+
225+
.. code-block:: ipython
226+
227+
In [4]: df['2011-12-31 23:59:59']
228+
KeyError: '2011-12-31 23:59:59'
229+
230+
In [5]: df['a']['2011-12-31 23:59:59']
231+
Out[5]: 1
204232

205233
.. _whatsnew_0200.deprecations:
206234

@@ -253,7 +281,7 @@ Bug Fixes
253281

254282

255283

256-
284+
- Bug in ``Series`` construction with a datetimetz (:issue:`14928`)
257285

258286

259287

pandas/core/base.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -814,7 +814,7 @@ def transpose(self, *args, **kwargs):
814814
@property
815815
def shape(self):
816816
""" return a tuple of the shape of the underlying data """
817-
return self.values.shape
817+
return self._values.shape
818818

819819
@property
820820
def ndim(self):
@@ -842,22 +842,22 @@ def data(self):
842842
@property
843843
def itemsize(self):
844844
""" return the size of the dtype of the item of the underlying data """
845-
return self.values.itemsize
845+
return self._values.itemsize
846846

847847
@property
848848
def nbytes(self):
849849
""" return the number of bytes in the underlying data """
850-
return self.values.nbytes
850+
return self._values.nbytes
851851

852852
@property
853853
def strides(self):
854854
""" return the strides of the underlying data """
855-
return self.values.strides
855+
return self._values.strides
856856

857857
@property
858858
def size(self):
859859
""" return the number of elements in the underlying data """
860-
return self.values.size
860+
return self._values.size
861861

862862
@property
863863
def flags(self):

pandas/core/ops.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -545,9 +545,9 @@ def _offset(lvalues, rvalues):
545545

546546
# with tz, convert to UTC
547547
if self.is_datetime64tz_lhs:
548-
lvalues = lvalues.tz_localize(None)
548+
lvalues = lvalues.tz_convert('UTC').tz_localize(None)
549549
if self.is_datetime64tz_rhs:
550-
rvalues = rvalues.tz_localize(None)
550+
rvalues = rvalues.tz_convert('UTC').tz_localize(None)
551551

552552
lvalues = lvalues.view(np.int64)
553553
rvalues = rvalues.view(np.int64)

pandas/tests/groupby/test_filters.py

+13
Original file line numberDiff line numberDiff line change
@@ -596,6 +596,19 @@ def test_filter_non_bool_raises(self):
596596
with tm.assertRaisesRegexp(TypeError, 'filter function returned a.*'):
597597
df.groupby('a').filter(lambda g: g.c.mean())
598598

599+
def test_filter_dropna_with_empty_groups(self):
600+
# GH 10780
601+
data = pd.Series(np.random.rand(9), index=np.repeat([1, 2, 3], 3))
602+
groupped = data.groupby(level=0)
603+
result_false = groupped.filter(lambda x: x.mean() > 1, dropna=False)
604+
expected_false = pd.Series([np.nan] * 9,
605+
index=np.repeat([1, 2, 3], 3))
606+
tm.assert_series_equal(result_false, expected_false)
607+
608+
result_true = groupped.filter(lambda x: x.mean() > 1, dropna=True)
609+
expected_true = pd.Series(index=pd.Index([], dtype=int))
610+
tm.assert_series_equal(result_true, expected_true)
611+
599612

600613
def assert_fp_equal(a, b):
601614
assert (np.abs(a - b) < 1e-12).all()

0 commit comments

Comments
 (0)