Skip to content

Commit a44916a

Browse files
committed
merge
2 parents af37225 + 93c86aa commit a44916a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+1521
-931
lines changed

asv_bench/benchmarks/index_object.py

+17
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,11 @@ def setup(self, dtype):
147147
self.idx = getattr(tm, 'make{}Index'.format(dtype))(N)
148148
self.array_mask = (np.arange(N) % 3) == 0
149149
self.series_mask = Series(self.array_mask)
150+
self.sorted = self.idx.sort_values()
151+
half = N // 2
152+
self.non_unique = self.idx[:half].append(self.idx[:half])
153+
self.non_unique_sorted = self.sorted[:half].append(self.sorted[:half])
154+
self.key = self.sorted[N // 4]
150155

151156
def time_boolean_array(self, dtype):
152157
self.idx[self.array_mask]
@@ -163,6 +168,18 @@ def time_slice(self, dtype):
163168
def time_slice_step(self, dtype):
164169
self.idx[::2]
165170

171+
def time_get_loc(self, dtype):
172+
self.idx.get_loc(self.key)
173+
174+
def time_get_loc_sorted(self, dtype):
175+
self.sorted.get_loc(self.key)
176+
177+
def time_get_loc_non_unique(self, dtype):
178+
self.non_unique.get_loc(self.key)
179+
180+
def time_get_loc_non_unique_sorted(self, dtype):
181+
self.non_unique_sorted.get_loc(self.key)
182+
166183

167184
class Float64IndexMethod(object):
168185
# GH 13166

doc/source/advanced.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -672,7 +672,7 @@ The ``CategoricalIndex`` is **preserved** after indexing:
672672
df2.loc['a'].index
673673
674674
Sorting the index will sort by the order of the categories (Recall that we
675-
created the index with with ``CategoricalDtype(list('cab'))``, so the sorted
675+
created the index with ``CategoricalDtype(list('cab'))``, so the sorted
676676
order is ``cab``.).
677677

678678
.. ipython:: python

doc/source/comparison_with_sas.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -279,7 +279,7 @@ date/datetime columns.
279279
280280
The equivalent pandas operations are shown below. In addition to these
281281
functions pandas supports other Time Series features
282-
not available in Base SAS (such as resampling and and custom offsets) -
282+
not available in Base SAS (such as resampling and custom offsets) -
283283
see the :ref:`timeseries documentation<timeseries>` for more details.
284284

285285
.. ipython:: python
@@ -584,7 +584,7 @@ For example, in SAS you could do this to filter missing values.
584584
if value_x ^= .;
585585
run;
586586
587-
Which doesn't work in in pandas. Instead, the ``pd.isna`` or ``pd.notna`` functions
587+
Which doesn't work in pandas. Instead, the ``pd.isna`` or ``pd.notna`` functions
588588
should be used for comparisons.
589589

590590
.. ipython:: python

doc/source/computation.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -512,7 +512,7 @@ a same sized result as the input.
512512

513513
When using ``.resample()`` with an offset. Construct a new index that is the frequency of the offset. For each frequency
514514
bin, aggregate points from the input within a backwards-in-time looking window that fall in that bin. The result of this
515-
aggregation is the output for that frequency point. The windows are fixed size size in the frequency space. Your result
515+
aggregation is the output for that frequency point. The windows are fixed size in the frequency space. Your result
516516
will have the shape of a regular frequency between the min and the max of the original input object.
517517

518518
To summarize, ``.rolling()`` is a time-based window operation, while ``.resample()`` is a frequency-based window operation.

doc/source/groupby.rst

+4-3
Original file line numberDiff line numberDiff line change
@@ -1219,8 +1219,8 @@ see :ref:`here <basics.pipe>`.
12191219
Combining ``.groupby`` and ``.pipe`` is often useful when you need to reuse
12201220
GroupBy objects.
12211221

1222-
For an example, imagine having a DataFrame with columns for stores, products,
1223-
revenue and sold quantity. We'd like to do a groupwise calculation of *prices*
1222+
As an example, imagine having a DataFrame with columns for stores, products,
1223+
revenue and quantity sold. We'd like to do a groupwise calculation of *prices*
12241224
(i.e. revenue/quantity) per store and per product. We could do this in a
12251225
multi-step operation, but expressing it in terms of piping can make the
12261226
code more readable. First we set the data:
@@ -1230,7 +1230,8 @@ code more readable. First we set the data:
12301230
import numpy as np
12311231
n = 1000
12321232
df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n),
1233-
'Product': np.random.choice(['Product_1', 'Product_2', 'Product_3'], n),
1233+
'Product': np.random.choice(['Product_1',
1234+
'Product_2'], n),
12341235
'Revenue': (np.random.random(n)*50+10).round(2),
12351236
'Quantity': np.random.randint(1, 10, size=n)})
12361237
df.head(2)

doc/source/io.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -4529,7 +4529,7 @@ Several caveats.
45294529
on an attempt at serialization.
45304530

45314531
You can specify an ``engine`` to direct the serialization. This can be one of ``pyarrow``, or ``fastparquet``, or ``auto``.
4532-
If the engine is NOT specified, then the ``pd.options.io.parquet.engine`` option is checked; if this is also ``auto``, then
4532+
If the engine is NOT specified, then the ``pd.options.io.parquet.engine`` option is checked; if this is also ``auto``,
45334533
then ``pyarrow`` is tried, and falling back to ``fastparquet``.
45344534

45354535
See the documentation for `pyarrow <http://arrow.apache.org/docs/python/>`__ and `fastparquet <https://fastparquet.readthedocs.io/en/latest/>`__

doc/source/release.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -406,7 +406,7 @@ of all enhancements and bugs that have been fixed in 0.20.1.
406406

407407
.. note::
408408

409-
This is a combined release for 0.20.0 and and 0.20.1.
409+
This is a combined release for 0.20.0 and 0.20.1.
410410
Version 0.20.1 contains one additional change for backwards-compatibility with downstream projects using pandas' ``utils`` routines. (:issue:`16250`)
411411

412412
Thanks
@@ -2918,7 +2918,7 @@ Improvements to existing features
29182918
- clipboard functions use pyperclip (no dependencies on Windows, alternative
29192919
dependencies offered for Linux) (:issue:`3837`).
29202920
- Plotting functions now raise a ``TypeError`` before trying to plot anything
2921-
if the associated objects have have a dtype of ``object`` (:issue:`1818`,
2921+
if the associated objects have a dtype of ``object`` (:issue:`1818`,
29222922
:issue:`3572`, :issue:`3911`, :issue:`3912`), but they will try to convert object
29232923
arrays to numeric arrays if possible so that you can still plot, for example, an
29242924
object array with floats. This happens before any drawing takes place which
@@ -4082,7 +4082,7 @@ Bug Fixes
40824082
columns (:issue:`1943`)
40834083
- Fix time zone localization bug causing improper fields (e.g. hours) in time
40844084
zones that have not had a UTC transition in a long time (:issue:`1946`)
4085-
- Fix errors when parsing and working with with fixed offset timezones
4085+
- Fix errors when parsing and working with fixed offset timezones
40864086
(:issue:`1922`, :issue:`1928`)
40874087
- Fix text parser bug when handling UTC datetime objects generated by
40884088
dateutil (:issue:`1693`)
@@ -4383,7 +4383,7 @@ Bug Fixes
43834383
error (:issue:`1090`)
43844384
- Consistently set name on groupby pieces (:issue:`184`)
43854385
- Treat dict return values as Series in GroupBy.apply (:issue:`823`)
4386-
- Respect column selection for DataFrame in in GroupBy.transform (:issue:`1365`)
4386+
- Respect column selection for DataFrame in GroupBy.transform (:issue:`1365`)
43874387
- Fix MultiIndex partial indexing bug (:issue:`1352`)
43884388
- Enable assignment of rows in mixed-type DataFrame via .ix (:issue:`1432`)
43894389
- Reset index mapping when grouping Series in Cython (:issue:`1423`)
@@ -5040,7 +5040,7 @@ New Features
50405040
- Add `melt` function to `pandas.core.reshape`
50415041
- Add `level` parameter to group by level in Series and DataFrame
50425042
descriptive statistics (:issue:`313`)
5043-
- Add `head` and `tail` methods to Series, analogous to to DataFrame (PR
5043+
- Add `head` and `tail` methods to Series, analogous to DataFrame (PR
50445044
:issue:`296`)
50455045
- Add `Series.isin` function which checks if each value is contained in a
50465046
passed sequence (:issue:`289`)

doc/source/text.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -218,7 +218,8 @@ Extract first match in each subject (extract)
218218
``DataFrame``, depending on the subject and regular expression
219219
pattern (same behavior as pre-0.18.0). When ``expand=True`` it
220220
always returns a ``DataFrame``, which is more consistent and less
221-
confusing from the perspective of a user.
221+
confusing from the perspective of a user. ``expand=True`` is the
222+
default since version 0.23.0.
222223

223224
The ``extract`` method accepts a `regular expression
224225
<https://docs.python.org/3/library/re.html>`__ with at least one

doc/source/tutorials.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ pandas Cookbook
1919
The goal of this cookbook (by `Julia Evans <http://jvns.ca>`_) is to
2020
give you some concrete examples for getting started with pandas. These
2121
are examples with real-world data, and all the bugs and weirdness that
22-
that entails.
22+
entails.
2323

2424
Here are links to the v0.1 release. For an up-to-date table of contents, see the `pandas-cookbook GitHub
2525
repository <http://github.com/jvns/pandas-cookbook>`_. To run the examples in this tutorial, you'll need to

doc/source/whatsnew/v0.23.0.txt

+101-1
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,50 @@ Please note that the string `index` is not supported with the round trip format,
204204
new_df
205205
print(new_df.index.name)
206206

207+
.. _whatsnew_0230.enhancements.index_division_by_zero:
208+
209+
Index Division By Zero Fills Correctly
210+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
211+
212+
Division operations on ``Index`` and subclasses will now fill division of positive numbers by zero with ``np.inf``, division of negative numbers by zero with ``-np.inf`` and `0 / 0` with ``np.nan``. This matches existing ``Series`` behavior. (:issue:`19322`, :issue:`19347`)
213+
214+
Previous Behavior:
215+
216+
.. code-block:: ipython
217+
218+
In [6]: index = pd.Int64Index([-1, 0, 1])
219+
220+
In [7]: index / 0
221+
Out[7]: Int64Index([0, 0, 0], dtype='int64')
222+
223+
# Previous behavior yielded different results depending on the type of zero in the divisor
224+
In [8]: index / 0.0
225+
Out[8]: Float64Index([-inf, nan, inf], dtype='float64')
226+
227+
In [9]: index = pd.UInt64Index([0, 1])
228+
229+
In [10]: index / np.array([0, 0], dtype=np.uint64)
230+
Out[10]: UInt64Index([0, 0], dtype='uint64')
231+
232+
In [11]: pd.RangeIndex(1, 5) / 0
233+
ZeroDivisionError: integer division or modulo by zero
234+
235+
Current Behavior:
236+
237+
.. ipython:: python
238+
239+
index = pd.Int64Index([-1, 0, 1])
240+
# division by zero gives -infinity where negative, +infinity where positive, and NaN for 0 / 0
241+
index / 0
242+
243+
# The result of division by zero should not depend on whether the zero is int or float
244+
index / 0.0
245+
246+
index = pd.UInt64Index([0, 1])
247+
index / np.array([0, 0], dtype=np.uint64)
248+
249+
pd.RangeIndex(1, 5) / 0
250+
207251
.. _whatsnew_0230.enhancements.other:
208252

209253
Other Enhancements
@@ -289,13 +333,64 @@ Convert to an xarray DataArray
289333
p.to_xarray()
290334

291335

336+
.. _whatsnew_0230.api_breaking.build_changes:
337+
292338
Build Changes
293339
^^^^^^^^^^^^^
294340

295341
- Building pandas for development now requires ``cython >= 0.24`` (:issue:`18613`)
296342
- Building from source now explicitly requires ``setuptools`` in ``setup.py`` (:issue:`18113`)
297343
- Updated conda recipe to be in compliance with conda-build 3.0+ (:issue:`18002`)
298344

345+
.. _whatsnew_0230.api_breaking.extract:
346+
347+
Extraction of matching patterns from strings
348+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
349+
350+
By default, extracting matching patterns from strings with :func:`str.extract` used to return a
351+
``Series`` if a single group was being extracted (a ``DataFrame`` if more than one group was
352+
extracted``). As of Pandas 0.23.0 :func:`str.extract` always returns a ``DataFrame``, unless
353+
``expand`` is set to ``False`` (:issue:`11386`).
354+
355+
Also, ``None`` was an accepted value for the ``expand`` parameter (which was equivalent to
356+
``False``), but now raises a ``ValueError``.
357+
358+
Previous Behavior:
359+
360+
.. code-block:: ipython
361+
362+
In [1]: s = pd.Series(['number 10', '12 eggs'])
363+
364+
In [2]: extracted = s.str.extract('.*(\d\d).*')
365+
366+
In [3]: extracted
367+
Out [3]:
368+
0 10
369+
1 12
370+
dtype: object
371+
372+
In [4]: type(extracted)
373+
Out [4]:
374+
pandas.core.series.Series
375+
376+
New Behavior:
377+
378+
.. ipython:: python
379+
380+
s = pd.Series(['number 10', '12 eggs'])
381+
extracted = s.str.extract('.*(\d\d).*')
382+
extracted
383+
type(extracted)
384+
385+
To restore previous behavior, simply set ``expand`` to ``False``:
386+
387+
.. ipython:: python
388+
389+
s = pd.Series(['number 10', '12 eggs'])
390+
extracted = s.str.extract('.*(\d\d).*', expand=False)
391+
extracted
392+
type(extracted)
393+
299394
.. _whatsnew_0230.api:
300395

301396
Other API Changes
@@ -455,6 +550,7 @@ Datetimelike
455550
- Bug in :func:`Series.truncate` which raises ``TypeError`` with a monotonic ``PeriodIndex`` (:issue:`17717`)
456551
- Bug in :func:`~DataFrame.pct_change` using ``periods`` and ``freq`` returned different length outputs (:issue:`7292`)
457552
- Bug in comparison of :class:`DatetimeIndex` against ``None`` or ``datetime.date`` objects raising ``TypeError`` for ``==`` and ``!=`` comparisons instead of all-``False`` and all-``True``, respectively (:issue:`19301`)
553+
- Bug in :class:`Timestamp` and :func:`to_datetime` where a string representing a barely out-of-bounds timestamp would be incorrectly rounded down instead of raising ``OutOfBoundsDatetime`` (:issue:`19382`)
458554
-
459555

460556
Timezones
@@ -531,6 +627,7 @@ I/O
531627
- Bug in :func:`DataFrame.to_parquet` where an exception was raised if the write destination is S3 (:issue:`19134`)
532628
- :class:`Interval` now supported in :func:`DataFrame.to_excel` for all Excel file types (:issue:`19242`)
533629
- :class:`Timedelta` now supported in :func:`DataFrame.to_excel` for xls file type (:issue:`19242`, :issue:`9155`)
630+
- Bug in :meth:`pandas.io.stata.StataReader.value_labels` raising an ``AttributeError`` when called on very old files. Now returns an empty dict (:issue:`19417`)
534631

535632
Plotting
536633
^^^^^^^^
@@ -549,13 +646,15 @@ Groupby/Resample/Rolling
549646
- Bug in :func:`DataFrame.groupby` where tuples were interpreted as lists of keys rather than as keys (:issue:`17979`, :issue:`18249`)
550647
- Bug in ``transform`` where particular aggregation functions were being incorrectly cast to match the dtype(s) of the grouped data (:issue:`19200`)
551648
- Bug in :func:`DataFrame.groupby` where aggregation by ``first``/``last``/``min``/``max`` was causing timestamps to lose precision (:issue:`19526`)
649+
- Bug in :func:`DataFrame.transform` where particular aggregation functions were being incorrectly cast to match the dtype(s) of the grouped data (:issue:`19200`)
650+
- Bug in :func:`DataFrame.groupby` passing the `on=` kwarg, and subsequently using ``.apply()`` (:issue:`17813`)
552651

553652
Sparse
554653
^^^^^^
555654

556655
- Bug in which creating a ``SparseDataFrame`` from a dense ``Series`` or an unsupported type raised an uncontrolled exception (:issue:`19374`)
557656
- Bug in :class:`SparseDataFrame.to_csv` causing exception (:issue:`19384`)
558-
-
657+
- Bug in :class:`SparseSeries.memory_usage` which caused segfault by accessing non sparse elements (:issue:`19368`)
559658

560659
Reshaping
561660
^^^^^^^^^
@@ -571,6 +670,7 @@ Reshaping
571670
- Bug in :func:`DataFrame.stack`, :func:`DataFrame.unstack`, :func:`Series.unstack` which were not returning subclasses (:issue:`15563`)
572671
- Bug in timezone comparisons, manifesting as a conversion of the index to UTC in ``.concat()`` (:issue:`18523`)
573672
- Bug in :func:`concat` when concatting sparse and dense series it returns only a ``SparseDataFrame``. Should be a ``DataFrame``. (:issue:`18914`, :issue:`18686`, and :issue:`16874`)
673+
- Improved error message for :func:`DataFrame.merge` when there is no common merge key (:issue:`19427`)
574674
-
575675

576676

0 commit comments

Comments
 (0)