Skip to content

Commit 24941de

Browse files
Merge remote-tracking branch 'upstream/master' into groupby_tuples
2 parents ec4a3e7 + c65a0f5 commit 24941de

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

94 files changed

+3165
-1864
lines changed

Makefile

+3
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ clean_pyc:
1212
build: clean_pyc
1313
python setup.py build_ext --inplace
1414

15+
lint-diff:
16+
git diff master --name-only -- "*.py" | grep "pandas" | xargs flake8
17+
1518
develop: build
1619
-python setup.py develop
1720

asv_bench/benchmarks/categoricals.py

+9
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,9 @@ def setup(self):
2626
self.datetimes = pd.Series(pd.date_range(
2727
'1995-01-01 00:00:00', periods=10000, freq='s'))
2828

29+
self.values_some_nan = list(np.tile(self.categories + [np.nan], N))
30+
self.values_all_nan = [np.nan] * len(self.values)
31+
2932
def time_concat(self):
3033
concat([self.s, self.s])
3134

@@ -46,6 +49,12 @@ def time_constructor_datetimes_with_nat(self):
4649
t.iloc[-1] = pd.NaT
4750
Categorical(t)
4851

52+
def time_constructor_with_nan(self):
53+
Categorical(self.values_some_nan)
54+
55+
def time_constructor_all_nan(self):
56+
Categorical(self.values_all_nan)
57+
4958

5059
class Categoricals2(object):
5160
goal_time = 0.2

asv_bench/benchmarks/index_object.py

+19
Original file line numberDiff line numberDiff line change
@@ -219,3 +219,22 @@ def time_min(self):
219219

220220
def time_min_trivial(self):
221221
self.idx_inc.min()
222+
223+
224+
class IndexOps(object):
225+
goal_time = 0.2
226+
227+
def setup(self):
228+
N = 10000
229+
self.ridx = [RangeIndex(i * 100, (i + 1) * 100) for i in range(N)]
230+
self.iidx = [idx.astype(int) for idx in self.ridx]
231+
self.oidx = [idx.astype(str) for idx in self.iidx]
232+
233+
def time_concat_range(self):
234+
self.ridx[0].append(self.ridx[1:])
235+
236+
def time_concat_int(self):
237+
self.iidx[0].append(self.iidx[1:])
238+
239+
def time_concat_obj(self):
240+
self.oidx[0].append(self.oidx[1:])

asv_bench/benchmarks/indexing.py

+16
Original file line numberDiff line numberDiff line change
@@ -287,3 +287,19 @@ def setup(self):
287287

288288
def time_subset(self):
289289
self.p.ix[(self.inds, self.inds, self.inds)]
290+
291+
292+
class IndexerLookup(object):
293+
goal_time = 0.2
294+
295+
def setup(self):
296+
self.s = Series(range(10))
297+
298+
def time_lookup_iloc(self):
299+
self.s.iloc
300+
301+
def time_lookup_ix(self):
302+
self.s.ix
303+
304+
def time_lookup_loc(self):
305+
self.s.loc

ci/lint.sh

+7
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,13 @@ if [ "$LINT" ]; then
1616
fi
1717
echo "Linting *.py DONE"
1818

19+
echo "Linting setup.py"
20+
flake8 setup.py
21+
if [ $? -ne "0" ]; then
22+
RET=1
23+
fi
24+
echo "Linting setup.py DONE"
25+
1926
echo "Linting *.pyx"
2027
flake8 pandas --filename=*.pyx --select=E501,E302,E203,E111,E114,E221,E303,E128,E231,E126
2128
if [ $? -ne "0" ]; then

doc/source/advanced.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -833,15 +833,15 @@ Of course if you need integer based selection, then use ``iloc``
833833
IntervalIndex
834834
~~~~~~~~~~~~~
835835
836+
.. versionadded:: 0.20.0
837+
836838
:class:`IntervalIndex` together with its own dtype, ``interval`` as well as the
837839
:class:`Interval` scalar type, allow first-class support in pandas for interval
838840
notation.
839841
840842
The ``IntervalIndex`` allows some unique indexing and is also used as a
841843
return type for the categories in :func:`cut` and :func:`qcut`.
842844
843-
.. versionadded:: 0.20.0
844-
845845
.. warning::
846846
847847
These indexing behaviors are provisional and may change in a future version of pandas.
@@ -862,7 +862,7 @@ selecting that particular interval.
862862
df.loc[2]
863863
df.loc[[2, 3]]
864864
865-
If you select a lable *contained* within an interval, this will also select the interval.
865+
If you select a label *contained* within an interval, this will also select the interval.
866866
867867
.. ipython:: python
868868

doc/source/api.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1794,6 +1794,7 @@ Methods
17941794
Timestamp.strftime
17951795
Timestamp.strptime
17961796
Timestamp.time
1797+
Timestamp.timestamp
17971798
Timestamp.timetuple
17981799
Timestamp.timetz
17991800
Timestamp.to_datetime64

doc/source/basics.rst

+1-8
Original file line numberDiff line numberDiff line change
@@ -1738,11 +1738,6 @@ description.
17381738
Sorting
17391739
-------
17401740

1741-
.. warning::
1742-
1743-
The sorting API is substantially changed in 0.17.0, see :ref:`here <whatsnew_0170.api_breaking.sorting>` for these changes.
1744-
In particular, all sorting methods now return a new object by default, and **DO NOT** operate in-place (except by passing ``inplace=True``).
1745-
17461741
There are two obvious kinds of sorting that you may be interested in: sorting
17471742
by label and sorting by actual values.
17481743

@@ -1829,8 +1824,6 @@ faster than sorting the entire Series and calling ``head(n)`` on the result.
18291824
s.nsmallest(3)
18301825
s.nlargest(3)
18311826
1832-
.. versionadded:: 0.17.0
1833-
18341827
``DataFrame`` also has the ``nlargest`` and ``nsmallest`` methods.
18351828

18361829
.. ipython:: python
@@ -1881,7 +1874,7 @@ dtypes
18811874
------
18821875

18831876
The main types stored in pandas objects are ``float``, ``int``, ``bool``,
1884-
``datetime64[ns]`` and ``datetime64[ns, tz]`` (in >= 0.17.0), ``timedelta[ns]``,
1877+
``datetime64[ns]`` and ``datetime64[ns, tz]``, ``timedelta[ns]``,
18851878
``category`` and ``object``. In addition these dtypes have item sizes, e.g.
18861879
``int64`` and ``int32``. See :ref:`Series with TZ <timeseries.timezone_series>`
18871880
for more detail on ``datetime64[ns, tz]`` dtypes.

doc/source/categorical.rst

-2
Original file line numberDiff line numberDiff line change
@@ -632,8 +632,6 @@ To get a single value `Series` of type ``category`` pass in a list with a single
632632
String and datetime accessors
633633
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
634634

635-
.. versionadded:: 0.17.1
636-
637635
The accessors ``.dt`` and ``.str`` will work if the ``s.cat.categories`` are of an appropriate
638636
type:
639637

doc/source/computation.rst

-2
Original file line numberDiff line numberDiff line change
@@ -206,8 +206,6 @@ Window Functions
206206
functions and are now deprecated. These are replaced by using the :class:`~pandas.core.window.Rolling`, :class:`~pandas.core.window.Expanding` and :class:`~pandas.core.window.EWM`. objects and a corresponding method call.
207207

208208
The deprecation warning will show the new syntax, see an example :ref:`here <whatsnew_0180.window_deprecations>`
209-
You can view the previous documentation
210-
`here <http://pandas.pydata.org/pandas-docs/version/0.17.1/computation.html#moving-rolling-statistics-moments>`__
211209

212210
For working with data, a number of windows functions are provided for
213211
computing common *window* or *rolling* statistics. Among these are count, sum,

doc/source/contributing.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -877,12 +877,12 @@ directive is used. The sphinx syntax for that is:
877877
878878
.. code-block:: rst
879879
880-
.. versionadded:: 0.17.0
880+
.. versionadded:: 0.21.0
881881
882-
This will put the text *New in version 0.17.0* wherever you put the sphinx
882+
This will put the text *New in version 0.21.0* wherever you put the sphinx
883883
directive. This should also be put in the docstring when adding a new function
884-
or method (`example <https://github.com/pandas-dev/pandas/blob/v0.16.2/pandas/core/generic.py#L1959>`__)
885-
or a new keyword argument (`example <https://github.com/pandas-dev/pandas/blob/v0.16.2/pandas/core/frame.py#L1171>`__).
884+
or method (`example <https://github.com/pandas-dev/pandas/blob/v0.20.2/pandas/core/frame.py#L1495>`__)
885+
or a new keyword argument (`example <https://github.com/pandas-dev/pandas/blob/v0.20.2/pandas/core/generic.py#L568>`__).
886886
887887
Contributing your changes to *pandas*
888888
=====================================

doc/source/ecosystem.rst

+4-1
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,10 @@ API
146146

147147
`pandas-datareader <https://github.com/pydata/pandas-datareader>`__
148148
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
149-
``pandas-datareader`` is a remote data access library for pandas. ``pandas.io`` from pandas < 0.17.0 is now refactored/split-off to and importable from ``pandas_datareader`` (PyPI:``pandas-datareader``). Many/most of the supported APIs have at least a documentation paragraph in the `pandas-datareader docs <https://pandas-datareader.readthedocs.io/en/latest/>`_:
149+
``pandas-datareader`` is a remote data access library for pandas (PyPI:``pandas-datareader``).
150+
It is based on functionality that was located in ``pandas.io.data`` and ``pandas.io.wb`` but was
151+
split off in v0.19.
152+
See more in the `pandas-datareader docs <https://pandas-datareader.readthedocs.io/en/latest/>`_:
150153

151154
The following data feeds are available:
152155

doc/source/gotchas.rst

-2
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,6 @@ The ``+`` symbol indicates that the true memory usage could be higher, because
4747
pandas does not count the memory used by values in columns with
4848
``dtype=object``.
4949

50-
.. versionadded:: 0.17.1
51-
5250
Passing ``memory_usage='deep'`` will enable a more accurate memory usage report,
5351
that accounts for the full usage of the contained objects. This is optional
5452
as it can be expensive to do this deeper introspection.

doc/source/indexing.rst

-2
Original file line numberDiff line numberDiff line change
@@ -1632,8 +1632,6 @@ Missing values
16321632

16331633
.. _indexing.missing:
16341634

1635-
.. versionadded:: 0.17.1
1636-
16371635
.. important::
16381636

16391637
Even though ``Index`` can hold missing values (``NaN``), it should be avoided

doc/source/io.rst

+3-47
Original file line numberDiff line numberDiff line change
@@ -2689,11 +2689,6 @@ of sheet names can simply be passed to ``read_excel`` with no loss in performanc
26892689
# equivalent using the read_excel function
26902690
data = read_excel('path_to_file.xls', ['Sheet1', 'Sheet2'], index_col=None, na_values=['NA'])
26912691
2692-
.. versionadded:: 0.17
2693-
2694-
``read_excel`` can take an ``ExcelFile`` object as input
2695-
2696-
26972692
.. _io.excel.specifying_sheets:
26982693

26992694
Specifying Sheets
@@ -2754,8 +2749,6 @@ respectively.
27542749
Reading a ``MultiIndex``
27552750
++++++++++++++++++++++++
27562751

2757-
.. versionadded:: 0.17
2758-
27592752
``read_excel`` can read a ``MultiIndex`` index, by passing a list of columns to ``index_col``
27602753
and a ``MultiIndex`` column by passing a list of rows to ``header``. If either the ``index``
27612754
or ``columns`` have serialized level names those will be read in as well by specifying
@@ -2928,14 +2921,8 @@ one can pass an :class:`~pandas.io.excel.ExcelWriter`.
29282921
Writing Excel Files to Memory
29292922
+++++++++++++++++++++++++++++
29302923

2931-
.. versionadded:: 0.17
2932-
29332924
Pandas supports writing Excel files to buffer-like objects such as ``StringIO`` or
2934-
``BytesIO`` using :class:`~pandas.io.excel.ExcelWriter`.
2935-
2936-
.. versionadded:: 0.17
2937-
2938-
Added support for Openpyxl >= 2.2
2925+
``BytesIO`` using :class:`~pandas.io.excel.ExcelWriter`. Pandas also supports Openpyxl >= 2.2.
29392926

29402927
.. code-block:: python
29412928
@@ -3191,25 +3178,6 @@ both on the writing (serialization), and reading (deserialization).
31913178
optimizations in the io of the ``msgpack`` data. Since this is marked
31923179
as an EXPERIMENTAL LIBRARY, the storage format may not be stable until a future release.
31933180

3194-
As a result of writing format changes and other issues:
3195-
3196-
+----------------------+------------------------+
3197-
| Packed with | Can be unpacked with |
3198-
+======================+========================+
3199-
| pre-0.17 / Python 2 | any |
3200-
+----------------------+------------------------+
3201-
| pre-0.17 / Python 3 | any |
3202-
+----------------------+------------------------+
3203-
| 0.17 / Python 2 | - 0.17 / Python 2 |
3204-
| | - >=0.18 / any Python |
3205-
+----------------------+------------------------+
3206-
| 0.17 / Python 3 | >=0.18 / any Python |
3207-
+----------------------+------------------------+
3208-
| 0.18 | >= 0.18 |
3209-
+----------------------+------------------------+
3210-
3211-
Reading (files packed by older versions) is backward-compatibile, except for files packed with 0.17 in Python 2, in which case only they can only be unpacked in Python 2.
3212-
32133181
.. ipython:: python
32143182
32153183
df = pd.DataFrame(np.random.rand(5,2),columns=list('AB'))
@@ -3287,10 +3255,6 @@ for some advanced strategies
32873255
If you see a subset of results being returned, upgrade to ``PyTables`` >= 3.2.
32883256
Stores created previously will need to be rewritten using the updated version.
32893257

3290-
.. warning::
3291-
3292-
As of version 0.17.0, ``HDFStore`` will not drop rows that have all missing values by default. Previously, if all values (except the index) were missing, ``HDFStore`` would not write those rows to disk.
3293-
32943258
.. ipython:: python
32953259
:suppress:
32963260
:okexcept:
@@ -3388,7 +3352,7 @@ similar to how ``read_csv`` and ``to_csv`` work.
33883352
os.remove('store_tl.h5')
33893353
33903354
3391-
As of version 0.17.0, HDFStore will no longer drop rows that are all missing by default. This behavior can be enabled by setting ``dropna=True``.
3355+
HDFStore will by default not drop rows that are all missing. This behavior can be changed by setting ``dropna=True``.
33923356

33933357
.. ipython:: python
33943358
:suppress:
@@ -3632,12 +3596,6 @@ Querying
36323596
Querying a Table
36333597
++++++++++++++++
36343598

3635-
.. warning::
3636-
3637-
This query capabilities have changed substantially starting in ``0.13.0``.
3638-
Queries from prior version are accepted (with a ``DeprecationWarning``) printed
3639-
if its not string-like.
3640-
36413599
``select`` and ``delete`` operations have an optional criterion that can
36423600
be specified to select/delete only a subset of the data. This allows one
36433601
to have a very large on-disk table and retrieve only a portion of the
@@ -5098,10 +5056,8 @@ whether imported ``Categorical`` variables are ordered.
50985056
SAS Formats
50995057
-----------
51005058

5101-
.. versionadded:: 0.17.0
5102-
51035059
The top-level function :func:`read_sas` can read (but not write) SAS
5104-
`xport` (.XPT) and `SAS7BDAT` (.sas7bdat) format files were added in *v0.18.0*.
5060+
`xport` (.XPT) and (since *v0.18.0*) `SAS7BDAT` (.sas7bdat) format files.
51055061

51065062
SAS files only contain two value types: ASCII text and floating point
51075063
values (usually 8 bytes but sometimes truncated). For xport files,

doc/source/merging.rst

+1-5
Original file line numberDiff line numberDiff line change
@@ -550,8 +550,6 @@ standard database join operations between DataFrame objects:
550550
merge key only appears in ``'right'`` DataFrame, and ``both`` if the
551551
observation's merge key is found in both.
552552

553-
.. versionadded:: 0.17.0
554-
555553
- ``validate`` : string, default None.
556554
If specified, checks if merge is of specified type.
557555

@@ -766,9 +764,7 @@ If the user is aware of the duplicates in the right `DataFrame` but wants to ens
766764
The merge indicator
767765
~~~~~~~~~~~~~~~~~~~
768766

769-
.. versionadded:: 0.17.0
770-
771-
``merge`` now accepts the argument ``indicator``. If ``True``, a Categorical-type column called ``_merge`` will be added to the output object that takes on values:
767+
``merge`` accepts the argument ``indicator``. If ``True``, a Categorical-type column called ``_merge`` will be added to the output object that takes on values:
772768

773769
=================================== ================
774770
Observation Origin ``_merge`` value

doc/source/missing_data.rst

-4
Original file line numberDiff line numberDiff line change
@@ -352,10 +352,6 @@ examined :ref:`in the API <api.dataframe.missing>`.
352352
Interpolation
353353
~~~~~~~~~~~~~
354354

355-
.. versionadded:: 0.17.0
356-
357-
The ``limit_direction`` keyword argument was added.
358-
359355
Both Series and DataFrame objects have an ``interpolate`` method that, by default,
360356
performs linear interpolation at missing datapoints.
361357

0 commit comments

Comments
 (0)