Skip to content

Commit 2730d1c

Browse files
committed
Merge remote-tracking branch 'upstream/master' into windows_crlf
2 parents 17440b6 + 415012f commit 2730d1c

File tree

232 files changed

+9738
-2929
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

232 files changed

+9738
-2929
lines changed

Makefile

+1
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,4 @@ doc:
2323
cd doc; \
2424
python make.py clean; \
2525
python make.py html
26+
python make.py spellcheck

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,7 @@ All contributions, bug reports, bug fixes, documentation improvements, enhanceme
233233

234234
A detailed overview on how to contribute can be found in the **[contributing guide.](https://pandas.pydata.org/pandas-docs/stable/contributing.html)**
235235

236-
If you are simply looking to start working with the pandas codebase, navigate to the [GitHub “issues” tab](https://github.com/pandas-dev/pandas/issues) and start looking through interesting issues. There are a number of issues listed under [Docs](https://github.com/pandas-dev/pandas/issues?labels=Docs&sort=updated&state=open) and [Difficulty Novice](https://github.com/pandas-dev/pandas/issues?q=is%3Aopen+is%3Aissue+label%3A%22Difficulty+Novice%22) where you could start out.
236+
If you are simply looking to start working with the pandas codebase, navigate to the [GitHub “issues” tab](https://github.com/pandas-dev/pandas/issues) and start looking through interesting issues. There are a number of issues listed under [Docs](https://github.com/pandas-dev/pandas/issues?labels=Docs&sort=updated&state=open) and [good first issue](https://github.com/pandas-dev/pandas/issues?labels=good+first+issue&sort=updated&state=open) where you could start out.
237237

238238
You can also triage issues which may include reproducing bug reports, or asking for vital information such as version numbers or reproduction instructions. If you would like to start triaging issues, one easy way to get started is to [subscribe to pandas on CodeTriage](https://www.codetriage.com/pandas-dev/pandas).
239239

asv_bench/benchmarks/categoricals.py

+24
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ def setup(self):
5151

5252
self.values_some_nan = list(np.tile(self.categories + [np.nan], N))
5353
self.values_all_nan = [np.nan] * len(self.values)
54+
self.values_all_int8 = np.ones(N, 'int8')
5455

5556
def time_regular(self):
5657
pd.Categorical(self.values, self.categories)
@@ -70,6 +71,9 @@ def time_with_nan(self):
7071
def time_all_nan(self):
7172
pd.Categorical(self.values_all_nan)
7273

74+
def time_from_codes_all_int8(self):
75+
pd.Categorical.from_codes(self.values_all_int8, self.categories)
76+
7377

7478
class ValueCounts(object):
7579

@@ -169,3 +173,23 @@ def setup(self, dtype):
169173

170174
def time_isin_categorical(self, dtype):
171175
self.series.isin(self.sample)
176+
177+
178+
class IsMonotonic(object):
179+
180+
def setup(self):
181+
N = 1000
182+
self.c = pd.CategoricalIndex(list('a' * N + 'b' * N + 'c' * N))
183+
self.s = pd.Series(self.c)
184+
185+
def time_categorical_index_is_monotonic_increasing(self):
186+
self.c.is_monotonic_increasing
187+
188+
def time_categorical_index_is_monotonic_decreasing(self):
189+
self.c.is_monotonic_decreasing
190+
191+
def time_categorical_series_is_monotonic_increasing(self):
192+
self.s.is_monotonic_increasing
193+
194+
def time_categorical_series_is_monotonic_decreasing(self):
195+
self.s.is_monotonic_decreasing

asv_bench/benchmarks/frame_methods.py

+18
Original file line numberDiff line numberDiff line change
@@ -512,3 +512,21 @@ def time_nlargest(self, keep):
512512

513513
def time_nsmallest(self, keep):
514514
self.df.nsmallest(100, 'A', keep=keep)
515+
516+
517+
class Describe(object):
518+
519+
goal_time = 0.2
520+
521+
def setup(self):
522+
self.df = DataFrame({
523+
'a': np.random.randint(0, 100, int(1e6)),
524+
'b': np.random.randint(0, 100, int(1e6)),
525+
'c': np.random.randint(0, 100, int(1e6))
526+
})
527+
528+
def time_series_describe(self):
529+
self.df['a'].describe()
530+
531+
def time_dataframe_describe(self):
532+
self.df.describe()

asv_bench/benchmarks/pandas_vb_common.py

+1-4
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,7 @@
22
from importlib import import_module
33

44
import numpy as np
5-
try:
6-
from pandas import Panel
7-
except ImportError:
8-
from pandas import WidePanel as Panel # noqa
5+
from pandas import Panel
96

107
# Compatibility import for lib
118
for imp in ['pandas._libs.lib', 'pandas.lib']:

ci/appveyor-27.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ dependencies:
1111
- lxml
1212
- matplotlib
1313
- numexpr
14-
- numpy=1.10*
14+
- numpy=1.12*
1515
- openpyxl
16-
- pytables==3.2.2
16+
- pytables
1717
- python=2.7.*
1818
- pytz
1919
- s3fs

ci/appveyor-36.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ dependencies:
99
- feather-format
1010
- matplotlib
1111
- numexpr
12-
- numpy=1.13*
12+
- numpy=1.14*
1313
- openpyxl
1414
- pyarrow
1515
- pytables

ci/travis-36.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,10 @@ dependencies:
1818
- numexpr
1919
- numpy
2020
- openpyxl
21-
- pandas-datareader
2221
- psycopg2
2322
- pyarrow
2423
- pymysql
2524
- pytables
26-
- python-dateutil
2725
- python-snappy
2826
- python=3.6*
2927
- pytz
@@ -45,3 +43,5 @@ dependencies:
4543
- pip:
4644
- brotlipy
4745
- coverage
46+
- pandas-datareader
47+
- python-dateutil

doc/README.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Some other important things to know about the docs:
4242
- The docstrings follow the **Numpy Docstring Standard** which is used widely
4343
in the Scientific Python community. This standard specifies the format of
4444
the different sections of the docstring. See `this document
45-
<https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt>`_
45+
<https://numpydoc.readthedocs.io/en/latest/>`_
4646
for a detailed explanation, or look at some of the existing functions to
4747
extend it in a similar manner.
4848

doc/make.py

+15-2
Original file line numberDiff line numberDiff line change
@@ -224,8 +224,9 @@ def _sphinx_build(self, kind):
224224
--------
225225
>>> DocBuilder(num_jobs=4)._sphinx_build('html')
226226
"""
227-
if kind not in ('html', 'latex'):
228-
raise ValueError('kind must be html or latex, not {}'.format(kind))
227+
if kind not in ('html', 'latex', 'spelling'):
228+
raise ValueError('kind must be html, latex or '
229+
'spelling, not {}'.format(kind))
229230

230231
self._run_os('sphinx-build',
231232
'-j{}'.format(self.num_jobs),
@@ -304,6 +305,18 @@ def zip_html(self):
304305
'-q',
305306
*fnames)
306307

308+
def spellcheck(self):
309+
"""Spell check the documentation."""
310+
self._sphinx_build('spelling')
311+
output_location = os.path.join('build', 'spelling', 'output.txt')
312+
with open(output_location) as output:
313+
lines = output.readlines()
314+
if lines:
315+
raise SyntaxError(
316+
'Found misspelled words.'
317+
' Check pandas/doc/build/spelling/output.txt'
318+
' for more details.')
319+
307320

308321
def main():
309322
cmds = [method for method in dir(DocBuilder) if not method.startswith('_')]

doc/source/_static/reshaping_melt.png

51.7 KB
Loading
50.9 KB
Loading
53.2 KB
Loading
52.6 KB
Loading
57.2 KB
Loading
56.6 KB
Loading

doc/source/advanced.rst

+51-2
Original file line numberDiff line numberDiff line change
@@ -342,7 +342,7 @@ As usual, **both sides** of the slicers are included as this is label indexing.
342342
columns=micolumns).sort_index().sort_index(axis=1)
343343
dfmi
344344
345-
Basic multi-index slicing using slices, lists, and labels.
345+
Basic MultiIndex slicing using slices, lists, and labels.
346346

347347
.. ipython:: python
348348
@@ -924,6 +924,55 @@ bins, with ``NaN`` representing a missing value similar to other dtypes.
924924
925925
pd.cut([0, 3, 5, 1], bins=c.categories)
926926
927+
928+
Generating Ranges of Intervals
929+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
930+
931+
If we need intervals on a regular frequency, we can use the :func:`interval_range` function
932+
to create an ``IntervalIndex`` using various combinations of ``start``, ``end``, and ``periods``.
933+
The default frequency for ``interval_range`` is a 1 for numeric intervals, and calendar day for
934+
datetime-like intervals:
935+
936+
.. ipython:: python
937+
938+
pd.interval_range(start=0, end=5)
939+
940+
pd.interval_range(start=pd.Timestamp('2017-01-01'), periods=4)
941+
942+
pd.interval_range(end=pd.Timedelta('3 days'), periods=3)
943+
944+
The ``freq`` parameter can used to specify non-default frequencies, and can utilize a variety
945+
of :ref:`frequency aliases <timeseries.offset_aliases>` with datetime-like intervals:
946+
947+
.. ipython:: python
948+
949+
pd.interval_range(start=0, periods=5, freq=1.5)
950+
951+
pd.interval_range(start=pd.Timestamp('2017-01-01'), periods=4, freq='W')
952+
953+
pd.interval_range(start=pd.Timedelta('0 days'), periods=3, freq='9H')
954+
955+
Additionally, the ``closed`` parameter can be used to specify which side(s) the intervals
956+
are closed on. Intervals are closed on the right side by default.
957+
958+
.. ipython:: python
959+
960+
pd.interval_range(start=0, end=4, closed='both')
961+
962+
pd.interval_range(start=0, end=4, closed='neither')
963+
964+
.. versionadded:: 0.23.0
965+
966+
Specifying ``start``, ``end``, and ``periods`` will generate a range of evenly spaced
967+
intervals from ``start`` to ``end`` inclusively, with ``periods`` number of elements
968+
in the resulting ``IntervalIndex``:
969+
970+
.. ipython:: python
971+
972+
pd.interval_range(start=0, end=6, periods=4)
973+
974+
pd.interval_range(pd.Timestamp('2018-01-01'), pd.Timestamp('2018-02-28'), periods=3)
975+
927976
Miscellaneous indexing FAQ
928977
--------------------------
929978
@@ -990,7 +1039,7 @@ On the other hand, if the index is not monotonic, then both slice bounds must be
9901039
KeyError: 'Cannot get right slice bound for non-unique label: 3'
9911040
9921041
:meth:`Index.is_monotonic_increasing` and :meth:`Index.is_monotonic_decreasing` only check that
993-
an index is weakly monotonic. To check for strict montonicity, you can combine one of those with
1042+
an index is weakly monotonic. To check for strict monotonicity, you can combine one of those with
9941043
:meth:`Index.is_unique`
9951044
9961045
.. ipython:: python

doc/source/api.rst

+11-2
Original file line numberDiff line numberDiff line change
@@ -1459,7 +1459,6 @@ Modifying and Computations
14591459
Index.is_floating
14601460
Index.is_integer
14611461
Index.is_interval
1462-
Index.is_lexsorted_for_tuple
14631462
Index.is_mixed
14641463
Index.is_numeric
14651464
Index.is_object
@@ -1471,11 +1470,19 @@ Modifying and Computations
14711470
Index.where
14721471
Index.take
14731472
Index.putmask
1474-
Index.set_names
14751473
Index.unique
14761474
Index.nunique
14771475
Index.value_counts
14781476

1477+
Compatibility with MultiIndex
1478+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1479+
.. autosummary::
1480+
:toctree: generated/
1481+
1482+
Index.set_names
1483+
Index.is_lexsorted_for_tuple
1484+
Index.droplevel
1485+
14791486
Missing Values
14801487
~~~~~~~~~~~~~~
14811488
.. autosummary::
@@ -1632,6 +1639,8 @@ IntervalIndex Components
16321639
IntervalIndex.length
16331640
IntervalIndex.values
16341641
IntervalIndex.is_non_overlapping_monotonic
1642+
IntervalIndex.get_loc
1643+
IntervalIndex.get_indexer
16351644

16361645

16371646
.. _api.multiindex:

doc/source/basics.rst

+8-8
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ either match on the *index* or *columns* via the **axis** keyword:
168168
169169
df_orig = df
170170
171-
Furthermore you can align a level of a multi-indexed DataFrame with a Series.
171+
Furthermore you can align a level of a MultiIndexed DataFrame with a Series.
172172

173173
.. ipython:: python
174174
@@ -593,7 +593,7 @@ categorical columns:
593593
frame = pd.DataFrame({'a': ['Yes', 'Yes', 'No', 'No'], 'b': range(4)})
594594
frame.describe()
595595
596-
This behaviour can be controlled by providing a list of types as ``include``/``exclude``
596+
This behavior can be controlled by providing a list of types as ``include``/``exclude``
597597
arguments. The special value ``all`` can also be used:
598598

599599
.. ipython:: python
@@ -1034,7 +1034,7 @@ Passing a single function to ``.transform()`` with a ``Series`` will yield a sin
10341034
Transform with multiple functions
10351035
+++++++++++++++++++++++++++++++++
10361036

1037-
Passing multiple functions will yield a column multi-indexed DataFrame.
1037+
Passing multiple functions will yield a column MultiIndexed DataFrame.
10381038
The first level will be the original frame column names; the second level
10391039
will be the names of the transforming functions.
10401040

@@ -1060,7 +1060,7 @@ Passing a dict of functions will allow selective transforming per column.
10601060
10611061
tsdf.transform({'A': np.abs, 'B': lambda x: x+1})
10621062
1063-
Passing a dict of lists will generate a multi-indexed DataFrame with these
1063+
Passing a dict of lists will generate a MultiIndexed DataFrame with these
10641064
selective transforms.
10651065

10661066
.. ipython:: python
@@ -1889,12 +1889,12 @@ faster than sorting the entire Series and calling ``head(n)`` on the result.
18891889
df.nsmallest(5, ['a', 'c'])
18901890
18911891
1892-
.. _basics.multi-index_sorting:
1892+
.. _basics.multiindex_sorting:
18931893

1894-
Sorting by a multi-index column
1895-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1894+
Sorting by a MultiIndex column
1895+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
18961896

1897-
You must be explicit about sorting when the column is a multi-index, and fully specify
1897+
You must be explicit about sorting when the column is a MultiIndex, and fully specify
18981898
all levels to ``by``.
18991899

19001900
.. ipython:: python

doc/source/categorical.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -358,10 +358,10 @@ Renaming categories is done by assigning new values to the
358358
s
359359
s.cat.categories = ["Group %s" % g for g in s.cat.categories]
360360
s
361-
s.cat.rename_categories([1,2,3])
361+
s = s.cat.rename_categories([1,2,3])
362362
s
363363
# You can also pass a dict-like object to map the renaming
364-
s.cat.rename_categories({1: 'x', 2: 'y', 3: 'z'})
364+
s = s.cat.rename_categories({1: 'x', 2: 'y', 3: 'z'})
365365
s
366366
367367
.. note::

doc/source/conf.py

+4
Original file line numberDiff line numberDiff line change
@@ -73,10 +73,14 @@
7373
'sphinx.ext.ifconfig',
7474
'sphinx.ext.linkcode',
7575
'nbsphinx',
76+
'sphinxcontrib.spelling'
7677
]
7778

7879
exclude_patterns = ['**.ipynb_checkpoints']
7980

81+
spelling_word_list_filename = ['spelling_wordlist.txt', 'names_wordlist.txt']
82+
spelling_ignore_pypi_package_names = True
83+
8084
with open("index.rst") as f:
8185
index_rst_lines = f.readlines()
8286

doc/source/contributing.rst

+21-2
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@ If you are brand new to pandas or open-source development, we recommend going
1717
through the `GitHub "issues" tab <https://github.com/pandas-dev/pandas/issues>`_
1818
to find issues that interest you. There are a number of issues listed under `Docs
1919
<https://github.com/pandas-dev/pandas/issues?labels=Docs&sort=updated&state=open>`_
20-
and `Difficulty Novice
21-
<https://github.com/pandas-dev/pandas/issues?q=is%3Aopen+is%3Aissue+label%3A%22Difficulty+Novice%22>`_
20+
and `good first issue
21+
<https://github.com/pandas-dev/pandas/issues?labels=good+first+issue&sort=updated&state=open>`_
2222
where you could start out. Once you've found an interesting issue, you can
2323
return here to get your development environment setup.
2424

@@ -436,6 +436,25 @@ the documentation are also built by Travis-CI. These docs are then hosted `here
436436
<http://pandas-docs.github.io/pandas-docs-travis>`__, see also
437437
the :ref:`Continuous Integration <contributing.ci>` section.
438438

439+
Spell checking documentation
440+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
441+
442+
When contributing to documentation to **pandas** it's good to check if your work
443+
contains any spelling errors. Sphinx provides an easy way to spell check documentation
444+
and docstrings.
445+
446+
Running the spell check is easy. Just navigate to your local ``pandas/doc/`` directory and run::
447+
448+
python make.py spellcheck
449+
450+
The spellcheck will take a few minutes to run (between 1 to 6 minutes). Sphinx will alert you
451+
with warnings and misspelt words - these misspelt words will be added to a file called
452+
``output.txt`` and you can find it on your local directory ``pandas/doc/build/spelling/``.
453+
454+
The Sphinx spelling extension uses an EN-US dictionary to correct words, what means that in
455+
some cases you might need to add a word to this dictionary. You can do so by adding the word to
456+
the bag-of-words file named ``spelling_wordlist.txt`` located in the folder ``pandas/doc/``.
457+
439458
.. _contributing.code:
440459

441460
Contributing to the code base

0 commit comments

Comments
 (0)