pandas-dev · TomAugspurger · Jun 7, 2018 · May 17, 2018 · May 18, 2018 · May 18, 2018
diff --git a/doc/make.py b/doc/make.py
@@ -224,8 +224,9 @@ def _sphinx_build(self, kind):
         --------
         >>> DocBuilder(num_jobs=4)._sphinx_build('html')
         """
-        if kind not in ('html', 'latex'):
-            raise ValueError('kind must be html or latex, not {}'.format(kind))
+        if kind not in ('html', 'latex', 'spelling'):
+            raise ValueError('kind must be html, latex or '
+                             'spelling, not {}'.format(kind))
 
         self._run_os('sphinx-build',
                      '-j{}'.format(self.num_jobs),
@@ -304,6 +305,18 @@ def zip_html(self):
                      '-q',
                      *fnames)
 
+    def spellcheck(self):
+        """Spell check the documentation."""
+        self._sphinx_build('spelling')
+        output_location = os.path.join('build', 'spelling', 'output.txt')
+        with open(output_location) as output:
+            lines = output.readlines()
+            if lines:
+                raise SyntaxError(
+                    'Found misspelled words.'
+                    ' Check pandas/doc/build/spelling/output.txt'
+                    ' for more details.')
+
 
 def main():
     cmds = [method for method in dir(DocBuilder) if not method.startswith('_')]

diff --git a/doc/source/advanced.rst b/doc/source/advanced.rst
@@ -342,7 +342,7 @@ As usual, **both sides** of the slicers are included as this is label indexing.
                        columns=micolumns).sort_index().sort_index(axis=1)
    dfmi
 
-Basic multi-index slicing using slices, lists, and labels.
+Basic MultiIndex slicing using slices, lists, and labels.
 
 .. ipython:: python
 
@@ -611,7 +611,7 @@ index positions. ``take`` will also accept negative integers as relative positio
    ser.iloc[positions]
    ser.take(positions)
 
-For DataFrames, the given indices should be a 1d list or ndarray that specifies
+For DataFrames, the given indexes should be a 1d list or ndarray that specifies
 row or column positions.
 
 .. ipython:: python
@@ -623,7 +623,7 @@ row or column positions.
    frm.take([0, 2], axis=1)
 
 It is important to note that the ``take`` method on pandas objects are not
-intended to work on boolean indices and may return unexpected results.
+intended to work on boolean indexes and may return unexpected results.
 
 .. ipython:: python
 
@@ -990,7 +990,7 @@ On the other hand, if the index is not monotonic, then both slice bounds must be
     KeyError: 'Cannot get right slice bound for non-unique label: 3'
 
 :meth:`Index.is_monotonic_increasing` and :meth:`Index.is_monotonic_decreasing` only check that
-an index is weakly monotonic. To check for strict montonicity, you can combine one of those with
+an index is weakly monotonic. To check for strict monotonicity, you can combine one of those with
 :meth:`Index.is_unique`
 
 .. ipython:: python

diff --git a/doc/source/basics.rst b/doc/source/basics.rst
@@ -593,7 +593,7 @@ categorical columns:
     frame = pd.DataFrame({'a': ['Yes', 'Yes', 'No', 'No'], 'b': range(4)})
     frame.describe()
 
-This behaviour can be controlled by providing a list of types as ``include``/``exclude``
+This behavior can be controlled by providing a list of types as ``include``/``exclude``
 arguments. The special value ``all`` can also be used:
 
 .. ipython:: python

diff --git a/doc/source/categorical.rst b/doc/source/categorical.rst
@@ -370,7 +370,7 @@ Renaming categories is done by assigning new values to the
 
 .. note::
 
-    Be aware that assigning new categories is an inplace operation, while most other operations
+    Be aware that assigning new categories is an in place operation, while most other operations
     under ``Series.cat`` per default return a new ``Series`` of dtype `category`.
 
 Categories must be unique or a `ValueError` is raised:

diff --git a/doc/source/comparison_with_sql.rst b/doc/source/comparison_with_sql.rst
@@ -228,9 +228,9 @@ Grouping by more than one column is done by passing a list of columns to the
 JOIN
 ----
 JOINs can be performed with :meth:`~pandas.DataFrame.join` or :meth:`~pandas.merge`. By default,
-:meth:`~pandas.DataFrame.join` will join the DataFrames on their indices. Each method has
+:meth:`~pandas.DataFrame.join` will join the DataFrames on their indexes. Each method has
 parameters allowing you to specify the type of join to perform (LEFT, RIGHT, INNER, FULL) or the
-columns to join on (column names or indices).
+columns to join on (column names or indexes).
 
 .. ipython:: python
 

diff --git a/doc/source/conf.py b/doc/source/conf.py
@@ -73,10 +73,14 @@
               'sphinx.ext.ifconfig',
               'sphinx.ext.linkcode',
               'nbsphinx',
+              'sphinxcontrib.spelling'
               ]
 
 exclude_patterns = ['**.ipynb_checkpoints']
 
+spelling_word_list_filename = 'spelling_wordlist.txt'
+spelling_ignore_pypi_package_names = True
+
 with open("index.rst") as f:
     index_rst_lines = f.readlines()
 

diff --git a/doc/source/contributing.rst b/doc/source/contributing.rst
@@ -436,6 +436,25 @@ the documentation are also built by Travis-CI. These docs are then hosted `here
 <http://pandas-docs.github.io/pandas-docs-travis>`__, see also
 the :ref:`Continuous Integration <contributing.ci>` section.
 
+Spell checking documentation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When contributing to documentation to **pandas** it's good to check if your work
+contains any spelling errors. Sphinx provides an easy way to spell check documentation
+and docstrings.
+
+Running the spell check is easy. Just navigate to your local ``pandas/doc/`` directory and run::
+
+    python make.py spellcheck
+
+The spellcheck will take a few minutes to run (between 1 to 6 minutes). Sphinx will alert you
+with warnings and misspelt words - these misspelt words will be added to a file called
+``output.txt`` and you can find it on your local directory ``pandas/doc/build/spelling/``.
+
+The Sphinx spelling extension uses an EN-US dictionary to correct words, what means that in
+some cases you might need to add a word to this dictionary. You can do so by adding the word to
+the bag-of-words file named ``spelling_wordlist.txt`` located in the folder ``pandas/doc/``.
+
 .. _contributing.code:
 
 Contributing to the code base

diff --git a/doc/source/contributing_docstring.rst b/doc/source/contributing_docstring.rst
@@ -103,7 +103,7 @@ left before or after the docstring. The text starts in the next line after the
 opening quotes. The closing quotes have their own line
 (meaning that they are not at the end of the last sentence).
 
-In rare occasions reST styles like bold text or itallics will be used in
+In rare occasions reST styles like bold text or italics will be used in
 docstrings, but is it common to have inline code, which is presented between
 backticks. It is considered inline code:
 
@@ -513,7 +513,7 @@ instead of at the beginning, it is good to let the users know about it.
 To give an intuition on what can be considered related, here there are some
 examples:
 
-* ``loc`` and ``iloc``, as they do the same, but in one case providing indices
+* ``loc`` and ``iloc``, as they do the same, but in one case providing indexes
   and in the other positions
 * ``max`` and ``min``, as they do the opposite
 * ``iterrows``, ``itertuples`` and ``iteritems``, as it is easy that a user
@@ -692,7 +692,7 @@ the standard library go first, followed by third-party libraries (like
 matplotlib).
 
 When illustrating examples with a single ``Series`` use the name ``s``, and if
-illustrating with a single ``DataFrame`` use the name ``df``. For indices,
+illustrating with a single ``DataFrame`` use the name ``df``. For indexes,
 ``idx`` is the preferred name. If a set of homogeneous ``Series`` or
 ``DataFrame`` is used, name them ``s1``, ``s2``, ``s3``...  or ``df1``,
 ``df2``, ``df3``... If the data is not homogeneous, and more than one structure
@@ -706,7 +706,7 @@ than 5, to show the example with the default values. If doing the ``mean``, we
 could use something like ``[1, 2, 3]``, so it is easy to see that the value
 returned is the mean.
 
-For more complex examples (groupping for example), avoid using data without
+For more complex examples (grouping for example), avoid using data without
 interpretation, like a matrix of random numbers with columns A, B, C, D...
 And instead use a meaningful example, which makes it easier to understand the
 concept. Unless required by the example, use names of animals, to keep examples
@@ -877,7 +877,7 @@ be tricky. Here are some attention points:
   the actual error only the error name is sufficient.
 
 * If there is a small part of the result that can vary (e.g. a hash in an object
-  represenation), you can use ``...`` to represent this part.
+  representation), you can use ``...`` to represent this part.
 
   If you want to show that ``s.plot()`` returns a matplotlib AxesSubplot object,
   this will fail the doctest ::

diff --git a/doc/source/cookbook.rst b/doc/source/cookbook.rst
@@ -286,7 +286,7 @@ New Columns
    df = pd.DataFrame(
         {'AAA' : [1,1,1,2,2,2,3,3], 'BBB' : [2,1,3,4,5,1,2,3]}); df
 
-Method 1 : idxmin() to get the index of the mins
+Method 1 : idxmin() to get the index of the minimums
 
 .. ipython:: python
 
@@ -307,7 +307,7 @@ MultiIndexing
 
 The :ref:`multindexing <advanced.hierarchical>` docs.
 
-`Creating a multi-index from a labeled frame
+`Creating a MultiIndex from a labeled frame
 <http://stackoverflow.com/questions/14916358/reshaping-dataframes-in-pandas-based-on-column-labels>`__
 
 .. ipython:: python
@@ -330,7 +330,7 @@ The :ref:`multindexing <advanced.hierarchical>` docs.
 Arithmetic
 **********
 
-`Performing arithmetic with a multi-index that needs broadcasting
+`Performing arithmetic with a MultiIndex that needs broadcasting
 <http://stackoverflow.com/questions/19501510/divide-entire-pandas-multiindex-dataframe-by-dataframe-variable/19502176#19502176>`__
 
 .. ipython:: python
@@ -342,7 +342,7 @@ Arithmetic
 Slicing
 *******
 
-`Slicing a multi-index with xs
+`Slicing a MultiIndex with xs
 <http://stackoverflow.com/questions/12590131/how-to-slice-multindex-columns-in-pandas-dataframes>`__
 
 .. ipython:: python
@@ -363,7 +363,7 @@ To take the cross section of the 1st level and 1st axis the index:
 
    df.xs('six',level=1,axis=0)
 
-`Slicing a multi-index with xs, method #2
+`Slicing a MultiIndex with xs, method #2
 <http://stackoverflow.com/questions/14964493/multiindex-based-indexing-in-pandas>`__
 
 .. ipython:: python
@@ -386,13 +386,13 @@ To take the cross section of the 1st level and 1st axis the index:
    df.loc[(All,'Math'),('Exams')]
    df.loc[(All,'Math'),(All,'II')]
 
-`Setting portions of a multi-index with xs
+`Setting portions of a MultiIndex with xs
 <http://stackoverflow.com/questions/19319432/pandas-selecting-a-lower-level-in-a-dataframe-to-do-a-ffill>`__
 
 Sorting
 *******
 
-`Sort by specific column or an ordered list of columns, with a multi-index
+`Sort by specific column or an ordered list of columns, with a MultiIndex
 <http://stackoverflow.com/questions/14733871/mutli-index-sorting-in-pandas>`__
 
 .. ipython:: python
@@ -664,7 +664,7 @@ The :ref:`Pivot <reshaping.pivot>` docs.
 `Plot pandas DataFrame with year over year data
 <http://stackoverflow.com/questions/30379789/plot-pandas-data-frame-with-year-over-year-data>`__
 
-To create year and month crosstabulation:
+To create year and month cross tabulation:
 
 .. ipython:: python
 
@@ -677,7 +677,7 @@ To create year and month crosstabulation:
 Apply
 *****
 
-`Rolling Apply to Organize - Turning embedded lists into a multi-index frame
+`Rolling Apply to Organize - Turning embedded lists into a MultiIndex frame
 <http://stackoverflow.com/questions/17349981/converting-pandas-dataframe-with-categorical-values-into-binary-values>`__
 
 .. ipython:: python
@@ -1029,8 +1029,8 @@ Skip row between header and data
     01.01.1990 05:00;21;11;12;13
     """
 
-Option 1: pass rows explicitly to skiprows
-""""""""""""""""""""""""""""""""""""""""""
+Option 1: pass rows explicitly to skip rows
+"""""""""""""""""""""""""""""""""""""""""""
 
 .. ipython:: python
 

diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst
@@ -882,7 +882,7 @@ dictionary of DataFrames as above, and the following named parameters:
    :header: "Parameter", "Default", "Description"
    :widths: 10, 10, 40
 
-   intersect, ``False``, drops elements whose indices do not align
+   intersect, ``False``, drops elements whose indexes do not align
    orient, ``items``, use ``minor`` to use DataFrames' columns as panel items
 
 For example, compare to the construction above:
@@ -1014,7 +1014,7 @@ Deprecate Panel
 Over the last few years, pandas has increased in both breadth and depth, with new features,
 datatype support, and manipulation routines. As a result, supporting efficient indexing and functional
 routines for ``Series``, ``DataFrame`` and ``Panel`` has contributed to an increasingly fragmented and
-difficult-to-understand codebase.
+difficult-to-understand code base.
 
 The 3-D structure of a ``Panel`` is much less common for many types of data analysis,
 than the 1-D of the ``Series`` or the 2-D of the ``DataFrame``. Going forward it makes sense for
@@ -1023,7 +1023,7 @@ pandas to focus on these areas exclusively.
 Oftentimes, one can simply use a MultiIndex ``DataFrame`` for easily working with higher dimensional data.
 
 In addition, the ``xarray`` package was built from the ground up, specifically in order to
-support the multi-dimensional analysis that is one of ``Panel`` s main usecases.
+support the multi-dimensional analysis that is one of ``Panel`` s main use cases.
 `Here is a link to the xarray panel-transition documentation <http://xarray.pydata.org/en/stable/pandas.html#panel-transition>`__.
 
 .. ipython:: python

diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst
@@ -184,8 +184,8 @@ and metadata disseminated in
 `SDMX <http://www.sdmx.org>`_ 2.1, an ISO-standard
 widely used by institutions such as statistics offices, central banks,   
 and international organisations. pandaSDMX can expose datasets and related 
-structural metadata including dataflows, code-lists, 
-and datastructure definitions as pandas Series 
+structural metadata including data flows, code-lists,
+and data structure definitions as pandas Series
 or multi-indexed DataFrames.  
 
 `fredapi <https://github.com/mortada/fredapi>`__
@@ -260,7 +260,7 @@ Data validation
 `Engarde <http://engarde.readthedocs.io/en/latest/>`__
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Engarde is a lightweight library used to explicitly state your assumptions abour your datasets
+Engarde is a lightweight library used to explicitly state your assumptions about your datasets
 and check that they're *actually* true.
 
 .. _ecosystem.extensions:

diff --git a/doc/source/enhancingperf.rst b/doc/source/enhancingperf.rst
@@ -32,7 +32,7 @@ Cython (Writing C extensions for pandas)
 ----------------------------------------
 
 For many use cases writing pandas in pure Python and NumPy is sufficient. In some
-computationally heavy applications however, it can be possible to achieve sizeable
+computationally heavy applications however, it can be possible to achieve sizable
 speed-ups by offloading work to `cython <http://cython.org/>`__.
 
 This tutorial assumes you have refactored as much as possible in Python, for example
@@ -601,7 +601,7 @@ on the original ``DataFrame`` or return a copy with the new column.
 
    For backwards compatibility, ``inplace`` defaults to ``True`` if not
    specified. This will change in a future version of pandas - if your
-   code depends on an inplace assignment you should update to explicitly
+   code depends on an in place assignment you should update to explicitly
    set ``inplace=True``.
 
 .. ipython:: python
@@ -806,7 +806,7 @@ truncate any strings that are more than 60 characters in length. Second, we
 can't pass ``object`` arrays to ``numexpr`` thus string comparisons must be
 evaluated in Python space.
 
-The upshot is that this *only* applies to object-dtype'd expressions. So, if
+The upshot is that this *only* applies to object-dtype expressions. So, if
 you have an expression--for example
 
 .. ipython:: python

diff --git a/doc/source/extending.rst b/doc/source/extending.rst
@@ -167,7 +167,7 @@ you can retain subclasses through ``pandas`` data manipulations.
 
 There are 3 constructor properties to be defined:
 
-- ``_constructor``: Used when a manipulation result has the same dimesions as the original.
+- ``_constructor``: Used when a manipulation result has the same dimensions as the original.
 - ``_constructor_sliced``: Used when a manipulation result has one lower dimension(s) as the original, such as ``DataFrame`` single columns slicing.
 - ``_constructor_expanddim``: Used when a manipulation result has one higher dimension as the original, such as ``Series.to_frame()`` and ``DataFrame.to_panel()``.
 

diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst
@@ -942,7 +942,7 @@ that is itself a series, and possibly upcast the result to a DataFrame:
 
    ``apply`` can act as a reducer, transformer, *or* filter function, depending on exactly what is passed to it.
    So depending on the path taken, and exactly what you are grouping. Thus the grouped columns(s) may be included in
-   the output as well as set the indices.
+   the output as well as set the indexes.
 
 .. warning::
 
@@ -994,7 +994,7 @@ is only interesting over one column (here ``colname``), it may be filtered
 Handling of (un)observed Categorical values
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-When using a ``Categorical`` grouper (as a single grouper, or as part of multipler groupers), the ``observed`` keyword
+When using a ``Categorical`` grouper (as a single grouper, or as part of multiple groupers), the ``observed`` keyword
 controls whether to return a cartesian product of all possible groupers values (``observed=False``) or only those
 that are observed groupers (``observed=True``).
 
@@ -1010,7 +1010,7 @@ Show only the observed values:
 
    pd.Series([1, 1, 1]).groupby(pd.Categorical(['a', 'a', 'a'], categories=['a', 'b']), observed=True).count()
 
-The returned dtype of the grouped will *always* include *all* of the catergories that were grouped.
+The returned dtype of the grouped will *always* include *all* of the categories that were grouped.
 
 .. ipython:: python
 
@@ -1328,7 +1328,7 @@ Groupby by Indexer to 'resample' data
 
 Resampling produces new hypothetical samples (resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples.
 
-In order to resample to work on indices that are non-datetimelike, the following procedure can be utilized.
+In order to resample to work on indexes that are non-datetimelike, the following procedure can be utilized.
 
 In the following examples, **df.index // 5** returns a binary array which is used to determine what gets selected for the groupby operation.