Skip to content

DOC: Add sphinx spelling extension #21109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 7, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions doc/make.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,8 +224,9 @@ def _sphinx_build(self, kind):
--------
>>> DocBuilder(num_jobs=4)._sphinx_build('html')
"""
if kind not in ('html', 'latex'):
raise ValueError('kind must be html or latex, not {}'.format(kind))
if kind not in ('html', 'latex', 'spelling'):
raise ValueError('kind must be html, latex or '
'spelling, not {}'.format(kind))

self._run_os('sphinx-build',
'-j{}'.format(self.num_jobs),
Expand Down Expand Up @@ -304,6 +305,10 @@ def zip_html(self):
'-q',
*fnames)

def spellcheck(self):
"""Spell check the documentation."""
self._sphinx_build('spelling')


def main():
cmds = [method for method in dir(DocBuilder) if not method.startswith('_')]
Expand Down
4 changes: 2 additions & 2 deletions doc/source/10min.rst
Original file line number Diff line number Diff line change
Expand Up @@ -645,7 +645,7 @@ the quarter end:
ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9
ts.head()

Categoricals
Categorical
------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this, to not break links.


pandas can include categorical data in a ``DataFrame``. For full docs, see the
Expand All @@ -663,7 +663,7 @@ Convert the raw grades to a categorical data type.
df["grade"]

Rename the categories to more meaningful names (assigning to
``Series.cat.categories`` is inplace!).
``Series.cat.categories`` is in place!).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be making an exception for inplace


.. ipython:: python

Expand Down
16 changes: 8 additions & 8 deletions doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ For example:
df[['foo','qux']].columns # sliced

This is done to avoid a recomputation of the levels in order to make slicing
highly performant. If you want to see only the used levels, you can use the
highly efficient. If you want to see only the used levels, you can use the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still stick with performant - there's some overlap but still not quite perfect synonyms

:func:`MultiIndex.get_level_values` method.

.. ipython:: python
Expand Down Expand Up @@ -387,7 +387,7 @@ Furthermore you can *set* the values using the following methods.
df2.loc(axis=0)[:, :, ['C1', 'C3']] = -10
df2

You can use a right-hand-side of an alignable object as well.
You can use a right-hand-side of an align object as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think alignable is OK.


.. ipython:: python

Expand Down Expand Up @@ -559,7 +559,7 @@ return a copy of the data rather than a view:

.. _advanced.unsorted:

Furthermore if you try to index something that is not fully lexsorted, this can raise:
Furthermore if you try to index something that is not fully lex-sorted, this can raise:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen "lex-sorted" before, so I'd stick with the original spelling here


.. code-block:: ipython

Expand Down Expand Up @@ -593,7 +593,7 @@ Take Methods

Similar to NumPy ndarrays, pandas Index, Series, and DataFrame also provides
the ``take`` method that retrieves elements along a given axis at the given
indices. The given indices must be either a list or an ndarray of integer
indexes. The given indexes must be either a list or an ndarray of integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer indices, since that's the argument name to take.

index positions. ``take`` will also accept negative integers as relative positions to the end of the object.

.. ipython:: python
Expand All @@ -611,7 +611,7 @@ index positions. ``take`` will also accept negative integers as relative positio
ser.iloc[positions]
ser.take(positions)

For DataFrames, the given indices should be a 1d list or ndarray that specifies
For DataFrames, the given indexes should be a 1d list or ndarray that specifies
row or column positions.

.. ipython:: python
Expand All @@ -623,7 +623,7 @@ row or column positions.
frm.take([0, 2], axis=1)

It is important to note that the ``take`` method on pandas objects are not
intended to work on boolean indices and may return unexpected results.
intended to work on boolean indexes and may return unexpected results.

.. ipython:: python

Expand Down Expand Up @@ -711,7 +711,7 @@ order is ``cab``).

df2.sort_index()

Groupby operations on the index will preserve the index nature as well.
Group by operations on the index will preserve the index nature as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably add groupby to the wordlist.


.. ipython:: python

Expand Down Expand Up @@ -990,7 +990,7 @@ On the other hand, if the index is not monotonic, then both slice bounds must be
KeyError: 'Cannot get right slice bound for non-unique label: 3'

:meth:`Index.is_monotonic_increasing` and :meth:`Index.is_monotonic_decreasing` only check that
an index is weakly monotonic. To check for strict montonicity, you can combine one of those with
an index is weakly monotonic. To check for strict monotonicity, you can combine one of those with
:meth:`Index.is_unique`

.. ipython:: python
Expand Down
2 changes: 1 addition & 1 deletion doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -593,7 +593,7 @@ categorical columns:
frame = pd.DataFrame({'a': ['Yes', 'Yes', 'No', 'No'], 'b': range(4)})
frame.describe()

This behaviour can be controlled by providing a list of types as ``include``/``exclude``
This behavior can be controlled by providing a list of types as ``include``/``exclude``
arguments. The special value ``all`` can also be used:

.. ipython:: python
Expand Down
4 changes: 2 additions & 2 deletions doc/source/categorical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,7 @@ Renaming categories is done by assigning new values to the

.. note::

Be aware that assigning new categories is an inplace operation, while most other operations
Be aware that assigning new categories is an in place operation, while most other operations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still think inplace as a term is fine, especially since it aligns with the keyword used for the concept throughout pandas

under ``Series.cat`` per default return a new ``Series`` of dtype `category`.

Categories must be unique or a `ValueError` is raised:
Expand Down Expand Up @@ -847,7 +847,7 @@ the categories being combined.

By default, the resulting categories will be ordered as
they appear in the data. If you want the categories to
be lexsorted, use ``sort_categories=True`` argument.
be lex-sorted, use ``sort_categories=True`` argument.

.. ipython:: python

Expand Down
4 changes: 2 additions & 2 deletions doc/source/comparison_with_sql.rst
Original file line number Diff line number Diff line change
Expand Up @@ -228,9 +228,9 @@ Grouping by more than one column is done by passing a list of columns to the
JOIN
----
JOINs can be performed with :meth:`~pandas.DataFrame.join` or :meth:`~pandas.merge`. By default,
:meth:`~pandas.DataFrame.join` will join the DataFrames on their indices. Each method has
:meth:`~pandas.DataFrame.join` will join the DataFrames on their indexes. Each method has
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I missed it before but any reason we changed these? indices seemed preferable to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw both terms in the documentation and since the spelling marked indices as wrong I changed a lot of them that's the reason why. I can revert these changes 😃 👍

parameters allowing you to specify the type of join to perform (LEFT, RIGHT, INNER, FULL) or the
columns to join on (column names or indices).
columns to join on (column names or indexes).

.. ipython:: python

Expand Down
5 changes: 5 additions & 0 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,15 @@
'sphinx.ext.ifconfig',
'sphinx.ext.linkcode',
'nbsphinx',
'sphinxcontrib.spelling'
]

exclude_patterns = ['**.ipynb_checkpoints']

spelling_word_list_filename = 'spelling_wordlist.txt'
spelling_show_suggestions = True
spelling_ignore_pypi_package_names = True

with open("index.rst") as f:
index_rst_lines = f.readlines()

Expand Down
10 changes: 5 additions & 5 deletions doc/source/contributing_docstring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ left before or after the docstring. The text starts in the next line after the
opening quotes. The closing quotes have their own line
(meaning that they are not at the end of the last sentence).

In rare occasions reST styles like bold text or itallics will be used in
In rare occasions reST styles like bold text or italics will be used in
docstrings, but is it common to have inline code, which is presented between
backticks. It is considered inline code:

Expand Down Expand Up @@ -513,7 +513,7 @@ instead of at the beginning, it is good to let the users know about it.
To give an intuition on what can be considered related, here there are some
examples:

* ``loc`` and ``iloc``, as they do the same, but in one case providing indices
* ``loc`` and ``iloc``, as they do the same, but in one case providing indexes
and in the other positions
* ``max`` and ``min``, as they do the opposite
* ``iterrows``, ``itertuples`` and ``iteritems``, as it is easy that a user
Expand Down Expand Up @@ -692,7 +692,7 @@ the standard library go first, followed by third-party libraries (like
matplotlib).

When illustrating examples with a single ``Series`` use the name ``s``, and if
illustrating with a single ``DataFrame`` use the name ``df``. For indices,
illustrating with a single ``DataFrame`` use the name ``df``. For indexes,
``idx`` is the preferred name. If a set of homogeneous ``Series`` or
``DataFrame`` is used, name them ``s1``, ``s2``, ``s3``... or ``df1``,
``df2``, ``df3``... If the data is not homogeneous, and more than one structure
Expand All @@ -706,7 +706,7 @@ than 5, to show the example with the default values. If doing the ``mean``, we
could use something like ``[1, 2, 3]``, so it is easy to see that the value
returned is the mean.

For more complex examples (groupping for example), avoid using data without
For more complex examples (grouping for example), avoid using data without
interpretation, like a matrix of random numbers with columns A, B, C, D...
And instead use a meaningful example, which makes it easier to understand the
concept. Unless required by the example, use names of animals, to keep examples
Expand Down Expand Up @@ -877,7 +877,7 @@ be tricky. Here are some attention points:
the actual error only the error name is sufficient.

* If there is a small part of the result that can vary (e.g. a hash in an object
represenation), you can use ``...`` to represent this part.
representation), you can use ``...`` to represent this part.

If you want to show that ``s.plot()`` returns a matplotlib AxesSubplot object,
this will fail the doctest ::
Expand Down
8 changes: 4 additions & 4 deletions doc/source/cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -286,7 +286,7 @@ New Columns
df = pd.DataFrame(
{'AAA' : [1,1,1,2,2,2,3,3], 'BBB' : [2,1,3,4,5,1,2,3]}); df

Method 1 : idxmin() to get the index of the mins
Method 1 : idxmin() to get the index of the minimums

.. ipython:: python

Expand Down Expand Up @@ -664,7 +664,7 @@ The :ref:`Pivot <reshaping.pivot>` docs.
`Plot pandas DataFrame with year over year data
<http://stackoverflow.com/questions/30379789/plot-pandas-data-frame-with-year-over-year-data>`__

To create year and month crosstabulation:
To create year and month cross tabulation:

.. ipython:: python

Expand Down Expand Up @@ -723,7 +723,7 @@ Rolling Apply to multiple columns where function returns a Scalar (Volume Weight
s = pd.concat([ (pd.Series(vwap(df.iloc[i:i+window]), index=[df.index[i+window]])) for i in range(len(df)-window) ]);
s.round(2)

Timeseries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add timeseries to the wordlist?

Time series
----------

`Between times
Expand Down Expand Up @@ -1029,7 +1029,7 @@ Skip row between header and data
01.01.1990 05:00;21;11;12;13
"""

Option 1: pass rows explicitly to skiprows
Option 1: pass rows explicitly to skip rows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to be careful when changing headers - I think you now need another double quote on the line below for proper rendering

""""""""""""""""""""""""""""""""""""""""""

.. ipython:: python
Expand Down
6 changes: 3 additions & 3 deletions doc/source/dsintro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -882,7 +882,7 @@ dictionary of DataFrames as above, and the following named parameters:
:header: "Parameter", "Default", "Description"
:widths: 10, 10, 40

intersect, ``False``, drops elements whose indices do not align
intersect, ``False``, drops elements whose indexes do not align
orient, ``items``, use ``minor`` to use DataFrames' columns as panel items

For example, compare to the construction above:
Expand Down Expand Up @@ -1014,7 +1014,7 @@ Deprecate Panel
Over the last few years, pandas has increased in both breadth and depth, with new features,
datatype support, and manipulation routines. As a result, supporting efficient indexing and functional
routines for ``Series``, ``DataFrame`` and ``Panel`` has contributed to an increasingly fragmented and
difficult-to-understand codebase.
difficult-to-understand code base.

The 3-D structure of a ``Panel`` is much less common for many types of data analysis,
than the 1-D of the ``Series`` or the 2-D of the ``DataFrame``. Going forward it makes sense for
Expand All @@ -1023,7 +1023,7 @@ pandas to focus on these areas exclusively.
Oftentimes, one can simply use a MultiIndex ``DataFrame`` for easily working with higher dimensional data.

In addition, the ``xarray`` package was built from the ground up, specifically in order to
support the multi-dimensional analysis that is one of ``Panel`` s main usecases.
support the multi-dimensional analysis that is one of ``Panel`` s main use cases.
`Here is a link to the xarray panel-transition documentation <http://xarray.pydata.org/en/stable/pandas.html#panel-transition>`__.

.. ipython:: python
Expand Down
6 changes: 3 additions & 3 deletions doc/source/ecosystem.rst
Original file line number Diff line number Diff line change
Expand Up @@ -184,8 +184,8 @@ and metadata disseminated in
`SDMX <http://www.sdmx.org>`_ 2.1, an ISO-standard
widely used by institutions such as statistics offices, central banks,
and international organisations. pandaSDMX can expose datasets and related
structural metadata including dataflows, code-lists,
and datastructure definitions as pandas Series
structural metadata including data flows, code-lists,
and data structure definitions as pandas Series
or multi-indexed DataFrames.

`fredapi <https://github.com/mortada/fredapi>`__
Expand Down Expand Up @@ -260,7 +260,7 @@ Data validation
`Engarde <http://engarde.readthedocs.io/en/latest/>`__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Engarde is a lightweight library used to explicitly state your assumptions abour your datasets
Engarde is a lightweight library used to explicitly state your assumptions about your datasets
and check that they're *actually* true.

.. _ecosystem.extensions:
Expand Down
6 changes: 3 additions & 3 deletions doc/source/enhancingperf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Cython (Writing C extensions for pandas)
----------------------------------------

For many use cases writing pandas in pure Python and NumPy is sufficient. In some
computationally heavy applications however, it can be possible to achieve sizeable
computationally heavy applications however, it can be possible to achieve sizable
speed-ups by offloading work to `cython <http://cython.org/>`__.

This tutorial assumes you have refactored as much as possible in Python, for example
Expand Down Expand Up @@ -601,7 +601,7 @@ on the original ``DataFrame`` or return a copy with the new column.

For backwards compatibility, ``inplace`` defaults to ``True`` if not
specified. This will change in a future version of pandas - if your
code depends on an inplace assignment you should update to explicitly
code depends on an in place assignment you should update to explicitly
set ``inplace=True``.

.. ipython:: python
Expand Down Expand Up @@ -806,7 +806,7 @@ truncate any strings that are more than 60 characters in length. Second, we
can't pass ``object`` arrays to ``numexpr`` thus string comparisons must be
evaluated in Python space.

The upshot is that this *only* applies to object-dtype'd expressions. So, if
The upshot is that this *only* applies to object-dtype expressions. So, if
you have an expression--for example

.. ipython:: python
Expand Down
2 changes: 1 addition & 1 deletion doc/source/extending.rst
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ you can retain subclasses through ``pandas`` data manipulations.

There are 3 constructor properties to be defined:

- ``_constructor``: Used when a manipulation result has the same dimesions as the original.
- ``_constructor``: Used when a manipulation result has the same dimensions as the original.
- ``_constructor_sliced``: Used when a manipulation result has one lower dimension(s) as the original, such as ``DataFrame`` single columns slicing.
- ``_constructor_expanddim``: Used when a manipulation result has one higher dimension as the original, such as ``Series.to_frame()`` and ``DataFrame.to_panel()``.

Expand Down
10 changes: 5 additions & 5 deletions doc/source/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -942,7 +942,7 @@ that is itself a series, and possibly upcast the result to a DataFrame:

``apply`` can act as a reducer, transformer, *or* filter function, depending on exactly what is passed to it.
So depending on the path taken, and exactly what you are grouping. Thus the grouped columns(s) may be included in
the output as well as set the indices.
the output as well as set the indexes.

.. warning::

Expand Down Expand Up @@ -994,7 +994,7 @@ is only interesting over one column (here ``colname``), it may be filtered
Handling of (un)observed Categorical values
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When using a ``Categorical`` grouper (as a single grouper, or as part of multipler groupers), the ``observed`` keyword
When using a ``Categorical`` grouper (as a single grouper, or as part of multiple groupers), the ``observed`` keyword
controls whether to return a cartesian product of all possible groupers values (``observed=False``) or only those
that are observed groupers (``observed=True``).

Expand All @@ -1010,7 +1010,7 @@ Show only the observed values:

pd.Series([1, 1, 1]).groupby(pd.Categorical(['a', 'a', 'a'], categories=['a', 'b']), observed=True).count()

The returned dtype of the grouped will *always* include *all* of the catergories that were grouped.
The returned dtype of the grouped will *always* include *all* of the categories that were grouped.

.. ipython:: python

Expand Down Expand Up @@ -1328,11 +1328,11 @@ Groupby by Indexer to 'resample' data

Resampling produces new hypothetical samples (resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples.

In order to resample to work on indices that are non-datetimelike, the following procedure can be utilized.
In order to resample to work on indexes that are non-datetimelike, the following procedure can be utilized.

In the following examples, **df.index // 5** returns a binary array which is used to determine what gets selected for the groupby operation.

.. note:: The below example shows how we can downsample by consolidation of samples into fewer samples. Here by using **df.index // 5**, we are aggregating the samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation thereby reducing the number of samples.
.. note:: The below example shows how we can down-sample by consolidation of samples into fewer samples. Here by using **df.index // 5**, we are aggregating the samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation thereby reducing the number of samples.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think downsample should be an exception


.. ipython:: python

Expand Down
2 changes: 1 addition & 1 deletion doc/source/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -700,7 +700,7 @@ Current Behavior
Reindexing
~~~~~~~~~~

The idiomatic way to achieve selecting potentially not-found elmenents is via ``.reindex()``. See also the section on :ref:`reindexing <basics.reindexing>`.
The idiomatic way to achieve selecting potentially not-found elements is via ``.reindex()``. See also the section on :ref:`reindexing <basics.reindexing>`.

.. ipython:: python

Expand Down
4 changes: 2 additions & 2 deletions doc/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ PyPI and through conda.
Starting **January 1, 2019**, all releases will be Python 3 only.

If there are people interested in continued support for Python 2.7 past December
31, 2018 (either backporting bugfixes or funding) please reach out to the
31, 2018 (either backporting bug fixes or funding) please reach out to the
maintainers on the issue tracker.

For more information, see the `Python 3 statement`_ and the `Porting to Python 3 guide`_.
Expand Down Expand Up @@ -199,7 +199,7 @@ Running the test suite
----------------------

pandas is equipped with an exhaustive set of unit tests, covering about 97% of
the codebase as of this writing. To run it on your machine to verify that
the code base as of this writing. To run it on your machine to verify that
everything is working (and that you have all of the dependencies, soft and hard,
installed), make sure you have `pytest
<http://doc.pytest.org/en/latest/>`__ and run:
Expand Down
Loading