Skip to content

Commit 615d007

Browse files
committed
Add sphinx spelling extension, fix typos and add wordlist
1 parent d623ffd commit 615d007

19 files changed

+294
-61
lines changed

doc/make.py

+7-2
Original file line numberDiff line numberDiff line change
@@ -224,8 +224,9 @@ def _sphinx_build(self, kind):
224224
--------
225225
>>> DocBuilder(num_jobs=4)._sphinx_build('html')
226226
"""
227-
if kind not in ('html', 'latex'):
228-
raise ValueError('kind must be html or latex, not {}'.format(kind))
227+
if kind not in ('html', 'latex', 'spelling'):
228+
raise ValueError('kind must be html, latex or '
229+
'spelling, not {}'.format(kind))
229230

230231
self._run_os('sphinx-build',
231232
'-j{}'.format(self.num_jobs),
@@ -304,6 +305,10 @@ def zip_html(self):
304305
'-q',
305306
*fnames)
306307

308+
def spellcheck(self):
309+
"""Spell check the documentation."""
310+
self._sphinx_build('spelling')
311+
307312

308313
def main():
309314
cmds = [method for method in dir(DocBuilder) if not method.startswith('_')]

doc/source/10min.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -645,7 +645,7 @@ the quarter end:
645645
ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9
646646
ts.head()
647647
648-
Categoricals
648+
Categorical
649649
------------
650650

651651
pandas can include categorical data in a ``DataFrame``. For full docs, see the
@@ -663,7 +663,7 @@ Convert the raw grades to a categorical data type.
663663
df["grade"]
664664
665665
Rename the categories to more meaningful names (assigning to
666-
``Series.cat.categories`` is inplace!).
666+
``Series.cat.categories`` is in place!).
667667

668668
.. ipython:: python
669669

doc/source/advanced.rst

+8-8
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ For example:
182182
df[['foo','qux']].columns # sliced
183183
184184
This is done to avoid a recomputation of the levels in order to make slicing
185-
highly performant. If you want to see only the used levels, you can use the
185+
highly efficient. If you want to see only the used levels, you can use the
186186
:func:`MultiIndex.get_level_values` method.
187187

188188
.. ipython:: python
@@ -387,7 +387,7 @@ Furthermore you can *set* the values using the following methods.
387387
df2.loc(axis=0)[:, :, ['C1', 'C3']] = -10
388388
df2
389389
390-
You can use a right-hand-side of an alignable object as well.
390+
You can use a right-hand-side of an align object as well.
391391

392392
.. ipython:: python
393393
@@ -559,7 +559,7 @@ return a copy of the data rather than a view:
559559
560560
.. _advanced.unsorted:
561561

562-
Furthermore if you try to index something that is not fully lexsorted, this can raise:
562+
Furthermore if you try to index something that is not fully lex-sorted, this can raise:
563563

564564
.. code-block:: ipython
565565
@@ -593,7 +593,7 @@ Take Methods
593593

594594
Similar to NumPy ndarrays, pandas Index, Series, and DataFrame also provides
595595
the ``take`` method that retrieves elements along a given axis at the given
596-
indices. The given indices must be either a list or an ndarray of integer
596+
indexes. The given indexes must be either a list or an ndarray of integer
597597
index positions. ``take`` will also accept negative integers as relative positions to the end of the object.
598598

599599
.. ipython:: python
@@ -611,7 +611,7 @@ index positions. ``take`` will also accept negative integers as relative positio
611611
ser.iloc[positions]
612612
ser.take(positions)
613613
614-
For DataFrames, the given indices should be a 1d list or ndarray that specifies
614+
For DataFrames, the given indexes should be a 1d list or ndarray that specifies
615615
row or column positions.
616616

617617
.. ipython:: python
@@ -623,7 +623,7 @@ row or column positions.
623623
frm.take([0, 2], axis=1)
624624
625625
It is important to note that the ``take`` method on pandas objects are not
626-
intended to work on boolean indices and may return unexpected results.
626+
intended to work on boolean indexes and may return unexpected results.
627627

628628
.. ipython:: python
629629
@@ -711,7 +711,7 @@ order is ``cab``).
711711
712712
df2.sort_index()
713713
714-
Groupby operations on the index will preserve the index nature as well.
714+
Group by operations on the index will preserve the index nature as well.
715715

716716
.. ipython:: python
717717
@@ -990,7 +990,7 @@ On the other hand, if the index is not monotonic, then both slice bounds must be
990990
KeyError: 'Cannot get right slice bound for non-unique label: 3'
991991
992992
:meth:`Index.is_monotonic_increasing` and :meth:`Index.is_monotonic_decreasing` only check that
993-
an index is weakly monotonic. To check for strict montonicity, you can combine one of those with
993+
an index is weakly monotonic. To check for strict monotonicity, you can combine one of those with
994994
:meth:`Index.is_unique`
995995
996996
.. ipython:: python

doc/source/basics.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -593,7 +593,7 @@ categorical columns:
593593
frame = pd.DataFrame({'a': ['Yes', 'Yes', 'No', 'No'], 'b': range(4)})
594594
frame.describe()
595595
596-
This behaviour can be controlled by providing a list of types as ``include``/``exclude``
596+
This behavior can be controlled by providing a list of types as ``include``/``exclude``
597597
arguments. The special value ``all`` can also be used:
598598

599599
.. ipython:: python

doc/source/categorical.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -370,7 +370,7 @@ Renaming categories is done by assigning new values to the
370370

371371
.. note::
372372

373-
Be aware that assigning new categories is an inplace operation, while most other operations
373+
Be aware that assigning new categories is an in place operation, while most other operations
374374
under ``Series.cat`` per default return a new ``Series`` of dtype `category`.
375375

376376
Categories must be unique or a `ValueError` is raised:
@@ -847,7 +847,7 @@ the categories being combined.
847847
848848
By default, the resulting categories will be ordered as
849849
they appear in the data. If you want the categories to
850-
be lexsorted, use ``sort_categories=True`` argument.
850+
be lex-sorted, use ``sort_categories=True`` argument.
851851

852852
.. ipython:: python
853853

doc/source/comparison_with_sql.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -228,9 +228,9 @@ Grouping by more than one column is done by passing a list of columns to the
228228
JOIN
229229
----
230230
JOINs can be performed with :meth:`~pandas.DataFrame.join` or :meth:`~pandas.merge`. By default,
231-
:meth:`~pandas.DataFrame.join` will join the DataFrames on their indices. Each method has
231+
:meth:`~pandas.DataFrame.join` will join the DataFrames on their indexes. Each method has
232232
parameters allowing you to specify the type of join to perform (LEFT, RIGHT, INNER, FULL) or the
233-
columns to join on (column names or indices).
233+
columns to join on (column names or indexes).
234234

235235
.. ipython:: python
236236

doc/source/conf.py

+5
Original file line numberDiff line numberDiff line change
@@ -73,10 +73,15 @@
7373
'sphinx.ext.ifconfig',
7474
'sphinx.ext.linkcode',
7575
'nbsphinx',
76+
'sphinxcontrib.spelling'
7677
]
7778

7879
exclude_patterns = ['**.ipynb_checkpoints']
7980

81+
spelling_word_list_filename = 'spelling_wordlist.txt'
82+
spelling_show_suggestions = True
83+
spelling_ignore_pypi_package_names = True
84+
8085
with open("index.rst") as f:
8186
index_rst_lines = f.readlines()
8287

doc/source/contributing_docstring.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ left before or after the docstring. The text starts in the next line after the
103103
opening quotes. The closing quotes have their own line
104104
(meaning that they are not at the end of the last sentence).
105105

106-
In rare occasions reST styles like bold text or itallics will be used in
106+
In rare occasions reST styles like bold text or italics will be used in
107107
docstrings, but is it common to have inline code, which is presented between
108108
backticks. It is considered inline code:
109109

@@ -513,7 +513,7 @@ instead of at the beginning, it is good to let the users know about it.
513513
To give an intuition on what can be considered related, here there are some
514514
examples:
515515

516-
* ``loc`` and ``iloc``, as they do the same, but in one case providing indices
516+
* ``loc`` and ``iloc``, as they do the same, but in one case providing indexes
517517
and in the other positions
518518
* ``max`` and ``min``, as they do the opposite
519519
* ``iterrows``, ``itertuples`` and ``iteritems``, as it is easy that a user
@@ -692,7 +692,7 @@ the standard library go first, followed by third-party libraries (like
692692
matplotlib).
693693

694694
When illustrating examples with a single ``Series`` use the name ``s``, and if
695-
illustrating with a single ``DataFrame`` use the name ``df``. For indices,
695+
illustrating with a single ``DataFrame`` use the name ``df``. For indexes,
696696
``idx`` is the preferred name. If a set of homogeneous ``Series`` or
697697
``DataFrame`` is used, name them ``s1``, ``s2``, ``s3``... or ``df1``,
698698
``df2``, ``df3``... If the data is not homogeneous, and more than one structure
@@ -706,7 +706,7 @@ than 5, to show the example with the default values. If doing the ``mean``, we
706706
could use something like ``[1, 2, 3]``, so it is easy to see that the value
707707
returned is the mean.
708708

709-
For more complex examples (groupping for example), avoid using data without
709+
For more complex examples (grouping for example), avoid using data without
710710
interpretation, like a matrix of random numbers with columns A, B, C, D...
711711
And instead use a meaningful example, which makes it easier to understand the
712712
concept. Unless required by the example, use names of animals, to keep examples
@@ -877,7 +877,7 @@ be tricky. Here are some attention points:
877877
the actual error only the error name is sufficient.
878878

879879
* If there is a small part of the result that can vary (e.g. a hash in an object
880-
represenation), you can use ``...`` to represent this part.
880+
representation), you can use ``...`` to represent this part.
881881

882882
If you want to show that ``s.plot()`` returns a matplotlib AxesSubplot object,
883883
this will fail the doctest ::

doc/source/cookbook.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -286,7 +286,7 @@ New Columns
286286
df = pd.DataFrame(
287287
{'AAA' : [1,1,1,2,2,2,3,3], 'BBB' : [2,1,3,4,5,1,2,3]}); df
288288
289-
Method 1 : idxmin() to get the index of the mins
289+
Method 1 : idxmin() to get the index of the minimums
290290

291291
.. ipython:: python
292292
@@ -664,7 +664,7 @@ The :ref:`Pivot <reshaping.pivot>` docs.
664664
`Plot pandas DataFrame with year over year data
665665
<http://stackoverflow.com/questions/30379789/plot-pandas-data-frame-with-year-over-year-data>`__
666666

667-
To create year and month crosstabulation:
667+
To create year and month cross tabulation:
668668

669669
.. ipython:: python
670670
@@ -723,7 +723,7 @@ Rolling Apply to multiple columns where function returns a Scalar (Volume Weight
723723
s = pd.concat([ (pd.Series(vwap(df.iloc[i:i+window]), index=[df.index[i+window]])) for i in range(len(df)-window) ]);
724724
s.round(2)
725725
726-
Timeseries
726+
Time series
727727
----------
728728

729729
`Between times
@@ -1029,7 +1029,7 @@ Skip row between header and data
10291029
01.01.1990 05:00;21;11;12;13
10301030
"""
10311031
1032-
Option 1: pass rows explicitly to skiprows
1032+
Option 1: pass rows explicitly to skip rows
10331033
""""""""""""""""""""""""""""""""""""""""""
10341034

10351035
.. ipython:: python

doc/source/dsintro.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -882,7 +882,7 @@ dictionary of DataFrames as above, and the following named parameters:
882882
:header: "Parameter", "Default", "Description"
883883
:widths: 10, 10, 40
884884

885-
intersect, ``False``, drops elements whose indices do not align
885+
intersect, ``False``, drops elements whose indexes do not align
886886
orient, ``items``, use ``minor`` to use DataFrames' columns as panel items
887887

888888
For example, compare to the construction above:
@@ -1014,7 +1014,7 @@ Deprecate Panel
10141014
Over the last few years, pandas has increased in both breadth and depth, with new features,
10151015
datatype support, and manipulation routines. As a result, supporting efficient indexing and functional
10161016
routines for ``Series``, ``DataFrame`` and ``Panel`` has contributed to an increasingly fragmented and
1017-
difficult-to-understand codebase.
1017+
difficult-to-understand code base.
10181018

10191019
The 3-D structure of a ``Panel`` is much less common for many types of data analysis,
10201020
than the 1-D of the ``Series`` or the 2-D of the ``DataFrame``. Going forward it makes sense for
@@ -1023,7 +1023,7 @@ pandas to focus on these areas exclusively.
10231023
Oftentimes, one can simply use a MultiIndex ``DataFrame`` for easily working with higher dimensional data.
10241024

10251025
In addition, the ``xarray`` package was built from the ground up, specifically in order to
1026-
support the multi-dimensional analysis that is one of ``Panel`` s main usecases.
1026+
support the multi-dimensional analysis that is one of ``Panel`` s main use cases.
10271027
`Here is a link to the xarray panel-transition documentation <http://xarray.pydata.org/en/stable/pandas.html#panel-transition>`__.
10281028

10291029
.. ipython:: python

doc/source/ecosystem.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -184,8 +184,8 @@ and metadata disseminated in
184184
`SDMX <http://www.sdmx.org>`_ 2.1, an ISO-standard
185185
widely used by institutions such as statistics offices, central banks,
186186
and international organisations. pandaSDMX can expose datasets and related
187-
structural metadata including dataflows, code-lists,
188-
and datastructure definitions as pandas Series
187+
structural metadata including data flows, code-lists,
188+
and data structure definitions as pandas Series
189189
or multi-indexed DataFrames.
190190

191191
`fredapi <https://github.com/mortada/fredapi>`__
@@ -260,7 +260,7 @@ Data validation
260260
`Engarde <http://engarde.readthedocs.io/en/latest/>`__
261261
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
262262

263-
Engarde is a lightweight library used to explicitly state your assumptions abour your datasets
263+
Engarde is a lightweight library used to explicitly state your assumptions about your datasets
264264
and check that they're *actually* true.
265265

266266
.. _ecosystem.extensions:

doc/source/enhancingperf.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Cython (Writing C extensions for pandas)
3232
----------------------------------------
3333

3434
For many use cases writing pandas in pure Python and NumPy is sufficient. In some
35-
computationally heavy applications however, it can be possible to achieve sizeable
35+
computationally heavy applications however, it can be possible to achieve sizable
3636
speed-ups by offloading work to `cython <http://cython.org/>`__.
3737

3838
This tutorial assumes you have refactored as much as possible in Python, for example
@@ -601,7 +601,7 @@ on the original ``DataFrame`` or return a copy with the new column.
601601

602602
For backwards compatibility, ``inplace`` defaults to ``True`` if not
603603
specified. This will change in a future version of pandas - if your
604-
code depends on an inplace assignment you should update to explicitly
604+
code depends on an in place assignment you should update to explicitly
605605
set ``inplace=True``.
606606

607607
.. ipython:: python
@@ -806,7 +806,7 @@ truncate any strings that are more than 60 characters in length. Second, we
806806
can't pass ``object`` arrays to ``numexpr`` thus string comparisons must be
807807
evaluated in Python space.
808808

809-
The upshot is that this *only* applies to object-dtype'd expressions. So, if
809+
The upshot is that this *only* applies to object-dtype expressions. So, if
810810
you have an expression--for example
811811

812812
.. ipython:: python

doc/source/extending.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@ you can retain subclasses through ``pandas`` data manipulations.
167167

168168
There are 3 constructor properties to be defined:
169169

170-
- ``_constructor``: Used when a manipulation result has the same dimesions as the original.
170+
- ``_constructor``: Used when a manipulation result has the same dimensions as the original.
171171
- ``_constructor_sliced``: Used when a manipulation result has one lower dimension(s) as the original, such as ``DataFrame`` single columns slicing.
172172
- ``_constructor_expanddim``: Used when a manipulation result has one higher dimension as the original, such as ``Series.to_frame()`` and ``DataFrame.to_panel()``.
173173

doc/source/groupby.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -942,7 +942,7 @@ that is itself a series, and possibly upcast the result to a DataFrame:
942942

943943
``apply`` can act as a reducer, transformer, *or* filter function, depending on exactly what is passed to it.
944944
So depending on the path taken, and exactly what you are grouping. Thus the grouped columns(s) may be included in
945-
the output as well as set the indices.
945+
the output as well as set the indexes.
946946

947947
.. warning::
948948

@@ -994,7 +994,7 @@ is only interesting over one column (here ``colname``), it may be filtered
994994
Handling of (un)observed Categorical values
995995
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
996996

997-
When using a ``Categorical`` grouper (as a single grouper, or as part of multipler groupers), the ``observed`` keyword
997+
When using a ``Categorical`` grouper (as a single grouper, or as part of multiple groupers), the ``observed`` keyword
998998
controls whether to return a cartesian product of all possible groupers values (``observed=False``) or only those
999999
that are observed groupers (``observed=True``).
10001000

@@ -1010,7 +1010,7 @@ Show only the observed values:
10101010
10111011
pd.Series([1, 1, 1]).groupby(pd.Categorical(['a', 'a', 'a'], categories=['a', 'b']), observed=True).count()
10121012
1013-
The returned dtype of the grouped will *always* include *all* of the catergories that were grouped.
1013+
The returned dtype of the grouped will *always* include *all* of the categories that were grouped.
10141014

10151015
.. ipython:: python
10161016
@@ -1328,11 +1328,11 @@ Groupby by Indexer to 'resample' data
13281328
13291329
Resampling produces new hypothetical samples (resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples.
13301330
1331-
In order to resample to work on indices that are non-datetimelike, the following procedure can be utilized.
1331+
In order to resample to work on indexes that are non-datetimelike, the following procedure can be utilized.
13321332
13331333
In the following examples, **df.index // 5** returns a binary array which is used to determine what gets selected for the groupby operation.
13341334
1335-
.. note:: The below example shows how we can downsample by consolidation of samples into fewer samples. Here by using **df.index // 5**, we are aggregating the samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation thereby reducing the number of samples.
1335+
.. note:: The below example shows how we can down-sample by consolidation of samples into fewer samples. Here by using **df.index // 5**, we are aggregating the samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation thereby reducing the number of samples.
13361336
13371337
.. ipython:: python
13381338

doc/source/indexing.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -700,7 +700,7 @@ Current Behavior
700700
Reindexing
701701
~~~~~~~~~~
702702

703-
The idiomatic way to achieve selecting potentially not-found elmenents is via ``.reindex()``. See also the section on :ref:`reindexing <basics.reindexing>`.
703+
The idiomatic way to achieve selecting potentially not-found elements is via ``.reindex()``. See also the section on :ref:`reindexing <basics.reindexing>`.
704704

705705
.. ipython:: python
706706

doc/source/install.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ PyPI and through conda.
3131
Starting **January 1, 2019**, all releases will be Python 3 only.
3232

3333
If there are people interested in continued support for Python 2.7 past December
34-
31, 2018 (either backporting bugfixes or funding) please reach out to the
34+
31, 2018 (either backporting bug fixes or funding) please reach out to the
3535
maintainers on the issue tracker.
3636

3737
For more information, see the `Python 3 statement`_ and the `Porting to Python 3 guide`_.
@@ -199,7 +199,7 @@ Running the test suite
199199
----------------------
200200

201201
pandas is equipped with an exhaustive set of unit tests, covering about 97% of
202-
the codebase as of this writing. To run it on your machine to verify that
202+
the code base as of this writing. To run it on your machine to verify that
203203
everything is working (and that you have all of the dependencies, soft and hard,
204204
installed), make sure you have `pytest
205205
<http://doc.pytest.org/en/latest/>`__ and run:

0 commit comments

Comments
 (0)