DOC: Add sphinx spelling extension #21109

FabioRosado · 2018-05-17T20:53:32Z

This is my first time contributing to a project this big so I hope everything is okay
I am still working on fixing bugs and polishing a few things but here is what I have done so far:

fixed typos
updated a wordlist txt file with words to be ignored by the spellchecker
added the spellcheck method to the DocBuilder class (should this be named spelling?)

I've re-used the _sphix_build method to run the spelling command, to do this I had to update the _sphix_build method to include 'spelling' as a kind. I'd like to know if this is the best approach of If I should just replicate the code.

I have added a few configuring options in order to get better results when using the spellcheck, the first one was to use the wordlist text file, the second was to ignore known PyPI packages names and finally to show suggestions of a misspelt word.

At the moment the spellcheck will run if we type the command: python make.py spellcheck and the spellcheck method is called, I haven't yet figured out how to fail the build when there are exceptions like the Issue 21079 suggested.

closes DOC: Enable sphinx.contrib.spelling #21079
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

TomAugspurger

Thanks! Just gave a quick skim.

TomAugspurger · 2018-05-17T20:56:19Z

doc/source/10min.rst

@@ -645,7 +645,7 @@ the quarter end:
   ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9
   ts.head()

-Categoricals
+Categorical
 ------------


Let's keep this, to not break links.

TomAugspurger · 2018-05-17T20:56:39Z

doc/source/advanced.rst

@@ -387,7 +387,7 @@ Furthermore you can *set* the values using the following methods.
   df2.loc(axis=0)[:, :, ['C1', 'C3']] = -10
   df2

-You can use a right-hand-side of an alignable object as well.
+You can use a right-hand-side of an align object as well.


I think alignable is OK.

TomAugspurger · 2018-05-17T20:57:20Z

doc/source/advanced.rst

@@ -593,7 +593,7 @@ Take Methods

 Similar to NumPy ndarrays, pandas Index, Series, and DataFrame also provides
 the ``take`` method that retrieves elements along a given axis at the given
-indices. The given indices must be either a list or an ndarray of integer
+indexes. The given indexes must be either a list or an ndarray of integer


Prefer indices, since that's the argument name to take.

TomAugspurger · 2018-05-17T20:57:35Z

doc/source/advanced.rst

@@ -711,7 +711,7 @@ order is ``cab``).

   df2.sort_index()

-Groupby operations on the index will preserve the index nature as well.
+Group by operations on the index will preserve the index nature as well.


Can probably add groupby to the wordlist.

TomAugspurger · 2018-05-17T20:58:23Z

doc/source/cookbook.rst

@@ -723,7 +723,7 @@ Rolling Apply to multiple columns where function returns a Scalar (Volume Weight
   s = pd.concat([ (pd.Series(vwap(df.iloc[i:i+window]), index=[df.index[i+window]])) for i in range(len(df)-window) ]);
   s.round(2)

-Timeseries


Add timeseries to the wordlist?

TomAugspurger · 2018-05-18T02:20:41Z

Do you have a rough measure of how long this takes to run on the docs?

codecov · 2018-05-18T15:26:02Z

Codecov Report

Merging #21109 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #21109   +/-   ##
=======================================
  Coverage   91.85%   91.85%           
=======================================
  Files         153      153           
  Lines       49546    49546           
=======================================
  Hits        45509    45509           
  Misses       4037     4037

Flag	Coverage Δ
#multiple	`90.25% <100%> (ø)`	⬆️
#single	`41.87% <33.33%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/arrays/categorical.py	`95.69% <ø> (ø)`	⬆️
pandas/core/groupby/groupby.py	`92.66% <ø> (ø)`	⬆️
pandas/core/generic.py	`96.12% <ø> (ø)`	⬆️
pandas/core/reshape/merge.py	`94.25% <ø> (ø)`	⬆️
pandas/core/frame.py	`97.22% <ø> (ø)`	⬆️
pandas/io/json/table_schema.py	`98.29% <ø> (ø)`	⬆️
pandas/core/strings.py	`98.62% <ø> (ø)`	⬆️
pandas/core/tools/datetimes.py	`84.98% <ø> (ø)`	⬆️
pandas/core/arrays/base.py	`83.95% <ø> (ø)`	⬆️
pandas/core/panel.py	`97.87% <ø> (ø)`	⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4274b84...d929098. Read the comment docs.

FabioRosado · 2018-05-18T15:58:35Z

I apologise for taking so long to reply but this took me longer than I expected. I've added all the words to the wordlist (this file is now huge) and corrected some stuff, I will need your help in regards to some changes that I have done:

changed indicies to indices since indices is already in the plural form (I only changed comments and docstrings, there are some variables using this indicies but I didn't want to touch them unless you agree)
changed performant to efficient this term was used plenty of times and I'm not sure if changing it is desirable (if not I'll revert the changes)

The spelling extension is funny as it divides words by special characters (anything like - or ') so I had to add some words that don't exist in the dictionary but make sense when attached to others. (hope this makes sense?)

I've also given it some thought and I think we could fail the build by raising an exception if there are any words in this output.txt file, the idea is to:

open the file
readlines
pass if empty
Not sure if this is a smart way to approach the problem though.

Finally, the time it takes to spellcheck. If every single file needs to be generated then it will take about 2-3minutes to generate and read the files and about 1 minute to spellcheck and output misspelt words.

If the files are already generated then it runs in 1 minute. This is with the spelling_show_suggestions option set to False because this option seems to increase the time it takes to spellcheck to about 3x worse.

WillAyd

I haven't gotten all the way through yet but I see some consistent themes that could use updating and have labeled as such.

To answer your other question, stick with just DOC updates for the time being. If you wanted to go back and change variable names that should be a separate change if at all

WillAyd · 2018-05-18T16:16:45Z

doc/source/10min.rst

@@ -663,7 +663,7 @@ Convert the raw grades to a categorical data type.
    df["grade"]

 Rename the categories to more meaningful names (assigning to 
-``Series.cat.categories`` is inplace!).
+``Series.cat.categories`` is in place!).


I think we should be making an exception for inplace

WillAyd · 2018-05-18T16:17:13Z

doc/source/advanced.rst

@@ -182,7 +182,7 @@ For example:
   df[['foo','qux']].columns  # sliced

 This is done to avoid a recomputation of the levels in order to make slicing
-highly performant. If you want to see only the used levels, you can use the
+highly efficient. If you want to see only the used levels, you can use the


I'd still stick with performant - there's some overlap but still not quite perfect synonyms

WillAyd · 2018-05-18T16:18:02Z

doc/source/advanced.rst

@@ -559,7 +559,7 @@ return a copy of the data rather than a view:

 .. _advanced.unsorted:

-Furthermore if you try to index something that is not fully lexsorted, this can raise:
+Furthermore if you try to index something that is not fully lex-sorted, this can raise:


I've never seen "lex-sorted" before, so I'd stick with the original spelling here

WillAyd · 2018-05-18T16:18:45Z

doc/source/advanced.rst

@@ -593,7 +593,7 @@ Take Methods

 Similar to NumPy ndarrays, pandas Index, Series, and DataFrame also provides
 the ``take`` method that retrieves elements along a given axis at the given
-indices. The given indices must be either a list or an ndarray of integer
+indexes. The given indices must be either a list or an ndarray of integer


Why did you change from indices to indexes?

WillAyd · 2018-05-18T16:19:36Z

doc/source/cookbook.rst

@@ -1029,7 +1029,7 @@ Skip row between header and data
    01.01.1990 05:00;21;11;12;13
    """

-Option 1: pass rows explicitly to skiprows
+Option 1: pass rows explicitly to skip rows


Have to be careful when changing headers - I think you now need another double quote on the line below for proper rendering

WillAyd · 2018-05-18T16:22:01Z

doc/source/io.rst

@@ -1812,7 +1812,7 @@ Writing to a file, with a date index and a date column:
   dfj2.to_json('test.json')
   open('test.json').read()

-Fallback Behavior
+Fall-back Behavior


Same header comment - if you update these you need to change the line below to reflect new length

WillAyd · 2018-05-18T16:22:37Z

doc/source/io.rst

@@ -2237,16 +2237,16 @@ A few notes on the generated table schema:
    name is ``values``
  + For ``DataFrames``, the stringified version of the column name is used
  + For ``Index`` (not ``MultiIndex``), ``index.name`` is used, with a
-    fallback to ``index`` if that is None.
+    fall-back to ``index`` if that is None.


I believe fallback is a word

WillAyd · 2018-05-18T16:22:56Z

doc/source/io.rst

-the preserveration of metadata such as dtypes and index names in a
-round-trippable manner.
+the preservation of metadata such as dtypes and index names in a
+round-trip manner.


Hmm would still stick with trippable here

WillAyd · 2018-05-18T16:23:15Z

doc/source/io.rst

@@ -3467,7 +3467,7 @@ Fixed Format
 ''''''''''''

 The examples above show storing using ``put``, which write the HDF5 to ``PyTables`` in a fixed array format, called
-the ``fixed`` format. These types of stores are **not** appendable once written (though you can simply
+the ``fixed`` format. These types of stores are **not** able to be appended once written (though you can simply


Appendable would read easier

WillAyd · 2018-05-18T16:23:48Z

doc/source/io.rst

@@ -3820,7 +3820,7 @@ indexed dimension as the ``where``.

 .. note::

-   Indexes are automagically created on the indexables
+   Indexes are auto-magically created on the indexables


Stick with automagically

FabioRosado · 2018-05-18T18:08:51Z

Thanks for the feedback @WillAyd some of these changes where made yesterday and after working today on it I got a better understanding of a few terms. I also forgot to double check the previous commit, my apologies for it.

I will revert the changes from efficient back to performant. I'll try to do all the changes before submitting another commit so I'll wait a bit 😃👍

pep8speaks · 2018-05-26T11:39:20Z

Hello @FabioRosado! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 02, 2018 at 13:56 Hours UTC

FabioRosado · 2018-05-26T12:55:10Z

I got some time to work on this PR and here is the updated version:

Reverted changes from efficient back to performant
Revert lex-sorted back to lexsorted
Fixed lint issues
Added new section on documentation to inform how to run the spell check and how to add words to the file

As for the MultiIndex I changed two terms but as I searched for multi-index it seems that this term appears a fair amount of time so I'm not sure if I should change ALL of them from multi-index to MultiIndex or just leave it as it is.

On the documentation about the spell check should I explain better about the output.txt file and how to find the word that is being marked as spell check?

Finally, should I raise an exception if any errors were found after running the spellcheck command? I think the best way to do this would be opening the output.txt file, check if it's empty and if not raise an exception. Not sure if this i the best way to approach this problem though 😃

TomAugspurger · 2018-05-26T13:05:31Z

Finally, should I raise an exception if any errors were found after running the spellcheck command?

Yes, ideally. Do you know how long the spellchecker takes to run over our docs? We're close to our time limit on Travis for the doc build.

FabioRosado · 2018-05-26T13:56:31Z

Okay I’ll add that bit of code when i get home later on. The spell checker takes 1 minute to run but if the docs weren’t generated yet they are generated first so it takes up to 6 minutes to run

FabioRosado · 2018-05-26T18:19:40Z

The code will not raise an exception if there are misspelt words. I was unsure which exception to raise so I went with SyntaxError with a message that perhaps will be enough to let people know why the exception was raised 😃

--EDIT--
Sorry for the number of commits but this will be my last one until you want me to change something. I've added the entry to whatsnew hopefully everything looks good. Thanks for the help and patience with me 😄 👍

WillAyd · 2018-05-26T23:30:24Z

doc/source/categorical.rst

@@ -370,7 +370,7 @@ Renaming categories is done by assigning new values to the

 .. note::

-    Be aware that assigning new categories is an inplace operation, while most other operations
+    Be aware that assigning new categories is an in place operation, while most other operations


Still think inplace as a term is fine, especially since it aligns with the keyword used for the concept throughout pandas

WillAyd · 2018-05-26T23:30:58Z

doc/source/comparison_with_sql.rst

@@ -228,9 +228,9 @@ Grouping by more than one column is done by passing a list of columns to the
 JOIN
 ----
 JOINs can be performed with :meth:`~pandas.DataFrame.join` or :meth:`~pandas.merge`. By default,
-:meth:`~pandas.DataFrame.join` will join the DataFrames on their indices. Each method has
+:meth:`~pandas.DataFrame.join` will join the DataFrames on their indexes. Each method has


Sorry if I missed it before but any reason we changed these? indices seemed preferable to me

I saw both terms in the documentation and since the spelling marked indices as wrong I changed a lot of them that's the reason why. I can revert these changes 😃 👍

WillAyd · 2018-05-26T23:33:53Z

doc/source/contributing.rst

+
+The spellcheck will take a few minutes to run (between 1 to 6 minutes). Sphinx will alert you
+with warnings and misspelt words - these misspelt words will be added to a file called
+``output.txt`` and you can find it on your local directory ``pandas/doc/build/spelling/``.


Not saying this is a bad thing but any reason you chose to output to a text file instead of to STDOUT? All of the other checks I can think of off the top of my head would write to the latter

Unfortunately, I didn't choose such approach. The spelling library is coded in this way (I checked the source code), it will always output a text file with all the misspelt words. To be honest I would much rather work with STDOUT and avoid calling open on a file to check if it was empty

WillAyd · 2018-05-26T23:34:39Z

doc/source/contributing.rst

+
+Running the spell check is easy. Just navigate to your local ``pandas/doc/`` directory and run::
+
+    python make.py spellcheck


Could we alternately add a rule to the Makefile so that make spellcheck works?

I didn't think about this my apologies I'll add the spellcheck on the makefile 👍

WillAyd · 2018-05-26T23:38:53Z

doc/source/spelling_wordlist.txt

+Hoese
+Stansby
+Kamau
+Niederhut


Not saying we need to do this here but wonder if its cleaner to have a separate contributors file containing their names that could be pulled here in as well as into the docs

I had to dig into the source code of the Sphix spelling extension to see if this was possible. Apparently, it is possible to read from additional files but the documentation didn't explain how to do this.
I will try a few things and see if it works (which it should) 😄

WillAyd · 2018-05-26T23:41:29Z

doc/source/whatsnew/v0.15.2.txt

 - Regression in ``Timestamp`` does not parse 'Z' zone designator for UTC (:issue:`8771`)
 - Bug in `StataWriter` the produces writes strings with 244 characters irrespective of actual size (:issue:`8969`)
 - Fixed ValueError raised by cummin/cummax when datetime64 Series contains NaT. (:issue:`8965`)
- Bug in Datareader returns object dtype if there are missing values (:issue:`8980`)
+- Bug in Data reader returns object dtype if there are missing values (:issue:`8980`)


This was a class that could just be placed in backticks like StatWriter above. Also capitalize R so DataReader

WillAyd · 2018-05-26T23:42:04Z

doc/source/whatsnew/v0.18.0.txt

@@ -414,7 +414,7 @@ New Behavior:
   df.loc[ix, 'b'] = df.loc[ix, 'b']
   df.dtypes

-When a DataFrame's integer slice is partially updated with a new slice of floats that could potentially be downcasted to integer without losing precision, the dtype of the slice will be set to float instead of integer.
+When a DataFrame's integer slice is partially updated with a new slice of floats that could potentially be down casted to integer without losing precision, the dtype of the slice will be set to float instead of integer.


down-casted?

FabioRosado · 2018-05-27T08:24:02Z

I am going on holidays for the next 4-5 days so I won't be able to work on these issues. But when I'm back home on Saturday I will update the PR with all the requested changes 👍

…one for terms

WillAyd

minor comments but otherwise lgtm. If you want to create a make rule for this could be a logical follow up *(separate PR)

WillAyd · 2018-06-02T22:36:13Z

doc/source/release.rst

@@ -2582,7 +2582,7 @@ Improvements to existing features
  docstrings for Panel flex methods. (:issue:`5336`)
 - ``NDFrame.drop()``, ``NDFrame.dropna()``, and ``.drop_duplicates()`` all
  accept ``inplace`` as a keyword argument; however, this only means that the
-  wrapper is updated inplace, a copy is still made internally.
+  wrapper is updated in place, a copy is still made internally.


WillAyd · 2018-06-02T22:36:23Z

doc/source/release.rst

@@ -2625,7 +2625,7 @@ API Changes
  :issue:`5744`, :issue:`5756`)
 - Default export for ``to_clipboard`` is now csv with a sep of `\t` for
  compat (:issue:`3368`)
- ``at`` now will enlarge the object inplace (and return the same)
+- ``at`` now will enlarge the object in place (and return the same)


FabioRosado · 2018-06-03T05:32:51Z

Ops I thought I had reverted all the terms but I missed those two sorry for that.
They are now changed and I added the makefile rule hope all is okay now, thanks for the help 😄 👍

WillAyd

lgtm thanks for tackling these updates!

WillAyd · 2018-06-03T17:09:47Z

Makefile

@@ -23,3 +23,4 @@ doc:
 	cd doc; \
 	python make.py clean; \
 	python make.py html
+	python make.py spellcheck


Hmm well I was thinking with would be a separate rule but the way you've done it doc is fine and may even be preferable. Just calling out here in case another reviewer has a differing opinion

jreback

is this added to the build?

jreback · 2018-06-03T22:50:33Z

doc/source/names_wordlist.txt

+Seabold
+Carrucciu
+Hoyer
+Pascoe


these look like the names of our contributors? can we simply generate this?

something like: git log --format='%an' | sort|uniq (I guess would have to split on whitespace as well)

The spelling extension checks every single file for text/docstrings so the names of the contributors needed to be whitelisted in order to be ignored by the spellchecker, so the name of every contributor is added to the build.

How would you advise on generating this? The reason why I'm asking is that the spelling extension uses path.join on anything that you pass as a wordlist (one or two files) so we only get an exception when trying to use let's say a list of names.

I suppose we could run the command and write it to file on every run? Perhaps using something like a tempfile to get all those names and use that file on every run?

Also, I have encountered a few issues when decoding and splitting every name - the spelling extension makes some weird word divisions at times, when I try to decode and split the list Yan Facai name will be split like this: ['颜发才（Yan', 'Facai）'] since the wordlist expects one line per word, then this name will always be marked as wrong due to how the split divided the name. I suppose using regex could help with this matter but not sure.

TomAugspurger · 2018-06-04T13:24:10Z

We already have https://github.com/pandas-dev/pandas/blob/master/scripts/announce.py for getting a list of contributors. We could refactor that to update the wordlist of contributors when it's rn.

…

On Mon, Jun 4, 2018 at 12:51 AM, Fábio Rosado ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In doc/source/names_wordlist.txt <#21109 (comment)>: > +Skolasinski +Rinoc +Gieseke +Safia +Abdalla +Saumitra +Shahapure +Pölsterl +Rubbert +Sinhrks +Siu +Kwan +Seabold +Carrucciu +Hoyer +Pascoe The spelling extension checks every single file for text/docstrings so the names of the contributors needed to be whitelisted in order to be ignored by the spellchecker. How would you advise on generating this? The reason why I'm asking is that the spelling extension uses path.join on anything that you pass as a wordlist (one or two files) so we only get an exception when trying to use let's say a list of names. I suppose we could run the command and write it to file on every run? Perhaps using something like a tempfile to get all those names and use that file on every run? Also, I have encountered a few issues when decoding and splitting every name - the spelling extension makes some weird word divisions at times, when I try to decode and split the list Yan Facai name will be split like this: ['颜发才（Yan', 'Facai）'] since the wordlist expects one line per word, then this name will always be marked as wrong due to how the split divided the name. I suppose using regex could help with this matter but not sure. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#21109 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIgtosWu7GK3xKWP3HG9wrMuwvp7pks5t5MrcgaJpZM4UDwfG> .

FabioRosado · 2018-06-04T13:36:06Z

I see. But the names need to be added to a wordlist like the file named spelling_wordlist.txt. Since the extension captures all text within the project every term will have to be whitelisted (unless its a grammatical error) which means that the name of every contributor needs to be included in a file somewhere.

It makes sense that perhaps the contributor names should be in a separate file like WillAyd suggested. Also, the extension needs very specific ways to whitelist terms - one term per line. As I pointed out, the problem with scrappy contributors name is that the names will need to be split into both newlines + spaces so every name is a whitelisted word.

Due to the fact that the extension divides words by special characters, it could potentially cause trouble with names such as the one from Yan Facai (example: 颜发才（Yan Facai） ) so the name needs to be added with both the special characters and normal characters - you can see this on the wordlist.

So I'm unsure what to do now in regards to jreback suggestions. If anyone else could give a hand in order for me to solve this issue and the PR merged I'll be happy to fix/modify anything 😃 👍

TomAugspurger · 2018-06-04T17:46:55Z

It makes sense that perhaps the contributor names should be in a separate file

Agreed. I would

Modify scripts/announce.py to take an optional flag that disables the + at the end of the name of new contributors
Run scripts/announce.py v0.4.0..HEAD to get a list of all the contributors
Update scripts/announce.py to update the list of contributors (will have to do some de-duplication)
Update the docbuild to run scripts/announce.py` on each build.

I've been meaning to change the docs around so that 4 is done anyone. I'll hopefully get to it by the end of the week.

WillAyd · 2018-06-04T18:50:21Z

@TomAugspurger that would be nice but I'm of the opinion that should be a separate PR, if only because that will introduce some code changes that exceed what we would normally allow in a DOC change. OK to merge this and have that as a follow up?

TomAugspurger · 2018-06-04T18:53:22Z

I'm fine with doing it as a followup, as long as the doc build is currently passing. I image this will be hard to keep up to date with master.

…

On Mon, Jun 4, 2018 at 1:50 PM, William Ayd ***@***.***> wrote: @TomAugspurger <https://github.com/TomAugspurger> that would be nice but I'm of the opinion that should be a separate PR, if only because that will introduce some code changes that exceed what we would normally allow in a DOC change. OK to merge this and have that as a follow up? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#21109 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIl2AXfcDek417vmU2BdfcvVi5xLbks5t5YFzgaJpZM4UDwfG> .

jreback · 2018-06-04T19:20:14Z

i am ok with only merging the actual spelling changes

the machinery needs to be done in a maintenance aware manner (no hard coding)

so if this were split up then ok

TomAugspurger · 2018-06-04T19:27:06Z

What hard coding are you referring to, the contributors list? It'll have to always be "hard coded" in the sense that it's a static text file. IIUC, in the meantime if we have a new contributor name, the doc build will fail, and they'll need to manually add their name to the contributors list?

…

On Mon, Jun 4, 2018 at 2:20 PM, Jeff Reback ***@***.***> wrote: i am ok with only merging the actual spelling changes the machinery needs to be done in a maintenance aware manner (no hard coding) so if this were split up then ok — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#21109 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIiIFSDkyEJsegXWS7wUCC9S8qE8Mks5t5Yh2gaJpZM4UDwfG> .

jreback · 2018-06-04T19:29:34Z

yes my point is that it needs to be auto generated
otherwise this will become a maintenance burden

if u want to make this a manual action that is ok too

but it should not be part of he build (is it? my question was never answered)

TomAugspurger · 2018-06-07T13:19:27Z

It's not currently part of the doc build on Travis, so for now it's the ability to run spellcheck + fixing a bunch of typos.

Opened #21354 for the followup.

TomAugspurger · 2018-06-07T13:20:58Z

Thanks @FabioRosado! If you're interested in the followup changes to the build processes feel free to claim it on #21354.

h-vetinari

Had a quick look through the diff, and some adverbs seem to have fallen through the cracks. Also "closedness".

Furthermore, there's an inconsistency with "MultiIndex" vs "multi-index" (and potentially vs. "multi-value").

h-vetinari · 2018-06-07T14:24:35Z

doc/source/whatsnew/v0.14.0.txt

@@ -781,7 +781,7 @@ Enhancements
  noon, January 1, 4713 BC.  Because nanoseconds are used to define the time
  in pandas the actual range of dates that you can use is 1678 AD to 2262 AD. (:issue:`4041`)
 - ``DataFrame.to_stata`` will now check data for compatibility with Stata data types
-  and will upcast when needed.  When it is not possible to losslessly upcast, a warning
+  and will upcast when needed.  When it is not possible to lossless upcast, a warning


This change is wrong - needs to be an adverb.

h-vetinari · 2018-06-07T14:25:04Z

doc/source/whatsnew/v0.10.0.txt

@@ -370,7 +370,7 @@ Updated PyTables Support
      df1.get_dtype_counts()

 - performance improvements on table writing
- support for arbitrarily indexed dimensions
+- support for arbitrary indexed dimensions


This should stay an adverb.

h-vetinari · 2018-06-07T14:25:47Z

doc/source/io.rst

@@ -2356,7 +2356,7 @@ Read a URL and match a table that contains specific text:

 Specify a header row (by default ``<th>`` or ``<td>`` elements located within a
 ``<thead>`` are used to form the column index, if multiple rows are contained within
-``<thead>`` then a multiindex is created); if specified, the header row is taken
+``<thead>`` then a multi-index is created); if specified, the header row is taken


All above changes were to MultiIndex - here multi-index?

h-vetinari · 2018-06-07T14:26:36Z

doc/source/whatsnew/v0.16.1.txt

@@ -133,7 +133,7 @@ groupby operations on the index will preserve the index nature as well
 reindexing operations, will return a resulting index based on the type of the passed
 indexer, meaning that passing a list will return a plain-old-``Index``; indexing with
 a ``Categorical`` will return a ``CategoricalIndex``, indexed according to the categories
-of the PASSED ``Categorical`` dtype. This allows one to arbitrarly index these even with
+of the PASSED ``Categorical`` dtype. This allows one to arbitrary index these even with


h-vetinari · 2018-06-07T14:28:03Z

doc/source/whatsnew/v0.20.0.txt

@@ -462,7 +462,7 @@ Selecting via a scalar value that is contained *in* the intervals.
 Other Enhancements
 ^^^^^^^^^^^^^^^^^^

- ``DataFrame.rolling()`` now accepts the parameter ``closed='right'|'left'|'both'|'neither'`` to choose the rolling window-endpoint closedness. See the :ref:`documentation <stats.rolling_window.endpoints>` (:issue:`13965`)
+- ``DataFrame.rolling()`` now accepts the parameter ``closed='right'|'left'|'both'|'neither'`` to choose the rolling window-endpoint closed. See the :ref:`documentation <stats.rolling_window.endpoints>` (:issue:`13965`)


"Closedness" might not be the prettiest word, but "to choose the [...] closed" is definitely wrong

h-vetinari · 2018-06-07T14:35:40Z

doc/source/release.rst

@@ -4115,7 +4115,7 @@ Bug Fixes
  datetime64 when calling DataFrame.apply. (:issue:`2374`)
 - Raise exception when calling to_panel on non uniquely-indexed frame (:issue:`2441`)
 - Improved detection of console encoding on IPython zmq frontends (:issue:`2458`)
- Preserve time zone when .append-ing two time series (:issue:`2260`)
+- Preserve time zone when .appending two time series (:issue:`2260`)


The hyphen was surely intentional...

TomAugspurger · 2018-06-07T14:46:51Z

@h-vetinari this is merged so your comments are probably going to be lost. Feel free to make a new PR with the fixes.

…

On Thu, Jun 7, 2018 at 9:35 AM, h-vetinari ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In doc/source/release.rst <#21109 (comment)>: > @@ -4115,7 +4115,7 @@ Bug Fixes datetime64 when calling DataFrame.apply. (:issue:`2374`) - Raise exception when calling to_panel on non uniquely-indexed frame (:issue:`2441`) - Improved detection of console encoding on IPython zmq frontends (:issue:`2458`) -- Preserve time zone when .append-ing two time series (:issue:`2260`) +- Preserve time zone when .appending two time series (:issue:`2260`) The hyphen was surely intentional... — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#21109 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIo53cmeI07hXfZMFz0BtJEBk2_39ks5t6TpDgaJpZM4UDwfG> .

FabioRosado · 2018-06-07T16:40:33Z

Even though this PR was already merged I'm happy to fix these issues.
In regards to MultiIndex vs Multi-index it's a bit confusing because both terms seem to be in use (for examples in the io.rst the term Multi-index is used on a title but on advanced MultiIndex is used) which one you think is right @TomAugspurger ?

Also apologies for replying here :)

WillAyd · 2018-06-07T17:02:32Z

MultiIndex is what we want

* Add sphinx spelling extension, fix typos and add wordlist

Add sphinx spelling extension, fix typos and add wordlist

615d007

TomAugspurger reviewed May 17, 2018

View reviewed changes

Fix typos

f6481aa

Revert changes

fc9a960

FabioRosado force-pushed the sphinx-spelling branch from 1217397 to fc9a960 Compare May 18, 2018 15:59

WillAyd requested changes May 18, 2018

View reviewed changes

gfyoung added the Docs label May 21, 2018

Revert spelling changes

4117a9c

FabioRosado force-pushed the sphinx-spelling branch from c2383e1 to a2fe8b9 Compare May 26, 2018 12:48

Fix lint and add spellcheck documentation

a2fe8b9

Raise exception if output.txt is not empty

39482e0

Update whatsnew

c99a15d

WillAyd requested changes May 26, 2018

View reviewed changes

Revert changes on some terms. Add two lists of words - one for names …

18dc19c

…one for terms

FabioRosado force-pushed the sphinx-spelling branch from d6dc480 to 18dc19c Compare June 2, 2018 13:56

WillAyd requested changes Jun 2, 2018

View reviewed changes

Add makefile rule, revert some terms

d929098

WillAyd approved these changes Jun 3, 2018

View reviewed changes

jreback requested changes Jun 3, 2018

View reviewed changes

TomAugspurger mentioned this pull request Jun 7, 2018

CI: Enable spellcheck in docbuild #21354

Closed

TomAugspurger merged commit 9d61ab5 into pandas-dev:master Jun 7, 2018

h-vetinari reviewed Jun 7, 2018

View reviewed changes

datapythonista mentioned this pull request Jun 8, 2018

Missing dependency for sphinxcontrib-spelling #21396

Closed

jorisvandenbossche added this to the 0.24.0 milestone Jun 12, 2018

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018

DOC: Add sphinx spelling extension (pandas-dev#21109)

08d9fc5

* Add sphinx spelling extension, fix typos and add wordlist


		Running the spell check is easy. Just navigate to your local ``pandas/doc/`` directory and run::

		python make.py spellcheck

Uh oh!

DOC: Add sphinx spelling extension #21109

DOC: Add sphinx spelling extension #21109

Uh oh!

Conversation

FabioRosado commented May 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented May 18, 2018

Uh oh!

codecov bot commented May 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

FabioRosado commented May 18, 2018

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FabioRosado commented May 18, 2018

Uh oh!

pep8speaks commented May 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on June 02, 2018 at 13:56 Hours UTC

Uh oh!

FabioRosado commented May 26, 2018

Uh oh!

TomAugspurger commented May 26, 2018

Uh oh!

FabioRosado commented May 26, 2018

Uh oh!

FabioRosado commented May 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FabioRosado May 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

FabioRosado commented May 17, 2018 •

edited

Loading

codecov bot commented May 18, 2018 •

edited

Loading

pep8speaks commented May 26, 2018 •

edited

Loading

FabioRosado commented May 26, 2018 •

edited

Loading

FabioRosado May 27, 2018 •

edited

Loading

FabioRosado Jun 4, 2018 •

edited

Loading