Skip to content

DOC: Add sphinx spelling extension #21109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 7, 2018

Conversation

FabioRosado
Copy link
Contributor

@FabioRosado FabioRosado commented May 17, 2018

This is my first time contributing to a project this big so I hope everything is okay
I am still working on fixing bugs and polishing a few things but here is what I have done so far:

  • fixed typos
  • updated a wordlist txt file with words to be ignored by the spellchecker
  • added the spellcheck method to the DocBuilder class (should this be named spelling?)

I've re-used the _sphix_build method to run the spelling command, to do this I had to update the _sphix_build method to include 'spelling' as a kind. I'd like to know if this is the best approach of If I should just replicate the code.

I have added a few configuring options in order to get better results when using the spellcheck, the first one was to use the wordlist text file, the second was to ignore known PyPI packages names and finally to show suggestions of a misspelt word.

At the moment the spellcheck will run if we type the command: python make.py spellcheck and the spellcheck method is called, I haven't yet figured out how to fail the build when there are exceptions like the Issue 21079 suggested.


Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just gave a quick skim.

@@ -645,7 +645,7 @@ the quarter end:
ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9
ts.head()

Categoricals
Categorical
------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this, to not break links.

@@ -387,7 +387,7 @@ Furthermore you can *set* the values using the following methods.
df2.loc(axis=0)[:, :, ['C1', 'C3']] = -10
df2

You can use a right-hand-side of an alignable object as well.
You can use a right-hand-side of an align object as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think alignable is OK.

@@ -593,7 +593,7 @@ Take Methods

Similar to NumPy ndarrays, pandas Index, Series, and DataFrame also provides
the ``take`` method that retrieves elements along a given axis at the given
indices. The given indices must be either a list or an ndarray of integer
indexes. The given indexes must be either a list or an ndarray of integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer indices, since that's the argument name to take.

@@ -711,7 +711,7 @@ order is ``cab``).

df2.sort_index()

Groupby operations on the index will preserve the index nature as well.
Group by operations on the index will preserve the index nature as well.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably add groupby to the wordlist.

@@ -723,7 +723,7 @@ Rolling Apply to multiple columns where function returns a Scalar (Volume Weight
s = pd.concat([ (pd.Series(vwap(df.iloc[i:i+window]), index=[df.index[i+window]])) for i in range(len(df)-window) ]);
s.round(2)

Timeseries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add timeseries to the wordlist?

@TomAugspurger
Copy link
Contributor

Do you have a rough measure of how long this takes to run on the docs?

@codecov
Copy link

codecov bot commented May 18, 2018

Codecov Report

Merging #21109 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #21109   +/-   ##
=======================================
  Coverage   91.85%   91.85%           
=======================================
  Files         153      153           
  Lines       49546    49546           
=======================================
  Hits        45509    45509           
  Misses       4037     4037
Flag Coverage Δ
#multiple 90.25% <100%> (ø) ⬆️
#single 41.87% <33.33%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/categorical.py 95.69% <ø> (ø) ⬆️
pandas/core/groupby/groupby.py 92.66% <ø> (ø) ⬆️
pandas/core/generic.py 96.12% <ø> (ø) ⬆️
pandas/core/reshape/merge.py 94.25% <ø> (ø) ⬆️
pandas/core/frame.py 97.22% <ø> (ø) ⬆️
pandas/io/json/table_schema.py 98.29% <ø> (ø) ⬆️
pandas/core/strings.py 98.62% <ø> (ø) ⬆️
pandas/core/tools/datetimes.py 84.98% <ø> (ø) ⬆️
pandas/core/arrays/base.py 83.95% <ø> (ø) ⬆️
pandas/core/panel.py 97.87% <ø> (ø) ⬆️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4274b84...d929098. Read the comment docs.

@FabioRosado
Copy link
Contributor Author

I apologise for taking so long to reply but this took me longer than I expected. I've added all the words to the wordlist (this file is now huge) and corrected some stuff, I will need your help in regards to some changes that I have done:

  • changed indicies to indices since indices is already in the plural form (I only changed comments and docstrings, there are some variables using this indicies but I didn't want to touch them unless you agree)
  • changed performant to efficient this term was used plenty of times and I'm not sure if changing it is desirable (if not I'll revert the changes)

The spelling extension is funny as it divides words by special characters (anything like - or ') so I had to add some words that don't exist in the dictionary but make sense when attached to others. (hope this makes sense?)

I've also given it some thought and I think we could fail the build by raising an exception if there are any words in this output.txt file, the idea is to:

  • open the file
  • readlines
  • pass if empty
    Not sure if this is a smart way to approach the problem though.

Finally, the time it takes to spellcheck. If every single file needs to be generated then it will take about 2-3minutes to generate and read the files and about 1 minute to spellcheck and output misspelt words.

If the files are already generated then it runs in 1 minute. This is with the spelling_show_suggestions option set to False because this option seems to increase the time it takes to spellcheck to about 3x worse.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gotten all the way through yet but I see some consistent themes that could use updating and have labeled as such.

To answer your other question, stick with just DOC updates for the time being. If you wanted to go back and change variable names that should be a separate change if at all

@@ -663,7 +663,7 @@ Convert the raw grades to a categorical data type.
df["grade"]

Rename the categories to more meaningful names (assigning to
``Series.cat.categories`` is inplace!).
``Series.cat.categories`` is in place!).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be making an exception for inplace

@@ -182,7 +182,7 @@ For example:
df[['foo','qux']].columns # sliced

This is done to avoid a recomputation of the levels in order to make slicing
highly performant. If you want to see only the used levels, you can use the
highly efficient. If you want to see only the used levels, you can use the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still stick with performant - there's some overlap but still not quite perfect synonyms

@@ -559,7 +559,7 @@ return a copy of the data rather than a view:

.. _advanced.unsorted:

Furthermore if you try to index something that is not fully lexsorted, this can raise:
Furthermore if you try to index something that is not fully lex-sorted, this can raise:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen "lex-sorted" before, so I'd stick with the original spelling here

@@ -593,7 +593,7 @@ Take Methods

Similar to NumPy ndarrays, pandas Index, Series, and DataFrame also provides
the ``take`` method that retrieves elements along a given axis at the given
indices. The given indices must be either a list or an ndarray of integer
indexes. The given indices must be either a list or an ndarray of integer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change from indices to indexes?

@@ -1029,7 +1029,7 @@ Skip row between header and data
01.01.1990 05:00;21;11;12;13
"""

Option 1: pass rows explicitly to skiprows
Option 1: pass rows explicitly to skip rows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to be careful when changing headers - I think you now need another double quote on the line below for proper rendering

@@ -1812,7 +1812,7 @@ Writing to a file, with a date index and a date column:
dfj2.to_json('test.json')
open('test.json').read()

Fallback Behavior
Fall-back Behavior
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same header comment - if you update these you need to change the line below to reflect new length

@@ -2237,16 +2237,16 @@ A few notes on the generated table schema:
name is ``values``
+ For ``DataFrames``, the stringified version of the column name is used
+ For ``Index`` (not ``MultiIndex``), ``index.name`` is used, with a
fallback to ``index`` if that is None.
fall-back to ``index`` if that is None.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe fallback is a word

the preserveration of metadata such as dtypes and index names in a
round-trippable manner.
the preservation of metadata such as dtypes and index names in a
round-trip manner.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm would still stick with trippable here

@@ -3467,7 +3467,7 @@ Fixed Format
''''''''''''

The examples above show storing using ``put``, which write the HDF5 to ``PyTables`` in a fixed array format, called
the ``fixed`` format. These types of stores are **not** appendable once written (though you can simply
the ``fixed`` format. These types of stores are **not** able to be appended once written (though you can simply
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appendable would read easier

@@ -3820,7 +3820,7 @@ indexed dimension as the ``where``.

.. note::

Indexes are automagically created on the indexables
Indexes are auto-magically created on the indexables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stick with automagically

@FabioRosado
Copy link
Contributor Author

Thanks for the feedback @WillAyd some of these changes where made yesterday and after working today on it I got a better understanding of a few terms. I also forgot to double check the previous commit, my apologies for it.

I will revert the changes from efficient back to performant. I'll try to do all the changes before submitting another commit so I'll wait a bit 😃👍

@gfyoung gfyoung added the Docs label May 21, 2018
@pep8speaks
Copy link

pep8speaks commented May 26, 2018

Hello @FabioRosado! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 02, 2018 at 13:56 Hours UTC

@FabioRosado
Copy link
Contributor Author

I got some time to work on this PR and here is the updated version:

  • Reverted changes from efficient back to performant
  • Revert lex-sorted back to lexsorted
  • Fixed lint issues
  • Added new section on documentation to inform how to run the spell check and how to add words to the file

As for the MultiIndex I changed two terms but as I searched for multi-index it seems that this term appears a fair amount of time so I'm not sure if I should change ALL of them from multi-index to MultiIndex or just leave it as it is.

On the documentation about the spell check should I explain better about the output.txt file and how to find the word that is being marked as spell check?

Finally, should I raise an exception if any errors were found after running the spellcheck command? I think the best way to do this would be opening the output.txt file, check if it's empty and if not raise an exception. Not sure if this i the best way to approach this problem though 😃

@TomAugspurger
Copy link
Contributor

Finally, should I raise an exception if any errors were found after running the spellcheck command?

Yes, ideally. Do you know how long the spellchecker takes to run over our docs? We're close to our time limit on Travis for the doc build.

@FabioRosado
Copy link
Contributor Author

Okay I’ll add that bit of code when i get home later on. The spell checker takes 1 minute to run but if the docs weren’t generated yet they are generated first so it takes up to 6 minutes to run

@FabioRosado
Copy link
Contributor Author

FabioRosado commented May 26, 2018

The code will not raise an exception if there are misspelt words. I was unsure which exception to raise so I went with SyntaxError with a message that perhaps will be enough to let people know why the exception was raised 😃

--EDIT--
Sorry for the number of commits but this will be my last one until you want me to change something. I've added the entry to whatsnew hopefully everything looks good. Thanks for the help and patience with me 😄 👍

@@ -370,7 +370,7 @@ Renaming categories is done by assigning new values to the

.. note::

Be aware that assigning new categories is an inplace operation, while most other operations
Be aware that assigning new categories is an in place operation, while most other operations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still think inplace as a term is fine, especially since it aligns with the keyword used for the concept throughout pandas

@@ -228,9 +228,9 @@ Grouping by more than one column is done by passing a list of columns to the
JOIN
----
JOINs can be performed with :meth:`~pandas.DataFrame.join` or :meth:`~pandas.merge`. By default,
:meth:`~pandas.DataFrame.join` will join the DataFrames on their indices. Each method has
:meth:`~pandas.DataFrame.join` will join the DataFrames on their indexes. Each method has
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I missed it before but any reason we changed these? indices seemed preferable to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw both terms in the documentation and since the spelling marked indices as wrong I changed a lot of them that's the reason why. I can revert these changes 😃 👍


The spellcheck will take a few minutes to run (between 1 to 6 minutes). Sphinx will alert you
with warnings and misspelt words - these misspelt words will be added to a file called
``output.txt`` and you can find it on your local directory ``pandas/doc/build/spelling/``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying this is a bad thing but any reason you chose to output to a text file instead of to STDOUT? All of the other checks I can think of off the top of my head would write to the latter

Copy link
Contributor Author

@FabioRosado FabioRosado May 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I didn't choose such approach. The spelling library is coded in this way (I checked the source code), it will always output a text file with all the misspelt words. To be honest I would much rather work with STDOUT and avoid calling open on a file to check if it was empty


Running the spell check is easy. Just navigate to your local ``pandas/doc/`` directory and run::

python make.py spellcheck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we alternately add a rule to the Makefile so that make spellcheck works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think about this my apologies I'll add the spellcheck on the makefile 👍

Hoese
Stansby
Kamau
Niederhut
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying we need to do this here but wonder if its cleaner to have a separate contributors file containing their names that could be pulled here in as well as into the docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to dig into the source code of the Sphix spelling extension to see if this was possible. Apparently, it is possible to read from additional files but the documentation didn't explain how to do this.
I will try a few things and see if it works (which it should) 😄

- Regression in ``Timestamp`` does not parse 'Z' zone designator for UTC (:issue:`8771`)
- Bug in `StataWriter` the produces writes strings with 244 characters irrespective of actual size (:issue:`8969`)
- Fixed ValueError raised by cummin/cummax when datetime64 Series contains NaT. (:issue:`8965`)
- Bug in Datareader returns object dtype if there are missing values (:issue:`8980`)
- Bug in Data reader returns object dtype if there are missing values (:issue:`8980`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a class that could just be placed in backticks like StatWriter above. Also capitalize R so DataReader

@@ -414,7 +414,7 @@ New Behavior:
df.loc[ix, 'b'] = df.loc[ix, 'b']
df.dtypes

When a DataFrame's integer slice is partially updated with a new slice of floats that could potentially be downcasted to integer without losing precision, the dtype of the slice will be set to float instead of integer.
When a DataFrame's integer slice is partially updated with a new slice of floats that could potentially be down casted to integer without losing precision, the dtype of the slice will be set to float instead of integer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

down-casted?

@FabioRosado
Copy link
Contributor Author

I am going on holidays for the next 4-5 days so I won't be able to work on these issues. But when I'm back home on Saturday I will update the PR with all the requested changes 👍

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments but otherwise lgtm. If you want to create a make rule for this could be a logical follow up *(separate PR)

@@ -2582,7 +2582,7 @@ Improvements to existing features
docstrings for Panel flex methods. (:issue:`5336`)
- ``NDFrame.drop()``, ``NDFrame.dropna()``, and ``.drop_duplicates()`` all
accept ``inplace`` as a keyword argument; however, this only means that the
wrapper is updated inplace, a copy is still made internally.
wrapper is updated in place, a copy is still made internally.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@@ -2625,7 +2625,7 @@ API Changes
:issue:`5744`, :issue:`5756`)
- Default export for ``to_clipboard`` is now csv with a sep of `\t` for
compat (:issue:`3368`)
- ``at`` now will enlarge the object inplace (and return the same)
- ``at`` now will enlarge the object in place (and return the same)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too

@FabioRosado
Copy link
Contributor Author

Ops I thought I had reverted all the terms but I missed those two sorry for that.
They are now changed and I added the makefile rule hope all is okay now, thanks for the help 😄 👍

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks for tackling these updates!

@@ -23,3 +23,4 @@ doc:
cd doc; \
python make.py clean; \
python make.py html
python make.py spellcheck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm well I was thinking with would be a separate rule but the way you've done it doc is fine and may even be preferable. Just calling out here in case another reviewer has a differing opinion

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this added to the build?

Seabold
Carrucciu
Hoyer
Pascoe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these look like the names of our contributors? can we simply generate this?

something like: git log --format='%an' | sort|uniq (I guess would have to split on whitespace as well)

Copy link
Contributor Author

@FabioRosado FabioRosado Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spelling extension checks every single file for text/docstrings so the names of the contributors needed to be whitelisted in order to be ignored by the spellchecker, so the name of every contributor is added to the build.

How would you advise on generating this? The reason why I'm asking is that the spelling extension uses path.join on anything that you pass as a wordlist (one or two files) so we only get an exception when trying to use let's say a list of names.

I suppose we could run the command and write it to file on every run? Perhaps using something like a tempfile to get all those names and use that file on every run?

Also, I have encountered a few issues when decoding and splitting every name - the spelling extension makes some weird word divisions at times, when I try to decode and split the list Yan Facai name will be split like this: ['颜发才(Yan', 'Facai)'] since the wordlist expects one line per word, then this name will always be marked as wrong due to how the split divided the name. I suppose using regex could help with this matter but not sure.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 4, 2018 via email

@FabioRosado
Copy link
Contributor Author

I see. But the names need to be added to a wordlist like the file named spelling_wordlist.txt. Since the extension captures all text within the project every term will have to be whitelisted (unless its a grammatical error) which means that the name of every contributor needs to be included in a file somewhere.

It makes sense that perhaps the contributor names should be in a separate file like WillAyd suggested. Also, the extension needs very specific ways to whitelist terms - one term per line. As I pointed out, the problem with scrappy contributors name is that the names will need to be split into both newlines + spaces so every name is a whitelisted word.

Due to the fact that the extension divides words by special characters, it could potentially cause trouble with names such as the one from Yan Facai (example: 颜发才(Yan Facai) ) so the name needs to be added with both the special characters and normal characters - you can see this on the wordlist.

So I'm unsure what to do now in regards to jreback suggestions. If anyone else could give a hand in order for me to solve this issue and the PR merged I'll be happy to fix/modify anything 😃 👍

@TomAugspurger
Copy link
Contributor

It makes sense that perhaps the contributor names should be in a separate file

Agreed. I would

  1. Modify scripts/announce.py to take an optional flag that disables the + at the end of the name of new contributors
  2. Run scripts/announce.py v0.4.0..HEAD to get a list of all the contributors
  3. Update scripts/announce.py to update the list of contributors (will have to do some de-duplication)
  4. Update the docbuild to run scripts/announce.py` on each build.

I've been meaning to change the docs around so that 4 is done anyone. I'll hopefully get to it by the end of the week.

@WillAyd
Copy link
Member

WillAyd commented Jun 4, 2018

@TomAugspurger that would be nice but I'm of the opinion that should be a separate PR, if only because that will introduce some code changes that exceed what we would normally allow in a DOC change. OK to merge this and have that as a follow up?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 4, 2018 via email

@jreback
Copy link
Contributor

jreback commented Jun 4, 2018

i am ok with only merging the actual spelling changes

the machinery needs to be done in a maintenance aware manner (no hard coding)

so if this were split up then ok

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 4, 2018 via email

@jreback
Copy link
Contributor

jreback commented Jun 4, 2018

yes my point is that it needs to be auto generated
otherwise this will become a maintenance burden

if u want to make this a manual action that is ok too

but it should not be part of he build (is it? my question was never answered)

@TomAugspurger
Copy link
Contributor

It's not currently part of the doc build on Travis, so for now it's the ability to run spellcheck + fixing a bunch of typos.

Opened #21354 for the followup.

@TomAugspurger TomAugspurger merged commit 9d61ab5 into pandas-dev:master Jun 7, 2018
@TomAugspurger
Copy link
Contributor

Thanks @FabioRosado! If you're interested in the followup changes to the build processes feel free to claim it on #21354.

Copy link
Contributor

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick look through the diff, and some adverbs seem to have fallen through the cracks. Also "closedness".

Furthermore, there's an inconsistency with "MultiIndex" vs "multi-index" (and potentially vs. "multi-value").

@@ -781,7 +781,7 @@ Enhancements
noon, January 1, 4713 BC. Because nanoseconds are used to define the time
in pandas the actual range of dates that you can use is 1678 AD to 2262 AD. (:issue:`4041`)
- ``DataFrame.to_stata`` will now check data for compatibility with Stata data types
and will upcast when needed. When it is not possible to losslessly upcast, a warning
and will upcast when needed. When it is not possible to lossless upcast, a warning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is wrong - needs to be an adverb.

@@ -370,7 +370,7 @@ Updated PyTables Support
df1.get_dtype_counts()

- performance improvements on table writing
- support for arbitrarily indexed dimensions
- support for arbitrary indexed dimensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should stay an adverb.

@@ -2356,7 +2356,7 @@ Read a URL and match a table that contains specific text:

Specify a header row (by default ``<th>`` or ``<td>`` elements located within a
``<thead>`` are used to form the column index, if multiple rows are contained within
``<thead>`` then a multiindex is created); if specified, the header row is taken
``<thead>`` then a multi-index is created); if specified, the header row is taken
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All above changes were to MultiIndex - here multi-index?

@@ -133,7 +133,7 @@ groupby operations on the index will preserve the index nature as well
reindexing operations, will return a resulting index based on the type of the passed
indexer, meaning that passing a list will return a plain-old-``Index``; indexing with
a ``Categorical`` will return a ``CategoricalIndex``, indexed according to the categories
of the PASSED ``Categorical`` dtype. This allows one to arbitrarly index these even with
of the PASSED ``Categorical`` dtype. This allows one to arbitrary index these even with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adverb

@@ -462,7 +462,7 @@ Selecting via a scalar value that is contained *in* the intervals.
Other Enhancements
^^^^^^^^^^^^^^^^^^

- ``DataFrame.rolling()`` now accepts the parameter ``closed='right'|'left'|'both'|'neither'`` to choose the rolling window-endpoint closedness. See the :ref:`documentation <stats.rolling_window.endpoints>` (:issue:`13965`)
- ``DataFrame.rolling()`` now accepts the parameter ``closed='right'|'left'|'both'|'neither'`` to choose the rolling window-endpoint closed. See the :ref:`documentation <stats.rolling_window.endpoints>` (:issue:`13965`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Closedness" might not be the prettiest word, but "to choose the [...] closed" is definitely wrong

@@ -4115,7 +4115,7 @@ Bug Fixes
datetime64 when calling DataFrame.apply. (:issue:`2374`)
- Raise exception when calling to_panel on non uniquely-indexed frame (:issue:`2441`)
- Improved detection of console encoding on IPython zmq frontends (:issue:`2458`)
- Preserve time zone when .append-ing two time series (:issue:`2260`)
- Preserve time zone when .appending two time series (:issue:`2260`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hyphen was surely intentional...

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 7, 2018 via email

@FabioRosado
Copy link
Contributor Author

Even though this PR was already merged I'm happy to fix these issues.
In regards to MultiIndex vs Multi-index it's a bit confusing because both terms seem to be in use (for examples in the io.rst the term Multi-index is used on a title but on advanced MultiIndex is used) which one you think is right @TomAugspurger ?

Also apologies for replying here :)

@WillAyd
Copy link
Member

WillAyd commented Jun 7, 2018

MultiIndex is what we want

@jorisvandenbossche jorisvandenbossche added this to the 0.24.0 milestone Jun 12, 2018
david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018
* Add sphinx spelling extension, fix typos and add wordlist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Enable sphinx.contrib.spelling
8 participants