From 4db41200be02cb7246fe9fd03f9fcc88aad2e71e Mon Sep 17 00:00:00 2001 From: tommyod Date: Tue, 26 Dec 2017 17:07:11 +0100 Subject: [PATCH 1/2] DOC: greater consistency and spell-check for intro docs --- doc/source/10min.rst | 137 ++++++++++++++++++++-------------------- doc/source/basics.rst | 131 +++++++++++++++++++------------------- doc/source/dsintro.rst | 47 +++++++------- doc/source/overview.rst | 19 +++--- 4 files changed, 171 insertions(+), 163 deletions(-) diff --git a/doc/source/10min.rst b/doc/source/10min.rst index 49142311ff057..0507cd0d2a97a 100644 --- a/doc/source/10min.rst +++ b/doc/source/10min.rst @@ -25,7 +25,7 @@ ******************** This is a short introduction to pandas, geared mainly for new users. -You can see more complex recipes in the :ref:`Cookbook` +You can see more complex recipes in the :ref:`Cookbook`. Customarily, we import as follows: @@ -38,7 +38,7 @@ Customarily, we import as follows: Object Creation --------------- -See the :ref:`Data Structure Intro section ` +See the :ref:`Data Structure Intro section `. Creating a :class:`Series` by passing a list of values, letting pandas create a default integer index: @@ -70,7 +70,8 @@ Creating a ``DataFrame`` by passing a dict of objects that can be converted to s 'F' : 'foo' }) df2 -Having specific :ref:`dtypes ` +The columns of the resulting ``DataFrame`` have different +:ref:`dtypes `. .. ipython:: python @@ -104,16 +105,16 @@ truncated for brevity. Viewing Data ------------ -See the :ref:`Basics section ` +See the :ref:`Basics section `. -See the top & bottom rows of the frame +Here is how to view the top and bottom rows of the frame: .. ipython:: python df.head() df.tail(3) -Display the index, columns, and the underlying numpy data +Display the index, columns, and the underlying numpy data: .. ipython:: python @@ -121,25 +122,25 @@ Display the index, columns, and the underlying numpy data df.columns df.values -Describe shows a quick statistic summary of your data +:func:`~DataFrame.describe` shows a quick statistic summary of your data: .. ipython:: python df.describe() -Transposing your data +Transposing your data: .. ipython:: python df.T -Sorting by an axis +Sorting by an axis: .. ipython:: python df.sort_index(axis=1, ascending=False) -Sorting by values +Sorting by values: .. ipython:: python @@ -155,13 +156,13 @@ Selection recommend the optimized pandas data access methods, ``.at``, ``.iat``, ``.loc``, ``.iloc`` and ``.ix``. -See the indexing documentation :ref:`Indexing and Selecting Data ` and :ref:`MultiIndex / Advanced Indexing ` +See the indexing documentation :ref:`Indexing and Selecting Data ` and :ref:`MultiIndex / Advanced Indexing `. Getting ~~~~~~~ Selecting a single column, which yields a ``Series``, -equivalent to ``df.A`` +equivalent to ``df.A``: .. ipython:: python @@ -177,39 +178,39 @@ Selecting via ``[]``, which slices the rows. Selection by Label ~~~~~~~~~~~~~~~~~~ -See more in :ref:`Selection by Label ` +See more in :ref:`Selection by Label `. -For getting a cross section using a label +For getting a cross section using a label: .. ipython:: python df.loc[dates[0]] -Selecting on a multi-axis by label +Selecting on a multi-axis by label: .. ipython:: python df.loc[:,['A','B']] -Showing label slicing, both endpoints are *included* +Showing label slicing, both endpoints are *included*: .. ipython:: python df.loc['20130102':'20130104',['A','B']] -Reduction in the dimensions of the returned object +Reduction in the dimensions of the returned object: .. ipython:: python df.loc['20130102',['A','B']] -For getting a scalar value +For getting a scalar value: .. ipython:: python df.loc[dates[0],'A'] -For getting fast access to a scalar (equiv to the prior method) +For getting fast access to a scalar (equiv to the prior method): .. ipython:: python @@ -218,45 +219,45 @@ For getting fast access to a scalar (equiv to the prior method) Selection by Position ~~~~~~~~~~~~~~~~~~~~~ -See more in :ref:`Selection by Position ` +See more in :ref:`Selection by Position `. -Select via the position of the passed integers +Select via the position of the passed integers: .. ipython:: python df.iloc[3] -By integer slices, acting similar to numpy/python +By integer slices, acting similar to numpy/python: .. ipython:: python df.iloc[3:5,0:2] -By lists of integer position locations, similar to the numpy/python style +By lists of integer position locations, similar to the numpy/python style: .. ipython:: python df.iloc[[1,2,4],[0,2]] -For slicing rows explicitly +For slicing rows explicitly: .. ipython:: python df.iloc[1:3,:] -For slicing columns explicitly +For slicing columns explicitly: .. ipython:: python df.iloc[:,1:3] -For getting a value explicitly +For getting a value explicitly: .. ipython:: python df.iloc[1,1] -For getting fast access to a scalar (equiv to the prior method) +For getting fast access to a scalar (equivalent to the prior method): .. ipython:: python @@ -290,7 +291,7 @@ Setting ~~~~~~~ Setting a new column automatically aligns the data -by the indexes +by the indexes. .. ipython:: python @@ -298,25 +299,25 @@ by the indexes s1 df['F'] = s1 -Setting values by label +Setting values by label: .. ipython:: python df.at[dates[0],'A'] = 0 -Setting values by position +Setting values by position: .. ipython:: python df.iat[0,1] = 0 -Setting by assigning with a numpy array +Setting by assigning with a numpy array: .. ipython:: python df.loc[:,'D'] = np.array([5] * len(df)) -The result of the prior setting operations +The result of the prior setting operations. .. ipython:: python @@ -336,7 +337,7 @@ Missing Data pandas primarily uses the value ``np.nan`` to represent missing data. It is by default not included in computations. See the :ref:`Missing Data section -` +`. Reindexing allows you to change/add/delete the index on a specified axis. This returns a copy of the data. @@ -353,13 +354,13 @@ To drop any rows that have missing data. df1.dropna(how='any') -Filling missing data +Filling missing data. .. ipython:: python df1.fillna(value=5) -To get the boolean mask where values are ``nan`` +To get the boolean mask where values are ``nan``. .. ipython:: python @@ -369,20 +370,20 @@ To get the boolean mask where values are ``nan`` Operations ---------- -See the :ref:`Basic section on Binary Ops ` +See the :ref:`Basic section on Binary Ops `. Stats ~~~~~ Operations in general *exclude* missing data. -Performing a descriptive statistic +Performing a descriptive statistic: .. ipython:: python df.mean() -Same operation on the other axis +Same operation on the other axis: .. ipython:: python @@ -401,7 +402,7 @@ In addition, pandas automatically broadcasts along the specified dimension. Apply ~~~~~ -Applying functions to the data +Applying functions to the data: .. ipython:: python @@ -411,7 +412,7 @@ Applying functions to the data Histogramming ~~~~~~~~~~~~~ -See more at :ref:`Histogramming and Discretization ` +See more at :ref:`Histogramming and Discretization `. .. ipython:: python @@ -425,7 +426,7 @@ String Methods Series is equipped with a set of string processing methods in the `str` attribute that make it easy to operate on each element of the array, as in the code snippet below. Note that pattern-matching in `str` generally uses `regular -expressions `__ by default (and in +expressions `__ by default (and in some cases always uses them). See more at :ref:`Vectorized String Methods `. @@ -445,7 +446,7 @@ DataFrame, and Panel objects with various kinds of set logic for the indexes and relational algebra functionality in the case of join / merge-type operations. -See the :ref:`Merging section ` +See the :ref:`Merging section `. Concatenating pandas objects together with :func:`concat`: @@ -462,7 +463,7 @@ Concatenating pandas objects together with :func:`concat`: Join ~~~~ -SQL style merges. See the :ref:`Database style joining ` +SQL style merges. See the :ref:`Database style joining ` section. .. ipython:: python @@ -486,7 +487,8 @@ Another example that can be given is: Append ~~~~~~ -Append rows to a dataframe. See the :ref:`Appending ` +Append rows to a dataframe. See the :ref:`Appending ` +section. .. ipython:: python @@ -500,13 +502,13 @@ Grouping -------- By "group by" we are referring to a process involving one or more of the -following steps +following steps: - **Splitting** the data into groups based on some criteria - **Applying** a function to each group independently - **Combining** the results into a data structure -See the :ref:`Grouping section ` +See the :ref:`Grouping section `. .. ipython:: python @@ -518,14 +520,14 @@ See the :ref:`Grouping section ` 'D' : np.random.randn(8)}) df -Grouping and then applying a function ``sum`` to the resulting groups. +Grouping and then applying the ``sum`` function to the resulting groups. .. ipython:: python df.groupby('A').sum() -Grouping by multiple columns forms a hierarchical index, which we then apply -the function. +Grouping by multiple columns forms a hierarchical index, and again we can +apply the ``sum`` function. .. ipython:: python @@ -595,7 +597,7 @@ Time Series pandas has simple, powerful, and efficient functionality for performing resampling operations during frequency conversion (e.g., converting secondly data into 5-minutely data). This is extremely common in, but not limited to, -financial applications. See the :ref:`Time Series section ` +financial applications. See the :ref:`Time Series section `. .. ipython:: python @@ -603,7 +605,7 @@ financial applications. See the :ref:`Time Series section ` ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng) ts.resample('5Min').sum() -Time zone representation +Time zone representation: .. ipython:: python @@ -613,13 +615,13 @@ Time zone representation ts_utc = ts.tz_localize('UTC') ts_utc -Convert to another time zone +Converting to another time zone: .. ipython:: python ts_utc.tz_convert('US/Eastern') -Converting between time span representations +Converting between time span representations: .. ipython:: python @@ -659,7 +661,8 @@ Convert the raw grades to a categorical data type. df["grade"] = df["raw_grade"].astype("category") df["grade"] -Rename the categories to more meaningful names (assigning to ``Series.cat.categories`` is inplace!) +Rename the categories to more meaningful names (assigning to +``Series.cat.categories`` is inplace!). .. ipython:: python @@ -679,7 +682,7 @@ Sorting is per order in the categories, not lexical order. df.sort_values(by="grade") -Grouping by a categorical column shows also empty categories. +Grouping by a categorical column also shows empty categories. .. ipython:: python @@ -689,7 +692,7 @@ Grouping by a categorical column shows also empty categories. Plotting -------- -:ref:`Plotting ` docs. +See the :ref:`Plotting ` docs. .. ipython:: python :suppress: @@ -705,8 +708,8 @@ Plotting @savefig series_plot_basic.png ts.plot() -On DataFrame, :meth:`~DataFrame.plot` is a convenience to plot all of the -columns with labels: +On a DataFrame, the :meth:`~DataFrame.plot` method is a convenience to plot all +of the columns with labels: .. ipython:: python @@ -723,13 +726,13 @@ Getting Data In/Out CSV ~~~ -:ref:`Writing to a csv file ` +:ref:`Writing to a csv file. ` .. ipython:: python df.to_csv('foo.csv') -:ref:`Reading from a csv file ` +:ref:`Reading from a csv file. ` .. ipython:: python @@ -743,15 +746,15 @@ CSV HDF5 ~~~~ -Reading and writing to :ref:`HDFStores ` +Reading and writing to :ref:`HDFStores `. -Writing to a HDF5 Store +Writing to a HDF5 Store. .. ipython:: python df.to_hdf('foo.h5','df') -Reading from a HDF5 Store +Reading from a HDF5 Store. .. ipython:: python @@ -765,15 +768,15 @@ Reading from a HDF5 Store Excel ~~~~~ -Reading and writing to :ref:`MS Excel ` +Reading and writing to :ref:`MS Excel `. -Writing to an excel file +Writing to an excel file. .. ipython:: python df.to_excel('foo.xlsx', sheet_name='Sheet1') -Reading from an excel file +Reading from an excel file. .. ipython:: python @@ -787,7 +790,7 @@ Reading from an excel file Gotchas ------- -If you are trying an operation and you see an exception like: +If you are attempting to perform an operation you might see an exception like: .. code-block:: python diff --git a/doc/source/basics.rst b/doc/source/basics.rst index 9318df2b76564..e08f115a962c4 100644 --- a/doc/source/basics.rst +++ b/doc/source/basics.rst @@ -133,7 +133,7 @@ of interest: * Broadcasting behavior between higher- (e.g. DataFrame) and lower-dimensional (e.g. Series) objects. - * Missing data in computations + * Missing data in computations. We will demonstrate how to manage these issues independently, though they can be handled simultaneously. @@ -226,12 +226,12 @@ We can also do elementwise :func:`divmod`: Missing data / operations with fill values ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In Series and DataFrame (though not yet in Panel), the arithmetic functions -have the option of inputting a *fill_value*, namely a value to substitute when -at most one of the values at a location are missing. For example, when adding -two DataFrame objects, you may wish to treat NaN as 0 unless both DataFrames -are missing that value, in which case the result will be NaN (you can later -replace NaN with some other value using ``fillna`` if you wish). +In Series and DataFrame, the arithmetic functions have the option of inputting +a *fill_value*, namely a value to substitute when at most one of the values at +a location are missing. For example, when adding two DataFrame objects, you may +wish to treat NaN as 0 unless both DataFrames are missing that value, in which +case the result will be NaN (you can later replace NaN with some other value +using ``fillna`` if you wish). .. ipython:: python :suppress: @@ -260,9 +260,9 @@ arithmetic operations described above: df.gt(df2) df2.ne(df) -These operations produce a pandas object the same type as the left-hand-side input -that if of dtype ``bool``. These ``boolean`` objects can be used in indexing operations, -see :ref:`here` +These operations produce a pandas object of the same type as the left-hand-side +input that is of dtype ``bool``. These ``boolean`` objects can be used in +indexing operations, see the section on :ref:`Boolean indexing`. .. _basics.reductions: @@ -316,7 +316,7 @@ To evaluate single-element pandas objects in a boolean context, use the method >>> df and df2 - These both will raise as you are trying to compare multiple values. + These will both raise errors, as you are trying to compare multiple values. .. code-block:: python @@ -329,7 +329,7 @@ See :ref:`gotchas` for a more detailed discussion. Comparing if objects are equivalent ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Often you may find there is more than one way to compute the same +Often you may find that there is more than one way to compute the same result. As a simple example, consider ``df+df`` and ``df*2``. To test that these two computations produce the same result, given the tools shown above, you might imagine using ``(df+df == df*2).all()``. But in @@ -341,7 +341,7 @@ fact, this expression is False: (df+df == df*2).all() Notice that the boolean DataFrame ``df+df == df*2`` contains some False values! -That is because NaNs do not compare as equals: +This is because NaNs do not compare as equals: .. ipython:: python @@ -368,7 +368,7 @@ equality to be True: Comparing array-like objects ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -You can conveniently do element-wise comparisons when comparing a pandas +You can conveniently perform element-wise comparisons when comparing a pandas data structure with a scalar value: .. ipython:: python @@ -452,8 +452,8 @@ So, for instance, to reproduce :meth:`~DataFrame.combine_first` as above: Descriptive statistics ---------------------- -A large number of methods for computing descriptive statistics and other related -operations on :ref:`Series `, :ref:`DataFrame +There exists a large number of methods for computing descriptive statistics and +other related operations on :ref:`Series `, :ref:`DataFrame `, and :ref:`Panel `. Most of these are aggregations (hence producing a lower-dimensional result) like :meth:`~DataFrame.sum`, :meth:`~DataFrame.mean`, and :meth:`~DataFrame.quantile`, @@ -764,7 +764,7 @@ For example, we can fit a regression using statsmodels. Their API expects a form The pipe method is inspired by unix pipes and more recently dplyr_ and magrittr_, which have introduced the popular ``(%>%)`` (read pipe) operator for R_. The implementation of ``pipe`` here is quite clean and feels right at home in python. -We encourage you to view the source code (``pd.DataFrame.pipe??`` in IPython). +We encourage you to view the source code of ``pd.DataFrame.pipe``. .. _dplyr: https://github.com/hadley/dplyr .. _magrittr: https://github.com/smbache/magrittr @@ -786,7 +786,7 @@ statistics methods, take an optional ``axis`` argument: df.apply(np.cumsum) df.apply(np.exp) -``.apply()`` will also dispatch on a string method name. +The ``.apply()`` method will also dispatch on a string method name. .. ipython:: python @@ -863,8 +863,9 @@ We will use a similar starting frame from above: tsdf.iloc[3:7] = np.nan tsdf -Using a single function is equivalent to :meth:`~DataFrame.apply`; You can also pass named methods as strings. -These will return a ``Series`` of the aggregated output: +Using a single function is equivalent to :meth:`~DataFrame.apply`. You can also +pass named methods as strings. These will return a ``Series`` of the aggregated +output: .. ipython:: python @@ -875,7 +876,7 @@ These will return a ``Series`` of the aggregated output: # these are equivalent to a ``.sum()`` because we are aggregating on a single function tsdf.sum() -Single aggregations on a ``Series`` this will result in a scalar value: +Single aggregations on a ``Series`` this will return a scalar value: .. ipython:: python @@ -885,8 +886,8 @@ Single aggregations on a ``Series`` this will result in a scalar value: Aggregating with multiple functions +++++++++++++++++++++++++++++++++++ -You can pass multiple aggregation arguments as a list. -The results of each of the passed functions will be a row in the resultant ``DataFrame``. +You can pass multiple aggregation arguments as a list. +The results of each of the passed functions will be a row in the resulting ``DataFrame``. These are naturally named from the aggregation function. .. ipython:: python @@ -989,7 +990,7 @@ The :meth:`~DataFrame.transform` method returns an object that is indexed the sa as the original. This API allows you to provide *multiple* operations at the same time rather than one-by-one. Its API is quite similar to the ``.agg`` API. -Use a similar frame to the above sections. +We create a frame similar to the one used in the above sections. .. ipython:: python @@ -1008,7 +1009,7 @@ function name or a user defined function. tsdf.transform('abs') tsdf.transform(lambda x: x.abs()) -Here ``.transform()`` received a single function; this is equivalent to a ufunc application +Here ``.transform()`` received a single function; this is equivalent to a ufunc application. .. ipython:: python @@ -1044,7 +1045,7 @@ Transforming with a dict ++++++++++++++++++++++++ -Passing a dict of functions will will allow selective transforming per column. +Passing a dict of functions will allow selective transforming per column. .. ipython:: python @@ -1080,7 +1081,7 @@ a single value and returning a single value. For example: df4['one'].map(f) df4.applymap(f) -:meth:`Series.map` has an additional feature which is that it can be used to easily +:meth:`Series.map` has an additional feature; it can be used to easily "link" or "map" values defined by a secondary series. This is closely related to :ref:`merging/joining functionality `: @@ -1123,13 +1124,13 @@ A reduction operation. panel.apply(lambda x: x.dtype, axis='items') -A similar reduction type operation +A similar reduction type operation. .. ipython:: python panel.apply(lambda x: x.sum(), axis='major_axis') -This last reduction is equivalent to +This last reduction is equivalent to: .. ipython:: python @@ -1157,7 +1158,7 @@ Apply can also accept multiple axes in the ``axis`` argument. This will pass a result result.loc[:,:,'ItemA'] -This is equivalent to the following +This is equivalent to the following: .. ipython:: python @@ -1358,9 +1359,9 @@ Note that the same result could have been achieved using ts2.reindex(ts.index).fillna(method='ffill') -:meth:`~Series.reindex` will raise a ValueError if the index is not monotonic +:meth:`~Series.reindex` will raise a ValueError if the index is not monotonically increasing or decreasing. :meth:`~Series.fillna` and :meth:`~Series.interpolate` -will not make any checks on the order of the index. +will not perform any checks on the order of the index. .. _basics.limits_on_reindex_fill: @@ -1428,7 +1429,7 @@ Series can also be used: df.rename(columns={'one': 'foo', 'two': 'bar'}, index={'a': 'apple', 'b': 'banana', 'd': 'durian'}) -If the mapping doesn't include a column/index label, it isn't renamed. Also +If the mapping doesn't include a column/index label, it isn't renamed. Note that extra labels in the mapping don't throw an error. .. versionadded:: 0.21.0 @@ -1438,8 +1439,8 @@ you specify a single ``mapper`` and the ``axis`` to apply that mapping to. .. ipython:: python - df.rename({'one': 'foo', 'two': 'bar'}, axis='columns'}) - df.rename({'a': 'apple', 'b': 'banana', 'd': 'durian'}, axis='columns'}) + df.rename({'one': 'foo', 'two': 'bar'}, axis='columns') + df.rename({'a': 'apple', 'b': 'banana', 'd': 'durian'}, axis='index') The :meth:`~DataFrame.rename` method also provides an ``inplace`` named @@ -1515,7 +1516,7 @@ To iterate over the rows of a DataFrame, you can use the following methods: over the values. See the docs on :ref:`function application `. * If you need to do iterative manipulations on the values but performance is - important, consider writing the inner loop using e.g. cython or numba. + important, consider writing the inner loop using for instance cython or numba. See the :ref:`enhancing performance ` section for some examples of this approach. @@ -1594,7 +1595,7 @@ index value along with a Series containing the data in each row: To preserve dtypes while iterating over the rows, it is better to use :meth:`~DataFrame.itertuples` which returns namedtuples of the values - and which is generally much faster as ``iterrows``. + and which is generally much faster than ``iterrows``. For instance, a contrived way to transpose the DataFrame would be: @@ -1615,14 +1616,14 @@ yielding a namedtuple for each row in the DataFrame. The first element of the tuple will be the row's corresponding index value, while the remaining values are the row values. -For instance, +For instance: .. ipython:: python for row in df.itertuples(): print(row) -This method does not convert the row to a Series object but just +This method does not convert the row to a Series object; it merely returns the values inside a namedtuple. Therefore, :meth:`~DataFrame.itertuples` preserves the data type of the values and is generally faster as :meth:`~DataFrame.iterrows`. @@ -1709,7 +1710,7 @@ The ``.dt`` accessor works for period and timedelta dtypes. .. note:: - ``Series.dt`` will raise a ``TypeError`` if you access with a non-datetimelike values + ``Series.dt`` will raise a ``TypeError`` if you access with a non-datetime-like values. Vectorized string methods ------------------------- @@ -1763,7 +1764,7 @@ labels (indexes) are the ``Series.sort_index()`` and the ``DataFrame.sort_index( By Values ~~~~~~~~~ -The :meth:`Series.sort_values` and :meth:`DataFrame.sort_values` are the entry points for **value** sorting (that is the values in a column or row). +The :meth:`Series.sort_values` and :meth:`DataFrame.sort_values` are the entry points for **value** sorting (i.e. the values in a column or row). :meth:`DataFrame.sort_values` can accept an optional ``by`` argument for ``axis=0`` which will use an arbitrary vector or a column name of the DataFrame to determine the sort order: @@ -1794,7 +1795,7 @@ argument: searchsorted ~~~~~~~~~~~~ -Series has the :meth:`~Series.searchsorted` method, which works similar to +Series has the :meth:`~Series.searchsorted` method, which works similarly to :meth:`numpy.ndarray.searchsorted`. .. ipython:: python @@ -1859,14 +1860,14 @@ the axis indexes, since they are immutable) and returns a new object. Note that **it is seldom necessary to copy objects**. For example, there are only a handful of ways to alter a DataFrame *in-place*: - * Inserting, deleting, or modifying a column - * Assigning to the ``index`` or ``columns`` attributes + * Inserting, deleting, or modifying a column. + * Assigning to the ``index`` or ``columns`` attributes. * For homogeneous data, directly modifying the values via the ``values`` - attribute or advanced indexing + attribute or advanced indexing. -To be clear, no pandas methods have the side effect of modifying your data; -almost all methods return new objects, leaving the original object -untouched. If data is modified, it is because you did so explicitly. +To be clear, no pandas method has the side effect of modifying your data; +almost every method returns a new object, leaving the original object +untouched. If the data is modified, it is because you did so explicitly. .. _basics.dtypes: @@ -1879,7 +1880,8 @@ The main types stored in pandas objects are ``float``, ``int``, ``bool``, ``int64`` and ``int32``. See :ref:`Series with TZ ` for more detail on ``datetime64[ns, tz]`` dtypes. -A convenient :attr:`~DataFrame.dtypes` attribute for DataFrames returns a Series with the data type of each column. +A convenient :attr:`~DataFrame.dtypes` attribute for DataFrame returns a Series +with the data type of each column. .. ipython:: python @@ -1893,15 +1895,15 @@ A convenient :attr:`~DataFrame.dtypes` attribute for DataFrames returns a Series dft dft.dtypes -On a ``Series`` use the :attr:`~Series.dtype` attribute. +On a ``Series`` object, use the :attr:`~Series.dtype` attribute. .. ipython:: python dft['A'].dtype -If a pandas object contains data multiple dtypes *IN A SINGLE COLUMN*, the dtype of the -column will be chosen to accommodate all of the data types (``object`` is the most -general). +If a pandas object contains data with multiple dtypes *in a single column*, the +dtype of the column will be chosen to accommodate all of the data types +(``object`` is the most general). .. ipython:: python @@ -1938,7 +1940,8 @@ defaults ~~~~~~~~ By default integer types are ``int64`` and float types are ``float64``, -*REGARDLESS* of platform (32-bit or 64-bit). The following will all result in ``int64`` dtypes. +*regardless* of platform (32-bit or 64-bit). +The following will all result in ``int64`` dtypes. .. ipython:: python @@ -1946,7 +1949,7 @@ By default integer types are ``int64`` and float types are ``float64``, pd.DataFrame({'a': [1, 2]}).dtypes pd.DataFrame({'a': 1 }, index=list(range(2))).dtypes -Numpy, however will choose *platform-dependent* types when creating arrays. +Note that Numpy will choose *platform-dependent* types when creating arrays. The following **WILL** result in ``int32`` on 32-bit platform. .. ipython:: python @@ -1958,7 +1961,7 @@ upcasting ~~~~~~~~~ Types can potentially be *upcasted* when combined with other types, meaning they are promoted -from the current type (say ``int`` to ``float``) +from the current type (e.g. ``int`` to ``float``). .. ipython:: python @@ -1995,7 +1998,7 @@ then the more *general* one will be used as the result of the operation. df3.astype('float32').dtypes -Convert a subset of columns to a specified type using :meth:`~DataFrame.astype` +Convert a subset of columns to a specified type using :meth:`~DataFrame.astype`. .. ipython:: python @@ -2006,7 +2009,7 @@ Convert a subset of columns to a specified type using :meth:`~DataFrame.astype` .. versionadded:: 0.19.0 -Convert certain columns to a specific dtype by passing a dict to :meth:`~DataFrame.astype` +Convert certain columns to a specific dtype by passing a dict to :meth:`~DataFrame.astype`. .. ipython:: python @@ -2148,7 +2151,7 @@ gotchas Performing selection operations on ``integer`` type data can easily upcast the data to ``floating``. The dtype of the input data will be preserved in cases where ``nans`` are not introduced. -See also :ref:`Support for integer NA ` +See also :ref:`Support for integer NA `. .. ipython:: python @@ -2200,17 +2203,17 @@ dtypes: df['tz_aware_dates'] = pd.date_range('20130101', periods=3, tz='US/Eastern') df -And the dtypes +And the dtypes: .. ipython:: python df.dtypes :meth:`~DataFrame.select_dtypes` has two parameters ``include`` and ``exclude`` that allow you to -say "give me the columns WITH these dtypes" (``include``) and/or "give the -columns WITHOUT these dtypes" (``exclude``). +say "give me the columns *with* these dtypes" (``include``) and/or "give the +columns *without* these dtypes" (``exclude``). -For example, to select ``bool`` columns +For example, to select ``bool`` columns: .. ipython:: python @@ -2226,7 +2229,7 @@ You can also pass the name of a dtype in the `numpy dtype hierarchy :meth:`~pandas.DataFrame.select_dtypes` also works with generic dtypes as well. For example, to select all numeric and boolean columns while excluding unsigned -integers +integers: .. ipython:: python diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst index e5c7637ddb499..c8018c8e66f72 100644 --- a/doc/source/dsintro.rst +++ b/doc/source/dsintro.rst @@ -93,10 +93,12 @@ constructed from the sorted keys of the dict, if possible. .. note:: - NaN (not a number) is the standard missing data marker used in pandas + NaN (not a number) is the standard missing data marker used in pandas. -**From scalar value** If ``data`` is a scalar value, an index must be -provided. The value will be repeated to match the length of **index** +**From scalar value** + +If ``data`` is a scalar value, an index must be +provided. The value will be repeated to match the length of **index**. .. ipython:: python @@ -106,7 +108,7 @@ Series is ndarray-like ~~~~~~~~~~~~~~~~~~~~~~ ``Series`` acts very similarly to a ``ndarray``, and is a valid argument to most NumPy functions. -However, things like slicing also slice the index. +However, operations such as slicing will also slice the index. .. ipython :: python @@ -152,10 +154,9 @@ See also the :ref:`section on attribute access`. Vectorized operations and label alignment with Series ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -When doing data analysis, as with raw NumPy arrays looping through Series -value-by-value is usually not necessary. Series can also be passed into most -NumPy methods expecting an ndarray. - +When working with raw NumPy arrays, looping through value-by-value is usually +not necessary. The same is true when working with Series in pandas. +Series can also be passed into most NumPy methods expecting an ndarray. .. ipython:: python @@ -245,8 +246,8 @@ based on common sense rules. From dict of Series or dicts ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The result **index** will be the **union** of the indexes of the various -Series. If there are any nested dicts, these will be first converted to +The resulting **index** will be the **union** of the indexes of the various +Series. If there are any nested dicts, these will first be converted to Series. If no columns are passed, the columns will be the sorted list of dict keys. @@ -323,7 +324,8 @@ From a list of dicts From a dict of tuples ~~~~~~~~~~~~~~~~~~~~~ -You can automatically create a multi-indexed frame by passing a tuples dictionary +You can automatically create a multi-indexed frame by passing a tuples +dictionary. .. ipython:: python @@ -345,8 +347,8 @@ column name provided). **Missing Data** Much more will be said on this topic in the :ref:`Missing data ` -section. To construct a DataFrame with missing data, use ``np.nan`` for those -values which are missing. Alternatively, you may pass a ``numpy.MaskedArray`` +section. To construct a DataFrame with missing data, we use ``np.nan`` to +represent missing values. Alternatively, you may pass a ``numpy.MaskedArray`` as the data argument to the DataFrame constructor, and its masked entries will be considered missing. @@ -367,9 +369,9 @@ set to ``'index'`` in order to use the dict keys as row labels. **DataFrame.from_records** ``DataFrame.from_records`` takes a list of tuples or an ndarray with structured -dtype. Works analogously to the normal ``DataFrame`` constructor, except that -index maybe be a specific field of the structured dtype to use as the index. -For example: +dtype. It works analogously to the normal ``DataFrame`` constructor, except that +the resulting DataFrame index may be a specific field of the structured +dtype. For example: .. ipython:: python @@ -467,7 +469,7 @@ derived from existing columns. (iris.assign(sepal_ratio = iris['SepalWidth'] / iris['SepalLength']) .head()) -Above was an example of inserting a precomputed value. We can also pass in +In the example above, we inserted a precomputed value. We can also pass in a function of one argument to be evalutated on the DataFrame being assigned to. .. ipython:: python @@ -480,7 +482,7 @@ DataFrame untouched. Passing a callable, as opposed to an actual value to be inserted, is useful when you don't have a reference to the DataFrame at hand. This is -common when using ``assign`` in chains of operations. For example, +common when using ``assign`` in a chain of operations. For example, we can limit the DataFrame to just those observations with a Sepal Length greater than 5, calculate the ratio, and plot: @@ -546,7 +548,7 @@ DataFrame: df.loc['b'] df.iloc[2] -For a more exhaustive treatment of more sophisticated label-based indexing and +For a more exhaustive treatment of sophisticated label-based indexing and slicing, see the :ref:`section on indexing `. We will address the fundamentals of reindexing / conforming to new sets of labels in the :ref:`section on reindexing `. @@ -739,7 +741,7 @@ DataFrame column attribute access and IPython completion ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If a DataFrame column label is a valid Python variable name, the column can be -accessed like attributes: +accessed like an attribute: .. ipython:: python @@ -912,7 +914,8 @@ For example, using the earlier example data, we could do: Squeezing ~~~~~~~~~ -Another way to change the dimensionality of an object is to ``squeeze`` a 1-len object, similar to ``wp['Item1']`` +Another way to change the dimensionality of an object is to ``squeeze`` a 1-len +object, similar to ``wp['Item1']``. .. ipython:: python :okwarning: @@ -964,7 +967,7 @@ support the multi-dimensional analysis that is one of ``Panel`` s main usecases. p = tm.makePanel() p -Convert to a MultiIndex DataFrame +Convert to a MultiIndex DataFrame. .. ipython:: python :okwarning: diff --git a/doc/source/overview.rst b/doc/source/overview.rst index 0354f6e7f06f7..73e7704b43be6 100644 --- a/doc/source/overview.rst +++ b/doc/source/overview.rst @@ -10,21 +10,21 @@ Package overview easy-to-use data structures and data analysis tools for the `Python `__ programming language. -:mod:`pandas` consists of the following elements +:mod:`pandas` consists of the following elements: * A set of labeled array data structures, the primary of which are - Series and DataFrame + Series and DataFrame. * Index objects enabling both simple axis indexing and multi-level / - hierarchical axis indexing - * An integrated group by engine for aggregating and transforming data sets + hierarchical axis indexing. + * An integrated group by engine for aggregating and transforming data sets. * Date range generation (date_range) and custom date offsets enabling the - implementation of customized frequencies + implementation of customized frequencies. * Input/Output tools: loading tabular data from flat files (CSV, delimited, Excel 2003), and saving and loading pandas objects from the fast and efficient PyTables/HDF5 format. * Memory-efficient "sparse" versions of the standard data structures for storing - data that is mostly missing or mostly constant (some fixed value) - * Moving window statistics (rolling mean, rolling standard deviation, etc.) + data that is mostly missing or mostly constant (some fixed value). + * Moving window statistics (rolling mean, rolling standard deviation, etc.). Data Structures --------------- @@ -58,7 +58,7 @@ transformations in downstream functions. For example, with tabular data (DataFrame) it is more semantically helpful to think of the **index** (the rows) and the **columns** rather than axis 0 and -axis 1. And iterating through the columns of the DataFrame thus results in more +axis 1. Iterating through the columns of the DataFrame thus results in more readable code: :: @@ -74,8 +74,7 @@ All pandas data structures are value-mutable (the values they contain can be altered) but not always size-mutable. The length of a Series cannot be changed, but, for example, columns can be inserted into a DataFrame. However, the vast majority of methods produce new objects and leave the input data -untouched. In general, though, we like to **favor immutability** where -sensible. +untouched. In general we like to **favor immutability** where sensible. Getting Support --------------- From e8cb0f366134d1474be11e68768f0b3a127b037b Mon Sep 17 00:00:00 2001 From: tommyod Date: Wed, 27 Dec 2017 15:05:38 +0100 Subject: [PATCH 2/2] Func references, link to python 3, spelling --- doc/source/10min.rst | 9 +++++---- doc/source/basics.rst | 10 +++++----- pandas/__init__.py | 20 ++++++++++---------- 3 files changed, 20 insertions(+), 19 deletions(-) diff --git a/doc/source/10min.rst b/doc/source/10min.rst index 0507cd0d2a97a..46c3ffef58228 100644 --- a/doc/source/10min.rst +++ b/doc/source/10min.rst @@ -210,7 +210,7 @@ For getting a scalar value: df.loc[dates[0],'A'] -For getting fast access to a scalar (equiv to the prior method): +For getting fast access to a scalar (equivalent to the prior method): .. ipython:: python @@ -426,7 +426,7 @@ String Methods Series is equipped with a set of string processing methods in the `str` attribute that make it easy to operate on each element of the array, as in the code snippet below. Note that pattern-matching in `str` generally uses `regular -expressions `__ by default (and in +expressions `__ by default (and in some cases always uses them). See more at :ref:`Vectorized String Methods `. @@ -520,7 +520,8 @@ See the :ref:`Grouping section `. 'D' : np.random.randn(8)}) df -Grouping and then applying the ``sum`` function to the resulting groups. +Grouping and then applying the :meth:`~DataFrame.sum` function to the resulting +groups. .. ipython:: python @@ -669,7 +670,7 @@ Rename the categories to more meaningful names (assigning to df["grade"].cat.categories = ["very good", "good", "very bad"] Reorder the categories and simultaneously add the missing categories (methods under ``Series -.cat`` return a new ``Series`` per default). +.cat`` return a new ``Series`` by default). .. ipython:: python diff --git a/doc/source/basics.rst b/doc/source/basics.rst index e08f115a962c4..ecb9a8f2d79db 100644 --- a/doc/source/basics.rst +++ b/doc/source/basics.rst @@ -764,7 +764,7 @@ For example, we can fit a regression using statsmodels. Their API expects a form The pipe method is inspired by unix pipes and more recently dplyr_ and magrittr_, which have introduced the popular ``(%>%)`` (read pipe) operator for R_. The implementation of ``pipe`` here is quite clean and feels right at home in python. -We encourage you to view the source code of ``pd.DataFrame.pipe``. +We encourage you to view the source code of :meth:`~DataFrame.pipe`. .. _dplyr: https://github.com/hadley/dplyr .. _magrittr: https://github.com/smbache/magrittr @@ -786,7 +786,7 @@ statistics methods, take an optional ``axis`` argument: df.apply(np.cumsum) df.apply(np.exp) -The ``.apply()`` method will also dispatch on a string method name. +The :meth:`~DataFrame.apply` method will also dispatch on a string method name. .. ipython:: python @@ -1009,7 +1009,7 @@ function name or a user defined function. tsdf.transform('abs') tsdf.transform(lambda x: x.abs()) -Here ``.transform()`` received a single function; this is equivalent to a ufunc application. +Here :meth:`~DataFrame.transform` received a single function; this is equivalent to a ufunc application. .. ipython:: python @@ -1516,7 +1516,7 @@ To iterate over the rows of a DataFrame, you can use the following methods: over the values. See the docs on :ref:`function application `. * If you need to do iterative manipulations on the values but performance is - important, consider writing the inner loop using for instance cython or numba. + important, consider writing the inner loop with cython or numba. See the :ref:`enhancing performance ` section for some examples of this approach. @@ -1595,7 +1595,7 @@ index value along with a Series containing the data in each row: To preserve dtypes while iterating over the rows, it is better to use :meth:`~DataFrame.itertuples` which returns namedtuples of the values - and which is generally much faster than ``iterrows``. + and which is generally much faster than :meth:`~DataFrame.iterrows`. For instance, a contrived way to transpose the DataFrame would be: diff --git a/pandas/__init__.py b/pandas/__init__.py index 8d9b75ccd6c2c..861c8e7d622fc 100644 --- a/pandas/__init__.py +++ b/pandas/__init__.py @@ -104,25 +104,25 @@ Here are just a few of the things that pandas does well: - Easy handling of missing data in floating point as well as non-floating - point data + point data. - Size mutability: columns can be inserted and deleted from DataFrame and - higher dimensional objects + higher dimensional objects. - Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let `Series`, `DataFrame`, etc. automatically align the data for you in - computations + computations. - Powerful, flexible group by functionality to perform split-apply-combine - operations on data sets, for both aggregating and transforming data + operations on data sets, for both aggregating and transforming data. - Make it easy to convert ragged, differently-indexed data in other Python - and NumPy data structures into DataFrame objects + and NumPy data structures into DataFrame objects. - Intelligent label-based slicing, fancy indexing, and subsetting of large - data sets - - Intuitive merging and joining data sets - - Flexible reshaping and pivoting of data sets - - Hierarchical labeling of axes (possible to have multiple labels per tick) + data sets. + - Intuitive merging and joining data sets. + - Flexible reshaping and pivoting of data sets. + - Hierarchical labeling of axes (possible to have multiple labels per tick). - Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving/loading data from the ultrafast HDF5 - format + format. - Time series-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc.