diff --git a/doc/source/comparison_with_sql.rst b/doc/source/comparison_with_sql.rst index 3d8b85e9460c4..4d0a2b80c9949 100644 --- a/doc/source/comparison_with_sql.rst +++ b/doc/source/comparison_with_sql.rst @@ -3,11 +3,11 @@ Comparison with SQL ******************** -Since many potential pandas users have some familiarity with -`SQL `_, this page is meant to provide some examples of how +Since many potential pandas users have some familiarity with +`SQL `_, this page is meant to provide some examples of how various SQL operations would be performed using pandas. -If you're new to pandas, you might want to first read through :ref:`10 Minutes to Pandas<10min>` +If you're new to pandas, you might want to first read through :ref:`10 Minutes to Pandas<10min>` to familiarize yourself with the library. As is customary, we import pandas and numpy as follows: @@ -17,8 +17,8 @@ As is customary, we import pandas and numpy as follows: import pandas as pd import numpy as np -Most of the examples will utilize the ``tips`` dataset found within pandas tests. We'll read -the data into a DataFrame called `tips` and assume we have a database table of the same name and +Most of the examples will utilize the ``tips`` dataset found within pandas tests. We'll read +the data into a DataFrame called `tips` and assume we have a database table of the same name and structure. .. ipython:: python @@ -44,7 +44,7 @@ With pandas, column selection is done by passing a list of column names to your tips[['total_bill', 'tip', 'smoker', 'time']].head(5) -Calling the DataFrame without the list of column names would display all columns (akin to SQL's +Calling the DataFrame without the list of column names would display all columns (akin to SQL's ``*``). WHERE @@ -58,14 +58,14 @@ Filtering in SQL is done via a WHERE clause. WHERE time = 'Dinner' LIMIT 5; -DataFrames can be filtered in multiple ways; the most intuitive of which is using +DataFrames can be filtered in multiple ways; the most intuitive of which is using `boolean indexing `_. .. ipython:: python tips[tips['time'] == 'Dinner'].head(5) -The above statement is simply passing a ``Series`` of True/False objects to the DataFrame, +The above statement is simply passing a ``Series`` of True/False objects to the DataFrame, returning all rows with True. .. ipython:: python @@ -74,7 +74,7 @@ returning all rows with True. is_dinner.value_counts() tips[is_dinner].head(5) -Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame using | (OR) and & +Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame using | (OR) and & (AND). .. code-block:: sql @@ -101,16 +101,16 @@ Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame usi # tips by parties of at least 5 diners OR bill total was more than $45 tips[(tips['size'] >= 5) | (tips['total_bill'] > 45)] -NULL checking is done using the :meth:`~pandas.Series.notnull` and :meth:`~pandas.Series.isnull` +NULL checking is done using the :meth:`~pandas.Series.notnull` and :meth:`~pandas.Series.isnull` methods. .. ipython:: python - + frame = pd.DataFrame({'col1': ['A', 'B', np.NaN, 'C', 'D'], 'col2': ['F', np.NaN, 'G', 'H', 'I']}) frame -Assume we have a table of the same structure as our DataFrame above. We can see only the records +Assume we have a table of the same structure as our DataFrame above. We can see only the records where ``col2`` IS NULL with the following query: .. code-block:: sql @@ -138,12 +138,12 @@ Getting items where ``col1`` IS NOT NULL can be done with :meth:`~pandas.Series. GROUP BY -------- -In pandas, SQL's GROUP BY operations performed using the similarly named -:meth:`~pandas.DataFrame.groupby` method. :meth:`~pandas.DataFrame.groupby` typically refers to a +In pandas, SQL's GROUP BY operations performed using the similarly named +:meth:`~pandas.DataFrame.groupby` method. :meth:`~pandas.DataFrame.groupby` typically refers to a process where we'd like to split a dataset into groups, apply some function (typically aggregation) , and then combine the groups together. -A common SQL operation would be getting the count of records in each group throughout a dataset. +A common SQL operation would be getting the count of records in each group throughout a dataset. For instance, a query getting us the number of tips left by sex: .. code-block:: sql @@ -163,23 +163,23 @@ The pandas equivalent would be: tips.groupby('sex').size() -Notice that in the pandas code we used :meth:`~pandas.DataFrameGroupBy.size` and not -:meth:`~pandas.DataFrameGroupBy.count`. This is because :meth:`~pandas.DataFrameGroupBy.count` +Notice that in the pandas code we used :meth:`~pandas.DataFrameGroupBy.size` and not +:meth:`~pandas.DataFrameGroupBy.count`. This is because :meth:`~pandas.DataFrameGroupBy.count` applies the function to each column, returning the number of ``not null`` records within each. .. ipython:: python tips.groupby('sex').count() -Alternatively, we could have applied the :meth:`~pandas.DataFrameGroupBy.count` method to an +Alternatively, we could have applied the :meth:`~pandas.DataFrameGroupBy.count` method to an individual column: .. ipython:: python tips.groupby('sex')['total_bill'].count() -Multiple functions can also be applied at once. For instance, say we'd like to see how tip amount -differs by day of the week - :meth:`~pandas.DataFrameGroupBy.agg` allows you to pass a dictionary +Multiple functions can also be applied at once. For instance, say we'd like to see how tip amount +differs by day of the week - :meth:`~pandas.DataFrameGroupBy.agg` allows you to pass a dictionary to your grouped DataFrame, indicating which functions to apply to specific columns. .. code-block:: sql @@ -198,7 +198,7 @@ to your grouped DataFrame, indicating which functions to apply to specific colum tips.groupby('day').agg({'tip': np.mean, 'day': np.size}) -Grouping by more than one column is done by passing a list of columns to the +Grouping by more than one column is done by passing a list of columns to the :meth:`~pandas.DataFrame.groupby` method. .. code-block:: sql @@ -207,7 +207,7 @@ Grouping by more than one column is done by passing a list of columns to the FROM tip GROUP BY smoker, day; /* - smoker day + smoker day No Fri 4 2.812500 Sat 45 3.102889 Sun 57 3.167895 @@ -226,16 +226,16 @@ Grouping by more than one column is done by passing a list of columns to the JOIN ---- -JOINs can be performed with :meth:`~pandas.DataFrame.join` or :meth:`~pandas.merge`. By default, -:meth:`~pandas.DataFrame.join` will join the DataFrames on their indices. Each method has -parameters allowing you to specify the type of join to perform (LEFT, RIGHT, INNER, FULL) or the +JOINs can be performed with :meth:`~pandas.DataFrame.join` or :meth:`~pandas.merge`. By default, +:meth:`~pandas.DataFrame.join` will join the DataFrames on their indices. Each method has +parameters allowing you to specify the type of join to perform (LEFT, RIGHT, INNER, FULL) or the columns to join on (column names or indices). .. ipython:: python df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': np.random.randn(4)}) - df2 = pd.DataFrame({'key': ['B', 'D', 'D', 'E'], + df2 = pd.DataFrame({'key': ['B', 'D', 'D', 'E'], 'value': np.random.randn(4)}) Assume we have two database tables of the same name and structure as our DataFrames. @@ -256,7 +256,7 @@ INNER JOIN # merge performs an INNER JOIN by default pd.merge(df1, df2, on='key') -:meth:`~pandas.merge` also offers parameters for cases when you'd like to join one DataFrame's +:meth:`~pandas.merge` also offers parameters for cases when you'd like to join one DataFrame's column with another DataFrame's index. .. ipython:: python @@ -296,7 +296,7 @@ RIGHT JOIN FULL JOIN ~~~~~~~~~ -pandas also allows for FULL JOINs, which display both sides of the dataset, whether or not the +pandas also allows for FULL JOINs, which display both sides of the dataset, whether or not the joined columns find a match. As of writing, FULL JOINs are not supported in all RDBMS (MySQL). .. code-block:: sql @@ -364,7 +364,7 @@ SQL's UNION is similar to UNION ALL, however UNION will remove duplicate rows. Los Angeles 5 */ -In pandas, you can use :meth:`~pandas.concat` in conjunction with +In pandas, you can use :meth:`~pandas.concat` in conjunction with :meth:`~pandas.DataFrame.drop_duplicates`. .. ipython:: python @@ -377,4 +377,4 @@ UPDATE DELETE ------- \ No newline at end of file +------ diff --git a/doc/source/computation.rst b/doc/source/computation.rst index 7b064c69c721c..d5dcacf53ec23 100644 --- a/doc/source/computation.rst +++ b/doc/source/computation.rst @@ -244,7 +244,7 @@ accept the following arguments: is min for ``rolling_min``, max for ``rolling_max``, median for ``rolling_median``, and mean for all other rolling functions. See :meth:`DataFrame.resample`'s how argument for more information. - + These functions can be applied to ndarrays or Series objects: .. ipython:: python diff --git a/doc/source/gotchas.rst b/doc/source/gotchas.rst index e76f58f023619..a927bcec683f5 100644 --- a/doc/source/gotchas.rst +++ b/doc/source/gotchas.rst @@ -100,7 +100,7 @@ index, not membership among the values. 2 in s 'b' in s -If this behavior is surprising, keep in mind that using ``in`` on a Python +If this behavior is surprising, keep in mind that using ``in`` on a Python dictionary tests keys, not values, and Series are dict-like. To test for membership in the values, use the method :func:`~pandas.Series.isin`: diff --git a/doc/source/indexing.rst b/doc/source/indexing.rst index 1d25a395f74a9..b90ae05c62895 100644 --- a/doc/source/indexing.rst +++ b/doc/source/indexing.rst @@ -216,9 +216,9 @@ new column. sa dfa.A = list(range(len(dfa.index))) # ok if A already exists dfa - dfa['A'] = list(range(len(dfa.index))) # use this form to create a new column + dfa['A'] = list(range(len(dfa.index))) # use this form to create a new column dfa - + .. warning:: - You can use this access only if the index element is a valid python identifier, e.g. ``s.1`` is not allowed. diff --git a/doc/source/missing_data.rst b/doc/source/missing_data.rst index ac5c8a4463b39..6dac071a5b2bb 100644 --- a/doc/source/missing_data.rst +++ b/doc/source/missing_data.rst @@ -598,7 +598,7 @@ You can also operate on the DataFrame in place .. warning:: - When replacing multiple ``bool`` or ``datetime64`` objects, the first + When replacing multiple ``bool`` or ``datetime64`` objects, the first argument to ``replace`` (``to_replace``) must match the type of the value being replaced type. For example, @@ -669,4 +669,3 @@ However, these can be filled in using **fillna** and it will work fine: reindexed[crit.fillna(False)] reindexed[crit.fillna(True)] - diff --git a/doc/source/overview.rst b/doc/source/overview.rst index 4d891d38f77a1..8e47466385e77 100644 --- a/doc/source/overview.rst +++ b/doc/source/overview.rst @@ -99,7 +99,7 @@ resources for development through the end of 2011, and continues to contribute bug reports today. Since January 2012, `Lambda Foundry `__, has -been providing development resources, as well as commercial support, +been providing development resources, as well as commercial support, training, and consulting for pandas. pandas is only made possible by a group of people around the world like you @@ -114,8 +114,8 @@ collection of developers focused on the improvement of Python's data libraries. The core team that coordinates development can be found on `Github `__. If you're interested in contributing, please visit the `project website `__. - + License ------- -.. literalinclude:: ../../LICENSE \ No newline at end of file +.. literalinclude:: ../../LICENSE diff --git a/doc/source/r_interface.rst b/doc/source/r_interface.rst index 5af5685ed1f56..98fc4edfd5816 100644 --- a/doc/source/r_interface.rst +++ b/doc/source/r_interface.rst @@ -22,9 +22,9 @@ rpy2 / R interface If your computer has R and rpy2 (> 2.2) installed (which will be left to the reader), you will be able to leverage the below functionality. On Windows, doing this is quite an ordeal at the moment, but users on Unix-like systems -should find it quite easy. rpy2 evolves in time, and is currently reaching +should find it quite easy. rpy2 evolves in time, and is currently reaching its release 2.3, while the current interface is -designed for the 2.2.x series. We recommend to use 2.2.x over other series +designed for the 2.2.x series. We recommend to use 2.2.x over other series unless you are prepared to fix parts of the code, yet the rpy2-2.3.0 introduces improvements such as a better R-Python bridge memory management layer so it might be a good idea to bite the bullet and submit patches for diff --git a/doc/source/reshaping.rst b/doc/source/reshaping.rst index 436055ffe37d1..db68c0eb224e2 100644 --- a/doc/source/reshaping.rst +++ b/doc/source/reshaping.rst @@ -266,7 +266,7 @@ It takes a number of arguments - ``values``: a column or a list of columns to aggregate - ``index``: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values. -- ``columns``: a column, Grouper, array which has the same length as data, or list of them. +- ``columns``: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values. - ``aggfunc``: function to use for aggregation, defaulting to ``numpy.mean`` @@ -456,4 +456,3 @@ handling of NaN: pd.factorize(x, sort=True) np.unique(x, return_inverse=True)[::-1] - diff --git a/doc/source/rplot.rst b/doc/source/rplot.rst index 12ade83261fb7..cdecee39d8d1e 100644 --- a/doc/source/rplot.rst +++ b/doc/source/rplot.rst @@ -45,7 +45,7 @@ We import the rplot API: Examples -------- -RPlot is a flexible API for producing Trellis plots. These plots allow you to arrange data in a rectangular grid by values of certain attributes. +RPlot is a flexible API for producing Trellis plots. These plots allow you to arrange data in a rectangular grid by values of certain attributes. .. ipython:: python diff --git a/doc/source/tutorials.rst b/doc/source/tutorials.rst index 65ff95a905c14..dafb9200cab1c 100644 --- a/doc/source/tutorials.rst +++ b/doc/source/tutorials.rst @@ -22,8 +22,8 @@ are examples with real-world data, and all the bugs and weirdness that that entails. Here are links to the v0.1 release. For an up-to-date table of contents, see the `pandas-cookbook GitHub -repository `_. To run the examples in this tutorial, you'll need to -clone the GitHub repository and get IPython Notebook running. +repository `_. To run the examples in this tutorial, you'll need to +clone the GitHub repository and get IPython Notebook running. See `How to use this cookbook `_. - `A quick tour of the IPython Notebook: `_