@@ -226,11 +226,11 @@ We can also do elementwise :func:`divmod`:
226
226
Missing data / operations with fill values
227
227
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
228
228
229
- In Series and DataFrame, the arithmetic functions have the option of inputting
230
- a *fill_value *, namely a value to substitute when at most one of the values at
231
- a location are missing. For example, when adding two DataFrame objects, you may
232
- wish to treat NaN as 0 unless both DataFrames are missing that value, in which
233
- case the result will be NaN (you can later replace NaN with some other value
229
+ In Series and DataFrame, the arithmetic functions have the option of inputting
230
+ a *fill_value *, namely a value to substitute when at most one of the values at
231
+ a location are missing. For example, when adding two DataFrame objects, you may
232
+ wish to treat NaN as 0 unless both DataFrames are missing that value, in which
233
+ case the result will be NaN (you can later replace NaN with some other value
234
234
using ``fillna `` if you wish).
235
235
236
236
.. ipython :: python
@@ -260,8 +260,8 @@ arithmetic operations described above:
260
260
df.gt(df2)
261
261
df2.ne(df)
262
262
263
- These operations produce a pandas object of the same type as the left-hand-side
264
- input that is of dtype ``bool ``. These ``boolean `` objects can be used in
263
+ These operations produce a pandas object of the same type as the left-hand-side
264
+ input that is of dtype ``bool ``. These ``boolean `` objects can be used in
265
265
indexing operations, see the section on :ref: `Boolean indexing<indexing.boolean> `.
266
266
267
267
.. _basics.reductions :
@@ -452,7 +452,7 @@ So, for instance, to reproduce :meth:`~DataFrame.combine_first` as above:
452
452
Descriptive statistics
453
453
----------------------
454
454
455
- There exists a large number of methods for computing descriptive statistics and
455
+ There exists a large number of methods for computing descriptive statistics and
456
456
other related operations on :ref: `Series <api.series.stats >`, :ref: `DataFrame
457
457
<api.dataframe.stats>`, and :ref: `Panel <api.panel.stats >`. Most of these
458
458
are aggregations (hence producing a lower-dimensional result) like
@@ -540,7 +540,7 @@ will exclude NAs on Series input by default:
540
540
np.mean(df[' one' ])
541
541
np.mean(df[' one' ].values)
542
542
543
- :meth: `Series.nunique ` will return the number of unique non-NA values in a
543
+ :meth: `Series.nunique ` will return the number of unique non-NA values in a
544
544
Series:
545
545
546
546
.. ipython :: python
@@ -852,7 +852,7 @@ Aggregation API
852
852
The aggregation API allows one to express possibly multiple aggregation operations in a single concise way.
853
853
This API is similar across pandas objects, see :ref: `groupby API <groupby.aggregate >`, the
854
854
:ref: `window functions API <stats.aggregate >`, and the :ref: `resample API <timeseries.aggregate >`.
855
- The entry point for aggregation is :meth: `DataFrame.aggregate `, or the alias
855
+ The entry point for aggregation is :meth: `DataFrame.aggregate `, or the alias
856
856
:meth: `DataFrame.agg `.
857
857
858
858
We will use a similar starting frame from above:
@@ -864,8 +864,8 @@ We will use a similar starting frame from above:
864
864
tsdf.iloc[3 :7 ] = np.nan
865
865
tsdf
866
866
867
- Using a single function is equivalent to :meth: `~DataFrame.apply `. You can also
868
- pass named methods as strings. These will return a ``Series `` of the aggregated
867
+ Using a single function is equivalent to :meth: `~DataFrame.apply `. You can also
868
+ pass named methods as strings. These will return a ``Series `` of the aggregated
869
869
output:
870
870
871
871
.. ipython :: python
@@ -887,7 +887,7 @@ Single aggregations on a ``Series`` this will return a scalar value:
887
887
Aggregating with multiple functions
888
888
+++++++++++++++++++++++++++++++++++
889
889
890
- You can pass multiple aggregation arguments as a list.
890
+ You can pass multiple aggregation arguments as a list.
891
891
The results of each of the passed functions will be a row in the resulting ``DataFrame ``.
892
892
These are naturally named from the aggregation function.
893
893
@@ -1430,7 +1430,7 @@ Series can also be used:
1430
1430
df.rename(columns = {' one' : ' foo' , ' two' : ' bar' },
1431
1431
index = {' a' : ' apple' , ' b' : ' banana' , ' d' : ' durian' })
1432
1432
1433
- If the mapping doesn't include a column/index label, it isn't renamed. Note that
1433
+ If the mapping doesn't include a column/index label, it isn't renamed. Note that
1434
1434
extra labels in the mapping don't throw an error.
1435
1435
1436
1436
.. versionadded :: 0.21.0
@@ -1740,19 +1740,26 @@ description.
1740
1740
Sorting
1741
1741
-------
1742
1742
1743
- There are two obvious kinds of sorting that you may be interested in: sorting
1744
- by label and sorting by actual values.
1743
+ Pandas supports three kinds of sorting: sorting by index labels,
1744
+ sorting by column values, and sorting by a combination of both.
1745
+
1746
+ .. _basics.sort_index :
1745
1747
1746
1748
By Index
1747
1749
~~~~~~~~
1748
1750
1749
- The primary method for sorting axis
1750
- labels (indexes) are the `` Series.sort_index() `` and the `` DataFrame.sort_index() `` methods .
1751
+ The :meth: ` Series.sort_index ` and :meth: ` DataFrame.sort_index ` methods are
1752
+ used to sort a pandas object by its index levels .
1751
1753
1752
1754
.. ipython :: python
1753
1755
1756
+ df = pd.DataFrame({' one' : pd.Series(np.random.randn(3 ), index = [' a' , ' b' , ' c' ]),
1757
+ ' two' : pd.Series(np.random.randn(4 ), index = [' a' , ' b' , ' c' , ' d' ]),
1758
+ ' three' : pd.Series(np.random.randn(3 ), index = [' b' , ' c' , ' d' ])})
1759
+
1754
1760
unsorted_df = df.reindex(index = [' a' , ' d' , ' c' , ' b' ],
1755
1761
columns = [' three' , ' two' , ' one' ])
1762
+ unsorted_df
1756
1763
1757
1764
# DataFrame
1758
1765
unsorted_df.sort_index()
@@ -1762,20 +1769,22 @@ labels (indexes) are the ``Series.sort_index()`` and the ``DataFrame.sort_index(
1762
1769
# Series
1763
1770
unsorted_df[' three' ].sort_index()
1764
1771
1772
+ .. _basics.sort_values :
1773
+
1765
1774
By Values
1766
1775
~~~~~~~~~
1767
1776
1768
- The :meth: `Series.sort_values ` and :meth: ` DataFrame.sort_values ` are the entry points for ** value ** sorting (i.e. the values in a column or row).
1769
- :meth: `DataFrame.sort_values ` can accept an optional `` by `` argument for `` axis=0 ``
1770
- which will use an arbitrary vector or a column name of the DataFrame to
1771
- determine the sort order:
1777
+ The :meth: `Series.sort_values ` method is used to sort a ` Series ` by its values. The
1778
+ :meth: `DataFrame.sort_values ` method is used to sort a ` DataFrame ` by its column or row values.
1779
+ The optional `` by `` parameter to :meth: ` DataFrame.sort_values ` may used to specify one or more columns
1780
+ to use to determine the sorted order.
1772
1781
1773
1782
.. ipython :: python
1774
1783
1775
1784
df1 = pd.DataFrame({' one' :[2 ,1 ,1 ,1 ],' two' :[1 ,3 ,2 ,4 ],' three' :[5 ,4 ,3 ,2 ]})
1776
1785
df1.sort_values(by = ' two' )
1777
1786
1778
- The ``by `` argument can take a list of column names, e.g.:
1787
+ The ``by `` parameter can take a list of column names, e.g.:
1779
1788
1780
1789
.. ipython :: python
1781
1790
@@ -1790,6 +1799,39 @@ argument:
1790
1799
s.sort_values()
1791
1800
s.sort_values(na_position = ' first' )
1792
1801
1802
+ .. _basics.sort_indexes_and_values :
1803
+
1804
+ By Indexes and Values
1805
+ ~~~~~~~~~~~~~~~~~~~~~
1806
+
1807
+ .. versionadded :: 0.23.0
1808
+
1809
+ Strings passed as the ``by `` parameter to :meth: `DataFrame.sort_values ` may
1810
+ refer to either columns or index level names.
1811
+
1812
+ .. ipython :: python
1813
+
1814
+ # Build MultiIndex
1815
+ idx = pd.MultiIndex.from_tuples([(' a' , 1 ), (' a' , 2 ), (' a' , 2 ),
1816
+ (' b' , 2 ), (' b' , 1 ), (' b' , 1 )])
1817
+ idx.names = [' first' , ' second' ]
1818
+
1819
+ # Build DataFrame
1820
+ df_multi = pd.DataFrame({' A' : np.arange(6 , 0 , - 1 )},
1821
+ index = idx)
1822
+ df_multi
1823
+
1824
+ Sort by 'second' (index) and 'A' (column)
1825
+
1826
+ .. ipython :: python
1827
+
1828
+ df_multi.sort_values(by = [' second' , ' A' ])
1829
+
1830
+ .. note ::
1831
+
1832
+ If a string matches both a column name and an index level name then a
1833
+ warning is issued and the column takes precedence. This will result in an
1834
+ ambiguity error in a future version.
1793
1835
1794
1836
.. _basics.searchsorted :
1795
1837
@@ -1881,7 +1923,7 @@ The main types stored in pandas objects are ``float``, ``int``, ``bool``,
1881
1923
``int64 `` and ``int32 ``. See :ref: `Series with TZ <timeseries.timezone_series >`
1882
1924
for more detail on ``datetime64[ns, tz] `` dtypes.
1883
1925
1884
- A convenient :attr: `~DataFrame.dtypes ` attribute for DataFrame returns a Series
1926
+ A convenient :attr: `~DataFrame.dtypes ` attribute for DataFrame returns a Series
1885
1927
with the data type of each column.
1886
1928
1887
1929
.. ipython :: python
@@ -1902,8 +1944,8 @@ On a ``Series`` object, use the :attr:`~Series.dtype` attribute.
1902
1944
1903
1945
dft[' A' ].dtype
1904
1946
1905
- If a pandas object contains data with multiple dtypes *in a single column *, the
1906
- dtype of the column will be chosen to accommodate all of the data types
1947
+ If a pandas object contains data with multiple dtypes *in a single column *, the
1948
+ dtype of the column will be chosen to accommodate all of the data types
1907
1949
(``object `` is the most general).
1908
1950
1909
1951
.. ipython :: python
@@ -1941,7 +1983,7 @@ defaults
1941
1983
~~~~~~~~
1942
1984
1943
1985
By default integer types are ``int64 `` and float types are ``float64 ``,
1944
- *regardless * of platform (32-bit or 64-bit).
1986
+ *regardless * of platform (32-bit or 64-bit).
1945
1987
The following will all result in ``int64 `` dtypes.
1946
1988
1947
1989
.. ipython :: python
0 commit comments