From ae4ba929341a9bfbfa8be631c55d2c4439590df2 Mon Sep 17 00:00:00 2001 From: Adam Klein Date: Wed, 8 Feb 2012 18:43:06 -0500 Subject: [PATCH 1/2] DOC: added docs for 7 release --- RELEASE.rst | 44 ++++++++++++++++++++++++++++++++-- doc/source/basics.rst | 1 + doc/source/indexing.rst | 19 +++++++++++++++ doc/source/io.rst | 1 + doc/source/reshaping.rst | 33 +++++++++++++++++++++++++ doc/source/whatsnew/v0.7.0.txt | 9 +++++++ 6 files changed, 105 insertions(+), 2 deletions(-) diff --git a/RELEASE.rst b/RELEASE.rst index 22e59ae73222a..58798c9ae4869 100644 --- a/RELEASE.rst +++ b/RELEASE.rst @@ -25,7 +25,7 @@ Where to get it pandas 0.7.0 ============ -**Release date:** NOT YET RELEASED +**Release date:** 2/8/2012 **New features / modules** @@ -91,11 +91,13 @@ pandas 0.7.0 - Add ``sort`` option to ``DataFrame.join`` (GH #731) - Improved handling of NAs (propagation) in binary operations with dtype=object arrays (GH #737) + - Add ``abs`` method to Pandas objects + - Added ``algorithms`` module to start collecting central algos **API Changes** - Label-indexing with integer indexes now raises KeyError if a label is not - found instead of falling back on location-based indexing + found instead of falling back on location-based indexing (GH #700) - Label-based slicing via ``ix`` or ``[]`` on Series will now only work if exact matches for the labels are found or if the index is monotonic (for range selections) @@ -119,6 +121,8 @@ pandas 0.7.0 #395) - The default delimiter for ``read_csv`` is comma rather than letting ``csv.Sniffer`` infer it + - Rename ``col_or_columns`` argument in ``DataFrame.drop_duplicates`` (GH + #734) **Improvements to existing features** @@ -178,6 +182,14 @@ pandas 0.7.0 - Substantially improved performance of groupby on DataFrames with many columns by aggregating blocks of columns all at once (GH #745) - Can pass a file handle or StringIO to Series/DataFrame.to_csv (GH #765) + - Can pass sequence of integers to DataFrame.irow(icol) and Series.iget, (GH + #654) + - Prototypes for some vectorized string functions + - Add float64 hash table to solve the Series.unique problem with NAs (GH #714) + - Memoize objects when reading from file to reduce memory footprint + - Improved handling of NAs in binary ops with object Series (GH #737) + - Added optimized Cython groupby routines for aggregating 2D blocks, added + vbenchmark, (GH #745) **Bug fixes** @@ -263,6 +275,34 @@ pandas 0.7.0 - Fix Index.format bug causing incorrectly string-formatted Series with datetime indexes (# 758) - Fix errors caused by object dtype arrays passed to ols (GH #759) + - Fix error where column names lost when passing list of labels to + DataFrame.__getitem__, (GH #662) + - Fix error whereby top-level week iterator overwrote week instance + - Fix circular reference causing memory leak in sparse array / series / + frame, (GH #663) + - Fix integer-slicing from integers-as-floats (GH #670) + - Fix zero division errors in nanops from object dtype arrays in all NA case + (GH #676) + - Fix csv encoding when using unicode (GH #705, #717, #738) + - Fix assumption that each object contains every unique block type in concat, + (GH #708) + - Fix handling of NAs in Series.unique (GH #714) + - Fix sortedness check of multiindex in to_panel (GH #719, 720) + - Fix that None was not treated as NA in PyObjectHashtable + - Fix hashing dtype because of endianness confusion (GH #747, #748) + - Fix SparseSeries.dropna to return dense Series in case of NA fill value (GH + #730) + - Use map_infer instead of np.vectorize. handle NA sentinels if converter + yields numeric array, (GH #753) + - Fixes and improvements to DataFrame.rank (GH #742) + - Fix catching AttributeError instead of NameError for bottleneck + - Fix to float formatting issues (GH #726, 758) + - Try to cast non-MultiIndex to better dtype when calling reset_index (GH #726 + #440) + - Fix object dtype arrays in OLS class (GH #759) + - Fix #1.QNAN0' float bug on 2.6/win64 + - Allow subclasses of dicts in DataFrame constructor, with tests + - Fix problem whereby set_index destroys column multiindex (GH #764) Thanks ------ diff --git a/doc/source/basics.rst b/doc/source/basics.rst index 658711b9877a6..38b498ef646fe 100644 --- a/doc/source/basics.rst +++ b/doc/source/basics.rst @@ -277,6 +277,7 @@ optional ``level`` parameter which applies only if the object has a ``median``, Arithmetic median of values ``min``, Minimum ``max``, Maximum + ``abs``, Absolute Value ``prod``, Product of values ``std``, Unbiased standard deviation ``var``, Unbiased variance diff --git a/doc/source/indexing.rst b/doc/source/indexing.rst index a23af20e4e80a..7483e6d6d7240 100644 --- a/doc/source/indexing.rst +++ b/doc/source/indexing.rst @@ -140,6 +140,7 @@ label, respectively. panel.major_xs(date) panel.minor_xs('A') + Slicing ranges ~~~~~~~~~~~~~~ @@ -473,6 +474,12 @@ operators: a & b a - b +``isin`` method of Index objects +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +One additional operation is the ``isin`` method that works analogously to the +``Series.isin`` method found :ref:`here `. + .. _indexing.hierarchical: Hierarchical indexing (MultiIndex) @@ -643,6 +650,18 @@ but as you use it you may uncover corner cases or unintuitive behavior. If you do find something like this, do not hesitate to report the issue or ask on the mailing list. +.. _indexing.xs: + +Cross-section with hierarchical index +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``xs`` method of ``DataFrame`` additionally takes a level argument to make +selecting data at a particular level of a MultiIndex easier. + +.. ipython:: python + + df.xs('one', level='second') + .. _indexing.advanced_reindex: Advanced reindexing and alignment with hierarchical index diff --git a/doc/source/io.rst b/doc/source/io.rst index 3bb84dfa36c43..f3bbf73509e3b 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -95,6 +95,7 @@ data into a DataFrame object. They can take a number of arguments: columns, where keys are either integers or column labels - ``encoding``: a string representing the encoding to use if the contents are non-ascii, for python versions prior to 3 + - ``verbose`` : show number of NA values inserted in non-numeric columns .. ipython:: python :suppress: diff --git a/doc/source/reshaping.rst b/doc/source/reshaping.rst index 8c877f05ad77d..4ac2892ded849 100644 --- a/doc/source/reshaping.rst +++ b/doc/source/reshaping.rst @@ -284,6 +284,39 @@ calling ``to_string`` if you wish: Note that ``pivot_table`` is also available as an instance method on DataFrame. +Cross tabulations +~~~~~~~~~~~~~~~~~ + +Use the ``crosstab`` function to compute a cross-tabulation of two (or more) +factors. By default ``crosstab`` computes a frequency table of the factors +unless an array of values and an aggregation function are passed. + +It takes a number of arguments + +- ``rows``: array-like, values to group by in the rows +- ``cols``: array-like, values to group by in the columns +- ``values``: array-like, optional, array of values to aggregate according to + the factors +- ``aggfunc``: function, optional, If no values array is passed, computes a + frequency table +- ``rownames``: sequence, default None, must match number of row arrays passed +- ``colnames``: sequence, default None, if passed, must match number of column + arrays passed +- ``margins``: boolean, default False, Add row/column margins (subtotals) + +Any Series passed will have their name attributes used unless row or column +names for the cross-tabulation are specified + +For example: + +.. ipython:: python + + foo, bar, dull, shiny, one, two = 'foo', 'bar', 'dull', 'shiny', 'one', 'two' + a = np.array([foo, foo, bar, bar, foo, foo], dtype=object) + b = np.array([one, one, two, one, two, one], dtype=object) + c = np.array([dull, dull, shiny, dull, dull, shiny], dtype=object) + crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c']) + .. _reshaping.pivot.margins: Adding margins (partial aggregates) diff --git a/doc/source/whatsnew/v0.7.0.txt b/doc/source/whatsnew/v0.7.0.txt index 8f2d6b89bf12f..bc2bfe040f0f9 100644 --- a/doc/source/whatsnew/v0.7.0.txt +++ b/doc/source/whatsnew/v0.7.0.txt @@ -97,6 +97,15 @@ New features - Added ``encoding`` argument to ``read_csv``, ``read_table``, ``to_csv`` and ``from_csv`` for non-ascii text (GH717_) +- :ref:`Added ` ``abs`` method to pandas objects + +- :ref:`Added ` ``crosstab`` function for easily computing frequency tables + +- :ref:`Added ` ``isin`` method to index objects + +- :ref:`Added ` ``level`` argument to ``xs`` method of DataFrame. + + API Changes to integer indexing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From db2ffc5b1d8045ceac88920cf04ef0807c91a1df Mon Sep 17 00:00:00 2001 From: Adam Klein Date: Wed, 8 Feb 2012 18:57:53 -0500 Subject: [PATCH 2/2] DOC: fix duplicates in release --- RELEASE.rst | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/RELEASE.rst b/RELEASE.rst index 58798c9ae4869..b0efe9b17d06b 100644 --- a/RELEASE.rst +++ b/RELEASE.rst @@ -187,9 +187,6 @@ pandas 0.7.0 - Prototypes for some vectorized string functions - Add float64 hash table to solve the Series.unique problem with NAs (GH #714) - Memoize objects when reading from file to reduce memory footprint - - Improved handling of NAs in binary ops with object Series (GH #737) - - Added optimized Cython groupby routines for aggregating 2D blocks, added - vbenchmark, (GH #745) **Bug fixes** @@ -273,7 +270,7 @@ pandas 0.7.0 - Use right dropna function for SparseSeries. Return dense Series for NA fill value (GH #730) - Fix Index.format bug causing incorrectly string-formatted Series with - datetime indexes (# 758) + datetime indexes (# 726, 758) - Fix errors caused by object dtype arrays passed to ols (GH #759) - Fix error where column names lost when passing list of labels to DataFrame.__getitem__, (GH #662) @@ -286,7 +283,6 @@ pandas 0.7.0 - Fix csv encoding when using unicode (GH #705, #717, #738) - Fix assumption that each object contains every unique block type in concat, (GH #708) - - Fix handling of NAs in Series.unique (GH #714) - Fix sortedness check of multiindex in to_panel (GH #719, 720) - Fix that None was not treated as NA in PyObjectHashtable - Fix hashing dtype because of endianness confusion (GH #747, #748) @@ -296,10 +292,8 @@ pandas 0.7.0 yields numeric array, (GH #753) - Fixes and improvements to DataFrame.rank (GH #742) - Fix catching AttributeError instead of NameError for bottleneck - - Fix to float formatting issues (GH #726, 758) - Try to cast non-MultiIndex to better dtype when calling reset_index (GH #726 #440) - - Fix object dtype arrays in OLS class (GH #759) - Fix #1.QNAN0' float bug on 2.6/win64 - Allow subclasses of dicts in DataFrame constructor, with tests - Fix problem whereby set_index destroys column multiindex (GH #764)