diff --git a/RELEASE.rst b/RELEASE.rst index 22e59ae73222a..b0efe9b17d06b 100644 --- a/RELEASE.rst +++ b/RELEASE.rst @@ -25,7 +25,7 @@ Where to get it pandas 0.7.0 ============ -**Release date:** NOT YET RELEASED +**Release date:** 2/8/2012 **New features / modules** @@ -91,11 +91,13 @@ pandas 0.7.0 - Add ``sort`` option to ``DataFrame.join`` (GH #731) - Improved handling of NAs (propagation) in binary operations with dtype=object arrays (GH #737) + - Add ``abs`` method to Pandas objects + - Added ``algorithms`` module to start collecting central algos **API Changes** - Label-indexing with integer indexes now raises KeyError if a label is not - found instead of falling back on location-based indexing + found instead of falling back on location-based indexing (GH #700) - Label-based slicing via ``ix`` or ``[]`` on Series will now only work if exact matches for the labels are found or if the index is monotonic (for range selections) @@ -119,6 +121,8 @@ pandas 0.7.0 #395) - The default delimiter for ``read_csv`` is comma rather than letting ``csv.Sniffer`` infer it + - Rename ``col_or_columns`` argument in ``DataFrame.drop_duplicates`` (GH + #734) **Improvements to existing features** @@ -178,6 +182,11 @@ pandas 0.7.0 - Substantially improved performance of groupby on DataFrames with many columns by aggregating blocks of columns all at once (GH #745) - Can pass a file handle or StringIO to Series/DataFrame.to_csv (GH #765) + - Can pass sequence of integers to DataFrame.irow(icol) and Series.iget, (GH + #654) + - Prototypes for some vectorized string functions + - Add float64 hash table to solve the Series.unique problem with NAs (GH #714) + - Memoize objects when reading from file to reduce memory footprint **Bug fixes** @@ -261,8 +270,33 @@ pandas 0.7.0 - Use right dropna function for SparseSeries. Return dense Series for NA fill value (GH #730) - Fix Index.format bug causing incorrectly string-formatted Series with - datetime indexes (# 758) + datetime indexes (# 726, 758) - Fix errors caused by object dtype arrays passed to ols (GH #759) + - Fix error where column names lost when passing list of labels to + DataFrame.__getitem__, (GH #662) + - Fix error whereby top-level week iterator overwrote week instance + - Fix circular reference causing memory leak in sparse array / series / + frame, (GH #663) + - Fix integer-slicing from integers-as-floats (GH #670) + - Fix zero division errors in nanops from object dtype arrays in all NA case + (GH #676) + - Fix csv encoding when using unicode (GH #705, #717, #738) + - Fix assumption that each object contains every unique block type in concat, + (GH #708) + - Fix sortedness check of multiindex in to_panel (GH #719, 720) + - Fix that None was not treated as NA in PyObjectHashtable + - Fix hashing dtype because of endianness confusion (GH #747, #748) + - Fix SparseSeries.dropna to return dense Series in case of NA fill value (GH + #730) + - Use map_infer instead of np.vectorize. handle NA sentinels if converter + yields numeric array, (GH #753) + - Fixes and improvements to DataFrame.rank (GH #742) + - Fix catching AttributeError instead of NameError for bottleneck + - Try to cast non-MultiIndex to better dtype when calling reset_index (GH #726 + #440) + - Fix #1.QNAN0' float bug on 2.6/win64 + - Allow subclasses of dicts in DataFrame constructor, with tests + - Fix problem whereby set_index destroys column multiindex (GH #764) Thanks ------ diff --git a/doc/source/basics.rst b/doc/source/basics.rst index 658711b9877a6..38b498ef646fe 100644 --- a/doc/source/basics.rst +++ b/doc/source/basics.rst @@ -277,6 +277,7 @@ optional ``level`` parameter which applies only if the object has a ``median``, Arithmetic median of values ``min``, Minimum ``max``, Maximum + ``abs``, Absolute Value ``prod``, Product of values ``std``, Unbiased standard deviation ``var``, Unbiased variance diff --git a/doc/source/indexing.rst b/doc/source/indexing.rst index a23af20e4e80a..7483e6d6d7240 100644 --- a/doc/source/indexing.rst +++ b/doc/source/indexing.rst @@ -140,6 +140,7 @@ label, respectively. panel.major_xs(date) panel.minor_xs('A') + Slicing ranges ~~~~~~~~~~~~~~ @@ -473,6 +474,12 @@ operators: a & b a - b +``isin`` method of Index objects +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +One additional operation is the ``isin`` method that works analogously to the +``Series.isin`` method found :ref:`here `. + .. _indexing.hierarchical: Hierarchical indexing (MultiIndex) @@ -643,6 +650,18 @@ but as you use it you may uncover corner cases or unintuitive behavior. If you do find something like this, do not hesitate to report the issue or ask on the mailing list. +.. _indexing.xs: + +Cross-section with hierarchical index +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``xs`` method of ``DataFrame`` additionally takes a level argument to make +selecting data at a particular level of a MultiIndex easier. + +.. ipython:: python + + df.xs('one', level='second') + .. _indexing.advanced_reindex: Advanced reindexing and alignment with hierarchical index diff --git a/doc/source/io.rst b/doc/source/io.rst index 3bb84dfa36c43..f3bbf73509e3b 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -95,6 +95,7 @@ data into a DataFrame object. They can take a number of arguments: columns, where keys are either integers or column labels - ``encoding``: a string representing the encoding to use if the contents are non-ascii, for python versions prior to 3 + - ``verbose`` : show number of NA values inserted in non-numeric columns .. ipython:: python :suppress: diff --git a/doc/source/reshaping.rst b/doc/source/reshaping.rst index 8c877f05ad77d..4ac2892ded849 100644 --- a/doc/source/reshaping.rst +++ b/doc/source/reshaping.rst @@ -284,6 +284,39 @@ calling ``to_string`` if you wish: Note that ``pivot_table`` is also available as an instance method on DataFrame. +Cross tabulations +~~~~~~~~~~~~~~~~~ + +Use the ``crosstab`` function to compute a cross-tabulation of two (or more) +factors. By default ``crosstab`` computes a frequency table of the factors +unless an array of values and an aggregation function are passed. + +It takes a number of arguments + +- ``rows``: array-like, values to group by in the rows +- ``cols``: array-like, values to group by in the columns +- ``values``: array-like, optional, array of values to aggregate according to + the factors +- ``aggfunc``: function, optional, If no values array is passed, computes a + frequency table +- ``rownames``: sequence, default None, must match number of row arrays passed +- ``colnames``: sequence, default None, if passed, must match number of column + arrays passed +- ``margins``: boolean, default False, Add row/column margins (subtotals) + +Any Series passed will have their name attributes used unless row or column +names for the cross-tabulation are specified + +For example: + +.. ipython:: python + + foo, bar, dull, shiny, one, two = 'foo', 'bar', 'dull', 'shiny', 'one', 'two' + a = np.array([foo, foo, bar, bar, foo, foo], dtype=object) + b = np.array([one, one, two, one, two, one], dtype=object) + c = np.array([dull, dull, shiny, dull, dull, shiny], dtype=object) + crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c']) + .. _reshaping.pivot.margins: Adding margins (partial aggregates) diff --git a/doc/source/whatsnew/v0.7.0.txt b/doc/source/whatsnew/v0.7.0.txt index 8f2d6b89bf12f..bc2bfe040f0f9 100644 --- a/doc/source/whatsnew/v0.7.0.txt +++ b/doc/source/whatsnew/v0.7.0.txt @@ -97,6 +97,15 @@ New features - Added ``encoding`` argument to ``read_csv``, ``read_table``, ``to_csv`` and ``from_csv`` for non-ascii text (GH717_) +- :ref:`Added ` ``abs`` method to pandas objects + +- :ref:`Added ` ``crosstab`` function for easily computing frequency tables + +- :ref:`Added ` ``isin`` method to index objects + +- :ref:`Added ` ``level`` argument to ``xs`` method of DataFrame. + + API Changes to integer indexing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~