Skip to content

Commit 57e10a2

Browse files
committed
Merge tag 'v0.8.1' into debian
Version 0.8.1 * tag 'v0.8.1': (126 commits) RLS: Version 0.8.1 DOC: tweak DOC: set_index/reset_index examples DOC: doc fixes and what's new in 0.8.1, vectorized string methods ENH: better string element access/slicing notation close pandas-dev#1656 DOC: minor additions to release notes for 0.8.1 BUG: handle Yahoo! finance returning duplicate dates for prev bus day, doc fixes BUG: fix windows/32-bit builds BUG: get pandas-dev#1620 fix working on python 3 ENH: handling of UTF-8 strings in DataFrame columns, close pandas-dev#1620 TST: span unit test pandas-dev#1635 TST: skip another @network test if no internet connection ENH/BUG: handle tz-aware datetime.datetime in to_datetime, add utc=True option to allow conversion to utc, close pandas-dev#1581 ENH: hack to not compress single group keys, accelerate single-key and Categorical groupby operations BUG: fix merge bug with left joins on length-0 DataFrame, close pandas-dev#1628 BUG: Series.interpolate bug with method='values' and datetime64[ns], close pandas-dev#1646 BUG: properly handle None values in dict input to concat, close pandas-dev#1649 BUG: len-0 Series min/max/describe pandas-dev#1650 Fix describe() failure for None and empty Series. BUG: string date aliases now work with tz-aware time series close pandas-dev#1647 ...
2 parents 1c24700 + 4e95b31 commit 57e10a2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+4725
-722
lines changed

RELEASE.rst

+128-73
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,94 @@ Where to get it
2222
* Binary installers on PyPI: http://pypi.python.org/pypi/pandas
2323
* Documentation: http://pandas.pydata.org
2424

25+
pandas 0.8.1
26+
============
27+
28+
**Release date:** July 22, 2012
29+
30+
**New features**
31+
32+
- Add vectorized, NA-friendly string methods to Series (#1621, #620)
33+
- Can pass dict of per-column line styles to DataFrame.plot (#1559)
34+
- Selective plotting to secondary y-axis on same subplot (PR #1640)
35+
- Add new ``bootstrap_plot`` plot function
36+
- Add new ``parallel_coordinates`` plot function (#1488)
37+
- Add ``radviz`` plot function (#1566)
38+
- Add ``multi_sparse`` option to ``set_printoptions`` to modify display of
39+
hierarchical indexes (#1538)
40+
- Add ``dropna`` method to Panel (#171)
41+
42+
**Improvements to existing features**
43+
44+
- Use moving min/max algorithms from Bottleneck in rolling_min/rolling_max
45+
for > 100x speedup. (#1504, #50)
46+
- Add Cython group median method for >15x speedup (#1358)
47+
- Drastically improve ``to_datetime`` performance on ISO8601 datetime strings
48+
(with no time zones) (#1571)
49+
- Improve single-key groupby performance on large data sets, accelerate use of
50+
groupby with a Categorical variable
51+
- Add ability to append hierarchical index levels with ``set_index`` and to
52+
drop single levels with ``reset_index`` (#1569, #1577)
53+
- Always apply passed functions in ``resample``, even if upsampling (#1596)
54+
- Avoid unnecessary copies in DataFrame constructor with explicit dtype (#1572)
55+
- Cleaner DatetimeIndex string representation with 1 or 2 elements (#1611)
56+
- Improve performance of array-of-Period to PeriodIndex, convert such arrays
57+
to PeriodIndex inside Index (#1215)
58+
- More informative string representation for weekly Period objects (#1503)
59+
- Accelerate 3-axis multi data selection from homogeneous Panel (#979)
60+
- Add ``adjust`` option to ewma to disable adjustment factor (#1584)
61+
- Add new matplotlib converters for high frequency time series plotting (#1599)
62+
- Handling of tz-aware datetime.datetime objects in to_datetime; raise
63+
Exception unless utc=True given (#1581)
64+
65+
**Bug fixes**
66+
67+
- Fix NA handling in DataFrame.to_panel (#1582)
68+
- Handle TypeError issues inside PyObject_RichCompareBool calls in khash
69+
(#1318)
70+
- Fix resampling bug to lower case daily frequency (#1588)
71+
- Fix kendall/spearman DataFrame.corr bug with no overlap (#1595)
72+
- Fix bug in DataFrame.set_index (#1592)
73+
- Don't ignore axes in boxplot if by specified (#1565)
74+
- Fix Panel .ix indexing with integers bug (#1603)
75+
- Fix Partial indexing bugs (years, months, ...) with PeriodIndex (#1601)
76+
- Fix MultiIndex console formatting issue (#1606)
77+
- Unordered index with duplicates doesn't yield scalar location for single
78+
entry (#1586)
79+
- Fix resampling of tz-aware time series with "anchored" freq (#1591)
80+
- Fix DataFrame.rank error on integer data (#1589)
81+
- Selection of multiple SparseDataFrame columns by list in __getitem__ (#1585)
82+
- Override Index.tolist for compatibility with MultiIndex (#1576)
83+
- Fix hierarchical summing bug with MultiIndex of length 1 (#1568)
84+
- Work around numpy.concatenate use/bug in Series.set_value (#1561)
85+
- Ensure Series/DataFrame are sorted before resampling (#1580)
86+
- Fix unhandled IndexError when indexing very large time series (#1562)
87+
- Fix DatetimeIndex intersection logic error with irregular indexes (#1551)
88+
- Fix unit test errors on Python 3 (#1550)
89+
- Fix .ix indexing bugs in duplicate DataFrame index (#1201)
90+
- Better handle errors with non-existing objects in HDFStore (#1254)
91+
- Don't copy int64 array data in DatetimeIndex when copy=False (#1624)
92+
- Fix resampling of conforming periods quarterly to annual (#1622)
93+
- Don't lose index name on resampling (#1631)
94+
- Support python-dateutil version 2.1 (#1637)
95+
- Fix broken scatter_matrix axis labeling, esp. with time series (#1625)
96+
- Fix cases where extra keywords weren't being passed on to matplotlib from
97+
Series.plot (#1636)
98+
- Fix BusinessMonthBegin logic for dates before 1st bday of month (#1645)
99+
- Ensure string alias converted (valid in DatetimeIndex.get_loc) in
100+
DataFrame.xs / __getitem__ (#1644)
101+
- Fix use of string alias timestamps with tz-aware time series (#1647)
102+
- Fix Series.max/min and Series.describe on len-0 series (#1650)
103+
- Handle None values in dict passed to concat (#1649)
104+
- Fix Series.interpolate with method='values' and DatetimeIndex (#1646)
105+
- Fix IndexError in left merges on a DataFrame with 0-length (#1628)
106+
- Fix DataFrame column width display with UTF-8 encoded characters (#1620)
107+
- Handle case in pandas.io.data.get_data_yahoo where Yahoo! returns duplicate
108+
dates for most recent business day
109+
- Avoid downsampling when plotting mixed frequencies on the same subplot (#1619)
110+
- Fix read_csv bug when reading a single line (#1553)
111+
- Fix bug in C code causing monthly periods prior to December 1969 to be off (#1570)
112+
25113
pandas 0.8.0
26114
============
27115

@@ -140,6 +228,7 @@ pandas 0.8.0
140228

141229
**API Changes**
142230

231+
- Rename `pandas._tseries` to `pandas.lib`
143232
- Rename Factor to Categorical and add improvements. Numerous Categorical bug
144233
fixes
145234
- Frequency name overhaul, WEEKDAY/EOM and rules with @
@@ -1661,92 +1750,58 @@ Thanks
16611750
pandas 0.3.0
16621751
============
16631752

1664-
This major release of pandas represents approximately 1 year of continuous
1665-
development work and brings with it many new features, bug fixes, speed
1666-
enhancements, and general quality-of-life improvements. The most significant
1667-
change from the 0.2 release has been the completion of a rigorous unit test
1668-
suite covering all of the core functionality.
1669-
16701753
Release notes
16711754
-------------
16721755

16731756
**Release date:** February 20, 2011
16741757

16751758
**New features / modules**
16761759

1677-
* DataFrame / DataMatrix classes
1678-
1679-
* `corrwith` function to compute column- or row-wise correlations between two
1680-
objects
1681-
* Can boolean-index DataFrame objects, e.g. df[df > 2] = 2, px[px > last_px] = 0
1682-
* Added comparison magic methods (__lt__, __gt__, etc.)
1683-
* Flexible explicit arithmetic methods (add, mul, sub, div, etc.)
1684-
* Added `reindex_like` method
1685-
1686-
* WidePanel
1687-
1688-
* Added `reindex_like` method
1689-
1690-
* `pandas.io`: IO utilities
1691-
1692-
* `pandas.io.sql` module
1693-
1694-
* Convenience functions for accessing SQL-like databases
1695-
1696-
* `pandas.io.pytables` module
1697-
1698-
* Added (still experimental) HDFStore class for storing pandas data
1699-
structures using HDF5 / PyTables
1700-
1701-
* `pandas.core.datetools`
1702-
1703-
* Added WeekOfMonth date offset
1704-
1705-
* `pandas.rpy` (experimental) module created, provide some interfacing /
1706-
conversion between rpy2 and pandas
1760+
- `corrwith` function to compute column- or row-wise correlations between two
1761+
DataFrame objects
1762+
- Can boolean-index DataFrame objects, e.g. df[df > 2] = 2, px[px > last_px] = 0
1763+
- Added comparison magic methods (__lt__, __gt__, etc.)
1764+
- Flexible explicit arithmetic methods (add, mul, sub, div, etc.)
1765+
- Added `reindex_like` method
1766+
- Added `reindex_like` method to WidePanel
1767+
- Convenience functions for accessing SQL-like databases in `pandas.io.sql`
1768+
module
1769+
- Added (still experimental) HDFStore class for storing pandas data
1770+
structures using HDF5 / PyTables in `pandas.io.pytables` module
1771+
- Added WeekOfMonth date offset
1772+
- `pandas.rpy` (experimental) module created, provide some interfacing /
1773+
conversion between rpy2 and pandas
17071774

17081775
**Improvements**
17091776

1710-
* Unit test coverage: 100% line coverage of core data structures
1711-
1712-
* Speed enhancement to rolling_{median, max, min}
1713-
1714-
* Column ordering between DataFrame and DataMatrix is now consistent: before
1715-
DataFrame would not respect column order
1716-
1717-
* Improved {Series, DataFrame}.plot methods to be more flexible (can pass
1718-
matplotlib Axis arguments, plot DataFrame columns in multiple subplots, etc.)
1777+
- Unit test coverage: 100% line coverage of core data structures
1778+
- Speed enhancement to rolling_{median, max, min}
1779+
- Column ordering between DataFrame and DataMatrix is now consistent: before
1780+
DataFrame would not respect column order
1781+
- Improved {Series, DataFrame}.plot methods to be more flexible (can pass
1782+
matplotlib Axis arguments, plot DataFrame columns in multiple subplots,
1783+
etc.)
17191784

17201785
**API Changes**
17211786

1722-
* Exponentially-weighted moment functions in `pandas.stats.moments`
1723-
have a more consistent API and accept a min_periods argument like
1724-
their regular moving counterparts.
1725-
1726-
* **fillMethod** argument in Series, DataFrame changed to **method**,
1727-
`FutureWarning` added.
1728-
1729-
* **fill** method in Series, DataFrame/DataMatrix, WidePanel renamed to
1730-
**fillna**, `FutureWarning` added to **fill**
1731-
1732-
* Renamed **DataFrame.getXS** to **xs**, `FutureWarning` added
1733-
1734-
* Removed **cap** and **floor** functions from DataFrame, renamed to
1735-
**clip_upper** and **clip_lower** for consistency with NumPy
1787+
- Exponentially-weighted moment functions in `pandas.stats.moments` have a
1788+
more consistent API and accept a min_periods argument like their regular
1789+
moving counterparts.
1790+
- **fillMethod** argument in Series, DataFrame changed to **method**,
1791+
`FutureWarning` added.
1792+
- **fill** method in Series, DataFrame/DataMatrix, WidePanel renamed to
1793+
**fillna**, `FutureWarning` added to **fill**
1794+
- Renamed **DataFrame.getXS** to **xs**, `FutureWarning` added
1795+
- Removed **cap** and **floor** functions from DataFrame, renamed to
1796+
**clip_upper** and **clip_lower** for consistency with NumPy
17361797

17371798
**Bug fixes**
17381799

1739-
* Fixed bug in IndexableSkiplist Cython code that was breaking
1740-
rolling_max function
1741-
1742-
* Numerous numpy.int64-related indexing fixes
1743-
1744-
* Several NumPy 1.4.0 NaN-handling fixes
1745-
1746-
* Bug fixes to pandas.io.parsers.parseCSV
1747-
1748-
* Fixed `DateRange` caching issue with unusual date offsets
1749-
1750-
* Fixed bug in `DateRange.union`
1751-
1752-
* Fixed corner case in `IndexableSkiplist` implementation
1800+
- Fixed bug in IndexableSkiplist Cython code that was breaking
1801+
rolling_max function
1802+
- Numerous numpy.int64-related indexing fixes
1803+
- Several NumPy 1.4.0 NaN-handling fixes
1804+
- Bug fixes to pandas.io.parsers.parseCSV
1805+
- Fixed `DateRange` caching issue with unusual date offsets
1806+
- Fixed bug in `DateRange.union`
1807+
- Fixed corner case in `IndexableSkiplist` implementation

TODO.rst

+4-1
Original file line numberDiff line numberDiff line change
@@ -57,4 +57,7 @@ Performance blog
5757
- Take
5858

5959
git log v0.6.1..master --pretty=format:%aN | sort | uniq -c | sort -rn
60-
git log a8c2f88..master --pretty=format:%aN | sort | uniq -c | sort -rn
60+
61+
git log 7ddfbd4..master --pretty=format:%aN | sort | uniq -c | sort -rn
62+
git log a0257f5..master --pretty=format:%aN | sort | uniq -c | sort -rn
63+

doc/data/fx_prices

15.8 KB
Binary file not shown.

doc/data/iris.data

+1-2
Original file line numberDiff line numberDiff line change
@@ -148,5 +148,4 @@ SepalLength,SepalWidth,PetalLength,PetalWidth,Name
148148
6.3,2.5,5.0,1.9,Iris-virginica
149149
6.5,3.0,5.2,2.0,Iris-virginica
150150
6.2,3.4,5.4,2.3,Iris-virginica
151-
5.9,3.0,5.1,1.8,Iris-virginica
152-
151+
5.9,3.0,5.1,1.8,Iris-virginica

doc/source/basics.rst

+78-1
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ an axis and broadcasting over the same axis:
141141
major_mean
142142
wp.sub(major_mean, axis='major')
143143
144-
And similarly for axis="items" and axis="minor".
144+
And similarly for ``axis="items"`` and ``axis="minor"``.
145145

146146
.. note::
147147

@@ -369,6 +369,15 @@ index labels with the minimum and maximum corresponding values:
369369
df1.idxmin(axis=0)
370370
df1.idxmax(axis=1)
371371
372+
When there are multiple rows (or columns) matching the minimum or maximum
373+
value, ``idxmin`` and ``idxmax`` return the first matching index:
374+
375+
.. ipython:: python
376+
377+
df3 = DataFrame([2, 1, 1, 3, np.nan], columns=['A'], index=list('edcba'))
378+
df3
379+
df3['A'].idxmin()
380+
372381
Value counts (histogramming)
373382
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
374383

@@ -826,6 +835,74 @@ For instance,
826835
827836
for r in df2.itertuples(): print r
828837
838+
.. _basics.string_methods:
839+
840+
Vectorized string methods
841+
-------------------------
842+
843+
Series is equipped (as of pandas 0.8.1) with a set of string processing methods
844+
that make it easy to operate on each element of the array. Perhaps most
845+
importantly, these methods exclude missing/NA values automatically. These are
846+
accessed via the Series's ``str`` attribute and generally have names matching
847+
the equivalent (scalar) build-in string methods:
848+
849+
.. ipython:: python
850+
851+
s = Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
852+
s.str.lower()
853+
s.str.upper()
854+
s.str.len()
855+
856+
Methods like ``split`` return a Series of lists:
857+
858+
.. ipython:: python
859+
860+
s2 = Series(['a_b_c', 'c_d_e', np.nan, 'f_g_h'])
861+
s2.str.split('_')
862+
863+
Elements in the split lists can be accessed using ``get`` or ``[]`` notation:
864+
865+
.. ipython:: python
866+
867+
s2.str.split('_').str.get(1)
868+
s2.str.split('_').str[1]
869+
870+
Methods like ``replace`` and ``findall`` take regular expressions, too:
871+
872+
.. ipython:: python
873+
874+
s3 = Series(['A', 'B', 'C', 'Aaba', 'Baca',
875+
'', np.nan, 'CABA', 'dog', 'cat'])
876+
s3
877+
s3.str.replace('^.a|dog', 'XX-XX ', case=False)
878+
879+
.. csv-table::
880+
:header: "Method", "Description"
881+
:widths: 20, 80
882+
883+
``cat``,Concatenate strings
884+
``split``,Split strings on delimiter
885+
``get``,Index into each element (retrieve i-th element)
886+
``join``,Join strings in each element of the Series with passed separator
887+
``contains``,Return boolean array if each string contains pattern/regex
888+
``replace``,Replace occurrences of pattern/regex with some other string
889+
``repeat``,Duplicate values (``s.str.repeat(3)`` equivalent to ``x * 3``)
890+
``pad``,"Add whitespace to left, right, or both sides of strings"
891+
``center``,Equivalent to ``pad(side='both')``
892+
``slice``,Slice each string in the Series
893+
``slice_replace``,Replace slice in each string with passed value
894+
``count``,Count occurrences of pattern
895+
``startswith``,Equivalent to ``str.startswith(pat)`` for each element
896+
``endswidth``,Equivalent to ``str.endswith(pat)`` for each element
897+
``findall``,Compute list of all occurrences of pattern/regex for each string
898+
``match``,"Call ``re.match`` on each element, returning matched groups as list"
899+
``len``,Compute string lengths
900+
``strip``,Equivalent to ``str.strip``
901+
``rstrip``,Equivalent to ``str.rstrip``
902+
``lstrip``,Equivalent to ``str.lstrip``
903+
``lower``,Equivalent to ``str.lower``
904+
``upper``,Equivalent to ``str.upper``
905+
829906
.. _basics.sorting:
830907

831908
Sorting by index and value

doc/source/computation.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -299,7 +299,7 @@ average as
299299

300300
.. math::
301301
302-
y_t = (1-\alpha) y_{t-1} + \alpha x_t
302+
y_t = \alpha y_{t-1} + (1 - \alpha) x_t
303303
304304
One must have :math:`0 < \alpha \leq 1`, but rather than pass :math:`\alpha`
305305
directly, it's easier to think about either the **span** or **center of mass

doc/source/dsintro.rst

+7
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,13 @@ between labels and data will not be broken unless done so explicitly by you.
3232
We'll give a brief intro to the data structures, then consider all of the broad
3333
categories of functionality and methods in separate sections.
3434

35+
When using pandas, we recommend the following import convention:
36+
37+
.. code-block:: python
38+
39+
import pandas as pd
40+
41+
3542
.. _basics.series:
3643

3744
Series

0 commit comments

Comments
 (0)