Skip to content

Commit d09ac64

Browse files
committed
2 parents dc7976c + d6a99af commit d09ac64

28 files changed

+473
-104
lines changed

RELEASE.rst

+23-1
Original file line numberDiff line numberDiff line change
@@ -22,15 +22,25 @@ Where to get it
2222
* Binary installers on PyPI: http://pypi.python.org/pypi/pandas
2323
* Documentation: http://pandas.pydata.org
2424

25-
pandas 0.8.1
25+
pandas 0.8.2
2626
============
2727

2828
**Release date:** NOT YET RELEASED
2929

30+
**Improvements to existing features**
31+
32+
- Add ``flags`` option for ``re.compile`` in some Series.str methods (#1659)
33+
34+
pandas 0.8.1
35+
============
36+
37+
**Release date:** July 22, 2012
38+
3039
**New features**
3140

3241
- Add vectorized, NA-friendly string methods to Series (#1621, #620)
3342
- Can pass dict of per-column line styles to DataFrame.plot (#1559)
43+
- Selective plotting to secondary y-axis on same subplot (PR #1640)
3444
- Add new ``bootstrap_plot`` plot function
3545
- Add new ``parallel_coordinates`` plot function (#1488)
3646
- Add ``radviz`` plot function (#1566)
@@ -45,6 +55,8 @@ pandas 0.8.1
4555
- Add Cython group median method for >15x speedup (#1358)
4656
- Drastically improve ``to_datetime`` performance on ISO8601 datetime strings
4757
(with no time zones) (#1571)
58+
- Improve single-key groupby performance on large data sets, accelerate use of
59+
groupby with a Categorical variable
4860
- Add ability to append hierarchical index levels with ``set_index`` and to
4961
drop single levels with ``reset_index`` (#1569, #1577)
5062
- Always apply passed functions in ``resample``, even if upsampling (#1596)
@@ -56,6 +68,8 @@ pandas 0.8.1
5668
- Accelerate 3-axis multi data selection from homogeneous Panel (#979)
5769
- Add ``adjust`` option to ewma to disable adjustment factor (#1584)
5870
- Add new matplotlib converters for high frequency time series plotting (#1599)
71+
- Handling of tz-aware datetime.datetime objects in to_datetime; raise
72+
Exception unless utc=True given (#1581)
5973

6074
**Bug fixes**
6175

@@ -96,6 +110,14 @@ pandas 0.8.1
96110
- Fix use of string alias timestamps with tz-aware time series (#1647)
97111
- Fix Series.max/min and Series.describe on len-0 series (#1650)
98112
- Handle None values in dict passed to concat (#1649)
113+
- Fix Series.interpolate with method='values' and DatetimeIndex (#1646)
114+
- Fix IndexError in left merges on a DataFrame with 0-length (#1628)
115+
- Fix DataFrame column width display with UTF-8 encoded characters (#1620)
116+
- Handle case in pandas.io.data.get_data_yahoo where Yahoo! returns duplicate
117+
dates for most recent business day
118+
- Avoid downsampling when plotting mixed frequencies on the same subplot (#1619)
119+
- Fix read_csv bug when reading a single line (#1553)
120+
- Fix bug in C code causing monthly periods prior to December 1969 to be off (#1570)
99121

100122
pandas 0.8.0
101123
============

doc/source/basics.rst

+73-5
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ an axis and broadcasting over the same axis:
141141
major_mean
142142
wp.sub(major_mean, axis='major')
143143
144-
And similarly for axis="items" and axis="minor".
144+
And similarly for ``axis="items"`` and ``axis="minor"``.
145145

146146
.. note::
147147

@@ -369,14 +369,14 @@ index labels with the minimum and maximum corresponding values:
369369
df1.idxmin(axis=0)
370370
df1.idxmax(axis=1)
371371
372-
When there are multiple rows (or columns) matching the minimum or maximum
372+
When there are multiple rows (or columns) matching the minimum or maximum
373373
value, ``idxmin`` and ``idxmax`` return the first matching index:
374374

375375
.. ipython:: python
376376
377-
df = DataFrame([2, 1, 1, 3, np.nan], columns=['A'], index=list('edcba'))
378-
df
379-
df['A'].idxmin()
377+
df3 = DataFrame([2, 1, 1, 3, np.nan], columns=['A'], index=list('edcba'))
378+
df3
379+
df3['A'].idxmin()
380380
381381
Value counts (histogramming)
382382
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -835,6 +835,74 @@ For instance,
835835
836836
for r in df2.itertuples(): print r
837837
838+
.. _basics.string_methods:
839+
840+
Vectorized string methods
841+
-------------------------
842+
843+
Series is equipped (as of pandas 0.8.1) with a set of string processing methods
844+
that make it easy to operate on each element of the array. Perhaps most
845+
importantly, these methods exclude missing/NA values automatically. These are
846+
accessed via the Series's ``str`` attribute and generally have names matching
847+
the equivalent (scalar) build-in string methods:
848+
849+
.. ipython:: python
850+
851+
s = Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
852+
s.str.lower()
853+
s.str.upper()
854+
s.str.len()
855+
856+
Methods like ``split`` return a Series of lists:
857+
858+
.. ipython:: python
859+
860+
s2 = Series(['a_b_c', 'c_d_e', np.nan, 'f_g_h'])
861+
s2.str.split('_')
862+
863+
Elements in the split lists can be accessed using ``get`` or ``[]`` notation:
864+
865+
.. ipython:: python
866+
867+
s2.str.split('_').str.get(1)
868+
s2.str.split('_').str[1]
869+
870+
Methods like ``replace`` and ``findall`` take regular expressions, too:
871+
872+
.. ipython:: python
873+
874+
s3 = Series(['A', 'B', 'C', 'Aaba', 'Baca',
875+
'', np.nan, 'CABA', 'dog', 'cat'])
876+
s3
877+
s3.str.replace('^.a|dog', 'XX-XX ', case=False)
878+
879+
.. csv-table::
880+
:header: "Method", "Description"
881+
:widths: 20, 80
882+
883+
``cat``,Concatenate strings
884+
``split``,Split strings on delimiter
885+
``get``,Index into each element (retrieve i-th element)
886+
``join``,Join strings in each element of the Series with passed separator
887+
``contains``,Return boolean array if each string contains pattern/regex
888+
``replace``,Replace occurrences of pattern/regex with some other string
889+
``repeat``,Duplicate values (``s.str.repeat(3)`` equivalent to ``x * 3``)
890+
``pad``,"Add whitespace to left, right, or both sides of strings"
891+
``center``,Equivalent to ``pad(side='both')``
892+
``slice``,Slice each string in the Series
893+
``slice_replace``,Replace slice in each string with passed value
894+
``count``,Count occurrences of pattern
895+
``startswith``,Equivalent to ``str.startswith(pat)`` for each element
896+
``endswidth``,Equivalent to ``str.endswith(pat)`` for each element
897+
``findall``,Compute list of all occurrences of pattern/regex for each string
898+
``match``,"Call ``re.match`` on each element, returning matched groups as list"
899+
``len``,Compute string lengths
900+
``strip``,Equivalent to ``str.strip``
901+
``rstrip``,Equivalent to ``str.rstrip``
902+
``lstrip``,Equivalent to ``str.lstrip``
903+
``lower``,Equivalent to ``str.lower``
904+
``upper``,Equivalent to ``str.upper``
905+
838906
.. _basics.sorting:
839907

840908
Sorting by index and value

doc/source/dsintro.rst

+7
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,13 @@ between labels and data will not be broken unless done so explicitly by you.
3232
We'll give a brief intro to the data structures, then consider all of the broad
3333
categories of functionality and methods in separate sections.
3434

35+
When using pandas, we recommend the following import convention:
36+
37+
.. code-block:: python
38+
39+
import pandas as pd
40+
41+
3542
.. _basics.series:
3643

3744
Series

doc/source/indexing.rst

+26
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
import random
1010
np.random.seed(123456)
1111
from pandas import *
12+
import pandas as pd
1213
randn = np.random.randn
1314
randint = np.random.randint
1415
np.set_printoptions(precision=4, suppress=True)
@@ -665,6 +666,14 @@ can find yourself working with hierarchically-indexed data without creating a
665666
``MultiIndex`` explicitly yourself. However, when loading data from a file, you
666667
may wish to generate your own ``MultiIndex`` when preparing the data set.
667668

669+
Note that how the index is displayed by be controlled using the
670+
``multi_sparse`` option in ``pandas.set_printoptions``:
671+
672+
.. ipython:: python
673+
674+
pd.set_printoptions(multi_sparse=False)
675+
df
676+
pd.set_printoptions(multi_sparse=True)
668677
669678
Reconstructing the level labels
670679
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -935,6 +944,15 @@ indexed DataFrame:
935944
indexed2 = data.set_index(['a', 'b'])
936945
indexed2
937946
947+
The ``append`` keyword option allow you to keep the existing index and append the given
948+
columns to a MultiIndex:
949+
950+
.. ipython:: python
951+
952+
frame = data.set_index('c', drop=False)
953+
frame = frame.set_index(['a', 'b'], append=True)
954+
frame
955+
938956
Other options in ``set_index`` allow you not drop the index columns or to add
939957
the index in-place (without creating a new object):
940958

@@ -959,6 +977,14 @@ integer index. This is the inverse operation to ``set_index``
959977
The output is more similar to a SQL table or a record array. The names for the
960978
columns derived from the index are the ones stored in the ``names`` attribute.
961979

980+
You can use the ``level`` keyword to remove only a portion of the index:
981+
982+
.. ipython:: python
983+
984+
frame
985+
frame.reset_index(level=1)
986+
987+
962988
``reset_index`` takes an optional parameter ``drop`` which if true simply
963989
discards the index, instead of putting index values in the DataFrame's columns.
964990

doc/source/v0.8.1.txt

+30-11
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,46 @@
11
.. _whatsnew_0801:
22

3-
v.0.8.1 (July 23, 2012)
4-
---------------------------
3+
v0.8.1 (July 22, 2012)
4+
----------------------
55

6-
This release includes a few new features and addresses over a dozen bugs in
7-
0.8.0, most notably NA friendly string processing functionality and a series of
8-
new plot types and options.
6+
This release includes a few new features, performance enhancements, and over 30
7+
bug fixes from 0.8.0. New features include notably NA friendly string
8+
processing functionality and a series of new plot types and options.
99

1010
New features
1111
~~~~~~~~~~~~
1212

13-
- Add string processing methods accesible via Series.str (GH620_)
13+
- Add :ref:`vectorized string processing methods <basics.string_methods>`
14+
accessible via Series.str (GH620_)
1415
- Add option to disable adjustment in EWMA (GH1584_)
15-
- Radviz plot (GH1566_)
16+
- :ref:`Radviz plot <visualization.radviz>` (GH1566_)
17+
- :ref:`Parallel coordinates plot <visualization.parallel_coordinates>`
18+
- :ref:`Bootstrap plot <visualization.bootstrap>`
1619
- Per column styles and secondary y-axis plotting (GH1559_)
1720
- New datetime converters millisecond plotting (GH1599_)
21+
- Add option to disable "sparse" display of hierarchical indexes (GH1538_)
22+
- Series/DataFrame's ``set_index`` method can :ref:`append levels
23+
<indexing.set_index>` to an existing Index/MultiIndex (GH1569_, GH1577_)
1824

1925
Performance improvements
2026
~~~~~~~~~~~~~~~~~~~~~~~~
2127

22-
- Improved implementation of rolling min and max
23-
- Set logic performance for primitives
28+
- Improved implementation of rolling min and max (thanks to `Bottleneck
29+
<http://berkeleyanalytics.com/bottleneck/>`__ !)
30+
- Add accelerated ``'median'`` GroupBy option (GH1358_)
31+
- Significantly improve the performance of parsing ISO8601-format date
32+
strings with ``DatetimeIndex`` or ``to_datetime`` (GH1571_)
33+
- Improve the performance of GroupBy on single-key aggregations and use with
34+
Categorical types
2435
- Significant datetime parsing performance improvments
2536

26-
.. _GH561: https://github.com/pydata/pandas/issues/561
27-
.. _GH50: https://github.com/pydata/pandas/issues/50
37+
.. _GH620: https://github.com/pydata/pandas/issues/620
38+
.. _GH1358: https://github.com/pydata/pandas/issues/1358
39+
.. _GH1538: https://github.com/pydata/pandas/issues/1538
40+
.. _GH1559: https://github.com/pydata/pandas/issues/1559
41+
.. _GH1584: https://github.com/pydata/pandas/issues/1584
42+
.. _GH1566: https://github.com/pydata/pandas/issues/1566
43+
.. _GH1569: https://github.com/pydata/pandas/issues/1569
44+
.. _GH1571: https://github.com/pydata/pandas/issues/1571
45+
.. _GH1577: https://github.com/pydata/pandas/issues/1577
46+
.. _GH1599: https://github.com/pydata/pandas/issues/1599

doc/source/visualization.rst

+9-3
Original file line numberDiff line numberDiff line change
@@ -98,11 +98,11 @@ You can plot one column versus another using the `x` and `y` keywords in
9898
9999
plt.figure()
100100
101-
df = DataFrame(np.random.randn(1000, 2), columns=['B', 'C']).cumsum()
102-
df['A'] = Series(range(len(df)))
101+
df3 = DataFrame(np.random.randn(1000, 2), columns=['B', 'C']).cumsum()
102+
df3['A'] = Series(range(len(df)))
103103
104104
@savefig df_plot_xy.png width=4.5in
105-
df.plot(x='A', y='B')
105+
df3.plot(x='A', y='B')
106106
107107
108108
Plotting on a Secondary Y-axis
@@ -339,6 +339,8 @@ of the same class will usually be closer together and form larger structures.
339339
@savefig andrews_curves.png width=6in
340340
andrews_curves(data, 'Name')
341341
342+
.. _visualization.parallel_coordinates:
343+
342344
Parallel Coordinates
343345
~~~~~~~~~~~~~~~~~~~~
344346

@@ -402,6 +404,8 @@ confidence band.
402404
@savefig autocorrelation_plot.png width=6in
403405
autocorrelation_plot(data)
404406
407+
.. _visualization.bootstrap:
408+
405409
Bootstrap Plot
406410
~~~~~~~~~~~~~~
407411

@@ -420,6 +424,8 @@ are what constitutes the bootstrap plot.
420424
@savefig bootstrap_plot.png width=8in
421425
bootstrap_plot(data, size=50, samples=500, color='grey')
422426
427+
.. _visualization.radviz:
428+
423429
RadViz
424430
~~~~~~
425431

doc/source/whatsnew.rst

+2
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ What's New
1616

1717
These are new features and improvements of note in each release.
1818

19+
.. include:: v0.8.1.txt
20+
1921
.. include:: v0.8.0.txt
2022

2123
.. include:: v0.7.3.txt

pandas/core/format.py

+16-3
Original file line numberDiff line numberDiff line change
@@ -133,10 +133,18 @@ def to_string(self):
133133

134134
if py3compat.PY3: # pragma: no cover
135135
_encode_diff = lambda x: 0
136+
137+
_strlen = len
136138
else:
137139
def _encode_diff(x):
138140
return len(x) - len(x.decode('utf-8'))
139141

142+
def _strlen(x):
143+
try:
144+
return len(x.decode('utf-8'))
145+
except UnicodeError:
146+
return len(x)
147+
140148
class DataFrameFormatter(object):
141149
"""
142150
Render a DataFrame
@@ -205,7 +213,7 @@ def to_string(self, force_unicode=False):
205213
if self.header:
206214
fmt_values = self._format_col(i)
207215
cheader = str_columns[i]
208-
max_len = max(max(len(x) for x in fmt_values),
216+
max_len = max(max(_strlen(x) for x in fmt_values),
209217
max(len(x) for x in cheader))
210218
if self.justify == 'left':
211219
cheader = [x.ljust(max_len) for x in cheader]
@@ -624,7 +632,7 @@ def _make_fixed_width(strings, justify='right'):
624632
if len(strings) == 0:
625633
return strings
626634

627-
max_len = max(len(x) for x in strings)
635+
max_len = max(_strlen(x) for x in strings)
628636
conf_max = print_config.max_colwidth
629637
if conf_max is not None and max_len > conf_max:
630638
max_len = conf_max
@@ -635,7 +643,12 @@ def _make_fixed_width(strings, justify='right'):
635643
justfunc = lambda self, x: self.rjust(x)
636644

637645
def just(x):
638-
return justfunc(x[:max_len], max_len)
646+
try:
647+
eff_len = max_len + _encode_diff(x)
648+
except UnicodeError:
649+
eff_len = max_len
650+
651+
return justfunc(x[:eff_len], eff_len)
639652

640653
return [just(x) for x in strings]
641654

0 commit comments

Comments
 (0)