Skip to content

Commit 22183fa

Browse files
committed
Merge commit 'v0.7.2-111-gf7b9139' into debian
* commit 'v0.7.2-111-gf7b9139': (178 commits) DOC: refer to [email protected] instead of [email protected] DOC: fix url to vbench moved over to pydata DOC: some docs re: Panel.from_dict with orient='minor', close pandas-dev#1009 ENH: use filtered exog index for the y_predict result, per pandas-dev#1027 close pandas-dev#1008 ENH: make method signature more consistent with new statsmodels behavior. Uses dot product directly so pandas users aren't affected by statsmodels API change ENH: added OLS.predict method; pass through call to statsmodels ols predict method DOC: encoding affects python 2 and 3 DOC: read_fwf doc tweaks for 0.7.3 Sphinx documentation for read_fwf DOC: add what's new for 0.7.3, some scatter_matrix improvements TST: python 3 fixes BUG: return ax.get_figure() in scatter_plot if ax argument is not None add ax kwd to several functions and push ax into subplots so new subplot axes is generated on the ax's figure ENH: label sizes and rotations for histogram TST: test cases for both Series and DataFrame histogram DOC: all 0.7.3 issues should now be in either new features or bug fixes ENH: partial multiple setting on first level via .ix on DataFrame, close pandas-dev#409 RLS: python 2.5 compatibility stuff with boolean arrays BUG: treat None as NA in DataFrame arithmetic operations, pandas-dev#992 BUG: fix indexing error when selecting section of a hierarchically-indexed DataFrame row, close pandas-dev#1013 ENH: attach name to Series on axis=1 in DataFrame.apply, pandas-dev#983 ...
2 parents 14e2ebe + f7b9139 commit 22183fa

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

80 files changed

+5142
-1334
lines changed

RELEASE.rst

+122
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,128 @@ Where to get it
2222
* Binary installers on PyPI: http://pypi.python.org/pypi/pandas
2323
* Documentation: http://pandas.pydata.org
2424

25+
pandas 0.7.3
26+
============
27+
28+
**Release date:** NOT YET RELEAED
29+
30+
**New features / modules**
31+
32+
- Added fixed-width file reader, read_fwf (PR #952)
33+
- Add group_keys argument to groupby to not add group names to MultiIndex in
34+
result of apply (GH #938)
35+
- DataFrame can now accept non-integer label slicing (GH #946). Previously
36+
only DataFrame.ix was able to do so.
37+
- DataFrame.apply now retains name attributes on Series objects (GH #983)
38+
- Numeric DataFrame comparisons with non-numeric values now raises proper
39+
TypeError (GH #943). Previously raise "PandasError: DataFrame constructor
40+
not properly called!"
41+
- Add ``kurt`` methods to Series and DataFrame (PR #964)
42+
- Can pass dict of column -> list/set NA values for text parsers (GH #754)
43+
- Allows users specified NA values in text parsers (GH #754)
44+
- Parsers checks for openpyxl dependency and raises ImportError if not found
45+
(PR #1007)
46+
- New factory function to create HDFStore objects that can be used in a with
47+
statement so users do not have to explicitly call HDFStore.close (PR #1005)
48+
- pivot_table is now more flexible with same parameters as groupby (GH #941)
49+
- Added stacked bar plots (GH #987)
50+
- scatter_matrix method in pandas/tools/plotting.py (PR #935)
51+
- DataFrame.boxplot returns plot results for ex-post styling (GH #985)
52+
- Short version number accessible as pandas.version.short_version (GH #930)
53+
- Additional documentation in panel.to_frame (GH #942)
54+
- More informative Series.apply docstring regarding element-wise apply
55+
(GH #977)
56+
- Notes on rpy2 installation (GH #1006)
57+
- Add rotation and font size options to hist method (#1012)
58+
- Use exogenous / X variable index in result of OLS.y_predict. Add
59+
OLS.predict method (PR #1027, #1008)
60+
61+
**API Changes**
62+
63+
- Calling apply on grouped Series, e.g. describe(), will no longer yield
64+
DataFrame by default. Will have to call unstack() to get prior behavior
65+
- NA handling in non-numeric comparisons has been tightened up (#933, #953)
66+
67+
**Bug fixes**
68+
69+
- Fix logic error when selecting part of a row in a DataFrame with a
70+
MultiIndex index (GH #1013)
71+
- Series comparison with Series of differing length causes crash (GH #1016).
72+
- Fix bug in indexing when selecting section of hierarchically-indexed row
73+
(GH #1013)
74+
- DataFrame.plot(logy=True) has no effect (GH #1011).
75+
- Broken arithmetic operations between SparsePanel-Panel (GH #1015)
76+
- Unicode repr issues in MultiIndex with non-ascii characters (GH #1010)
77+
- DataFrame.lookup() returns inconsistent results if exact match not present
78+
(GH #1001)
79+
- DataFrame arithmetic operations not treating None as NA (GH #992)
80+
- DataFrameGroupBy.apply returns incorrect result (GH #991)
81+
- Series.reshape returns incorrect result for multiple dimensions (GH #989)
82+
- Series.std and Series.var ignores ddof parameter (GH #934)
83+
- DataFrame.append loses index names (GH #980)
84+
- DataFrame.plot(kind='bar') ignores color argument (GH #958)
85+
- Inconsistent Index comparison results (GH #948)
86+
- Improper int dtype DataFrame construction from data with NaN (GH #846)
87+
- Removes default 'result' name in grouby results (GH #995)
88+
- DataFrame.from_records no longer mutate input columns (PR #975)
89+
90+
pandas 0.7.2
91+
============
92+
93+
**Release date:** March 16, 2012
94+
95+
**New features / modules**
96+
97+
- Add additional tie-breaking methods in DataFrame.rank (#874)
98+
- Add ascending parameter to rank in Series, DataFrame (#875)
99+
- Add coerce_float option to DataFrame.from_records (#893)
100+
- Add sort_columns parameter to allow unsorted plots (#918)
101+
- IPython tab completion on GroupBy objects
102+
103+
**API Changes**
104+
105+
- Series.sum returns 0 instead of NA when called on an empty
106+
series. Analogously for a DataFrame whose rows or columns are length 0
107+
(#844)
108+
109+
**Improvements to existing features**
110+
111+
- Don't use groups dict in Grouper.size (#860)
112+
- Use khash for Series.value_counts, add raw function to algorithms.py (#861)
113+
- Enable column access via attributes on GroupBy (#882)
114+
- Enable setting existing columns (only) via attributes on DataFrame, Panel
115+
(#883)
116+
- Intercept __builtin__.sum in groupby (#885)
117+
- Can pass dict to DataFrame.fillna to use different values per column (#661)
118+
- Can select multiple hierarchical groups by passing list of values in .ix
119+
(#134)
120+
- Add level keyword to ``drop`` for dropping values from a level (GH #159)
121+
- Add ``coerce_float`` option on DataFrame.from_records (# 893)
122+
- Raise exception if passed date_parser fails in ``read_csv``
123+
- Add ``axis`` option to DataFrame.fillna (#174)
124+
- Fixes to Panel to make it easier to subclass (PR #888)
125+
126+
**Bug fixes**
127+
128+
- Fix overflow-related bugs in groupby (#850, #851)
129+
- Fix unhelpful error message in parsers (#856)
130+
- Better err msg for failed boolean slicing of dataframe (#859)
131+
- Series.count cannot accept a string (level name) in the level argument (#869)
132+
- Group index platform int check (#870)
133+
- concat on axis=1 and ignore_index=True raises TypeError (#871)
134+
- Further unicode handling issues resolved (#795)
135+
- Fix failure in multiindex-based access in Panel (#880)
136+
- Fix DataFrame boolean slice assignment failure (#881)
137+
- Fix combineAdd NotImplementedError for SparseDataFrame (#887)
138+
- Fix DataFrame.to_html encoding and columns (#890, #891, #909)
139+
- Fix na-filling handling in mixed-type DataFrame (#910)
140+
- Fix to DataFrame.set_value with non-existant row/col (#911)
141+
- Fix malformed block in groupby when excluding nuisance columns (#916)
142+
- Fix inconsistant NA handling in dtype=object arrays (#925)
143+
- Fix missing center-of-mass computation in ewmcov (#862)
144+
- Don't raise exception when opening read-only HDF5 file (#847)
145+
- Fix possible out-of-bounds memory access in 0-length Series (#917)
146+
25147
pandas 0.7.1
26148
============
27149

doc/source/computation.rst

+12-1
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ Data ranking
9595
~~~~~~~~~~~~
9696

9797
The ``rank`` method produces a data ranking with ties being assigned the mean
98-
of the ranks for the group:
98+
of the ranks (by default) for the group:
9999

100100
.. ipython:: python
101101
@@ -113,6 +113,17 @@ or the columns (``axis=1``). ``NaN`` values are excluded from the ranking.
113113
df
114114
df.rank(1)
115115
116+
``rank`` optionally takes a parameter ``ascending`` which by default is true;
117+
when false, data is reverse-ranked, with larger values assigned a smaller rank.
118+
119+
``rank`` supports different tie-breaking methods, specified with the ``method``
120+
parameter:
121+
122+
- ``average`` : average rank of tied group
123+
- ``min`` : lowest rank in the group
124+
- ``max`` : highest rank in the group
125+
- ``first`` : ranks assigned in the order they appear in the array
126+
116127
.. note::
117128

118129
These methods are significantly faster (around 10-20x) than

doc/source/dsintro.rst

+29-1
Original file line numberDiff line numberDiff line change
@@ -687,7 +687,20 @@ For example, compare to the construction above:
687687
688688
Panel.from_dict(data, orient='minor')
689689
690-
Orient is especially useful for mixed-type DataFrames.
690+
Orient is especially useful for mixed-type DataFrames. If you pass a dict of
691+
DataFrame objects with mixed-type columns, all of the data will get upcasted to
692+
``dtype=object`` unless you pass ``orient='minor'``:
693+
694+
.. ipython:: python
695+
696+
df = DataFrame({'a': ['foo', 'bar', 'baz'],
697+
'b': np.random.randn(3)})
698+
df
699+
data = {'item1': df, 'item2': df}
700+
panel = Panel.from_dict(data, orient='minor')
701+
panel['a']
702+
panel['b']
703+
panel['b'].dtypes
691704
692705
.. note::
693706

@@ -747,3 +760,18 @@ For example, using the earlier example data, we could do:
747760
wp.major_xs(wp.major_axis[2])
748761
wp.minor_axis
749762
wp.minor_xs('C')
763+
764+
Conversion to DataFrame
765+
~~~~~~~~~~~~~~~~~~~~~~~
766+
767+
A Panel can be represented in 2D form as a hierarchically indexed
768+
DataFrame. See the section :ref:`hierarchical indexing <indexing.hierarchical>`
769+
for more on this. To convert a Panel to a DataFrame, use the ``to_frame``
770+
method:
771+
772+
.. ipython:: python
773+
774+
panel = Panel(np.random.randn(3, 5, 4), items=['one', 'two', 'three'],
775+
major_axis=DateRange('1/1/2000', periods=5),
776+
minor_axis=['a', 'b', 'c', 'd'])
777+
panel.to_frame()

doc/source/io.rst

+63-2
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ data into a DataFrame object. They can take a number of arguments:
9494
- ``converters``: a dictionary of functions for converting values in certain
9595
columns, where keys are either integers or column labels
9696
- ``encoding``: a string representing the encoding to use if the contents are
97-
non-ascii, for python versions prior to 3
97+
non-ascii
9898
- ``verbose`` : show number of NA values inserted in non-numeric columns
9999

100100
.. ipython:: python
@@ -139,6 +139,67 @@ fragile. Type inference is a pretty big deal. So if a column can be coerced to
139139
integer dtype without altering the contents, it will do so. Any non-numeric
140140
columns will come through as object dtype as with the rest of pandas objects.
141141

142+
.. _io.fwf:
143+
144+
Files with Fixed Width Columns
145+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
146+
While `read_csv` reads delimited data, the :func:`~pandas.io.parsers.read_fwf`
147+
function works with data files that have known and fixed column widths.
148+
The function parameters to `read_fwf` are largely the same as `read_csv` with
149+
two extra parameters:
150+
151+
- ``colspecs``: a list of pairs (tuples), giving the extents of the
152+
fixed-width fields of each line as half-open intervals [from, to[
153+
- ``widths``: a list of field widths, which can be used instead of
154+
``colspecs`` if the intervals are contiguous
155+
156+
.. ipython:: python
157+
:suppress:
158+
159+
f = open('bar.csv', 'w')
160+
data1 = ("id8141 360.242940 149.910199 11950.7\n"
161+
"id1594 444.953632 166.985655 11788.4\n"
162+
"id1849 364.136849 183.628767 11806.2\n"
163+
"id1230 413.836124 184.375703 11916.8\n"
164+
"id1948 502.953953 173.237159 12468.3")
165+
f.write(data1)
166+
f.close()
167+
168+
Consider a typical fixed-width data file:
169+
170+
.. ipython:: python
171+
172+
print open('bar.csv').read()
173+
174+
In order to parse this file into a DataFrame, we simply need to supply the
175+
column specifications to the `read_fwf` function along with the file name:
176+
177+
.. ipython:: python
178+
179+
#Column specifications are a list of half-intervals
180+
colspecs = [(0, 6), (8, 20), (21, 33), (34, 43)]
181+
df = read_fwf('bar.csv', colspecs=colspecs, header=None, index_col=0)
182+
df
183+
184+
Note how the parser automatically picks column names X.<column number> when
185+
``header=None`` argument is specified. Alternatively, you can supply just the
186+
column widths for contiguous columns:
187+
188+
.. ipython:: python
189+
190+
#Widths are a list of integers
191+
widths = [6, 14, 13, 10]
192+
df = read_fwf('bar.csv', widths=widths, header=None)
193+
df
194+
195+
The parser will take care of extra white spaces around the columns
196+
so it's ok to have extra separation between the columns in the file.
197+
198+
.. ipython:: python
199+
:suppress:
200+
201+
os.remove('bar.csv')
202+
142203
Files with an "implicit" index column
143204
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
144205

@@ -281,7 +342,7 @@ function takes a number of arguments. Only the first is required.
281342
- ``mode`` : Python write mode, default 'w'
282343
- ``sep`` : Field delimiter for the output file (default "'")
283344
- ``encoding``: a string representing the encoding to use if the contents are
284-
non-ascii, for python versions prior to 3
345+
non-ascii, for python versions prior to 3
285346

286347
Writing a formatted string
287348
~~~~~~~~~~~~~~~~~~~~~~~~~~

doc/source/merging.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -318,11 +318,11 @@ Here's a description of what each argument is for:
318318
- ``right``: Another DataFrame object
319319
- ``on``: Columns (names) to join on. Must be found in both the left and
320320
right DataFrame objects. If not passed and ``left_index`` and
321-
``right_index`` are ``False``, the intersectino of the columns in the
321+
``right_index`` are ``False``, the intersection of the columns in the
322322
DataFrames will be inferred to be the join keys
323323
- ``left_on``: Columns from the left DataFrame to use as keys. Can either be
324324
column names or arrays with length equal to the length of the DataFrame
325-
- ``right_on``: Columns from the left DataFrame to use as keys. Can either be
325+
- ``right_on``: Columns from the right DataFrame to use as keys. Can either be
326326
column names or arrays with length equal to the length of the DataFrame
327327
- ``left_index``: If ``True``, use the index (row labels) from the left
328328
DataFrame as its join key(s). In the case of a DataFrame with a MultiIndex

doc/source/r_interface.rst

+5-3
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,10 @@ rpy2 / R interface
1515
If your computer has R and rpy2 (> 2.2) installed (which will be left to the
1616
reader), you will be able to leverage the below functionality. On Windows,
1717
doing this is quite an ordeal at the moment, but users on Unix-like systems
18-
should find it quite easy. As a general rule, I would recommend using the
19-
latest revision of rpy2 from bitbucket:
18+
should find it quite easy. rpy2 evolves in time and the current interface is
19+
designed for the 2.2.x series, and we recommend to use over other series
20+
unless you are prepared to fix parts of the code. Released packages are available
21+
in PyPi, but should the latest code in the 2.2.x series be wanted it can be obtained with:
2022

2123
::
2224

@@ -25,7 +27,7 @@ latest revision of rpy2 from bitbucket:
2527

2628
cd rpy2
2729
hg pull
28-
hg update
30+
hg update version_2.2.x
2931
sudo python setup.py install
3032

3133
.. note::

doc/source/timeseries.rst

+1
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ We could have done the same thing with ``DateOffset``:
7777

7878
.. ipython:: python
7979
80+
from pandas.core.datetools import *
8081
d + DateOffset(months=4, days=5)
8182
8283
The key features of a ``DateOffset`` object are:

0 commit comments

Comments
 (0)