Skip to content

Commit 9094e83

Browse files
committed
Merge commit 'v0.9.0-166-g7156920' into debian
* commit 'v0.9.0-166-g7156920': (159 commits) DOC: spacing fix BUG: catch any/all case that fails in NumPy > 1.6 with stricter casting rules DOC: fixed up release notes a little more TST: misc coverage and cleanup Additional test coverage for v0.9.1 DOC: add note about converters now blocking type coersion in file parsers BUG: partial slicing bugs for PeriodIndex BUG: account for different fp exponent formatting in some pythons BUG: parse floats outside of PyFloat_FromString for python 2.5 BUG: do not coerce types in parser if converters specified pandas-dev#2184 TST: more misc test coverage TST: take out IPython check for timeseries plotting tests TST: fix panel/test_panel file modes. don't skip plotting tests if ipython imported TST: rogue foo TST: disable test_console_encode post unicode refactor TST: misc test coverage DOC: remove note on not overriding mpl registry BUG: put mpl unit registry back in for set_xlim BUG: ensure axes are Index objects in _arrays_to_mgr DOC: examples for release notes ...
2 parents e5fe126 + 7156920 commit 9094e83

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

114 files changed

+4906
-1925
lines changed

CONTRIBUTING.md

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Please see [Developers](http://pandas.pydata.org/developers.html) page on the project website.

RELEASE.rst

+72
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,78 @@ Where to get it
2222
* Binary installers on PyPI: http://pypi.python.org/pypi/pandas
2323
* Documentation: http://pandas.pydata.org
2424

25+
pandas 0.9.1
26+
============
27+
28+
**Release date:** NOT YET RELEASED
29+
30+
**New features**
31+
32+
- Can specify multiple sort orders in DataFrame/Series.sort/sort_index (#928)
33+
- New `top` and `bottom` options for handling NAs in rank (#1508, #2159)
34+
- Add `where` and `mask` functions to DataFrame (#2109, #2151)
35+
- Add `at_time` and `between_time` functions to DataFrame (#2149)
36+
37+
**API Changes**
38+
39+
- Upsampling period index "spans" intervals. Example: annual periods
40+
upsampled to monthly will span all months in each year
41+
- Period.end_time will yield timestamp at last nanosecond in the interval
42+
(#2124, #2125, #1764)
43+
- File parsers no longer coerce to float or bool for columns that have custom
44+
converters specified (#2184)
45+
46+
**Improvements to existing features**
47+
48+
- Time rule inference for week-of-month (e.g. WOM-2FRI) rules (#2140)
49+
- Improve performance of datetime + business day offset with large number of
50+
offset periods
51+
- Improve HTML display of DataFrame objects with hierarchical columns
52+
- Enable referencing of Excel columns by their column names (#1936)
53+
- DataFrame.dot can accept ndarrays (#2042)
54+
- Support negative periods in Panel.shift (#2164)
55+
- Make .drop(...) work with non-unique indexes (#2101)
56+
- Improve performance of Series/DataFrame.diff (re: #2087)
57+
- Support unary ~ (__invert__) in DataFrame (#2110)
58+
59+
**Bug fixes**
60+
61+
- Fix some duplicate-column DataFrame constructor issues (#2079)
62+
- Fix bar plot color cycle issues (#2082)
63+
- Fix off-center grid for stacked bar plots (#2157)
64+
- Fix plotting bug if inferred frequency is offset with N > 1 (#2126)
65+
- Implement comparisons on date offsets with fixed delta (#2078)
66+
- Handle inf/-inf correctly in read_* parser functions (#2041)
67+
- Fix matplotlib unicode interaction bug
68+
- Make WLS r-squared match statsmodels 0.5.0 fixed value
69+
- Fix zero-trimming DataFrame formatting bug
70+
- Correctly compute/box datetime64 min/max values from Series.min/max (#2083)
71+
- Fix unstacking edge case with unrepresented groups (#2100)
72+
- Fix Series.str failures when using pipe pattern '|' (#2119)
73+
- Fix pretty-printing of dict entries in Series, DataFrame (#2144)
74+
- Cast other datetime64 values to nanoseconds in DataFrame ctor (#2095)
75+
- Alias Timestamp.astimezone to tz_convert, so will yield Timestamp (#2060)
76+
- Fix timedelta64 formatting from Series (#2165, #2146)
77+
- Handle None values gracefully in dict passed to Panel constructor (#2075)
78+
- Box datetime64 values as Timestamp objects in Series/DataFrame.iget (#2148)
79+
- Fix Timestamp indexing bug in DatetimeIndex.insert (#2155)
80+
- Use index name(s) (if any) in DataFrame.to_records (#2161)
81+
- Don't lose index names in Panel.to_frame/DataFrame.to_panel (#2163)
82+
- Work around length-0 boolean indexing NumPy bug (#2096)
83+
- Fix partial integer indexing bug in DataFrame.xs (#2107)
84+
- Fix variety of cut/qcut string-bin formatting bugs (#1978, #1979)
85+
- Raise Exception when xs view not possible of MultiIndex'd DataFrame (#2117)
86+
- Fix groupby(...).first() issue with datetime64 (#2133)
87+
- Better floating point error robustness in some rolling_* functions (#2114)
88+
- Fix ewma NA handling in the middle of Series (#2128)
89+
- Fix numerical precision issues in diff with integer data (#2087)
90+
- Fix bug in MultiIndex.__getitem__ with NA values (#2008)
91+
- Fix DataFrame.from_records dict-arg bug when passing columns (#2179)
92+
- Fix Series and DataFrame.diff for integer dtypes (#2087, #2174)
93+
- Fix bug when taking intersection of DatetimeIndex with empty index (#2129)
94+
- Pass through timezone information when calling DataFrame.align (#2127)
95+
96+
2597
pandas 0.9.0
2698
============
2799

bench/bench_merge.py

+10-9
Original file line numberDiff line numberDiff line change
@@ -47,9 +47,9 @@ def get_test_data(ngroups=100, n=N):
4747

4848

4949
join_methods = ['inner', 'outer', 'left', 'right']
50-
results = DataFrame(index=join_methods, columns=[False])
50+
results = DataFrame(index=join_methods, columns=[False, True])
5151
niter = 10
52-
for sort in [False]:
52+
for sort in [False, True]:
5353
for join_method in join_methods:
5454
f = lambda: merge(left, right, how=join_method, sort=sort)
5555
gc.disable()
@@ -59,8 +59,8 @@ def get_test_data(ngroups=100, n=N):
5959
elapsed = (time.time() - start) / niter
6060
gc.enable()
6161
results[sort][join_method] = elapsed
62-
results.columns = ['pandas']
63-
# results.columns = ['dont_sort', 'sort']
62+
# results.columns = ['pandas']
63+
results.columns = ['dont_sort', 'sort']
6464

6565

6666
# R results
@@ -73,20 +73,21 @@ def get_test_data(ngroups=100, n=N):
7373
right 0.3102 0.0536 0.0376
7474
"""), sep='\s+')
7575

76-
all_results = results.join(r_results)
76+
presults = results[['dont_sort']].rename(columns={'dont_sort': 'pandas'})
77+
all_results = presults.join(r_results)
7778

7879
all_results = all_results.div(all_results['pandas'], axis=0)
7980

8081
all_results = all_results.ix[:, ['pandas', 'data.table', 'plyr', 'base::merge']]
8182

8283
sort_results = DataFrame.from_items([('pandas', results['sort']),
83-
('R', r_results['sort'])])
84+
('R', r_results['base::merge'])])
8485
sort_results['Ratio'] = sort_results['R'] / sort_results['pandas']
8586

8687

8788
nosort_results = DataFrame.from_items([('pandas', results['dont_sort']),
88-
('R', r_results['dont_sort'])])
89-
nosort_results['Ratio'] = sort_results['R'] / sort_results['pandas']
89+
('R', r_results['base::merge'])])
90+
nosort_results['Ratio'] = nosort_results['R'] / nosort_results['pandas']
9091

9192
# many to many
9293

@@ -99,6 +100,6 @@ def get_test_data(ngroups=100, n=N):
99100
right 0.6425 0.0522 0.0428
100101
"""), sep='\s+')
101102

102-
all_results = results.join(r_results)
103+
all_results = presults.join(r_results)
103104
all_results = all_results.div(all_results['pandas'], axis=0)
104105
all_results = all_results.ix[:, ['pandas', 'data.table', 'plyr', 'base::merge']]

bench/bench_merge_sqlite.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,8 @@
7474
conn.commit()
7575

7676
sql_results[sort][join_method] = elapsed
77-
sql_results.columns = ['sqlite3'] # ['dont_sort', 'sort']
78-
sql_results.index = ['inner', 'outer', 'left']
77+
sql_results.columns = ['sqlite3'] # ['dont_sort', 'sort']
78+
sql_results.index = ['inner', 'outer', 'left']
7979

8080
sql = """select *
8181
from left

doc/data/test.xls

30 KB
Binary file not shown.

doc/source/basics.rst

-4
Original file line numberDiff line numberDiff line change
@@ -110,15 +110,11 @@ Series input is of primary interest. Using these functions, you can use to
110110
either match on the *index* or *columns* via the **axis** keyword:
111111

112112
.. ipython:: python
113-
:suppress:
114113
115114
d = {'one' : Series(randn(3), index=['a', 'b', 'c']),
116115
'two' : Series(randn(4), index=['a', 'b', 'c', 'd']),
117116
'three' : Series(randn(3), index=['b', 'c', 'd'])}
118117
df = DataFrame(d)
119-
120-
.. ipython:: python
121-
122118
df
123119
row = df.ix[1]
124120
column = df['two']

doc/source/dsintro.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ objects. To get started, import numpy and load pandas into your namespace:
2626
randn = np.random.randn
2727
from pandas import *
2828
29-
Here is a basic tenet to keep in mind: **data alignment is intrinsic**. Link
29+
Here is a basic tenet to keep in mind: **data alignment is intrinsic**. The link
3030
between labels and data will not be broken unless done so explicitly by you.
3131

3232
We'll give a brief intro to the data structures, then consider all of the broad

doc/source/indexing.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ the data structures:
9191
9292
There is an analogous ``set_value`` method which has the additional capability
9393
of enlarging an object. This method *always* returns a reference to the object
94-
it modified, which in the fast of enlargement, will be a **new object**:
94+
it modified, which in the case of enlargement, will be a **new object**:
9595

9696
.. ipython:: python
9797

doc/source/io.rst

+9-2
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ You can also use a list of columns to create a hierarchical index:
164164

165165
The ``dialect`` keyword gives greater flexibility in specifying the file format.
166166
By default it uses the Excel dialect but you can specify either the dialect name
167-
or a :class:``python:csv.Dialect`` instance.
167+
or a :class:`python:csv.Dialect` instance.
168168

169169
.. ipython:: python
170170
:suppress:
@@ -286,6 +286,13 @@ data columns:
286286
index_col=0) #index is the nominal column
287287
df
288288
289+
**Note**: When passing a dict as the `parse_dates` argument, the order of
290+
the columns prepended is not guaranteed, because `dict` objects do not impose
291+
an ordering on their keys. On Python 2.7+ you may use `collections.OrderedDict`
292+
instead of a regular `dict` if this matters to you. Because of this, when using a
293+
dict for 'parse_dates' in conjunction with the `index_col` argument, it's best to
294+
specify `index_col` as a column label rather then as an index on the resulting frame.
295+
289296
Date Parsing Functions
290297
~~~~~~~~~~~~~~~~~~~~~~
291298
Finally, the parser allows you can specify a custom ``date_parser`` function to
@@ -647,7 +654,7 @@ function takes a number of arguments. Only the first is required.
647654
(default), and `header` and `index` are True, then the index names are
648655
used. (A sequence should be given if the DataFrame uses MultiIndex).
649656
- ``mode`` : Python write mode, default 'w'
650-
- ``sep`` : Field delimiter for the output file (default "'")
657+
- ``sep`` : Field delimiter for the output file (default ",")
651658
- ``encoding``: a string representing the encoding to use if the contents are
652659
non-ascii, for python versions prior to 3
653660

doc/source/merging.rst

-5
Original file line numberDiff line numberDiff line change
@@ -414,11 +414,6 @@ either the left or right tables, the values in the joined table will be
414414
``outer``, ``FULL OUTER JOIN``, Use union of keys from both frames
415415
``inner``, ``INNER JOIN``, Use intersection of keys from both frames
416416

417-
Note that if using the index from either the left or right DataFrame (or both)
418-
using the ``left_index`` / ``right_index`` options, the join operation is no
419-
longer a many-to-many join by construction, as the index values are necessarily
420-
unique. There will be some examples of this below.
421-
422417
.. _merging.join.index:
423418

424419
Joining on index

doc/source/v0.9.0.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ API changes
4545

4646
- Creating a Series from another Series, passing an index, will cause reindexing
4747
to happen inside rather than treating the Series like an ndarray. Technically
48-
improper usages like ``Series(df[col1], index=df[col2])11 that worked before
48+
improper usages like ``Series(df[col1], index=df[col2])`` that worked before
4949
"by accident" (this was never intended) will lead to all NA Series in some
5050
cases. To be perfectly clear:
5151

0 commit comments

Comments
 (0)