Skip to content

Commit 4d8cf56

Browse files
committed
Merge commit 'v0.4.3' into debian
* commit 'v0.4.3': (97 commits) Version 0.4.3 RLS: release notes BUG: to_csv test failure on windows python3 TST: test failure on windows with Series.to_csv TST: skip scikits.statsmodels-depending tests if not installed BUG: version_info is a tuple in python <= 2.5 TST: use to_string instead to avoid futurewarning BUG: explicit CSV lineterminator='\n' to work around 3.1/3.2 csv module bug on Win32 TST: is_lexsorted test failure on py3.1/win32 BUG: work around unicode bug in py3.1 on win32 TST: skip test on sparse. release notes BLD: only import setuptools in py3k as cython command doesn't currently work ENH: print 25%-75% quartiles instead of 10%-90% deciles in describe. Address discussion in GH pandas-dev#196 RLS: update release notes Fix bug writing Series to CSV in Python 3. Add trove classifiers for Python 3 compatibility. Add isnull and notnull methods to Series. RLS: update release notes BUG: change python access to buffer access ENH: SparseSeries binary op speed enhancement in the block case, address GH pandas-dev#205 ...
2 parents ba1ff88 + a8b1479 commit 4d8cf56

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+5585
-1508
lines changed

README.rst

+2
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,8 @@ Dependencies
6969
Optional dependencies
7070
~~~~~~~~~~~~~~~~~~~~~
7171

72+
* `Cython <http://www.cython.org>`__: Only necessary to build development
73+
version
7274
* `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions
7375
* `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage
7476
* `matplotlib <http://matplotlib.sourceforge.net/>`__: for plotting

RELEASE.rst

+140-1
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,146 @@ Release Notes
55
This is the list of changes to pandas between each release. For full details,
66
see the commit logs at http://github.com/wesm/pandas
77

8+
pandas 0.4.3
9+
============
10+
11+
**Release date:** not yet released
12+
13+
This is largely a bugfix release from 0.4.2 but also includes a handful of new
14+
and enhanced features. Also, pandas can now be installed and used on Python 3
15+
(thanks Thomas Kluyver!).
16+
17+
**New features / modules**
18+
19+
- Python 3 support using 2to3 (PR #200, Thomas Kluyver)
20+
- Add `name` attribute to `Series` and added relevant logic and tests. Name
21+
now prints as part of `Series.__repr__`
22+
- Add `name` attribute to standard Index so that stacking / unstacking does
23+
not discard names and so that indexed DataFrame objects can be reliably
24+
round-tripped to flat files, pickle, HDF5, etc.
25+
- Add `isnull` and `notnull` as instance methods on Series (PR #209, GH #203)
26+
27+
**Improvements to existing features**
28+
29+
- Skip xlrd-related unit tests if not installed
30+
- `Index.append` and `MultiIndex.append` can accept a list of Index objects to
31+
concatenate together
32+
- Altered binary operations on differently-indexed SparseSeries objects to use
33+
the integer-based (dense) alignment logic which is faster with a larger
34+
number of blocks (GH #205)
35+
- Refactored `Series.__repr__` to be a bit more clean and consistent
36+
37+
**API Changes**
38+
39+
- `Series.describe` and `DataFrame.describe` now bring the 25% and 75%
40+
quartiles instead of the 10% and 90% deciles. The other outputs have not
41+
changed
42+
- `Series.toString` will print deprecation warning, has been de-camelCased to
43+
`to_string`
44+
45+
**Bug fixes**
46+
47+
- Fix broken interaction between `Index` and `Int64Index` when calling
48+
intersection. Implement `Int64Index.intersection`
49+
- `MultiIndex.sortlevel` discarded the level names (GH #202)
50+
- Fix bugs in groupby, join, and append due to improper concatenation of
51+
`MultiIndex` objects (GH #201)
52+
- Fix regression from 0.4.1, `isnull` and `notnull` ceased to work on other
53+
kinds of Python scalar objects like `datetime.datetime`
54+
- Raise more helpful exception when attempting to write empty DataFrame or
55+
LongPanel to `HDFStore` (GH #204)
56+
- Use stdlib csv module to properly escape strings with commas in
57+
`DataFrame.to_csv` (PR #206, Thomas Kluyver)
58+
- Fix Python ndarray access in Cython code for sparse blocked index integrity
59+
check
60+
- Fix bug writing Series to CSV in Python 3 (PR #209)
61+
- Miscellaneous Python 3 bugfixes
62+
63+
Thanks
64+
------
65+
66+
- Thomas Kluyver
67+
- rsamson
68+
69+
pandas 0.4.2
70+
============
71+
72+
**Release date:** 10/3/2011
73+
74+
This is a performance optimization release with several bug fixes. The new
75+
Int64Index and new merging / joining Cython code and related Python
76+
infrastructure are the main new additions
77+
78+
**New features / modules**
79+
80+
- Added fast `Int64Index` type with specialized join, union,
81+
intersection. Will result in significant performance enhancements for
82+
int64-based time series (e.g. using NumPy's datetime64 one day) and also
83+
faster operations on DataFrame objects storing record array-like data.
84+
- Refactored `Index` classes to have a `join` method and associated data
85+
alignment routines throughout the codebase to be able to leverage optimized
86+
joining / merging routines.
87+
- Added `Series.align` method for aligning two series with choice of join
88+
method
89+
- Wrote faster Cython data alignment / merging routines resulting in
90+
substantial speed increases
91+
- Added `is_monotonic` property to `Index` classes with associated Cython
92+
code to evaluate the monotonicity of the `Index` values
93+
- Add method `get_level_values` to `MultiIndex`
94+
- Implemented shallow copy of `BlockManager` object in `DataFrame` internals
95+
96+
**Improvements to existing features**
97+
98+
- Improved performance of `isnull` and `notnull`, a regression from v0.3.0
99+
(GH #187)
100+
- Wrote templating / code generation script to auto-generate Cython code for
101+
various functions which need to be available for the 4 major data types
102+
used in pandas (float64, bool, object, int64)
103+
- Refactored code related to `DataFrame.join` so that intermediate aligned
104+
copies of the data in each `DataFrame` argument do not need to be
105+
created. Substantial performance increases result (GH #176)
106+
- Substantially improved performance of generic `Index.intersection` and
107+
`Index.union`
108+
- Improved performance of `DateRange.union` with overlapping ranges and
109+
non-cacheable offsets (like Minute). Implemented analogous fast
110+
`DateRange.intersection` for overlapping ranges.
111+
- Implemented `BlockManager.take` resulting in significantly faster `take`
112+
performance on mixed-type `DataFrame` objects (GH #104)
113+
- Improved performance of `Series.sort_index`
114+
- Significant groupby performance enhancement: removed unnecessary integrity
115+
checks in DataFrame internals that were slowing down slicing operations to
116+
retrieve groups
117+
- Added informative Exception when passing dict to DataFrame groupby
118+
aggregation with axis != 0
119+
120+
**API Changes**
121+
122+
None
123+
124+
**Bug fixes**
125+
126+
- Fixed minor unhandled exception in Cython code implementing fast groupby
127+
aggregation operations
128+
- Fixed bug in unstacking code manifesting with more than 3 hierarchical
129+
levels
130+
- Throw exception when step specified in label-based slice (GH #185)
131+
- Fix isnull to correctly work with np.float32. Fix upstream bug described in
132+
GH #182
133+
- Finish implementation of as_index=False in groupby for DataFrame
134+
aggregation (GH #181)
135+
- Raise SkipTest for pre-epoch HDFStore failure. Real fix will be sorted out
136+
via datetime64 dtype
137+
138+
Thanks
139+
------
140+
141+
- Uri Laserson
142+
- Scott Sinclair
8143

9144
pandas 0.4.1
10145
============
11146

12-
**Release date:** Not yet released
147+
**Release date:** 9/25/2011
13148

14149
This is primarily a bug fix release but includes some new features and
15150
improvements
@@ -42,6 +177,10 @@ improvements
42177
- Optimized `_ensure_index` function resulting in performance savings in
43178
type-checking Index objects
44179

180+
**API Changes**
181+
182+
None
183+
45184
**Bug fixes**
46185

47186
- Fixed DataFrame constructor bug causing downstream problems (e.g. .copy()

TODO.rst

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
- SparseSeries name integration + tests
2+
- Refactor Series.repr
3+
- .name pickling / unpicking / HDFStore handling
4+
- Is there a way to write hierarchical columns to csv?
5+
- Possible to blow away existing name when creating MultiIndex?
6+
- prettytable output with index names
7+
- Add load/save functions to top level pandas namespace

bench/zoo_bench.R

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
library(zoo)
2+
library(xts)
3+
4+
indices = rep(NA, 100000)
5+
for (i in 1:100000)
6+
indices[i] <- paste(sample(letters, 10), collapse="")
7+
8+
timings <- numeric()
9+
10+
## x <- zoo(rnorm(100000), indices)
11+
## y <- zoo(rnorm(90000), indices[sample(1:100000, 90000)])
12+
13+
## indices <- as.POSIXct(1:100000)
14+
15+
indices <- as.POSIXct(Sys.Date()) + 1:1000000
16+
17+
x <- xts(rnorm(1000000), indices)
18+
y <- xts(rnorm(900000), indices[sample(1:1000000, 900000)])
19+
20+
for (i in 1:10) {
21+
gc()
22+
timings[i] = system.time(x + y)[3]
23+
}
24+
25+
mean(timings)

bench/zoo_bench.py

+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
from pandas import *
2+
from pandas.util.testing import rands
3+
4+
from la import larry
5+
6+
n = 100000
7+
indices = Index([rands(10) for _ in xrange(n)])
8+
9+
def sample(values, k):
10+
from random import shuffle
11+
sampler = np.arange(len(values))
12+
shuffle(sampler)
13+
return values.take(sampler[:k])
14+
15+
subsample_size = 90000
16+
17+
# x = Series(np.random.randn(100000), indices)
18+
# y = Series(np.random.randn(subsample_size),
19+
# index=sample(indices, subsample_size))
20+
21+
22+
# lx = larry(np.random.randn(100000), [list(indices)])
23+
# ly = larry(np.random.randn(subsample_size), [list(y.index)])
24+
25+
stamps = np.random.randint(1000000000, 1000000000000, 2000000)
26+
27+
idx1 = np.sort(sample(stamps, 1000000))
28+
idx2 = np.sort(sample(stamps, 1000000))
29+
30+
ts1 = Series(np.random.randn(1000000), idx1)
31+
ts2 = Series(np.random.randn(1000000), idx2)
32+
33+
# Benchmark 1: Two 1-million length time series (int64-based index) with
34+
# randomly chosen timestamps
35+
36+
# Benchmark 2: Join two 5-variate time series DataFrames (outer and inner join)
37+
38+
df1 = DataFrame(np.random.randn(1000000, 5), idx1, columns=range(5))
39+
df2 = DataFrame(np.random.randn(1000000, 5), idx2, columns=range(5, 10))
40+

pandas/core/api.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
import pandas.core.datetools as datetools
77

88
from pandas.core.common import isnull, notnull, set_printoptions
9-
from pandas.core.index import Index, Factor, MultiIndex
9+
from pandas.core.index import Index, Int64Index, Factor, MultiIndex
1010
from pandas.core.daterange import DateRange
1111
from pandas.core.series import Series, TimeSeries
1212
from pandas.core.frame import DataFrame

0 commit comments

Comments
 (0)