Skip to content

Commit 06e140a

Browse files
committed
Merge commit 'v0.8.1-203-g67121af' into debian
* commit 'v0.8.1-203-g67121af': (193 commits) BUG: DataFrame column formatting issue in length-truncated column close pandas-dev#1906 BUG: override min/max in DatetimeIndex to function as expected close pandas-dev#1895 BUG: DataFrame mixed-type arithmetic column-wise, fix DataFrame.diff upcasting->object bug close pandas-dev#1896 BUG: treat nobs=1 >= min_periods case in rolling_std/variance as 0 trivially. close pandas-dev#1884 TST: skip to_file test if URLError occurs on some systems VB: resolve test name conflict and update make script DOC: minor change to build script to help auto build process DOC: fixed extlinks in sphinx conf TST: oops import in wrong place TST: skip test_console_encode if sys.stdin.encoding is None TST: unit test for pandas-dev#1902 and default to csv.QUOTE_MINIMAL Make it possible to set quoting for to_csv ENH: clean up pandas-dev#1691 changes, rls note ENH: add more possible bool values to read_csv pandas-dev#1295 BUG: fix rolling_max/min for small inputs and large windows. Add a check that the min_period <= window size. Fixes pandas-dev#1897. Mention Ubuntu for NeuroDebian repository BUG: don't clobber color keyword in Series.plot, close pandas-dev#1890 DOC: add intersphinx mapping for python library, close pandas-dev#1556 BUG: fix mixed-integer .ix indexing bugs. close#1799 BUG: unicode sheet name in to_excel pandas-dev#1828 ...
2 parents 23fa6f8 + 67121af commit 06e140a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+4250
-886
lines changed

RELEASE.rst

+140
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,146 @@ Where to get it
2222
* Binary installers on PyPI: http://pypi.python.org/pypi/pandas
2323
* Documentation: http://pandas.pydata.org
2424

25+
pandas 0.9.0
26+
============
27+
28+
**Release date:** NOT YET RELEASED
29+
30+
**New features**
31+
32+
- Add ``str.encode`` and ``str.decode`` to Series (#1706)
33+
- Add `to_latex` method to DataFrame (#1735)
34+
- Add convenient expanding window equivalents of all rolling_* ops (#1785)
35+
- Add Options class to pandas.io.data for fetching options data from Yahoo!
36+
Finance (#1748, #1739)
37+
- Recognize and convert more boolean values in file parsing (Yes, No, TRUE,
38+
FALSE, variants thereof) (#1691)
39+
40+
**Improvements to existing features**
41+
42+
- Add ``flags`` option for ``re.compile`` in some Series.str methods (#1659)
43+
- Parsing of UTC date strings in read_* functions (#1693)
44+
- Handle generator input to Series (#1679)
45+
- Add `na_action='ignore'` to Series.map to quietly propagate NAs (#1661)
46+
- Add args/kwds options to Series.apply (#1829)
47+
- Add inplace option to Series/DataFrame.reset_index (#1797)
48+
- Add quoting option for DataFrame.to_csv (#1902)
49+
50+
**API Changes**
51+
52+
- Deprecated ``day_of_year`` API removed from PeriodIndex, use ``dayofyear``
53+
(#1723)
54+
- Don't modify NumPy suppress printoption at import time
55+
- The internal HDF5 data arrangement for DataFrames has been
56+
transposed. Legacy files will still be readable by HDFStore (#1834, #1824)
57+
- Legacy cruft removed: pandas.stats.misc.quantileTS
58+
- Use ISO8601 format for Period repr: monthly, daily, and on down (#1776)
59+
60+
**Bug fixes**
61+
62+
- Perform arithmetic column-by-column in mixed-type DataFrame to avoid type
63+
upcasting issues. Caused downstream DataFrame.diff bug (#1896)
64+
- Fix matplotlib auto-color assignment when no custom spectrum passed. Also
65+
respect passed color keyword argument (#1711)
66+
- Fix resampling logical error with closed='left' (#1726)
67+
- Fix critical DatetimeIndex.union bugs (#1730, #1719, #1745, #1702)
68+
- Fix critical DatetimeIndex.intersection bug with unanchored offsets (#1708)
69+
- Fix MM-YYYY time series indexing case (#1672)
70+
- Fix case where Categorical group key was not being passed into index in
71+
GroupBy result (#1701)
72+
- Handle Ellipsis in Series.__getitem__/__setitem__ (#1721)
73+
- Fix some bugs with handling datetime64 scalars of other units in NumPy 1.6
74+
and 1.7 (#1717)
75+
- Fix performance issue in MultiIndex.format (#1746)
76+
- Fixed GroupBy bugs interacting with DatetimeIndex asof / map methods (#1677)
77+
- Handle factors with NAs in pandas.rpy (#1615)
78+
- Fix statsmodels import in pandas.stats.var (#1734)
79+
- Fix DataFrame repr/info summary with non-unique columns (#1700)
80+
- Fix Series.iget_value for non-unique indexes (#1694)
81+
- Don't lose tzinfo when passing DatetimeIndex as DataFrame column (#1682)
82+
- Fix tz conversion with time zones that haven't had any DST transitions since
83+
first date in the array (#1673)
84+
- Fix field access with UTC->local conversion on unsorted arrays (#1756)
85+
- Fix isnull handling of array-like (list) inputs (#1755)
86+
- Fix regression in handling of Series in Series constructor (#1671)
87+
- Fix comparison of Int64Index with DatetimeIndex (#1681)
88+
- Fix min_periods handling in new rolling_max/min at array start (#1695)
89+
- Fix errors with how='median' and generic NumPy resampling in some cases
90+
caused by SeriesBinGrouper (#1648, #1688)
91+
- When grouping by level, exclude unobserved levels (#1697)
92+
- Don't lose tzinfo in DatetimeIndex when shifting by different offset (#1683)
93+
- Hack to support storing data with a zero-length axis in HDFStore (#1707)
94+
- Fix DatetimeIndex tz-aware range generation issue (#1674)
95+
- Fix method='time' interpolation with intraday data (#1698)
96+
- Don't plot all-NA DataFrame columns as zeros (#1696)
97+
- Fix bug in scatter_plot with by option (#1716)
98+
- Fix performance problem in infer_freq with lots of non-unique stamps (#1686)
99+
- Fix handling of PeriodIndex as argument to create MultiIndex (#1705)
100+
- Fix re: unicode MultiIndex level names in Series/DataFrame repr (#1736)
101+
- Handle PeriodIndex in to_datetime instance method (#1703)
102+
- Support StaticTzInfo in DatetimeIndex infrastructure (#1692)
103+
- Allow MultiIndex setops with length-0 other type indexes (#1727)
104+
- Fix handling of DatetimeIndex in DataFrame.to_records (#1720)
105+
- Fix handling of general objects in isnull on which bool(...) fails (#1749)
106+
- Fix .ix indexing with MultiIndex ambiguity (#1678)
107+
- Fix .ix setting logic error with non-unique MultiIndex (#1750)
108+
- Basic indexing now works on MultiIndex with > 1000000 elements, regression
109+
from earlier version of pandas (#1757)
110+
- Handle non-float64 dtypes in fast DataFrame.corr/cov code paths (#1761)
111+
- Fix DatetimeIndex.isin to function properly (#1763)
112+
- Fix conversion of array of tz-aware datetime.datetime to DatetimeIndex with
113+
right time zone (#1777)
114+
- Fix DST issues with generating ancxhored date ranges (#1778)
115+
- Fix issue calling sort on result of Series.unique (#1807)
116+
- Fix numerical issue leading to square root of negative number in
117+
rolling_std (#1840)
118+
- Let Series.str.split accept no arguments (like str.split) (#1859)
119+
- Allow user to have dateutil 2.1 installed on a Python 2 system (#1851)
120+
- Catch ImportError less aggressively in pandas/__init__.py (#1845)
121+
- Fix pip source installation bug when installing from GitHub (#1805)
122+
- Fix error when window size > array size in rolling_apply (#1850)
123+
- Fix pip source installation issues via SSH from GitHub
124+
- Fix OLS.summary when column is a tuple (#1837)
125+
- Fix bug in __doc__ patching when -OO passed to interpreter (#1792, #1741)
126+
- Fix unicode console encoding issue in IPython notebook (#1782, #1768)
127+
- Fix unicode formatting issue with Series.name (#1782)
128+
- Fix bug in DataFrame.duplicated with datetime64 columns (#1833)
129+
- Fix bug in Panel internals resulting in error when doing fillna after
130+
truncate not changing size of panel (#1823)
131+
- Prevent segfault due to MultiIndex not being supported in HDFStore table
132+
format (#1848)
133+
- Fix UnboundLocalError in Panel.__setitem__ and add better error (#1826)
134+
- Fix to_csv issues with list of string entries. Isnull works on list of
135+
strings now too (#1791)
136+
- Fix Timestamp comparisons with datetime values outside the nanosecond range
137+
(1677-2262)
138+
- Revert to prior behavior of normalize_date with datetime.date objects
139+
(return datetime)
140+
- Fix broken interaction between np.nansum and Series.any/all
141+
- Fix bug with multiple column date parsers (#1866)
142+
- DatetimeIndex.union(Int64Index) was broken
143+
- Make plot x vs y interface consistent with integer indexing (#1842)
144+
- set_index inplace modified data even if unique check fails (#1831)
145+
- Only use Q-OCT/NOV/DEC in quarterly frequency inference (#1789)
146+
- Upcast to dtype=object when unstacking boolean DataFrame (#1820)
147+
- Fix float64/float32 merging bug (#1849)
148+
- Fixes to Period.start_time for non-daily frequencies (#1857)
149+
- Fix failure when converter used on index_col in read_csv (#1835)
150+
- Implement PeriodIndex.append so that pandas.concat works correctly (#1815)
151+
- Avoid Cython out-of-bounds access causing segfault sometimes in pad_2d,
152+
backfill_2d
153+
- Fix resampling error with intraday times and anchored target time (like
154+
AS-DEC) (#1772)
155+
- Fix .ix indexing bugs with mixed-integer indexes (#1799)
156+
- Respect passed color keyword argument in Series.plot (#1890)
157+
- Fix rolling_min/max when the window is larger than the size of the input
158+
array. Check other malformed inputs (#1899, #1897)
159+
- Rolling variance / standard deviation with only a single observation in
160+
window (#1884)
161+
- Fix unicode sheet name failure in to_excel (#1828)
162+
- Override DatetimeIndex.min/max to return Timestamp objects (#1895)
163+
- Fix column name formatting issue in length-truncated column (#1906)
164+
25165
pandas 0.8.1
26166
============
27167

doc/make.py

+6-4
Original file line numberDiff line numberDiff line change
@@ -150,16 +150,18 @@ def sendmail(step=None, err_msg=None):
150150
finally:
151151
server.close()
152152

153-
def _get_dir():
153+
def _get_dir(subdir=None):
154154
import getpass
155155
USERNAME = getpass.getuser()
156156
if sys.platform == 'darwin':
157157
HOME = '/Users/%s' % USERNAME
158158
else:
159159
HOME = '/home/%s' % USERNAME
160160

161-
tmp_dir = '%s/tmp' % HOME
162-
return tmp_dir
161+
if subdir is None:
162+
subdir = '/code/scripts/config'
163+
conf_dir = '%s/%s' % (HOME, subdir)
164+
return conf_dir
163165

164166
def _get_credentials():
165167
tmp_dir = _get_dir()
@@ -177,7 +179,7 @@ def _get_credentials():
177179

178180
def _get_config():
179181
tmp_dir = _get_dir()
180-
with open('%s/config' % tmp_dir, 'r') as fh:
182+
with open('%s/addresses' % tmp_dir, 'r') as fh:
181183
from_name, to_name = fh.read().split(',')
182184
return from_name, to_name
183185

doc/source/api.rst

+21
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,27 @@ Standard moving window functions
8181
rolling_apply
8282
rolling_quantile
8383

84+
Standard expanding window functions
85+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
86+
87+
.. currentmodule:: pandas.stats.moments
88+
89+
.. autosummary::
90+
:toctree: generated/
91+
92+
expanding_count
93+
expanding_sum
94+
expanding_mean
95+
expanding_median
96+
expanding_var
97+
expanding_std
98+
expanding_corr
99+
expanding_cov
100+
expanding_skew
101+
expanding_kurt
102+
expanding_apply
103+
expanding_quantile
104+
84105
Exponentially-weighted moving window functions
85106
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
86107

doc/source/basics.rst

+8
Original file line numberDiff line numberDiff line change
@@ -876,6 +876,14 @@ Methods like ``replace`` and ``findall`` take regular expressions, too:
876876
s3
877877
s3.str.replace('^.a|dog', 'XX-XX ', case=False)
878878
879+
Methods like ``contains``, ``startswith``, and ``endswith`` takes an extra
880+
``na`` arguement so missing values can be considered True or False:
881+
882+
.. ipython:: python
883+
884+
s4 = Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
885+
s4.str.contains('A', na=False)
886+
879887
.. csv-table::
880888
:header: "Method", "Description"
881889
:widths: 20, 80

doc/source/computation.rst

+74-1
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,7 @@ accept the following arguments:
192192
- ``window``: size of moving window
193193
- ``min_periods``: threshold of non-null data points to require (otherwise
194194
result is NA)
195-
- ``freq``: optionally specify a :ref: `frequency string <timeseries.alias>`
195+
- ``freq``: optionally specify a :ref:`frequency string <timeseries.alias>`
196196
or :ref:`DateOffset <timeseries.offsets>` to pre-conform the data to.
197197
Note that prior to pandas v0.8.0, a keyword argument ``time_rule`` was used
198198
instead of ``freq`` that referred to the legacy time rule constants
@@ -288,6 +288,79 @@ columns using ``ix`` indexing:
288288
@savefig rolling_corr_pairwise_ex.png width=4.5in
289289
correls.ix[:, 'A', 'C'].plot()
290290
291+
Expanding window moment functions
292+
---------------------------------
293+
A common alternative to rolling statistics is to use an *expanding* window,
294+
which yields the value of the statistic with all the data available up to that
295+
point in time. As these calculations are a special case of rolling statistics,
296+
they are implemented in pandas such that the following two calls are equivalent:
297+
298+
.. ipython:: python
299+
300+
rolling_mean(df, window=len(df), min_periods=1)[:5]
301+
302+
expanding_mean(df)[:5]
303+
304+
Like the ``rolling_`` functions, the following methods are included in the
305+
``pandas`` namespace or can be located in ``pandas.stats.moments``.
306+
307+
.. csv-table::
308+
:header: "Function", "Description"
309+
:widths: 20, 80
310+
311+
``expanding_count``, Number of non-null observations
312+
``expanding_sum``, Sum of values
313+
``expanding_mean``, Mean of values
314+
``expanding_median``, Arithmetic median of values
315+
``expanding_min``, Minimum
316+
``expanding_max``, Maximum
317+
``expanding_std``, Unbiased standard deviation
318+
``expanding_var``, Unbiased variance
319+
``expanding_skew``, Unbiased skewness (3rd moment)
320+
``expanding_kurt``, Unbiased kurtosis (4th moment)
321+
``expanding_quantile``, Sample quantile (value at %)
322+
``expanding_apply``, Generic apply
323+
``expanding_cov``, Unbiased covariance (binary)
324+
``expanding_corr``, Correlation (binary)
325+
``expanding_corr_pairwise``, Pairwise correlation of DataFrame columns
326+
327+
Aside from not having a ``window`` parameter, these functions have the same
328+
interfaces as their ``rolling_`` counterpart. Like above, the parameters they
329+
all accept are:
330+
331+
- ``min_periods``: threshold of non-null data points to require. Defaults to
332+
minimum needed to compute statistic. No ``NaNs`` will be output once
333+
``min_periods`` non-null data points have been seen.
334+
- ``freq``: optionally specify a :ref:`frequency string <timeseries.alias>`
335+
or :ref:`DateOffset <timeseries.offsets>` to pre-conform the data to.
336+
Note that prior to pandas v0.8.0, a keyword argument ``time_rule`` was used
337+
instead of ``freq`` that referred to the legacy time rule constants
338+
339+
.. note::
340+
341+
The output of the ``rolling_`` and ``expanding_`` functions do not return a
342+
``NaN`` if there are at least ``min_periods`` non-null values in the current
343+
window. This differs from ``cumsum``, ``cumprod``, ``cummax``, and
344+
``cummin``, which return ``NaN`` in the output wherever a ``NaN`` is
345+
encountered in the input.
346+
347+
An expanding window statistic will be more stable (and less responsive) than
348+
its rolling window counterpart as the increasing window size decreases the
349+
relative impact of an individual data point. As an example, here is the
350+
``expanding_mean`` output for the previous time series dataset:
351+
352+
.. ipython:: python
353+
:suppress:
354+
355+
plt.close('all')
356+
357+
.. ipython:: python
358+
359+
ts.plot(style='k--')
360+
361+
@savefig expanding_mean_frame.png width=4.5in
362+
expanding_mean(ts).plot(style='k')
363+
291364
Exponentially weighted moment functions
292365
---------------------------------------
293366

doc/source/conf.py

+8-7
Original file line numberDiff line numberDiff line change
@@ -233,16 +233,17 @@
233233

234234

235235
# Example configuration for intersphinx: refer to the Python standard library.
236-
# intersphinx_mapping = {'http://docs.scipy.org/': None}
236+
intersphinx_mapping = {
237+
'statsmodels' : ('http://statsmodels.sourceforge.net/devel/', None),
238+
'python': ('http://docs.python.org/', None)
239+
}
237240
import glob
238241
autosummary_generate = glob.glob("*.rst")
239242

240243
# extlinks alias
241244
extlinks = {'issue': ('https://github.com/pydata/pandas/issues/%s',
242-
'issue ')}
243-
244-
extlinks = {'pull request': ('https://github.com/pydata/pandas/pulls/%s',
245-
'pull request ')}
246-
247-
extlinks = {'wiki': ('https://github.com/pydata/pandas/pulls/%s',
245+
'issue '),
246+
'pull request': ('https://github.com/pydata/pandas/pulls/%s',
247+
'pull request '),
248+
'wiki': ('https://github.com/pydata/pandas/pulls/%s',
248249
'wiki ')}

doc/source/dsintro.rst

+8-3
Original file line numberDiff line numberDiff line change
@@ -140,15 +140,20 @@ label:
140140
'e' in s
141141
'f' in s
142142
143-
If a label is not contained, an exception
143+
If a label is not contained, an exception is raised:
144144

145145
.. code-block:: python
146146
147147
>>> s['f']
148148
KeyError: 'f'
149149
150-
>>> s.get('f')
151-
nan
150+
Using the ``get`` method, a missing label will return None or specified default:
151+
152+
.. ipython:: python
153+
154+
s.get('f')
155+
156+
s.get('f', np.nan)
152157
153158
Vectorized operations and label alignment with Series
154159
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

doc/source/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ Some other notes
8989
on one feature for your application you may be able to create a faster
9090
specialized tool.
9191

92-
- pandas will soon become a dependency of `statsmodels
92+
- pandas is a dependency of `statsmodels
9393
<http://statsmodels.sourceforge.net>`__, making it an important part of the
9494
statistical computing ecosystem in Python.
9595

doc/source/install.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ ___________
4848
Windows, all, stable, :ref:`all-platforms`, ``pip install pandas``
4949
Mac, all, stable, :ref:`all-platforms`, ``pip install pandas``
5050
Linux, Debian, stable, `official Debian repository <http://packages.debian.org/search?keywords=pandas&searchon=names&suite=all&section=all>`_ , ``sudo apt-get install python-pandas``
51-
Linux, Debian, unstable (latest packages), `NeuroDebian <http://neuro.debian.net/index.html#how-to-use-this-repository>`_ , ``sudo apt-get install python-pandas``
51+
Linux, Debian & Ubuntu, unstable (latest packages), `NeuroDebian <http://neuro.debian.net/index.html#how-to-use-this-repository>`_ , ``sudo apt-get install python-pandas``
5252
Linux, Ubuntu, stable, `official Ubuntu repository <http://packages.ubuntu.com/search?keywords=pandas&searchon=names&suite=all&section=all>`_ , ``sudo apt-get install python-pandas``
5353
Linux, Ubuntu, unstable (daily builds), `PythonXY PPA <https://code.launchpad.net/~pythonxy/+archive/pythonxy-devel>`_; activate by: ``sudo add-apt-repository ppa:pythonxy/pythonxy-devel && sudo apt-get update``, ``sudo apt-get install python-pandas``
5454
Linux, OpenSuse & Fedora, stable, `OpenSuse Repository <http://software.opensuse.org/package/python-pandas?search_term=pandas>`_ , ``zypper in python-pandas``
@@ -74,7 +74,7 @@ Optional dependencies
7474
* `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions
7575
* `PyTables <http://www.pytables.org>`__: necessary for HDF5-based storage
7676
* `matplotlib <http://matplotlib.sourceforge.net/>`__: for plotting
77-
* `scikits.statsmodels <http://statsmodels.sourceforge.net/>`__
77+
* `statsmodels <http://statsmodels.sourceforge.net/>`__: 0.4.0 or higher
7878
* Needed for parts of :mod:`pandas.stats`
7979
* `pytz <http://pytz.sourceforge.net/>`__
8080
* Needed for time zone support with ``date_range``

0 commit comments

Comments
 (0)