Skip to content

Commit 20c6c91

Browse files
committed
Merge tag 'v0.9.0rc2' into debian
Version 0.9.0 Release Candidate 2 * tag 'v0.9.0rc2': DOC: release notes, bump to RC2 DOC: missed a few for release notes 0.9 DOC: add a few more notes on bug fixes in release.rst BUG: repr fix for all-NA index level. close pandas-dev#1971 BLD: don't link against math library on windows TST: kludge around test failure on win64 python 3.2.2 BLD: link against math library explicitly. close pandas-dev#1955 DOC: Add line about resetting to default index DOC: Adding details on normalization for variance functions. DOC: Specify default merge behavior for on = None BUG: PeriodIndex slicing by datetime fails when either end out-of-bounds pandas-dev#1977 BUG: read_table unicode bug pandas-dev#1975 BUG: BlockManager.iget fails with non-unique MultiIndex pandas-dev#1970 Better error message for DataFrame.apply if axis is not 0 or 1 TST: fix up tzlocal test cases DOC: add level option in Series.reset_index to release notes ENH: level parameter for Series.reset_index
2 parents 9237075 + 6d9bd5a commit 20c6c91

17 files changed

+187
-33
lines changed

RELEASE.rst

+8-1
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ pandas 0.9.0
4545
- Add `na_action='ignore'` to Series.map to quietly propagate NAs (#1661)
4646
- Add args/kwds options to Series.apply (#1829)
4747
- Add inplace option to Series/DataFrame.reset_index (#1797)
48+
- Add ``level`` parameter to ``Series.reset_index``
4849
- Add quoting option for DataFrame.to_csv (#1902)
4950
- Indicate long column value truncation in DataFrame output with ... (#1854)
5051
- DataFrame.dot will not do data alignment, and also work with Series (#1915)
@@ -58,7 +59,6 @@ pandas 0.9.0
5859
repeat levels) (#1929)
5960
- TimeSeries.between_time can now select times across midnight (#1871)
6061
- Enable `skip_footer` parameter in `ExcelFile.parse` (#1843)
61-
- Enable `skipfooter` parameter in text parsers as an alias for `skip_footer`
6262

6363
**API Changes**
6464

@@ -80,6 +80,7 @@ pandas 0.9.0
8080
- Resolved inconsistencies in specifying custom NA values in text parser.
8181
`na_values` of type dict no longer override default NAs unless
8282
`keep_default_na` is set to false explicitly (#1657)
83+
- Enable `skipfooter` parameter in text parsers as an alias for `skip_footer`
8384

8485
**Bug fixes**
8586

@@ -233,6 +234,12 @@ pandas 0.9.0
233234
- Fix bug in DataFrame.duplicated to enable iterables other than list-types
234235
as input argument (#1773)
235236
- Fix resample bug when passed list of lambdas as `how` argument (#1808)
237+
- Repr fix for MultiIndex level with all NAs (#1971)
238+
- Fix PeriodIndex slicing bug when slice start/end are out-of-bounds (#1977)
239+
- Fix read_table bug when parsing unicode (#1975)
240+
- Fix BlockManager.iget bug when dealing with non-unique MultiIndex as columns
241+
(#1970)
242+
- Fix reset_index bug if both drop and level are specified (#1957)
236243

237244

238245
pandas 0.8.1

doc/source/gotchas.rst

+10
Original file line numberDiff line numberDiff line change
@@ -302,3 +302,13 @@ of the new set of columns rather than the original ones:
302302
:suppress:
303303
304304
os.remove('tmp.csv')
305+
306+
307+
Differences with NumPy
308+
----------------------
309+
For Series and DataFrame objects, ``var`` normalizes by ``N-1`` to produce
310+
unbiased estimates of the sample variance, while NumPy's ``var`` normalizes
311+
by N, which measures the variance of the sample. Note that ``cov``
312+
normalizes by ``N-1`` in both pandas and NumPy.
313+
314+

doc/source/v0.9.0.txt

+19-1
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,24 @@ New features
2020
Finance (GH1748_, GH1739_)
2121
- More flexible parsing of boolean values (Yes, No, TRUE, FALSE, etc)
2222
(GH1691_, GH1295_)
23+
- Add ``level`` parameter to ``Series.reset_index``
24+
- ``TimeSeries.between_time`` can now select times across midnight (GH1871_)
25+
- Series constructor can now handle generator as input (GH1679_)
26+
- ``DataFrame.dropna`` can now take multiple axes (tuple/list) as input
27+
(GH924_)
28+
- Enable ``skip_footer`` parameter in ``ExcelFile.parse`` (GH1843_)
2329

2430
API changes
2531
~~~~~~~~~~~
2632

33+
- Creating a Series from another Series, passing an index, will cause
34+
reindexing to happen inside rather than treating the Series like an
35+
ndarray. Technically improper usages like Series(df[col1], index=df[col2])
36+
that worked before "by accident" (this was never intended) will lead to all
37+
NA Series in some cases.
2738
- Deprecated ``day_of_year`` API removed from PeriodIndex, use ``dayofyear``
2839
(GH1723_)
29-
- Don't modify NumPy suppress printoption at import time
40+
- Don't modify NumPy suppress printoption to True at import time
3041
- The internal HDF5 data arrangement for DataFrames has been transposed.
3142
Legacy files will still be readable by HDFStore (GH1834_, GH1824_)
3243
- Legacy cruft removed: pandas.stats.misc.quantileTS
@@ -42,6 +53,8 @@ API changes
4253
- Resolved inconsistencies in specifying custom NA values in text parser.
4354
`na_values` of type dict no longer override default NAs unless
4455
`keep_default_na` is set to false explicitly (GH1657_)
56+
- DataFrame.dot will not do data alignment, and also work with Series
57+
(GH1915_)
4558

4659

4760
See the `full release notes
@@ -63,3 +76,8 @@ on GitHub for a complete list.
6376
.. _GH1630: https://github.com/pydata/pandas/issues/1630
6477
.. _GH1809: https://github.com/pydata/pandas/issues/1809
6578
.. _GH1657: https://github.com/pydata/pandas/issues/1657
79+
.. _GH1871: https://github.com/pydata/pandas/issues/1871
80+
.. _GH1679: https://github.com/pydata/pandas/issues/1679
81+
.. _GH1915: https://github.com/pydata/pandas/issues/1915
82+
.. _GH924: https://github.com/pydata/pandas/issues/924
83+
.. _GH1843: https://github.com/pydata/pandas/issues/1843

pandas/core/frame.py

+20-4
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,9 @@
119119
* outer: use union of keys from both frames (SQL: full outer join)
120120
* inner: use intersection of keys from both frames (SQL: inner join)
121121
on : label or list
122-
Field names to join on. Must be found in both DataFrames.
122+
Field names to join on. Must be found in both DataFrames. If on is
123+
None and not merging on indexes, then it merges on the intersection of
124+
the columns by default.
123125
left_on : label or list, or array-like
124126
Field names to join on in left DataFrame. Can be a vector or list of
125127
vectors of the length of the DataFrame to use a particular vector as
@@ -2470,7 +2472,8 @@ def reset_index(self, level=None, drop=False, inplace=False):
24702472
Only remove the given levels from the index. Removes all levels by
24712473
default
24722474
drop : boolean, default False
2473-
Do not try to insert index into dataframe columns
2475+
Do not try to insert index into dataframe columns. This resets
2476+
the index to the default integer index.
24742477
inplace : boolean, default False
24752478
Modify the DataFrame in place (do not create a new object)
24762479
@@ -3760,6 +3763,8 @@ def _apply_standard(self, func, axis, ignore_failures=False):
37603763
series_gen = (Series.from_array(arr, index=res_columns, name=name)
37613764
for i, (arr, name) in
37623765
enumerate(izip(values, res_index)))
3766+
else:
3767+
raise ValueError('Axis must be 0 or 1, got %s' % str(axis))
37633768

37643769
keys = []
37653770
results = {}
@@ -3815,6 +3820,8 @@ def _apply_broadcast(self, func, axis):
38153820
target = self
38163821
elif axis == 1:
38173822
target = self.T
3823+
else:
3824+
raise ValueError('Axis must be 0 or 1, got %s' % str(axis))
38183825

38193826
result_values = np.empty_like(target.values)
38203827
columns = target.columns
@@ -4046,6 +4053,9 @@ def cov(self):
40464053
Returns
40474054
-------
40484055
y : DataFrame
4056+
4057+
y contains the covariance matrix of the DataFrame's time series.
4058+
The covariance is normalized by N-1 (unbiased estimator).
40494059
"""
40504060
numeric_df = self._get_numeric_data()
40514061
cols = numeric_df.columns
@@ -4362,7 +4372,10 @@ def mad(self, axis=0, skipna=True, level=None):
43624372

43634373
@Substitution(name='variance', shortname='var',
43644374
na_action=_doc_exclude_na, extras='')
4365-
@Appender(_stat_doc)
4375+
@Appender(_stat_doc +
4376+
"""
4377+
Normalized by N-1 (unbiased estimator).
4378+
""")
43664379
def var(self, axis=0, skipna=True, level=None, ddof=1):
43674380
if level is not None:
43684381
return self._agg_by_level('var', axis=axis, level=level,
@@ -4372,7 +4385,10 @@ def var(self, axis=0, skipna=True, level=None, ddof=1):
43724385

43734386
@Substitution(name='standard deviation', shortname='std',
43744387
na_action=_doc_exclude_na, extras='')
4375-
@Appender(_stat_doc)
4388+
@Appender(_stat_doc +
4389+
"""
4390+
Normalized by N-1 (unbiased estimator).
4391+
""")
43764392
def std(self, axis=0, skipna=True, level=None, ddof=1):
43774393
if level is not None:
43784394
return self._agg_by_level('std', axis=axis, level=level,

pandas/core/index.py

+11-3
Original file line numberDiff line numberDiff line change
@@ -1471,7 +1471,8 @@ def get_level_values(self, level):
14711471
labels = self.labels[num]
14721472
return unique_vals.take(labels)
14731473

1474-
def format(self, space=2, sparsify=None, adjoin=True, names=False):
1474+
def format(self, space=2, sparsify=None, adjoin=True, names=False,
1475+
na_rep='NaN'):
14751476
from pandas.core.common import _stringify
14761477
from pandas.core.format import print_config
14771478
def _strify(x):
@@ -1480,8 +1481,15 @@ def _strify(x):
14801481
if len(self) == 0:
14811482
return []
14821483

1483-
stringified_levels = [lev.take(lab).format() for lev, lab in
1484-
zip(self.levels, self.labels)]
1484+
1485+
stringified_levels = []
1486+
for lev, lab in zip(self.levels, self.labels):
1487+
if len(lev) > 0:
1488+
formatted = lev.take(lab).format()
1489+
else:
1490+
# weird all NA case
1491+
formatted = [str(x) for x in com.take_1d(lev.values, lab)]
1492+
stringified_levels.append(formatted)
14851493

14861494
result_levels = []
14871495
for lev, name in zip(stringified_levels, self.names):

pandas/core/internals.py

+8-2
Original file line numberDiff line numberDiff line change
@@ -833,11 +833,17 @@ def iget(self, i):
833833
return self.get(item)
834834
else:
835835
# ugh
836-
inds, = (self.items == item).nonzero()
836+
try:
837+
inds, = (self.items == item).nonzero()
838+
except AttributeError: #MultiIndex
839+
inds, = self.items.map(lambda x: x == item).nonzero()
837840

838841
_, block = self._find_block(item)
839842

840-
binds, = (block.items == item).nonzero()
843+
try:
844+
binds, = (block.items == item).nonzero()
845+
except AttributeError: #MultiIndex
846+
binds, = block.items.map(lambda x: x == item).nonzero()
841847

842848
for j, (k, b) in enumerate(zip(inds, binds)):
843849
if i == k:

pandas/core/series.py

+24-5
Original file line numberDiff line numberDiff line change
@@ -794,6 +794,9 @@ def reset_index(self, level=None, drop=False, name=None, inplace=False):
794794
795795
Parameters
796796
----------
797+
level : int, str, tuple, or list, default None
798+
Only remove the given levels from the index. Removes all levels by
799+
default
797800
drop : boolean, default False
798801
Do not try to insert index into dataframe columns
799802
name : object, default None
@@ -806,13 +809,21 @@ def reset_index(self, level=None, drop=False, name=None, inplace=False):
806809
resetted : DataFrame, or Series if drop == True
807810
"""
808811
if drop:
812+
new_index = np.arange(len(self))
813+
if level is not None and isinstance(self.index, MultiIndex):
814+
if not isinstance(level, (tuple, list)):
815+
level = [level]
816+
level = [self.index._get_level_number(lev) for lev in level]
817+
if len(level) < len(self.index.levels):
818+
new_index = self.index.droplevel(level)
819+
809820
if inplace:
810-
self.index = np.arange(len(self))
821+
self.index = new_index
811822
# set name if it was passed, otherwise, keep the previous name
812823
self.name = name or self.name
813824
return self
814825
else:
815-
return Series(self.values.copy(), index=np.arange(len(self)),
826+
return Series(self.values.copy(), index=new_index,
816827
name=self.name)
817828
else:
818829
from pandas.core.frame import DataFrame
@@ -821,7 +832,7 @@ def reset_index(self, level=None, drop=False, name=None, inplace=False):
821832
else:
822833
df = DataFrame({name : self})
823834

824-
return df.reset_index(drop=drop)
835+
return df.reset_index(level=level, drop=drop)
825836

826837
def __repr__(self):
827838
"""Clean string representation of a Series"""
@@ -1140,7 +1151,10 @@ def max(self, axis=None, out=None, skipna=True, level=None):
11401151

11411152
@Substitution(name='standard deviation', shortname='stdev',
11421153
na_action=_doc_exclude_na, extras='')
1143-
@Appender(_stat_doc)
1154+
@Appender(_stat_doc +
1155+
"""
1156+
Normalized by N-1 (unbiased estimator).
1157+
""")
11441158
def std(self, axis=None, dtype=None, out=None, ddof=1, skipna=True,
11451159
level=None):
11461160
if level is not None:
@@ -1150,7 +1164,10 @@ def std(self, axis=None, dtype=None, out=None, ddof=1, skipna=True,
11501164

11511165
@Substitution(name='variance', shortname='var',
11521166
na_action=_doc_exclude_na, extras='')
1153-
@Appender(_stat_doc)
1167+
@Appender(_stat_doc +
1168+
"""
1169+
Normalized by N-1 (unbiased estimator).
1170+
""")
11541171
def var(self, axis=None, dtype=None, out=None, ddof=1, skipna=True,
11551172
level=None):
11561173
if level is not None:
@@ -1463,6 +1480,8 @@ def cov(self, other):
14631480
Returns
14641481
-------
14651482
covariance : float
1483+
1484+
Normalized by N-1 (unbiased estimator).
14661485
"""
14671486
this, other = self.align(other, join='inner')
14681487
if len(this) == 0:

pandas/io/parsers.py

-3
Original file line numberDiff line numberDiff line change
@@ -295,9 +295,6 @@ def read_table(filepath_or_buffer,
295295
if kdict.get('delimiter', None) is None:
296296
kdict['delimiter'] = sep
297297

298-
# Override as default encoding.
299-
kdict['encoding'] = None
300-
301298
return _read(TextParser, filepath_or_buffer, kdict)
302299

303300
@Appender(_read_fwf_doc)

pandas/io/tests/test_parsers.py

+18-2
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
from pandas.util import py3compat
2222
from pandas.lib import Timestamp
2323
from pandas.tseries.index import date_range
24+
import pandas.tseries.tools as tools
2425

2526
from numpy.testing.decorators import slow
2627
from pandas.io.date_converters import (
@@ -839,6 +840,11 @@ def test_parse_cols_list(self):
839840
assert_frame_equal(df, df2)
840841
assert_frame_equal(df3, df2)
841842

843+
def test_read_table_unicode(self):
844+
fin = StringIO('\u0141aski, Jan;1')
845+
df1 = read_table(fin, sep=";", encoding="utf-8", header=None)
846+
self.assert_(isinstance(df1['X.1'].values[0], unicode))
847+
842848
def test_read_table_wrong_num_columns(self):
843849
data = """A,B,C,D,E,F
844850
1,2,3,4,5
@@ -1306,8 +1312,11 @@ def test_parse_dates_custom_euroformat(self):
13061312
na_values=['NA'])
13071313

13081314
def test_converters_corner_with_nas(self):
1315+
# skip aberration observed on Win64 Python 3.2.2
1316+
if hash(np.int64(-1)) != -2:
1317+
raise nose.SkipTest
1318+
13091319
import StringIO
1310-
import numpy as np
13111320
import pandas
13121321
csv = """id,score,days
13131322
1,2,12
@@ -1490,7 +1499,14 @@ def test_parse_tz_aware(self):
14901499
result = read_csv(data, index_col=0, parse_dates=True)
14911500
stamp = result.index[0]
14921501
self.assert_(stamp.minute == 39)
1493-
self.assert_(result.index.tz is pytz.utc)
1502+
try:
1503+
self.assert_(result.index.tz is pytz.utc)
1504+
except AssertionError: # hello Yaroslav
1505+
arr = result.index.to_pydatetime()
1506+
result = tools.to_datetime(arr, utc=True)[0]
1507+
self.assert_(stamp.minute == result.minute)
1508+
self.assert_(stamp.hour == result.hour)
1509+
self.assert_(stamp.day == result.day)
14941510

14951511
class TestParseSQL(unittest.TestCase):
14961512

pandas/src/generate_code.py

-4
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,6 @@
3535
cimport util
3636
from util cimport is_array, _checknull, _checknan
3737
38-
cdef extern from "math.h":
39-
double sqrt(double x)
40-
double fabs(double)
41-
4238
# import datetime C API
4339
PyDateTime_IMPORT
4440

pandas/src/generated.pyx

-4
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,6 @@ ctypedef unsigned char UChar
3232
cimport util
3333
from util cimport is_array, _checknull, _checknan
3434

35-
cdef extern from "math.h":
36-
double sqrt(double x)
37-
double fabs(double)
38-
3935
# import datetime C API
4036
PyDateTime_IMPORT
4137

0 commit comments

Comments
 (0)