Skip to content

Commit a79b2b2

Browse files
committed
Merge commit 'v0.7.0rc1-29-g4d29e47' into debian
* commit 'v0.7.0rc1-29-g4d29e47': (26 commits) BUG: fix indexing corner case with MultiIndex re: pandas-dev#671 BUG: fix integer-slicing from integers-as-floats, GH pandas-dev#670 DOC: added release notes, docs for pandas-dev#288, pandas-dev#647 ENH: can pass sequence of integers to DataFrame.{irow/icol} and Series.iget, GH pandas-dev#654" BUG: pass on sort kind from Series.sort to order and in argsort GH pandas-dev#668 BUG: make groupby play nice with sparse objects, modify SparseSeries.take to return SparseSeries, dictification tests, GH pandas-dev#666 TST: added tests for cummin, cummax. closes pandas-dev#647 Update pandas/core/generic.py * Added cummax and cummin methods for Series And DataFrame. ENH: closes pandas-dev#288 DOC: gotcha docs re: pandas-dev#656 BUG: week as top-level iterator overwrites week instance TST: test behavior of passing None to Series constructor ENH: string format None and None and not NaN ENH: break circular reference causing memory leak in sparse array / series / frame, GH pandas-dev#663 DOC: release notes ENH: handle list of values intelligently as grouping array when possible, GH pandas-dev#659 BUG: don't lose columns name when passing list of labels to DataFrame.__getitem__, GH pandas-dev#662 BUG: handling of tuples in MultiIndex level ENH: try to convert dtypes in Index.map ...
2 parents 6eb2e56 + 4d29e47 commit a79b2b2

31 files changed

+835
-160
lines changed

RELEASE.rst

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,8 +78,11 @@ pandas 0.7.0
7878
yielding an aggregated result with hierarchical columns (GH #166)
7979
- Add integer-indexing functions ``iget`` in Series and ``irow`` / ``iget``
8080
in DataFrame (GH #628)
81-
- Add automatic realignment functionality (when possible) to comparisons and
82-
logical operators for Series
81+
- Add new ``Series.unique`` function, significantly faster than
82+
``numpy.unique`` (GH #658)
83+
- Add new ``cummin`` and ``cummax`` instance methods to ``Series`` and
84+
``DataFrame`` (GH #647)
85+
- Add new ``value_range`` function to return min/max of a dataframe (GH #288)
8386

8487
**API Changes**
8588

@@ -155,6 +158,10 @@ pandas 0.7.0
155158
row/column the function application failed on (GH #614)
156159
- Improved ability of read_table and read_clipboard to parse
157160
console-formatted DataFrames (can read the row of index names, etc.)
161+
- Can pass list of group labels (without having to convert to an ndarray
162+
yourself) to ``groupby`` in some cases (GH #659)
163+
- Use ``kind`` argument to Series.order for selecting different sort kinds
164+
(GH #668)
158165

159166
**Bug fixes**
160167

@@ -220,6 +227,10 @@ pandas 0.7.0
220227
- Add support for legacy WidePanel objects to be read from HDFStore
221228
- Fix out-of-bounds segfault in pad_object and backfill_object methods when
222229
either source or target array are empty
230+
- Could not create a new column in a DataFrame from a list of tuples
231+
- Fix bugs preventing SparseDataFrame and SparseSeries working with groupby
232+
(GH #666)
233+
- Use sort kind in Series.sort / argsort (GH #668)
223234

224235
Thanks
225236
------

TODO.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
DOCS 0.7.0
22
----------
3-
- no sort in groupby
4-
- concat with dict
3+
- ??? no sort in groupby
4+
- DONE concat with dict
5+
- Gotchas re: integer indexing
56

67
DONE
78
----

doc/source/basics.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -285,6 +285,8 @@ optional ``level`` parameter which applies only if the object has a
285285
``quantile``, Sample quantile (value at %)
286286
``cumsum``, Cumulative sum
287287
``cumprod``, Cumulative product
288+
``cummax``, Cumulative maximum
289+
``cummin``, Cumulative minimum
288290

289291
Note that by chance some NumPy methods, like ``mean``, ``std``, and ``sum``,
290292
will exclude NAs on Series input by default:
@@ -332,6 +334,9 @@ number of unique values and most frequently occurring values:
332334
s = Series(['a', 'a', 'b', 'b', 'a', 'a', np.nan, 'c', 'd', 'a'])
333335
s.describe()
334336
337+
There also is a utility function, ``value_range`` which takes a DataFrame and
338+
returns a series with the minimum/maximum values in the DataFrame.
339+
335340
.. _basics.idxmin:
336341

337342
Index of Min/Max Values

doc/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@
6161

6262
# General information about the project.
6363
project = u'pandas'
64-
copyright = u'2008-2011, AQR and Wes McKinney'
64+
copyright = u'2008-2011, the pandas development team'
6565

6666
# The version info for the project you're documenting, acts as replacement for
6767
# |version| and |release|, also used in various other places throughout the

doc/source/gotchas.rst

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,3 +80,58 @@ specific dates. To enable this, we made the design design to make label-based sl
8080
This is most definitely a "practicality beats purity" sort of thing, but it is
8181
something to watch out for is you expect label-based slicing to behave exactly
8282
in the way that standard Python integer slicing works.
83+
84+
Miscellaneous indexing gotchas
85+
------------------------------
86+
87+
Reindex versus ix gotchas
88+
~~~~~~~~~~~~~~~~~~~~~~~~~
89+
90+
Many users will find themselves using the ``ix`` indexing capabilities as a
91+
concise means of selecting data from a pandas object:
92+
93+
.. ipython:: python
94+
95+
df = DataFrame(randn(6, 4), columns=['one', 'two', 'three', 'four'],
96+
index=list('abcdef'))
97+
df
98+
df.ix[['b', 'c', 'e']]
99+
100+
This is, of course, completely equivalent *in this case* to using th
101+
``reindex`` method:
102+
103+
.. ipython:: python
104+
105+
df.reindex(['b', 'c', 'e'])
106+
107+
Some might conclude that ``ix`` and ``reindex`` are 100% equivalent based on
108+
this. This is indeed true **except in the case of integer indexing**. For
109+
example, the above operation could alternately have been expressed as:
110+
111+
.. ipython:: python
112+
113+
df.ix[[1, 2, 4]]
114+
115+
If you pass ``[1, 2, 4]`` to ``reindex`` you will get another thing entirely:
116+
117+
.. ipython:: python
118+
119+
df.reindex([1, 2, 4])
120+
121+
So it's important to remember that ``reindex`` is **strict label indexing
122+
only**. This can lead to some potentially surprising results in pathological
123+
cases where an index contains, say, both integers and strings:
124+
125+
.. ipython:: python
126+
127+
s = Series([1, 2, 3], index=['a', 0, 1])
128+
s
129+
s.ix[[0, 1]]
130+
s.reindex([0, 1])
131+
132+
Because the index in this case does not contain solely integers, ``ix`` falls
133+
back on integer indexing. By contrast, ``reindex`` only looks for the values
134+
passed in the index, thus finding the integers ``0`` and ``1``. While it would
135+
be possible to insert some logic to check whether a passed sequence is all
136+
contained in the index, that logic would exact a very high cost in large data
137+
sets.

doc/source/themes/agogo/static/agogo.css_t

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -453,3 +453,7 @@ div.viewcode-block:target {
453453
border-top: 1px solid #ac9;
454454
border-bottom: 1px solid #ac9;
455455
}
456+
457+
th.field-name {
458+
white-space: nowrap;
459+
}

doc/source/whatsnew/v0.7.0.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,12 @@ New features
8888
aggregate with groupby on a DataFrame, yielding an aggregated result with
8989
hierarchical columns (GH166_)
9090

91+
- Can call ``cummin`` and ``cummax`` on Series and DataFrame to get cumulative
92+
minimum and maximum, respectively (GH647_)
93+
94+
- ``value_range`` added as utility function to get min and max of a dataframe
95+
(GH288_)
96+
9197
API Changes to integer indexing
9298
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9399

@@ -266,6 +272,7 @@ similar operation to the above but using a Python function:
266272
.. _GH115: https://github.com/wesm/pandas/issues/115
267273
.. _GH166: https://github.com/wesm/pandas/issues/166
268274
.. _GH220: https://github.com/wesm/pandas/issues/220
275+
.. _GH288: https://github.com/wesm/pandas/issues/288
269276
.. _GH249: https://github.com/wesm/pandas/issues/249
270277
.. _GH267: https://github.com/wesm/pandas/issues/267
271278
.. _GH273: https://github.com/wesm/pandas/issues/273
@@ -288,6 +295,7 @@ similar operation to the above but using a Python function:
288295
.. _GH545: https://github.com/wesm/pandas/issues/545
289296
.. _GH554: https://github.com/wesm/pandas/issues/554
290297
.. _GH595: https://github.com/wesm/pandas/issues/595
298+
.. _GH647: https://github.com/wesm/pandas/issues/647
291299
.. _GH93: https://github.com/wesm/pandas/issues/93
292300
.. _GH93: https://github.com/wesm/pandas/issues/93
293301
.. _PR521: https://github.com/wesm/pandas/pull/521

pandas/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,4 @@
3131

3232
from pandas.tools.merge import merge, concat
3333
from pandas.tools.pivot import pivot_table, crosstab
34+
from pandas.tools.describe import value_range

pandas/core/common.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -595,6 +595,8 @@ def _make_int_format(x):
595595
return _make_int_format(s)
596596
else:
597597
if na_rep is not None and lib.checknull(s):
598+
if s is None:
599+
return 'None'
598600
return na_rep
599601
else:
600602
# object dtype
@@ -798,7 +800,7 @@ def load(path):
798800
799801
Parameters
800802
----------
801-
p path : string
803+
path : string
802804
File path
803805
804806
Returns

pandas/core/datetools.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -714,9 +714,9 @@ class Second(Tick):
714714

715715

716716
for i, weekday in enumerate(['MON', 'TUE', 'WED', 'THU', 'FRI']):
717-
for week in xrange(4):
718-
_offsetMap['WOM@%d%s' % (week + 1, weekday)] = \
719-
WeekOfMonth(week=week, weekday=i)
717+
for iweek in xrange(4):
718+
_offsetMap['WOM@%d%s' % (iweek + 1, weekday)] = \
719+
WeekOfMonth(week=iweek, weekday=i)
720720

721721
_offsetNames = dict([(v, k) for k, v in _offsetMap.iteritems()])
722722

pandas/core/frame.py

Lines changed: 41 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1220,38 +1220,48 @@ def set_value(self, index, col, value):
12201220

12211221
def irow(self, i):
12221222
"""
1223-
Retrieve the i-th row of the DataFrame by location as a Series. Can
1224-
also pass a slice object
1223+
Retrieve the i-th row or rows of the DataFrame by location
12251224
12261225
Parameters
12271226
----------
1228-
i : int or slice
1227+
i : int, slice, or sequence of integers
1228+
1229+
Notes
1230+
-----
1231+
If slice passed, the resulting data will be a view
12291232
12301233
Returns
12311234
-------
1232-
row : Series
1235+
row : Series (int) or DataFrame (slice, sequence)
12331236
"""
12341237
if isinstance(i, slice):
12351238
return self[i]
12361239
else:
12371240
label = self.index[i]
1238-
return self.xs(label)
1241+
if isinstance(label, Index):
1242+
return self.reindex(label)
1243+
else:
1244+
return self.xs(label)
12391245

12401246
def icol(self, i):
12411247
"""
1242-
Retrieve the i-th column of the DataFrame by location as a Series. Can
1243-
also pass a slice object
1248+
Retrieve the i-th column or columns of the DataFrame by location
12441249
12451250
Parameters
12461251
----------
1247-
i : int or slice
1252+
i : int, slice, or sequence of integers
1253+
1254+
Notes
1255+
-----
1256+
If slice passed, the resulting data will be a view
12481257
12491258
Returns
12501259
-------
1251-
column : Series
1260+
column : Series (int) or DataFrame (slice, sequence)
12521261
"""
12531262
label = self.columns[i]
12541263
if isinstance(i, slice):
1264+
# need to return view
12551265
lab_slice = slice(label[0], label[-1])
12561266
return self.ix[:, lab_slice]
12571267
else:
@@ -1283,7 +1293,7 @@ def __getitem__(self, key):
12831293
# either boolean or fancy integer index
12841294
elif isinstance(key, (np.ndarray, list)):
12851295
if isinstance(key, list):
1286-
key = np.array(key, dtype=object)
1296+
key = lib.list_to_object_array(key)
12871297

12881298
# also raises Exception if object array with NA values
12891299
if com._is_bool_indexer(key):
@@ -1307,7 +1317,10 @@ def _getitem_array(self, key):
13071317
mask = indexer == -1
13081318
if mask.any():
13091319
raise KeyError("No column(s) named: %s" % str(key[mask]))
1310-
return self.reindex(columns=key)
1320+
result = self.reindex(columns=key)
1321+
if result.columns.name is None:
1322+
result.columns.name = self.columns.name
1323+
return result
13111324

13121325
def _slice(self, slobj, axis=0):
13131326
if axis == 0:
@@ -1427,9 +1440,7 @@ def _sanitize_column(self, value):
14271440
assert(len(value) == len(self.index))
14281441

14291442
if not isinstance(value, np.ndarray):
1430-
value = np.array(value)
1431-
if value.dtype.type == np.str_:
1432-
value = np.array(value, dtype=object)
1443+
value = com._asarray_tuplesafe(value)
14331444
else:
14341445
value = value.copy()
14351446
else:
@@ -1472,7 +1483,7 @@ def xs(self, key, axis=0, level=None, copy=True):
14721483
"""
14731484
labels = self._get_axis(axis)
14741485
if level is not None:
1475-
loc = labels.get_loc_level(key, level=level)
1486+
loc, new_ax = labels.get_loc_level(key, level=level)
14761487

14771488
# level = 0
14781489
if not isinstance(loc, slice):
@@ -1484,7 +1495,8 @@ def xs(self, key, axis=0, level=None, copy=True):
14841495

14851496
result = self.ix[indexer]
14861497

1487-
new_ax = result._get_axis(axis).droplevel(level)
1498+
# new_ax = result._get_axis(axis).droplevel(level)
1499+
14881500
setattr(result, result._get_axis_name(axis), new_ax)
14891501
return result
14901502

@@ -1495,14 +1507,23 @@ def xs(self, key, axis=0, level=None, copy=True):
14951507
return data
14961508

14971509
self._consolidate_inplace()
1498-
loc = self.index.get_loc(key)
1510+
1511+
index = self.index
1512+
if isinstance(index, MultiIndex):
1513+
loc, new_index = self.index.get_loc_level(key)
1514+
else:
1515+
loc = self.index.get_loc(key)
1516+
14991517
if np.isscalar(loc):
15001518
new_values = self._data.fast_2d_xs(loc, copy=copy)
15011519
return Series(new_values, index=self.columns, name=key)
15021520
else:
1503-
new_data = self._data.xs(key, axis=1, copy=copy)
1504-
result = DataFrame(new_data)
1505-
result.index = _maybe_droplevels(result.index, key)
1521+
result = self[loc]
1522+
result.index = new_index
1523+
1524+
# new_data = self._data.xs(key, axis=1, copy=copy)
1525+
# result = DataFrame(new_data)
1526+
# result.index = _maybe_droplevels(result.index, key)
15061527
return result
15071528

15081529
def lookup(self, row_labels, col_labels):

0 commit comments

Comments
 (0)