Skip to content

Commit 0fa4d06

Browse files
committed
Merge pull request #4850 from jreback/loc_float
BUG/ENH: provide better .loc based semantics for float based indicies, continuing not to fallback (related GH236)
2 parents d2f95f2 + 60efe85 commit 0fa4d06

19 files changed

+1080
-317
lines changed

doc/source/indexing.rst

+98-11
Original file line numberDiff line numberDiff line change
@@ -1199,22 +1199,109 @@ numpy array. For instance,
11991199
dflookup = DataFrame(np.random.rand(20,4), columns = ['A','B','C','D'])
12001200
dflookup.lookup(list(range(0,10,2)), ['B','C','A','B','D'])
12011201
1202-
Setting values in mixed-type DataFrame
1203-
--------------------------------------
1202+
.. _indexing.float64index:
12041203

1205-
.. _indexing.mixed_type_setting:
1204+
Float64Index
1205+
------------
1206+
1207+
.. versionadded:: 0.13.0
12061208

1207-
Setting values on a mixed-type DataFrame or Panel is supported when using
1208-
scalar values, though setting arbitrary vectors is not yet supported:
1209+
By default a ``Float64Index`` will be automatically created when passing floating, or mixed-integer-floating values in index creation.
1210+
This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
1211+
same.
12091212

12101213
.. ipython:: python
12111214
1212-
df2 = df[:4]
1213-
df2['foo'] = 'bar'
1214-
print(df2)
1215-
df2.ix[2] = np.nan
1216-
print(df2)
1217-
print(df2.dtypes)
1215+
indexf = Index([1.5, 2, 3, 4.5, 5])
1216+
indexf
1217+
sf = Series(range(5),index=indexf)
1218+
sf
1219+
1220+
Scalar selection for ``[],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)
1221+
1222+
.. ipython:: python
1223+
1224+
sf[3]
1225+
sf[3.0]
1226+
sf.ix[3]
1227+
sf.ix[3.0]
1228+
sf.loc[3]
1229+
sf.loc[3.0]
1230+
1231+
The only positional indexing is via ``iloc``
1232+
1233+
.. ipython:: python
1234+
1235+
sf.iloc[3]
1236+
1237+
A scalar index that is not found will raise ``KeyError``
1238+
1239+
Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS positional with ``iloc``
1240+
1241+
.. ipython:: python
1242+
1243+
sf[2:4]
1244+
sf.ix[2:4]
1245+
sf.loc[2:4]
1246+
sf.iloc[2:4]
1247+
1248+
In float indexes, slicing using floats is allowed
1249+
1250+
.. ipython:: python
1251+
1252+
sf[2.1:4.6]
1253+
sf.loc[2.1:4.6]
1254+
1255+
In non-float indexes, slicing using floats will raise a ``TypeError``
1256+
1257+
.. code-block:: python
1258+
1259+
In [1]: Series(range(5))[3.5]
1260+
TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index)
1261+
1262+
In [1]: Series(range(5))[3.5:4.5]
1263+
TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index)
1264+
1265+
Using a scalar float indexer will be deprecated in a future version, but is allowed for now.
1266+
1267+
.. code-block:: python
1268+
1269+
In [3]: Series(range(5))[3.0]
1270+
Out[3]: 3
1271+
1272+
Here is a typical use-case for using this type of indexing. Imagine that you have a somewhat
1273+
irregular timedelta-like indexing scheme, but the data is recorded as floats. This could for
1274+
example be millisecond offsets.
1275+
1276+
.. ipython:: python
1277+
1278+
dfir = concat([DataFrame(randn(5,2),
1279+
index=np.arange(5) * 250.0,
1280+
columns=list('AB')),
1281+
DataFrame(randn(6,2),
1282+
index=np.arange(4,10) * 250.1,
1283+
columns=list('AB'))])
1284+
dfir
1285+
1286+
Selection operations then will always work on a value basis, for all selection operators.
1287+
1288+
.. ipython:: python
1289+
1290+
dfir[0:1000.4]
1291+
dfir.loc[0:1001,'A']
1292+
dfir.loc[1000.4]
1293+
1294+
You could then easily pick out the first 1 second (1000 ms) of data then.
1295+
1296+
.. ipython:: python
1297+
1298+
dfir[0:1000]
1299+
1300+
Of course if you need integer based selection, then use ``iloc``
1301+
1302+
.. ipython:: python
1303+
1304+
dfir.iloc[0:5]
12181305
12191306
.. _indexing.view_versus_copy:
12201307

doc/source/release.rst

+4
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,10 @@ API Changes
229229
add top-level ``to_timedelta`` function
230230
- ``NDFrame`` now is compatible with Python's toplevel ``abs()`` function (:issue:`4821`).
231231
- raise a ``TypeError`` on invalid comparison ops on Series/DataFrame (e.g. integer/datetime) (:issue:`4968`)
232+
- Added a new index type, ``Float64Index``. This will be automatically created when passing floating values in index creation.
233+
This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the same.
234+
Indexing on other index types are preserved (and positional fallback for ``[],ix``), with the exception, that floating point slicing
235+
on indexes on non ``Float64Index`` will raise a ``TypeError``, e.g. ``Series(range(5))[3.5:4.5]`` (:issue:`263`)
232236

233237
Internal Refactoring
234238
~~~~~~~~~~~~~~~~~~~~

doc/source/v0.13.0.txt

+66
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,72 @@ Indexing API Changes
116116
p
117117
p.loc[:,:,'C']
118118

119+
Float64Index API Change
120+
~~~~~~~~~~~~~~~~~~~~~~~
121+
122+
- Added a new index type, ``Float64Index``. This will be automatically created when passing floating values in index creation.
123+
This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
124+
same. See :ref:`the docs<indexing.float64index>`, (:issue:`263`)
125+
126+
Construction is by default for floating type values.
127+
128+
.. ipython:: python
129+
130+
index = Index([1.5, 2, 3, 4.5, 5])
131+
index
132+
s = Series(range(5),index=index)
133+
s
134+
135+
Scalar selection for ``[],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)
136+
137+
.. ipython:: python
138+
139+
s[3]
140+
s.ix[3]
141+
s.loc[3]
142+
143+
The only positional indexing is via ``iloc``
144+
145+
.. ipython:: python
146+
147+
s.iloc[3]
148+
149+
A scalar index that is not found will raise ``KeyError``
150+
151+
Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS positional with ``iloc``
152+
153+
.. ipython:: python
154+
155+
s[2:4]
156+
s.ix[2:4]
157+
s.loc[2:4]
158+
s.iloc[2:4]
159+
160+
In float indexes, slicing using floats are allowed
161+
162+
.. ipython:: python
163+
164+
s[2.1:4.6]
165+
s.loc[2.1:4.6]
166+
167+
- Indexing on other index types are preserved (and positional fallback for ``[],ix``), with the exception, that floating point slicing
168+
on indexes on non ``Float64Index`` will now raise a ``TypeError``.
169+
170+
.. code-block:: python
171+
172+
In [1]: Series(range(5))[3.5]
173+
TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index)
174+
175+
In [1]: Series(range(5))[3.5:4.5]
176+
TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index)
177+
178+
Using a scalar float indexer will be deprecated in a future version, but is allowed for now.
179+
180+
.. code-block:: python
181+
182+
In [3]: Series(range(5))[3.0]
183+
Out[3]: 3
184+
119185
HDFStore API Changes
120186
~~~~~~~~~~~~~~~~~~~~
121187

doc/source/v0.4.x.txt

+1-2
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,7 @@ New Features
1515
with choice of join method (ENH56_)
1616
- :ref:`Added <indexing.get_level_values>` method ``get_level_values`` to
1717
``MultiIndex`` (:issue:`188`)
18-
- :ref:`Set <indexing.mixed_type_setting>` values in mixed-type
19-
``DataFrame`` objects via ``.ix`` indexing attribute (:issue:`135`)
18+
- Set values in mixed-type ``DataFrame`` objects via ``.ix`` indexing attribute (:issue:`135`)
2019
- Added new ``DataFrame`` :ref:`methods <basics.dtypes>`
2120
``get_dtype_counts`` and property ``dtypes`` (ENHdc_)
2221
- Added :ref:`ignore_index <merging.ignore_index>` option to

pandas/computation/tests/test_eval.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -778,7 +778,7 @@ def check_chained_cmp_op(self, lhs, cmp1, mid, cmp2, rhs):
778778
class TestAlignment(object):
779779

780780
index_types = 'i', 'u', 'dt'
781-
lhs_index_types = index_types + ('f', 's') # 'p'
781+
lhs_index_types = index_types + ('s',) # 'p'
782782

783783
def check_align_nested_unary_op(self, engine, parser):
784784
skip_if_no_ne(engine)

pandas/core/api.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
from pandas.core.categorical import Categorical, Factor
99
from pandas.core.format import (set_printoptions, reset_printoptions,
1010
set_eng_float_format)
11-
from pandas.core.index import Index, Int64Index, MultiIndex
11+
from pandas.core.index import Index, Int64Index, Float64Index, MultiIndex
1212

1313
from pandas.core.series import Series, TimeSeries
1414
from pandas.core.frame import DataFrame

pandas/core/frame.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2050,7 +2050,7 @@ def eval(self, expr, **kwargs):
20502050
kwargs['local_dict'] = _ensure_scope(resolvers=resolvers, **kwargs)
20512051
return _eval(expr, **kwargs)
20522052

2053-
def _slice(self, slobj, axis=0, raise_on_error=False):
2053+
def _slice(self, slobj, axis=0, raise_on_error=False, typ=None):
20542054
axis = self._get_block_manager_axis(axis)
20552055
new_data = self._data.get_slice(
20562056
slobj, axis=axis, raise_on_error=raise_on_error)

0 commit comments

Comments
 (0)