9
9
import pandas as pd
10
10
np.random.seed(123456 )
11
11
np.set_printoptions(precision = 4 , suppress = True )
12
- pd.options.display.max_rows= 15
12
+ pd.options.display.max_rows = 15
13
13
14
14
******************************
15
15
MultiIndex / Advanced Indexing
@@ -188,10 +188,10 @@ highly performant. If you want to see only the used levels, you can use the
188
188
189
189
.. ipython :: python
190
190
191
- df[[' foo' ,' qux' ]].columns.values
191
+ df[[' foo' , ' qux' ]].columns.values
192
192
193
193
# for a specific level
194
- df[[' foo' ,' qux' ]].columns.get_level_values(0 )
194
+ df[[' foo' , ' qux' ]].columns.get_level_values(0 )
195
195
196
196
To reconstruct the ``MultiIndex `` with only the used levels, the
197
197
:meth: `~MultiIndex.remove_unused_levels ` method may be used.
@@ -200,7 +200,7 @@ To reconstruct the ``MultiIndex`` with only the used levels, the
200
200
201
201
.. ipython :: python
202
202
203
- df[[' foo' ,' qux' ]].columns.remove_unused_levels()
203
+ df[[' foo' , ' qux' ]].columns.remove_unused_levels()
204
204
205
205
Data alignment and using ``reindex ``
206
206
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -235,7 +235,7 @@ keys take the form of tuples. For example, the following works as you would expe
235
235
236
236
df = df.T
237
237
df
238
- df.loc[(' bar' , ' two' ), ]
238
+ df.loc[(' bar' , ' two' )]
239
239
240
240
Note that ``df.loc['bar', 'two'] `` would also work in this example, but this shorthand
241
241
notation can lead to ambiguity in general.
@@ -328,17 +328,18 @@ As usual, **both sides** of the slicers are included as this is label indexing.
328
328
329
329
.. ipython :: python
330
330
331
- def mklbl (prefix ,n ):
332
- return [" %s%s " % (prefix,i) for i in range (n)]
331
+ def mklbl (prefix , n ):
332
+ return [" %s%s " % (prefix, i) for i in range (n)]
333
333
334
- miindex = pd.MultiIndex.from_product([mklbl(' A' ,4 ),
335
- mklbl(' B' ,2 ),
336
- mklbl(' C' ,4 ),
337
- mklbl(' D' ,2 )])
338
- micolumns = pd.MultiIndex.from_tuples([(' a' ,' foo' ),(' a' ,' bar' ),
339
- (' b' ,' foo' ),(' b' ,' bah' )],
334
+ miindex = pd.MultiIndex.from_product([mklbl(' A' , 4 ),
335
+ mklbl(' B' , 2 ),
336
+ mklbl(' C' , 4 ),
337
+ mklbl(' D' , 2 )])
338
+ micolumns = pd.MultiIndex.from_tuples([(' a' , ' foo' ), (' a' , ' bar' ),
339
+ (' b' , ' foo' ), (' b' , ' bah' )],
340
340
names = [' lvl0' , ' lvl1' ])
341
- dfmi = pd.DataFrame(np.arange(len (miindex)* len (micolumns)).reshape((len (miindex),len (micolumns))),
341
+ dfmi = pd.DataFrame(np.arange(len (miindex) * len (micolumns))
342
+ .reshape((len (miindex), len (micolumns))),
342
343
index = miindex,
343
344
columns = micolumns).sort_index().sort_index(axis = 1 )
344
345
dfmi
@@ -347,7 +348,7 @@ Basic MultiIndex slicing using slices, lists, and labels.
347
348
348
349
.. ipython :: python
349
350
350
- dfmi.loc[(slice (' A1' ,' A3' ), slice (None ), [' C1' , ' C3' ]), :]
351
+ dfmi.loc[(slice (' A1' , ' A3' ), slice (None ), [' C1' , ' C3' ]), :]
351
352
352
353
353
354
You can use :class: `pandas.IndexSlice ` to facilitate a more natural syntax
@@ -412,7 +413,7 @@ selecting data at a particular level of a ``MultiIndex`` easier.
412
413
.. ipython :: python
413
414
414
415
# using the slicers
415
- df.loc[(slice (None ),' one' ),:]
416
+ df.loc[(slice (None ), ' one' ), :]
416
417
417
418
You can also select on the columns with ``xs ``, by
418
419
providing the axis argument.
@@ -425,7 +426,7 @@ providing the axis argument.
425
426
.. ipython :: python
426
427
427
428
# using the slicers
428
- df.loc[:,(slice (None ),' one' )]
429
+ df.loc[:, (slice (None ), ' one' )]
429
430
430
431
``xs `` also allows selection with multiple keys.
431
432
@@ -436,7 +437,7 @@ providing the axis argument.
436
437
.. ipython :: python
437
438
438
439
# using the slicers
439
- df.loc[:,(' bar' ,' one' )]
440
+ df.loc[:, (' bar' , ' one' )]
440
441
441
442
You can pass ``drop_level=False `` to ``xs `` to retain
442
443
the level that was selected.
@@ -467,9 +468,9 @@ values across a level. For instance:
467
468
468
469
.. ipython :: python
469
470
470
- midx = pd.MultiIndex(levels = [[' zero' , ' one' ], [' x' ,' y' ]],
471
- labels = [[1 ,1 , 0 , 0 ],[1 ,0 , 1 , 0 ]])
472
- df = pd.DataFrame(np.random.randn(4 ,2 ), index = midx)
471
+ midx = pd.MultiIndex(levels = [[' zero' , ' one' ], [' x' , ' y' ]],
472
+ labels = [[1 , 1 , 0 , 0 ], [1 , 0 , 1 , 0 ]])
473
+ df = pd.DataFrame(np.random.randn(4 , 2 ), index = midx)
473
474
df
474
475
df2 = df.mean(level = 0 )
475
476
df2
@@ -501,7 +502,7 @@ method, allowing you to permute the hierarchical index levels in one step:
501
502
502
503
.. ipython :: python
503
504
504
- df[:5 ].reorder_levels([1 ,0 ], axis = 0 )
505
+ df[:5 ].reorder_levels([1 , 0 ], axis = 0 )
505
506
506
507
.. _advanced.index_names :
507
508
@@ -522,7 +523,7 @@ of the ``DataFrame``.
522
523
523
524
.. ipython :: python
524
525
525
- df.rename(index = {" one" : " two" , " y" : " z" })
526
+ df.rename(index = {" one" : " two" , " y" : " z" })
526
527
527
528
The :meth: `~DataFrame.rename_axis ` method is used to rename the name of a
528
529
``Index `` or ``MultiIndex ``. In particular, the names of the levels of a
@@ -605,7 +606,7 @@ Furthermore, if you try to index something that is not fully lexsorted, this can
605
606
606
607
.. code-block :: ipython
607
608
608
- In [5]: dfm.loc[(0,'y'):(1, 'z')]
609
+ In [5]: dfm.loc[(0, 'y'):(1, 'z')]
609
610
UnsortedIndexError: 'Key length (2) was greater than MultiIndex lexsort depth (1)'
610
611
611
612
The :meth: `~MultiIndex.is_lexsorted ` method on a ``MultiIndex `` shows if the
@@ -627,7 +628,7 @@ And now selection works as expected.
627
628
628
629
.. ipython :: python
629
630
630
- dfm.loc[(0 ,' y' ):(1 , ' z' )]
631
+ dfm.loc[(0 , ' y' ):(1 , ' z' )]
631
632
632
633
Take Methods
633
634
------------
@@ -688,12 +689,12 @@ faster than fancy indexing.
688
689
indexer = np.arange(10000)
689
690
random.shuffle(indexer)
690
691
691
- timeit arr[indexer]
692
- timeit arr.take(indexer, axis=0)
692
+ % timeit arr[indexer]
693
+ % timeit arr.take(indexer, axis=0)
693
694
694
695
ser = pd.Series(arr[:, 0])
695
- timeit ser.iloc[indexer]
696
- timeit ser.take(indexer)
696
+ % timeit ser.iloc[indexer]
697
+ % timeit ser.take(indexer)
697
698
698
699
.. _indexing.index_types :
699
700
@@ -718,7 +719,6 @@ and allows efficient indexing and storage of an index with a large number of dup
718
719
.. ipython :: python
719
720
720
721
from pandas.api.types import CategoricalDtype
721
-
722
722
df = pd.DataFrame({' A' : np.arange(6 ),
723
723
' B' : list (' aabbca' )})
724
724
df[' B' ] = df[' B' ].astype(CategoricalDtype(list (' cab' )))
@@ -781,16 +781,15 @@ values **not** in the categories, similarly to how you can reindex **any** panda
781
781
782
782
.. code-block :: ipython
783
783
784
- In [9]: df3 = pd.DataFrame({'A' : np.arange(6),
785
- 'B' : pd.Series(list('aabbca')).astype('category')})
784
+ In [9]: df3 = pd.DataFrame({'A': np.arange(6), 'B': pd.Series(list('aabbca')).astype('category')})
786
785
787
- In [11]: df3 = df3.set_index('B')
786
+ In [11]: df3 = df3.set_index('B')
788
787
789
- In [11]: df3.index
790
- Out[11]: CategoricalIndex([u'a', u'a', u'b', u'b', u'c', u'a'], categories=[u'a', u'b', u'c'], ordered=False, name=u'B', dtype='category')
788
+ In [11]: df3.index
789
+ Out[11]: CategoricalIndex([u'a', u'a', u'b', u'b', u'c', u'a'], categories=[u'a', u'b', u'c'], ordered=False, name=u'B', dtype='category')
791
790
792
- In [12]: pd.concat([df2, df3]
793
- TypeError: categories must match existing categories when appending
791
+ In [12]: pd.concat([df2, df3])
792
+ TypeError: categories must match existing categories when appending
794
793
795
794
.. _indexing.rangeindex :
796
795
@@ -883,11 +882,11 @@ example, be millisecond offsets.
883
882
884
883
.. ipython :: python
885
884
886
- dfir = pd.concat([pd.DataFrame(np.random.randn(5 ,2 ),
885
+ dfir = pd.concat([pd.DataFrame(np.random.randn(5 , 2 ),
887
886
index = np.arange(5 ) * 250.0 ,
888
887
columns = list (' AB' )),
889
- pd.DataFrame(np.random.randn(6 ,2 ),
890
- index = np.arange(4 ,10 ) * 250.1 ,
888
+ pd.DataFrame(np.random.randn(6 , 2 ),
889
+ index = np.arange(4 , 10 ) * 250.1 ,
891
890
columns = list (' AB' ))])
892
891
dfir
893
892
@@ -896,7 +895,7 @@ Selection operations then will always work on a value basis, for all selection o
896
895
.. ipython :: python
897
896
898
897
dfir[0 :1000.4 ]
899
- dfir.loc[0 :1001 ,' A' ]
898
+ dfir.loc[0 :1001 , ' A' ]
900
899
dfir.loc[1000.4 ]
901
900
902
901
You could retrieve the first 1 second (1000 ms) of data as such:
@@ -934,7 +933,7 @@ An ``IntervalIndex`` can be used in ``Series`` and in ``DataFrame`` as the index
934
933
.. ipython :: python
935
934
936
935
df = pd.DataFrame({' A' : [1 , 2 , 3 , 4 ]},
937
- index = pd.IntervalIndex.from_breaks([0 , 1 , 2 , 3 , 4 ]))
936
+ index = pd.IntervalIndex.from_breaks([0 , 1 , 2 , 3 , 4 ]))
938
937
df
939
938
940
939
Label based indexing via ``.loc `` along the edges of an interval works as you would expect,
@@ -1014,7 +1013,8 @@ in the resulting ``IntervalIndex``:
1014
1013
1015
1014
pd.interval_range(start = 0 , end = 6 , periods = 4 )
1016
1015
1017
- pd.interval_range(pd.Timestamp(' 2018-01-01' ), pd.Timestamp(' 2018-02-28' ), periods = 3 )
1016
+ pd.interval_range(pd.Timestamp(' 2018-01-01' ),
1017
+ pd.Timestamp(' 2018-02-28' ), periods = 3 )
1018
1018
1019
1019
Miscellaneous indexing FAQ
1020
1020
--------------------------
@@ -1051,7 +1051,7 @@ normal Python ``list``. Monotonicity of an index can be tested with the :meth:`~
1051
1051
1052
1052
.. ipython :: python
1053
1053
1054
- df = pd.DataFrame(index = [2 ,3 , 3 , 4 , 5 ], columns = [' data' ], data = list (range (5 )))
1054
+ df = pd.DataFrame(index = [2 , 3 , 3 , 4 , 5 ], columns = [' data' ], data = list (range (5 )))
1055
1055
df.index.is_monotonic_increasing
1056
1056
1057
1057
# no rows 0 or 1, but still returns rows 2, 3 (both of them), and 4:
@@ -1065,7 +1065,8 @@ On the other hand, if the index is not monotonic, then both slice bounds must be
1065
1065
1066
1066
.. ipython :: python
1067
1067
1068
- df = pd.DataFrame(index = [2 ,3 ,1 ,4 ,3 ,5 ], columns = [' data' ], data = list (range (6 )))
1068
+ df = pd.DataFrame(index = [2 , 3 , 1 , 4 , 3 , 5 ],
1069
+ columns = [' data' ], data = list (range (6 )))
1069
1070
df.index.is_monotonic_increasing
1070
1071
1071
1072
# OK because 2 and 4 are in the index
0 commit comments