6
6
:suppress:
7
7
8
8
import numpy as np
9
- import random
10
- np.random.seed(123456 )
11
- from pandas import *
12
- options.display.max_rows= 15
13
9
import pandas as pd
14
- randn = np.random.randn
15
- randint = np.random.randint
10
+ np.random.seed(123456 )
16
11
np.set_printoptions(precision = 4 , suppress = True )
17
- from pandas.compat import range , zip
12
+ pd.options.display.max_rows = 15
18
13
19
14
******************************
20
15
MultiIndex / Advanced Indexing
@@ -80,10 +75,10 @@ demo different ways to initialize MultiIndexes.
80
75
tuples = list (zip (* arrays))
81
76
tuples
82
77
83
- index = MultiIndex.from_tuples(tuples, names = [' first' , ' second' ])
78
+ index = pd. MultiIndex.from_tuples(tuples, names = [' first' , ' second' ])
84
79
index
85
80
86
- s = Series(randn(8 ), index = index)
81
+ s = pd. Series(np.random. randn(8 ), index = index)
87
82
s
88
83
89
84
When you want every pairing of the elements in two iterables, it can be easier
@@ -92,7 +87,7 @@ to use the ``MultiIndex.from_product`` function:
92
87
.. ipython :: python
93
88
94
89
iterables = [[' bar' , ' baz' , ' foo' , ' qux' ], [' one' , ' two' ]]
95
- MultiIndex.from_product(iterables, names = [' first' , ' second' ])
90
+ pd. MultiIndex.from_product(iterables, names = [' first' , ' second' ])
96
91
97
92
As a convenience, you can pass a list of arrays directly into Series or
98
93
DataFrame to construct a MultiIndex automatically:
@@ -101,9 +96,9 @@ DataFrame to construct a MultiIndex automatically:
101
96
102
97
arrays = [np.array([' bar' , ' bar' , ' baz' , ' baz' , ' foo' , ' foo' , ' qux' , ' qux' ]),
103
98
np.array([' one' , ' two' , ' one' , ' two' , ' one' , ' two' , ' one' , ' two' ])]
104
- s = Series(randn(8 ), index = arrays)
99
+ s = pd. Series(np.random. randn(8 ), index = arrays)
105
100
s
106
- df = DataFrame(randn(8 , 4 ), index = arrays)
101
+ df = pd. DataFrame(np.random. randn(8 , 4 ), index = arrays)
107
102
df
108
103
109
104
All of the ``MultiIndex `` constructors accept a ``names `` argument which stores
@@ -119,9 +114,9 @@ of the index is up to you:
119
114
120
115
.. ipython :: python
121
116
122
- df = DataFrame(randn(3 , 8 ), index = [' A' , ' B' , ' C' ], columns = index)
117
+ df = pd. DataFrame(np.random. randn(3 , 8 ), index = [' A' , ' B' , ' C' ], columns = index)
123
118
df
124
- DataFrame(randn(6 , 6 ), index = index[:6 ], columns = index[:6 ])
119
+ pd. DataFrame(np.random. randn(6 , 6 ), index = index[:6 ], columns = index[:6 ])
125
120
126
121
We've "sparsified" the higher levels of the indexes to make the console output a
127
122
bit easier on the eyes.
@@ -131,7 +126,7 @@ tuples as atomic labels on an axis:
131
126
132
127
.. ipython :: python
133
128
134
- Series(randn(8 ), index = tuples)
129
+ pd. Series(np.random. randn(8 ), index = tuples)
135
130
136
131
The reason that the ``MultiIndex `` matters is that it can allow you to do
137
132
grouping, selection, and reshaping operations as we will describe below and in
@@ -282,16 +277,16 @@ As usual, **both sides** of the slicers are included as this is label indexing.
282
277
def mklbl (prefix ,n ):
283
278
return [" %s%s " % (prefix,i) for i in range (n)]
284
279
285
- miindex = MultiIndex.from_product([mklbl(' A' ,4 ),
286
- mklbl(' B' ,2 ),
287
- mklbl(' C' ,4 ),
288
- mklbl(' D' ,2 )])
289
- micolumns = MultiIndex.from_tuples([(' a' ,' foo' ),(' a' ,' bar' ),
290
- (' b' ,' foo' ),(' b' ,' bah' )],
291
- names = [' lvl0' , ' lvl1' ])
292
- dfmi = DataFrame(np.arange(len (miindex)* len (micolumns)).reshape((len (miindex),len (micolumns))),
293
- index = miindex,
294
- columns = micolumns).sortlevel().sortlevel(axis = 1 )
280
+ miindex = pd. MultiIndex.from_product([mklbl(' A' ,4 ),
281
+ mklbl(' B' ,2 ),
282
+ mklbl(' C' ,4 ),
283
+ mklbl(' D' ,2 )])
284
+ micolumns = pd. MultiIndex.from_tuples([(' a' ,' foo' ),(' a' ,' bar' ),
285
+ (' b' ,' foo' ),(' b' ,' bah' )],
286
+ names = [' lvl0' , ' lvl1' ])
287
+ dfmi = pd. DataFrame(np.arange(len (miindex)* len (micolumns)).reshape((len (miindex),len (micolumns))),
288
+ index = miindex,
289
+ columns = micolumns).sortlevel().sortlevel(axis = 1 )
295
290
dfmi
296
291
297
292
Basic multi-index slicing using slices, lists, and labels.
@@ -418,9 +413,9 @@ instance:
418
413
419
414
.. ipython :: python
420
415
421
- midx = MultiIndex(levels = [[' zero' , ' one' ], [' x' ,' y' ]],
422
- labels = [[1 ,1 ,0 ,0 ],[1 ,0 ,1 ,0 ]])
423
- df = DataFrame(randn(4 ,2 ), index = midx)
416
+ midx = pd. MultiIndex(levels = [[' zero' , ' one' ], [' x' ,' y' ]],
417
+ labels = [[1 ,1 ,0 ,0 ],[1 ,0 ,1 ,0 ]])
418
+ df = pd. DataFrame(np.random. randn(4 ,2 ), index = midx)
424
419
df
425
420
df2 = df.mean(level = 0 )
426
421
df2
@@ -471,7 +466,7 @@ labels will be sorted lexicographically!
471
466
.. ipython :: python
472
467
473
468
import random; random.shuffle(tuples)
474
- s = Series(randn(8 ), index = MultiIndex.from_tuples(tuples))
469
+ s = pd. Series(np.random. randn(8 ), index = pd. MultiIndex.from_tuples(tuples))
475
470
s
476
471
s.sortlevel(0 )
477
472
s.sortlevel(1 )
@@ -509,13 +504,13 @@ an exception. Here is a concrete example to illustrate this:
509
504
.. ipython :: python
510
505
511
506
tuples = [(' a' , ' a' ), (' a' , ' b' ), (' b' , ' a' ), (' b' , ' b' )]
512
- idx = MultiIndex.from_tuples(tuples)
507
+ idx = pd. MultiIndex.from_tuples(tuples)
513
508
idx.lexsort_depth
514
509
515
510
reordered = idx[[1 , 0 , 3 , 2 ]]
516
511
reordered.lexsort_depth
517
512
518
- s = Series(randn(4 ), index = reordered)
513
+ s = pd. Series(np.random. randn(4 ), index = reordered)
519
514
s.ix[' a' :' a' ]
520
515
521
516
However:
@@ -540,15 +535,15 @@ index positions. ``take`` will also accept negative integers as relative positio
540
535
541
536
.. ipython :: python
542
537
543
- index = Index(randint(0 , 1000 , 10 ))
538
+ index = pd. Index(np.random. randint(0 , 1000 , 10 ))
544
539
index
545
540
546
541
positions = [0 , 9 , 3 ]
547
542
548
543
index[positions]
549
544
index.take(positions)
550
545
551
- ser = Series(randn(10 ))
546
+ ser = pd. Series(np.random. randn(10 ))
552
547
553
548
ser.iloc[positions]
554
549
ser.take(positions)
@@ -558,7 +553,7 @@ row or column positions.
558
553
559
554
.. ipython :: python
560
555
561
- frm = DataFrame(randn(5 , 3 ))
556
+ frm = pd. DataFrame(np.random. randn(5 , 3 ))
562
557
563
558
frm.take([1 , 4 , 3 ])
564
559
@@ -569,11 +564,11 @@ intended to work on boolean indices and may return unexpected results.
569
564
570
565
.. ipython :: python
571
566
572
- arr = randn(10 )
567
+ arr = np.random. randn(10 )
573
568
arr.take([False , False , True , True ])
574
569
arr[[0 , 1 ]]
575
570
576
- ser = Series(randn(10 ))
571
+ ser = pd. Series(np.random. randn(10 ))
577
572
ser.take([False , False , True , True ])
578
573
ser.ix[[0 , 1 ]]
579
574
@@ -583,14 +578,14 @@ faster than fancy indexing.
583
578
584
579
.. ipython ::
585
580
586
- arr = randn(10000, 5)
581
+ arr = np.random. randn(10000, 5)
587
582
indexer = np.arange(10000)
588
583
random.shuffle(indexer)
589
584
590
585
timeit arr[indexer]
591
586
timeit arr.take(indexer, axis=0)
592
587
593
- ser = Series(arr[:, 0])
588
+ ser = pd. Series(arr[:, 0])
594
589
timeit ser.ix[indexer]
595
590
timeit ser.take(indexer)
596
591
@@ -608,10 +603,9 @@ setting the index of a ``DataFrame/Series`` with a ``category`` dtype would conv
608
603
609
604
.. ipython :: python
610
605
611
- df = DataFrame({' A' : np.arange(6 ),
612
- ' B' : Series(list (' aabbca' )).astype(' category' ,
613
- categories = list (' cab' ))
614
- })
606
+ df = pd.DataFrame({' A' : np.arange(6 ),
607
+ ' B' : list (' aabbca' )})
608
+ df[' B' ] = df[' B' ].astype(' category' , categories = list (' cab' ))
615
609
df
616
610
df.dtypes
617
611
df.B.cat.categories
@@ -669,15 +663,15 @@ values NOT in the categories, similarly to how you can reindex ANY pandas index.
669
663
670
664
.. code-block :: python
671
665
672
- In [10 ]: df3 = DataFrame({' A' : np.arange(6 ),
673
- ' B' : Series(list (' aabbca' )).astype(' category' ,
674
- categories = list ( ' abc ' ))
675
- }) .set_index(' B' )
666
+ In [9 ]: df3 = pd. DataFrame({' A' : np.arange(6 ),
667
+ ' B' : pd. Series(list (' aabbca' )).astype(' category' )})
668
+
669
+ In [ 11 ]: df3 = df3 .set_index(' B' )
676
670
677
671
In [11 ]: df3.index
678
672
Out[11 ]: CategoricalIndex([u ' a' , u ' a' , u ' b' , u ' b' , u ' c' , u ' a' ], categories = [u ' a' , u ' b' , u ' c' ], ordered = False , name = u ' B' , dtype = ' category' )
679
673
680
- In [12 ]: pd.concat([df2,df3]
674
+ In [12 ]: pd.concat([df2, df3]
681
675
TypeError : categories must match existing categories when appending
682
676
683
677
.. _indexing.float64index:
@@ -702,9 +696,9 @@ same.
702
696
703
697
.. ipython:: python
704
698
705
- indexf = Index([1.5 , 2 , 3 , 4.5 , 5 ])
699
+ indexf = pd. Index([1.5 , 2 , 3 , 4.5 , 5 ])
706
700
indexf
707
- sf = Series(range (5 ),index = indexf)
701
+ sf = pd. Series(range (5 ), index = indexf)
708
702
sf
709
703
710
704
Scalar selection for `` [],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. `` 3 `` is equivalent to `` 3.0 `` )
@@ -746,17 +740,17 @@ In non-float indexes, slicing using floats will raise a ``TypeError``
746
740
747
741
.. code- block:: python
748
742
749
- In [1 ]: Series(range (5 ))[3.5 ]
743
+ In [1 ]: pd. Series(range (5 ))[3.5 ]
750
744
TypeError : the label [3.5 ] is not a proper indexer for this index type (Int64Index)
751
745
752
- In [1 ]: Series(range (5 ))[3.5 :4.5 ]
746
+ In [1 ]: pd. Series(range (5 ))[3.5 :4.5 ]
753
747
TypeError : the slice start [3.5 ] is not a proper indexer for this index type (Int64Index)
754
748
755
749
Using a scalar float indexer will be deprecated in a future version, but is allowed for now.
756
750
757
751
.. code- block:: python
758
752
759
- In [3 ]: Series(range (5 ))[3.0 ]
753
+ In [3 ]: pd. Series(range (5 ))[3.0 ]
760
754
Out[3 ]: 3
761
755
762
756
Here is a typical use- case for using this type of indexing. Imagine that you have a somewhat
@@ -765,12 +759,12 @@ example be millisecond offsets.
765
759
766
760
.. ipython:: python
767
761
768
- dfir = concat([DataFrame(randn(5 ,2 ),
769
- index = np.arange(5 ) * 250.0 ,
770
- columns = list (' AB' )),
771
- DataFrame(randn(6 ,2 ),
772
- index = np.arange(4 ,10 ) * 250.1 ,
773
- columns = list (' AB' ))])
762
+ dfir = pd. concat([pd. DataFrame(np.random. randn(5 ,2 ),
763
+ index = np.arange(5 ) * 250.0 ,
764
+ columns = list (' AB' )),
765
+ pd. DataFrame(np.random. randn(6 ,2 ),
766
+ index = np.arange(4 ,10 ) * 250.1 ,
767
+ columns = list (' AB' ))])
774
768
dfir
775
769
776
770
Selection operations then will always work on a value basis, for all selection operators.
0 commit comments