@@ -34,9 +34,9 @@ and labeled columns:
34
34
35
35
.. ipython :: python
36
36
37
- dates = pd.date_range(' 20130101' , periods = 6 )
37
+ dates = pd.date_range(" 20130101" , periods = 6 )
38
38
dates
39
- df = pd.DataFrame(np.random.randn(6 , 4 ), index = dates, columns = list (' ABCD' ))
39
+ df = pd.DataFrame(np.random.randn(6 , 4 ), index = dates, columns = list (" ABCD" ))
40
40
df
41
41
42
42
Creating a :class: `DataFrame ` by passing a dict of objects that can be converted to series-like.
@@ -156,7 +156,7 @@ Sorting by values:
156
156
157
157
.. ipython :: python
158
158
159
- df.sort_values(by = ' B ' )
159
+ df.sort_values(by = " B " )
160
160
161
161
Selection
162
162
---------
@@ -178,14 +178,14 @@ equivalent to ``df.A``:
178
178
179
179
.. ipython :: python
180
180
181
- df[' A ' ]
181
+ df[" A " ]
182
182
183
183
Selecting via ``[] ``, which slices the rows.
184
184
185
185
.. ipython :: python
186
186
187
187
df[0 :3 ]
188
- df[' 20130102' : ' 20130104' ]
188
+ df[" 20130102" : " 20130104" ]
189
189
190
190
Selection by label
191
191
~~~~~~~~~~~~~~~~~~
@@ -202,31 +202,31 @@ Selecting on a multi-axis by label:
202
202
203
203
.. ipython :: python
204
204
205
- df.loc[:, [' A ' , ' B ' ]]
205
+ df.loc[:, [" A " , " B " ]]
206
206
207
207
Showing label slicing, both endpoints are *included *:
208
208
209
209
.. ipython :: python
210
210
211
- df.loc[' 20130102' : ' 20130104' , [' A ' , ' B ' ]]
211
+ df.loc[" 20130102" : " 20130104" , [" A " , " B " ]]
212
212
213
213
Reduction in the dimensions of the returned object:
214
214
215
215
.. ipython :: python
216
216
217
- df.loc[' 20130102' , [' A ' , ' B ' ]]
217
+ df.loc[" 20130102" , [" A " , " B " ]]
218
218
219
219
For getting a scalar value:
220
220
221
221
.. ipython :: python
222
222
223
- df.loc[dates[0 ], ' A ' ]
223
+ df.loc[dates[0 ], " A " ]
224
224
225
225
For getting fast access to a scalar (equivalent to the prior method):
226
226
227
227
.. ipython :: python
228
228
229
- df.at[dates[0 ], ' A ' ]
229
+ df.at[dates[0 ], " A " ]
230
230
231
231
Selection by position
232
232
~~~~~~~~~~~~~~~~~~~~~
@@ -282,7 +282,7 @@ Using a single column's values to select data.
282
282
283
283
.. ipython :: python
284
284
285
- df[df[' A ' ] > 0 ]
285
+ df[df[" A " ] > 0 ]
286
286
287
287
Selecting values from a DataFrame where a boolean condition is met.
288
288
@@ -295,9 +295,9 @@ Using the :func:`~Series.isin` method for filtering:
295
295
.. ipython :: python
296
296
297
297
df2 = df.copy()
298
- df2[' E ' ] = [' one' , ' one' , ' two' , ' three' , ' four' , ' three' ]
298
+ df2[" E " ] = [" one" , " one" , " two" , " three" , " four" , " three" ]
299
299
df2
300
- df2[df2[' E ' ].isin([' two' , ' four' ])]
300
+ df2[df2[" E " ].isin([" two" , " four" ])]
301
301
302
302
Setting
303
303
~~~~~~~
@@ -307,15 +307,15 @@ by the indexes.
307
307
308
308
.. ipython :: python
309
309
310
- s1 = pd.Series([1 , 2 , 3 , 4 , 5 , 6 ], index = pd.date_range(' 20130102' , periods = 6 ))
310
+ s1 = pd.Series([1 , 2 , 3 , 4 , 5 , 6 ], index = pd.date_range(" 20130102" , periods = 6 ))
311
311
s1
312
- df[' F ' ] = s1
312
+ df[" F " ] = s1
313
313
314
314
Setting values by label:
315
315
316
316
.. ipython :: python
317
317
318
- df.at[dates[0 ], ' A ' ] = 0
318
+ df.at[dates[0 ], " A " ] = 0
319
319
320
320
Setting values by position:
321
321
@@ -327,7 +327,7 @@ Setting by assigning with a NumPy array:
327
327
328
328
.. ipython :: python
329
329
330
- df.loc[:, ' D ' ] = np.array([5 ] * len (df))
330
+ df.loc[:, " D " ] = np.array([5 ] * len (df))
331
331
332
332
The result of the prior setting operations.
333
333
@@ -356,15 +356,15 @@ returns a copy of the data.
356
356
357
357
.. ipython :: python
358
358
359
- df1 = df.reindex(index = dates[0 :4 ], columns = list (df.columns) + [' E ' ])
360
- df1.loc[dates[0 ]: dates[1 ], ' E ' ] = 1
359
+ df1 = df.reindex(index = dates[0 :4 ], columns = list (df.columns) + [" E " ])
360
+ df1.loc[dates[0 ] : dates[1 ], " E " ] = 1
361
361
df1
362
362
363
363
To drop any rows that have missing data.
364
364
365
365
.. ipython :: python
366
366
367
- df1.dropna(how = ' any' )
367
+ df1.dropna(how = " any" )
368
368
369
369
Filling missing data.
370
370
@@ -408,7 +408,7 @@ In addition, pandas automatically broadcasts along the specified dimension.
408
408
409
409
s = pd.Series([1 , 3 , 5 , np.nan, 6 , 8 ], index = dates).shift(2 )
410
410
s
411
- df.sub(s, axis = ' index' )
411
+ df.sub(s, axis = " index" )
412
412
413
413
414
414
Apply
@@ -444,7 +444,7 @@ some cases always uses them). See more at :ref:`Vectorized String Methods
444
444
445
445
.. ipython :: python
446
446
447
- s = pd.Series([' A ' , ' B ' , ' C ' , ' Aaba' , ' Baca' , np.nan, ' CABA' , ' dog' , ' cat' ])
447
+ s = pd.Series([" A " , " B " , " C " , " Aaba" , " Baca" , np.nan, " CABA" , " dog" , " cat" ])
448
448
s.str.lower()
449
449
450
450
Merge
@@ -486,21 +486,21 @@ SQL style merges. See the :ref:`Database style joining <merging.join>` section.
486
486
487
487
.. ipython :: python
488
488
489
- left = pd.DataFrame({' key' : [' foo' , ' foo' ], ' lval' : [1 , 2 ]})
490
- right = pd.DataFrame({' key' : [' foo' , ' foo' ], ' rval' : [4 , 5 ]})
489
+ left = pd.DataFrame({" key" : [" foo" , " foo" ], " lval" : [1 , 2 ]})
490
+ right = pd.DataFrame({" key" : [" foo" , " foo" ], " rval" : [4 , 5 ]})
491
491
left
492
492
right
493
- pd.merge(left, right, on = ' key' )
493
+ pd.merge(left, right, on = " key" )
494
494
495
495
Another example that can be given is:
496
496
497
497
.. ipython :: python
498
498
499
- left = pd.DataFrame({' key' : [' foo' , ' bar' ], ' lval' : [1 , 2 ]})
500
- right = pd.DataFrame({' key' : [' foo' , ' bar' ], ' rval' : [4 , 5 ]})
499
+ left = pd.DataFrame({" key" : [" foo" , " bar" ], " lval" : [1 , 2 ]})
500
+ right = pd.DataFrame({" key" : [" foo" , " bar" ], " rval" : [4 , 5 ]})
501
501
left
502
502
right
503
- pd.merge(left, right, on = ' key' )
503
+ pd.merge(left, right, on = " key" )
504
504
505
505
Grouping
506
506
--------
@@ -531,14 +531,14 @@ groups.
531
531
532
532
.. ipython :: python
533
533
534
- df.groupby(' A ' ).sum()
534
+ df.groupby(" A " ).sum()
535
535
536
536
Grouping by multiple columns forms a hierarchical index, and again we can
537
537
apply the :meth: `~pandas.core.groupby.GroupBy.sum ` function.
538
538
539
539
.. ipython :: python
540
540
541
- df.groupby([' A ' , ' B ' ]).sum()
541
+ df.groupby([" A " , " B " ]).sum()
542
542
543
543
Reshaping
544
544
---------
@@ -559,8 +559,8 @@ Stack
559
559
]
560
560
)
561
561
)
562
- index = pd.MultiIndex.from_tuples(tuples, names = [' first' , ' second' ])
563
- df = pd.DataFrame(np.random.randn(8 , 2 ), index = index, columns = [' A ' , ' B ' ])
562
+ index = pd.MultiIndex.from_tuples(tuples, names = [" first" , " second" ])
563
+ df = pd.DataFrame(np.random.randn(8 , 2 ), index = index, columns = [" A " , " B " ])
564
564
df2 = df[:4 ]
565
565
df2
566
566
@@ -603,7 +603,7 @@ We can produce pivot tables from this data very easily:
603
603
604
604
.. ipython :: python
605
605
606
- pd.pivot_table(df, values = ' D ' , index = [' A ' , ' B ' ], columns = [' C ' ])
606
+ pd.pivot_table(df, values = " D " , index = [" A " , " B " ], columns = [" C " ])
607
607
608
608
609
609
Time series
@@ -616,31 +616,31 @@ financial applications. See the :ref:`Time Series section <timeseries>`.
616
616
617
617
.. ipython :: python
618
618
619
- rng = pd.date_range(' 1/1/2012' , periods = 100 , freq = ' S ' )
619
+ rng = pd.date_range(" 1/1/2012" , periods = 100 , freq = " S " )
620
620
ts = pd.Series(np.random.randint(0 , 500 , len (rng)), index = rng)
621
- ts.resample(' 5Min' ).sum()
621
+ ts.resample(" 5Min" ).sum()
622
622
623
623
Time zone representation:
624
624
625
625
.. ipython :: python
626
626
627
- rng = pd.date_range(' 3/6/2012 00:00' , periods = 5 , freq = ' D ' )
627
+ rng = pd.date_range(" 3/6/2012 00:00" , periods = 5 , freq = " D " )
628
628
ts = pd.Series(np.random.randn(len (rng)), rng)
629
629
ts
630
- ts_utc = ts.tz_localize(' UTC' )
630
+ ts_utc = ts.tz_localize(" UTC" )
631
631
ts_utc
632
632
633
633
Converting to another time zone:
634
634
635
635
.. ipython :: python
636
636
637
- ts_utc.tz_convert(' US/Eastern' )
637
+ ts_utc.tz_convert(" US/Eastern" )
638
638
639
639
Converting between time span representations:
640
640
641
641
.. ipython :: python
642
642
643
- rng = pd.date_range(' 1/1/2012' , periods = 5 , freq = ' M ' )
643
+ rng = pd.date_range(" 1/1/2012" , periods = 5 , freq = " M " )
644
644
ts = pd.Series(np.random.randn(len (rng)), index = rng)
645
645
ts
646
646
ps = ts.to_period()
@@ -654,9 +654,9 @@ the quarter end:
654
654
655
655
.. ipython :: python
656
656
657
- prng = pd.period_range(' 1990Q1' , ' 2000Q4' , freq = ' Q-NOV' )
657
+ prng = pd.period_range(" 1990Q1" , " 2000Q4" , freq = " Q-NOV" )
658
658
ts = pd.Series(np.random.randn(len (prng)), prng)
659
- ts.index = (prng.asfreq(' M ' , ' e ' ) + 1 ).asfreq(' H ' , ' s ' ) + 9
659
+ ts.index = (prng.asfreq(" M " , " e " ) + 1 ).asfreq(" H " , " s " ) + 9
660
660
ts.head()
661
661
662
662
Categoricals
@@ -754,19 +754,20 @@ CSV
754
754
755
755
.. ipython :: python
756
756
757
- df.to_csv(' foo.csv' )
757
+ df.to_csv(" foo.csv" )
758
758
759
759
:ref: `Reading from a csv file. <io.read_csv_table >`
760
760
761
761
.. ipython :: python
762
762
763
- pd.read_csv(' foo.csv' )
763
+ pd.read_csv(" foo.csv" )
764
764
765
765
.. ipython :: python
766
766
:suppress:
767
767
768
768
import os
769
- os.remove(' foo.csv' )
769
+
770
+ os.remove(" foo.csv" )
770
771
771
772
HDF5
772
773
~~~~
@@ -777,18 +778,18 @@ Writing to a HDF5 Store.
777
778
778
779
.. ipython :: python
779
780
780
- df.to_hdf(' foo.h5' , ' df ' )
781
+ df.to_hdf(" foo.h5" , " df " )
781
782
782
783
Reading from a HDF5 Store.
783
784
784
785
.. ipython :: python
785
786
786
- pd.read_hdf(' foo.h5' , ' df ' )
787
+ pd.read_hdf(" foo.h5" , " df " )
787
788
788
789
.. ipython :: python
789
790
:suppress:
790
791
791
- os.remove(' foo.h5' )
792
+ os.remove(" foo.h5" )
792
793
793
794
Excel
794
795
~~~~~
@@ -799,18 +800,18 @@ Writing to an excel file.
799
800
800
801
.. ipython :: python
801
802
802
- df.to_excel(' foo.xlsx' , sheet_name = ' Sheet1' )
803
+ df.to_excel(" foo.xlsx" , sheet_name = " Sheet1" )
803
804
804
805
Reading from an excel file.
805
806
806
807
.. ipython :: python
807
808
808
- pd.read_excel(' foo.xlsx' , ' Sheet1' , index_col = None , na_values = [' NA ' ])
809
+ pd.read_excel(" foo.xlsx" , " Sheet1" , index_col = None , na_values = [" NA " ])
809
810
810
811
.. ipython :: python
811
812
:suppress:
812
813
813
- os.remove(' foo.xlsx' )
814
+ os.remove(" foo.xlsx" )
814
815
815
816
Gotchas
816
817
-------
0 commit comments