@@ -2584,3 +2584,123 @@ Tthe dataset names are listed at `Fama/French Data Library
2584
2584
import pandas.io.data as web
2585
2585
ip = web.DataReader(" 5_Industry_Portfolios" , " famafrench" )
2586
2586
ip[4 ].ix[192607 ]
2587
+
2588
+
2589
+ World Bank panel data in Pandas
2590
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2591
+
2592
+ `` Pandas`` users can easily access thousands of panel data series from the
2593
+ `World Bank' s World Development Indicators <http://data.worldbank.org>`_
2594
+ by using the ``wb`` I/ O functions.
2595
+
2596
+ For example, if you wanted to compare the Gross Domestic Products per capita in
2597
+ constant dollars in North America, you would use the `` search`` function:
2598
+
2599
+ .. code:: python
2600
+
2601
+ In [1 ]: from pandas.io.wb import search, download
2602
+
2603
+ In [2 ]: search(' gdp.*capita.*const' ).iloc[:,:2 ]
2604
+ Out[2 ]:
2605
+ id name
2606
+ 3242 GDPPCKD GDP per Capita, constant US $ , millions
2607
+ 5143 NY .GDP .PCAP .KD GDP per capita (constant 2005 US $ )
2608
+ 5145 NY .GDP .PCAP .KN GDP per capita (constant LCU )
2609
+ 5147 NY .GDP .PCAP .PP .KD GDP per capita, PPP (constant 2005 internation...
2610
+
2611
+ Then you would use the `` download`` function to acquire the data from the World
2612
+ Bank' s servers:
2613
+
2614
+ .. code:: python
2615
+
2616
+ In [3 ]: dat = download(indicator = ' NY.GDP.PCAP.KD' , country = [' US' , ' CA' , ' MX' ], start = 2005 , end = 2008 )
2617
+
2618
+ In [4 ]: print dat
2619
+ NY .GDP .PCAP .KD
2620
+ country year
2621
+ Canada 2008 36005.5004978584
2622
+ 2007 36182.9138439757
2623
+ 2006 35785.9698172849
2624
+ 2005 35087.8925933298
2625
+ Mexico 2008 8113.10219480083
2626
+ 2007 8119.21298908649
2627
+ 2006 7961.96818458178
2628
+ 2005 7666.69796097264
2629
+ United States 2008 43069.5819857208
2630
+ 2007 43635.5852068142
2631
+ 2006 43228.111147107
2632
+ 2005 42516.3934699993
2633
+
2634
+ The resulting dataset is a properly formatted `` DataFrame`` with a hierarchical
2635
+ index, so it is easy to apply `` .groupby`` transformations to it:
2636
+
2637
+ .. code:: python
2638
+
2639
+ In [6 ]: dat[' NY.GDP.PCAP.KD' ].groupby(level = 0 ).mean()
2640
+ Out[6 ]:
2641
+ country
2642
+ Canada 35765.569188
2643
+ Mexico 7965.245332
2644
+ United States 43112.417952
2645
+ dtype: float64
2646
+
2647
+ Now imagine you want to compare GDP to the share of people with cellphone
2648
+ contracts around the world.
2649
+
2650
+ .. code:: python
2651
+
2652
+ In [7 ]: search(' cell.*%' ).iloc[:,:2 ]
2653
+ Out[7 ]:
2654
+ id name
2655
+ 3990 IT .CEL .SETS .FE .ZS Mobile cellular telephone users, female (% of ...
2656
+ 3991 IT .CEL .SETS .MA .ZS Mobile cellular telephone users, male (% of po...
2657
+ 4027 IT .MOB .COV .ZS Population coverage of mobile cellular telepho...
2658
+
2659
+ Notice that this second search was much faster than the first one because
2660
+ `` Pandas`` now has a cached list of available data series.
2661
+
2662
+ .. code:: python
2663
+
2664
+ In [13 ]: ind = [' NY.GDP.PCAP.KD' , ' IT.MOB.COV.ZS' ]
2665
+ In [14 ]: dat = download(indicator = ind, country = ' all' , start = 2011 , end = 2011 ).dropna()
2666
+ In [15 ]: dat.columns = [' gdp' , ' cellphone' ]
2667
+ In [16 ]: print dat.tail()
2668
+ gdp cellphone
2669
+ country year
2670
+ Swaziland 2011 2413.952853 94.9
2671
+ Tunisia 2011 3687.340170 100.0
2672
+ Uganda 2011 405.332501 100.0
2673
+ Zambia 2011 767.911290 62.0
2674
+ Zimbabwe 2011 419.236086 72.4
2675
+
2676
+ Finally, we use the `` statsmodels`` package to assess the relationship between
2677
+ our two variables using ordinary least squares regression. Unsurprisingly,
2678
+ populations in rich countries tend to use cellphones at a higher rate:
2679
+
2680
+ .. code:: python
2681
+
2682
+ In [17 ]: import numpy as np
2683
+ In [18 ]: import statsmodels.formula.api as smf
2684
+ In [19 ]: mod = smf.ols(" cellphone ~ np.log(gdp)" , dat).fit()
2685
+ In [20 ]: print mod.summary()
2686
+ OLS Regression Results
2687
+ ==============================================================================
2688
+ Dep. Variable: cellphone R- squared: 0.297
2689
+ Model: OLS Adj. R- squared: 0.274
2690
+ Method: Least Squares F- statistic: 13.08
2691
+ Date: Thu, 25 Jul 2013 Prob (F- statistic): 0.00105
2692
+ Time: 15 :24 :42 Log- Likelihood: - 139.16
2693
+ No. Observations: 33 AIC : 282.3
2694
+ Df Residuals: 31 BIC : 285.3
2695
+ Df Model: 1
2696
+ ============================================================================== =
2697
+ coef std err t P> | t| [95.0 % Conf. Int.]
2698
+ ------------------------------------------------------------------------------ -
2699
+ Intercept 16.5110 19.071 0.866 0.393 - 22.384 55.406
2700
+ np.log(gdp) 9.9333 2.747 3.616 0.001 4.331 15.535
2701
+ ==============================================================================
2702
+ Omnibus: 36.054 Durbin- Watson: 2.071
2703
+ Prob(Omnibus): 0.000 Jarque- Bera (JB ): 119.133
2704
+ Skew: - 2.314 Prob(JB ): 1.35e-26
2705
+ Kurtosis: 11.077 Cond. No. 45.8
2706
+ ==============================================================================
0 commit comments