Skip to content

Commit 6a7ff40

Browse files
committed
ENH: Add ISO3 ctry codes and error arg. Fix tests, warn/exception logic #8482
1 parent 2737f5a commit 6a7ff40

File tree

4 files changed

+306
-75
lines changed

4 files changed

+306
-75
lines changed

doc/source/remote_data.rst

+59
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,12 @@ World Bank
143143
`World Bank's World Development Indicators <http://data.worldbank.org>`__
144144
by using the ``wb`` I/O functions.
145145

146+
Indicators
147+
~~~~~~~~~~
148+
149+
Either from exploring the World Bank site, or using the search function included,
150+
every world bank indicator is accessible.
151+
146152
For example, if you wanted to compare the Gross Domestic Products per capita in
147153
constant dollars in North America, you would use the ``search`` function:
148154

@@ -254,3 +260,56 @@ populations in rich countries tend to use cellphones at a higher rate:
254260
Skew: -2.314 Prob(JB): 1.35e-26
255261
Kurtosis: 11.077 Cond. No. 45.8
256262
==============================================================================
263+
264+
Country Codes
265+
~~~~~~~~~~~~~
266+
267+
.. versionadded:: 0.15.1
268+
269+
The ``country`` argument accepts a string or list of mixed
270+
`two <http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2>`__ or `three <http://en.wikipedia.org/wiki/ISO_3166-1_alpha-3>`__ character
271+
ISO country codes, as well as dynamic `World Bank exceptions <http://data.worldbank.org/node/18>`__ to the ISO standards.
272+
273+
For a list of the the hard-coded country codes (used solely for error handling logic) see ``pandas.io.wb.country_codes``.
274+
275+
Problematic Country Codes & Indicators
276+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
277+
278+
.. note::
279+
280+
The World Bank's country list and indicators are dynamic. As of 0.15.1,
281+
:func:`wb.download()` is more flexible. To achieve this, the warning
282+
and exception logic changed.
283+
284+
The world bank converts some country codes,
285+
in their response, which makes error checking by pandas difficult.
286+
Retired indicators still persist in the search.
287+
288+
Given the new flexibility of 0.15.1, improved error handling by the user
289+
may be necessary for fringe cases.
290+
291+
To help identify issues:
292+
293+
There are at least 4 kinds of country codes:
294+
295+
1. Standard (2/3 digit ISO) - returns data, will warn and error properly.
296+
2. Non-standard (WB Exceptions) - returns data, but will falsely warn.
297+
3. Blank - silently missing from the response.
298+
4. Bad - causes the entire response from WB to fail, always exception inducing.
299+
300+
There are at least 3 kinds of indicators:
301+
302+
1. Current - Returns data.
303+
2. Retired - Appears in search results, yet won't return data.
304+
3. Bad - Will not return data.
305+
306+
Use the ``errors`` argument to control warnings and exceptions. Setting
307+
errors to ignore or warn, won't stop failed responses. (ie, 100% bad
308+
indicators, or a single "bad" (#4 above) country code).
309+
310+
See docstrings for more info.
311+
312+
313+
314+
315+

doc/source/whatsnew/v0.15.1.txt

+5-7
Original file line numberDiff line numberDiff line change
@@ -19,18 +19,17 @@ users upgrade to this version.
1919

2020
API changes
2121
~~~~~~~~~~~
22-
23-
22+
2423
.. _whatsnew_0151.enhancements:
2524

2625
Enhancements
2726
~~~~~~~~~~~~
2827

2928
- Added option to select columns when importing Stata files (:issue:`7935`)
30-
3129
- Qualify memory usage in ``DataFrame.info()`` by adding ``+`` if it is a lower bound (:issue:`8578`)
32-
33-
30+
- Added support for 3-character ISO and non-standard country codes in :func:``io.wb.download()`` (:issue:`8482`)
31+
- :ref:`World Bank data requests <remote_data.wb>` now raise Warnings and ValueErrors based on an ``errors`` argument, as well as a list of hard-coded country codes and the World Bank's JSON response. In prior versions, the error messages didn't look at the World Bank's JSON response. Problem-inducing input were simply dropped prior to the request. The issue was that many good countries were cropped in the hard-coded approach. All countries will work now, but some bad countries will raise exceptions because some edge cases break the entire response.
32+
3433
.. _whatsnew_0151.performance:
3534

3635
Performance
@@ -41,8 +40,7 @@ Performance
4140

4241
Experimental
4342
~~~~~~~~~~~~
44-
45-
43+
4644
.. _whatsnew_0151.bug_fixes:
4745

4846
Bug Fixes

pandas/io/tests/test_wb.py

+85-18
Original file line numberDiff line numberDiff line change
@@ -14,42 +14,109 @@ class TestWB(tm.TestCase):
1414
@slow
1515
@network
1616
def test_wdi_search(self):
17-
raise nose.SkipTest
18-
19-
expected = {u('id'): {2634: u('GDPPCKD'),
20-
4649: u('NY.GDP.PCAP.KD'),
21-
4651: u('NY.GDP.PCAP.KN'),
22-
4653: u('NY.GDP.PCAP.PP.KD')},
23-
u('name'): {2634: u('GDP per Capita, constant US$, '
24-
'millions'),
25-
4649: u('GDP per capita (constant 2000 US$)'),
26-
4651: u('GDP per capita (constant LCU)'),
27-
4653: u('GDP per capita, PPP (constant 2005 '
17+
18+
expected = {u('id'): {6716: u('NY.GDP.PCAP.KD'),
19+
6718: u('NY.GDP.PCAP.KN'),
20+
6720: u('NY.GDP.PCAP.PP.KD')},
21+
u('name'): {6716: u('GDP per capita (constant 2005 US$)'),
22+
6718: u('GDP per capita (constant LCU)'),
23+
6720: u('GDP per capita, PPP (constant 2011 '
2824
'international $)')}}
29-
result = search('gdp.*capita.*constant').ix[:, :2]
25+
result = search('gdp.*capita.*constant').loc[6716:,['id','name']]
3026
expected = pandas.DataFrame(expected)
3127
expected.index = result.index
3228
assert_frame_equal(result, expected)
3329

3430
@slow
3531
@network
3632
def test_wdi_download(self):
37-
raise nose.SkipTest
3833

39-
expected = {'GDPPCKN': {(u('United States'), u('2003')): u('40800.0735367688'), (u('Canada'), u('2004')): u('37857.1261134552'), (u('United States'), u('2005')): u('42714.8594790102'), (u('Canada'), u('2003')): u('37081.4575704003'), (u('United States'), u('2004')): u('41826.1728310667'), (u('Mexico'), u('2003')): u('72720.0691255285'), (u('Mexico'), u('2004')): u('74751.6003347038'), (u('Mexico'), u('2005')): u('76200.2154469437'), (u('Canada'), u('2005')): u('38617.4563629611')}, 'GDPPCKD': {(u('United States'), u('2003')): u('40800.0735367688'), (u('Canada'), u('2004')): u('34397.055116118'), (u('United States'), u('2005')): u('42714.8594790102'), (u('Canada'), u('2003')): u('33692.2812368928'), (u('United States'), u('2004')): u('41826.1728310667'), (u('Mexico'), u('2003')): u('7608.43848670658'), (u('Mexico'), u('2004')): u('7820.99026814334'), (u('Mexico'), u('2005')): u('7972.55364129367'), (u('Canada'), u('2005')): u('35087.8925933298')}}
34+
# Test a bad indicator with double (US), triple (USA),
35+
# standard (CA, MX), non standard (KSV),
36+
# duplicated (US, US, USA), and unknown (BLA) country codes
37+
38+
# ...but NOT a crash inducing country code (World bank strips pandas
39+
# users of the luxury of laziness, because they create their
40+
# own exceptions, and don't clean up legacy country codes.
41+
# ...but NOT a retired indicator (User should want it to error.)
42+
43+
cntry_codes = ['CA', 'MX', 'USA', 'US', 'US', 'KSV', 'BLA']
44+
inds = ['NY.GDP.PCAP.CD','BAD.INDICATOR']
45+
46+
expected = {'NY.GDP.PCAP.CD': {('Canada', '2003'): 28026.006013044702, ('Mexico', '2003'): 6601.0420648056606, ('Canada', '2004'): 31829.522562759001, ('Kosovo', '2003'): 1969.56271307405, ('Mexico', '2004'): 7042.0247834044303, ('United States', '2004'): 41928.886136479705, ('United States', '2003'): 39682.472247320402, ('Kosovo', '2004'): 2135.3328465238301}}
4047
expected = pandas.DataFrame(expected)
41-
result = download(country=['CA', 'MX', 'US', 'junk'], indicator=['GDPPCKD',
42-
'GDPPCKN', 'junk'], start=2003, end=2005)
48+
expected.sort(inplace=True)
49+
result = download(country=cntry_codes, indicator=inds,
50+
start=2003, end=2004, errors='ignore')
51+
result.sort(inplace=True)
4352
expected.index = result.index
4453
assert_frame_equal(result, pandas.DataFrame(expected))
4554

55+
@slow
56+
@network
57+
def test_wdi_download_w_retired_indicator(self):
58+
59+
cntry_codes = ['CA', 'MX', 'US']
60+
# Despite showing up in the search feature, and being listed online,
61+
# the api calls to GDPPCKD don't work in their own query builder, nor
62+
# pandas module. GDPPCKD used to be a common symbol.
63+
# This test is written to ensure that error messages to pandas users
64+
# continue to make sense, rather than a user getting some missing
65+
# key error, cause their JSON message format changed. If
66+
# World bank ever finishes the deprecation of this symbol,
67+
# this nose test should still pass.
68+
69+
inds = ['GDPPCKD']
70+
71+
try:
72+
result = download(country=cntry_codes, indicator=inds,
73+
start=2003, end=2004, errors='ignore')
74+
# If for some reason result actually ever has data, it's cause WB
75+
# fixed the issue with this ticker. Find another bad one.
76+
except ValueError as e:
77+
error_raised = True
78+
error_msg = e.args[0]
79+
80+
self.assertTrue(error_raised)
81+
self.assertTrue("No indicators returned data." in error_msg)
82+
83+
# if it ever gets here, it means WB unretired the indicator.
84+
# even if they dropped it completely, it would still get caught above
85+
# or the WB API changed somehow in a really unexpected way.
86+
if len(result) > 0:
87+
raise nose.SkipTest
88+
89+
90+
91+
@slow
92+
@network
93+
def test_wdi_download_w_crash_inducing_countrycode(self):
94+
95+
cntry_codes = ['CA', 'MX', 'US', 'XXX']
96+
inds = ['NY.GDP.PCAP.CD']
97+
98+
try:
99+
result = download(country=cntry_codes, indicator=inds,
100+
start=2003, end=2004, errors='ignore')
101+
except ValueError as e:
102+
error_raised = True
103+
error_msg = e.args[0]
104+
105+
self.assertTrue(error_raised)
106+
self.assertTrue("No indicators returned data." in error_msg)
107+
108+
# if it ever gets here, it means the country code XXX got used by WB
109+
# or the WB API changed somehow in a really unexpected way.
110+
if len(result) > 0:
111+
raise nose.SkipTest
112+
46113
@slow
47114
@network
48115
def test_wdi_get_countries(self):
49116
result = get_countries()
50117
self.assertTrue('Zimbabwe' in list(result['name']))
51-
118+
self.assertTrue(len(result) > 100)
52119

53120
if __name__ == '__main__':
54121
nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'],
55-
exit=False)
122+
exit=False)

0 commit comments

Comments
 (0)