-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Add ISO3 ctry codes and error arg. Fix tests, warn/exception logic #8482 #8551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
country_codes = {} | ||
|
||
country_codes['ISO 3166-1 alpha-3'] = \ | ||
{'ABW' : 'Aruba', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this map actually necessary, aren't you just validating against the codes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map, is basically just there for reference. Improve maintainability.
The issue is that World Bank says one thing, then does another. They say "ISO", but then add/adapt that, as they see fit sporadically. Then, even the ISO standards, don't do a great job of creating a static standard. Because countries squabble and struggle to find a common ground, they create exceptions, borders change, it's dynamic.
This way, we hardcode the "latest" World Bank Standard, but we give the option to the users to work around World Bank's flakey logic if changes.
See "KSV", for example. Not in the ISO standard, but its the only way to get data for Kosovo. The user would set safe_cc_only to False, and be able to get data, without the available warnings.
Also, users could very easily adjust the list, and easily look up country codes in wb.io.country_codes.
I think this should be much simpler. Just take the input codes and validate against the 2 or 3 code map. Prob best to simply raise if their are codes that are not valid for the specified map (and make sure its only against the 2 OR 3 map). |
Question 1
Did you mean, that we should remove the 2-digit support? Wouldn't that break backwards compatibility? Surely, this isn't what you meant. Question 2Should I create a 0.15.1.txt? Confusion? Duplicated effort? Separate PR? Comment 1b0ed056 removes the maps, and safe_cc_only, as requested. I'll leave it as a separate commit, until you or somebody else acknowledges the dynamism that World Bank introduces to the ISO "standard". If you acknowledge the example of KSV, or other unknowable exceptions, and want to still stick to the more hard coded list, that's okay by me. |
if len(bad_countries) > 0: | ||
print('Invalid ISO-2 codes: %s' % ' '.join(bad_countries)) | ||
bad_countries = '; '.join(bad_countries) | ||
print('Invalid ISO country codes: %s' % bad_countries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise as a ValueError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a mistake, should be a warning at most. If I raise an exception, the user will have no way of accessing world-bank custom non-ISO codes. Eg. KSV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine, then really no reason to even validate the codes AT ALL.
If the returned values are not there, just make them NaN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except, you want to validate, because if you request something like, say, UK, instead of GB (No idea what actually happens, or which one is right), how does one check that they got "the right" country? World bank does tricky things. For instance, request a stale country code, and they'll they send a response back with the updated new one. A user could manually look at the response from World Bank, but automating that check is very tricky, if you don't start with a known list of country codes.
The static, hard coded, map, works/worked really really well. Exceptionally, explicitly, well.
For the sake of a few hundred, really, really, easy to understand lines of code (the map I already deleted), you're pushing to make pandas, and any user's code, less maintainable, as well as the response from World Bank, and the notifications Pandas provides with it, less explicit than I managed to create.
I'm a rookie at git, and the etiquette on github, but not logic. I've reviewed the problem of adding 3 digit country codes, very thoroughly. Both reading World Bank, looking at their API, and referencing Wikipedia.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, not concerned about the length of the code :)
I get that you want to validate the user input, and it is fine. However what do you want to do with an error is the question? I really don't like warnings, they don't really make someone pay attention, a nan
is the generally accepted soln. However, since validation is a good thing, I would then raise ValueError
if the code does not pass muster.
Another possibility is to add a keywork argument, e.g. errors='raise'
(and allow 'warn','nan') for the other cases) if you are ambituous, this makes it pretty general (lots of pandas top-level functions do this, e.g. pd.to_datetime
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the errors
argument. I can work with that. (But 'nan' can't work. We can't know now, what World Bank might return tomorrow.)
I didn't like the messages as they were either. They are amateurish.
If I have permission to change the warning/message logic, then
this..
Old Default Behaviour (v0.14)
- Validate [against list]
- Drop Invalid From Request
- Dumb Print Warnings
...could become (maintain 0.14 logic):
New Default Behaviour (errors='warn'
) ['KSV' dropped w/Warning]
- Validate Input [against map]
- Drop Invalid From Request
- Warning('Dropped Country Code(s)')
Force Query (errors='ignore'
) ['KSV' works]
- Blind Include Every Input Code in Request
Real Validity Check (errors='raise'
) ['KSV' throws exception]
- Validate [against map]
- Exception('Invalid Country Code(s)')
...but this would be even better:
New Default Behaviour (errors='warn'
) ['KSV' works, but throws Warning]
- Blind Include Every Input Code in Request
- Warning('Non-Standard Country Code(s)')
Force Query (errors='ignore'
) ['KSV' works]
- Blind Include Every Input Code in Request
Real Validity Check (errors='raise'
) ['KSV' throws exception]
- Validate [against map]
- Exception('Invalid Country Code(s)')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your last case looks fine (see what happens with the existing tests) - eg plug it in and see what if anything fails
@jnmclarty I meant you can support both 2 and 3 digit country codes (though was sort of -1 on allowing the mixing, but on 2nd thought that is unambiguous) will be for 0.15.1 (the file doesn't exist yet), check back this weekend |
that sounds fine then |
Wait, what? What "sounds fine"? a) leave it the old way. using json.loads + Dataframe + hacks or b) using json.loads + json_normalize + hacks? Edit: I'm indifferent. Technically, B, is probably more robust, cause the hacks are better, but harder to read. |
oh the old way is fine - least hacks as possible if u would like to clean that part up ok! can do in another pr if u would like |
So, there was one more surprise I stumbled on during testing. In all of my previous posts, and up until and including 80dd8fd, I thought there were three kinds of country codes:
Turns out, there are 4.
My code, and logic, now handles all of it, IMO, extremely well and passes things appropriately to the user based on ignore/warn/raise at both the country code and indicator level. |
bad_ind_msgs = "\n\n".join(bad_ind_msgs) | ||
bad_ind_msgs = "\n\nInvalid Indicators:\n\n%s" % bad_ind_msgs | ||
if errors == 'raise': | ||
raise Exception(bad_ind_msgs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ValueError
rather than an Exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ValueError
Done. On this one, and the other spots where relevant. cfd1baa
you'll need to squash the commits to 1-2 https://github.com/pydata/pandas/wiki/Using-Git add a doc note in v0.15.1 (enhancements section), referencing the issue |
I suspect some tests were failing and the prior author didn't figure it out - or they were do an odd data comparison that then was failing whatever u can fix would be gr8 u can delete them (rather than comment out) as u fix |
I already had fixed the issues with the tests. Obviously, I rolled in things that test this ENH. |
Done. 7d11d12 |
Doc note and squash is last thing I need to do...stay tuned. |
try: | ||
result = download(country=cntry_codes, indicator=inds, | ||
start=2003, end=2004, errors='ignore') | ||
# If for some reason result actually ever has data, it's cause WB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so let's maybe drop this test (or does it work 'sometimes'?)
if it usually works but fails ocassionaly, the raise nose.SkipTest
on the failure (rather than asserting anything)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand your logic. But because I think I do, and cause of my note below, I did 37f90b7 instead.
It will fail consistently, it's just a question of which exception catches it. Since the messages to both contain the same string, and are both ValueErrors, this test should always "work".
The only thing that would cause the test to stop working would be if WB unretired the indicator or their API changed.
More Info:
The indicator is a retired one. If World Bank removed it, from their indicator list, it would get caught at line 161 (all indicators failed), rather than where it gets caught now at 201 (single indicator query failure).
@jnmclarty looking good. just some minor changes, doc note, and squash |
@jreback I commented on two now "outdated" diffs. Do you see those? Or are they easy to miss? |
ok, pls rebase / squash and let me have a final look |
You can merge now (?) |
There are at least 4 kinds of country codes: | ||
|
||
1. Standard (2/3 digit ISO) - returns data, will warn and error properly. | ||
2. Non-standard (WB Exceptions) - returns data, but will falsely warn. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche how is this doc formatting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a rendering, after some rst syntax fixes. Posted the PNG below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fine
@jnmclarty you are doing gr8 actually! I pretend that I am a user reading your comments and would be 1) no nothing, but this looks cool, 2) use this feature and care about changes. So have to appeal to both, with the 2nd being more important. You basically want to tell them hey if you are doing this, it changed, so now do this. |
a3fb999
to
2811141
Compare
Squashed everything. Only changed things identified by @jreback , and some syntax stuff related to rst. Nothing beyond that changed since the last time I squashed. I'm having issues with my stack/sphinx, I can't render the entire docs. If somebody ( @jorisvandenbossche ?) could render the entire docs, and take a look at whatsnew and remote_data, as well as the docstring to |
.. note:: | ||
|
||
The World Bank's country list and indicators are dynamic. As of 0.15.1, | ||
:func:``wb.download()`` is more flexible. To achieve this, the warning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to be single backtick around the function: :func:
io.wb.download()``
Formatting of the docs is looking OK, apart from the one comment I gave. But another thing: you added a note in the API changes section, but normally we shouldn't do behaviour changing API changes for 0.15.1. |
this is one of those so it's an enhancement (but I usually put these in api change section) |
I tried to improve clarity in whatsnew, and fixed the single/double quote thing, then re-squashed. |
|
||
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two minor comments:
- try not to add trailing whitespace as in this line (normally you should be able to configure your editor to remove that automatically on saving, or should there be a menu action to remove all trailing whitespace in the file)
- try to hard wrap the lines (in principle to 80 chars, but 80-100 is also OK here I think), otherwise this is very difficult to review on github (as the line above)
OK on the API change, it sounds indeed more like a bug fix/enhancement. |
@jorisvandenbossche yep, should be in the Enhancements section (we put adding args stuff their right?) @jnmclarty can you move to Enhancements then |
Confused, I had split it, and put a note in each. (ie, I had a note for the errors in API, and functional change in Enhancements) ...but, I did what you asked. |
...and resquashed. |
@jnmclarty thanks for this! pls take a look at the built docs: http://pandas-docs.github.io/pandas-docs-travis/ (could be a little while before they are actually built); if necessary pls do a follow-up PR if docs not clear. |
This PR adds ISO 3-digit country codes, support for World Bank Country Code exceptions, by changing the error handling and warning logic and introducing an
error
argument which toggles between ignore/warn/raise . Validation no longer filters bad country codes, prior to submitting the API request. Instead, it attempts to catch errors in the WB response - which is a very hairy thing. See notes added in remote_data.tst for issues.During this PR, we attempted to clean up the JSON response, and make the module as simple as possible, without restricting the user. In the end, it proved somewhat awkward, but the solution is a good one given the constraints of the WB API.
closes #8482 .
Updated right before merging to remove noise, and provide a summary.