-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: read_excel dtypes and converts #8212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
you could add a |
Yes I think this is a good idea to handle coercion like that. Related #8272 |
Hi, I finally had some time to look into this. It is evident from our test suites (pandas/io/tests/test_excel.py, function 'test_reader_special_dtypes') that this is actually already implemented! The way it works is via the 'converters' keyword argument to pandas.read_excel. This is passed down all the way to the parser: read_excel --> ExcelFile._parse_excel --> TextParser --> TextFileReader._clean_options/_make_engine So we just need to document this. If that's ok with you folks, I'll just make a minimal commit containing another parameter in the docstring(s) and make a pull request. |
I think the I think for consistency would like to add In the meantime if you are wanting to document |
Actually, I tested ValueError: invalid literal for int() with base 10: '' This happens in So my suggestion: I close this bug by reversing the order of the operations: check for missing <--> convert dtype and then I add a docstring to read_excel for |
if a convert raises it's on the user pls submit a pr and I'll take a look |
Hi,
with alice's age missing, pandas does this:
You will agree that this behavious is not very friendly... but let me code it and then you can have a look. |
Thanks for the clear explanation @iosonofabio :-) |
Dear all,
Here from the mailing list: https://groups.google.com/forum/#!topic/pydata/jKiPOvYUQ1c
I have an excel table about family ages like this
and I would like to use read_excel to parse it into Python. I would like "People" to be read as an integer, "Mean size [cm]" as a float. (And "Family" as a string, but that might be a different issue.) Now:
Neither one is correct, for a stupid reason: there happen to be those .0 in all sizes! So I would like to specify something like:
so only that column gets converted. An even better solution would be to be explicit about types of some columns, letting pandas perform the automagic for the others, such as:
but this changes the signature of the function more significantly.
Are you folks in favour of any of this? If yes, I can get a look and try to code it in.
The text was updated successfully, but these errors were encountered: