-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[FIX] Handle decimal and thousand separator in 'round_trip' converer #35377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5020dfd
to
aab2f52
Compare
aab2f52
to
cc0c228
Compare
Hello @ales-erjavec! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-08-10 10:12:58 UTC |
cc0c228
to
e4bca08
Compare
e4bca08
to
9cf0755
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice test cases. can you merge master & add a whatsnew note in 1.2, I/O section. ping on green.
@ales-erjavec can you also run the current csv asv's to see if perf is changed; we might need additional coverage for thousands sep as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good - great job progressing this along.
A few more questions / comments I think are minor in nature
pandas/_libs/src/parser/tokenizer.c
Outdated
p++; | ||
} | ||
// Copy the remainder of the string as is. | ||
memcpy(dst, p, length + 1 - (p - s)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use strncpy here instead? Would make this slightly more readable and then you won't need to manually add a null byte
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// Copy integer part dropping `tsep` | ||
while (isdigit_ascii(*p)) { | ||
*dst++ = *p++; | ||
p += (tsep != '\0' && *p == tsep); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the null byte check here serve a purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. tsep
is null when no thousands separator is defined. But the loop must not skip the terminating null byte in the input string (the second condition) which would happen on e.g. '123\0'
} else { | ||
*error = -1; | ||
if (q != NULL) { | ||
// p and pc are different len due to tsep removal. Can't report |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test that hits this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The "1,2e1_0" for instance. I.e something that is prefixed with a number but has extra trailing characters.
In case of non c-locale decimal and tsep, copy and fixup the source string before passing it to PyOS_string_to_double
Always make a copy of input string.
74be91c
to
8703ad6
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good - very nicely done. @jreback
thanks @ales-erjavec |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
In case of non c-locale decimal and tsep, copy and fixup the source string before passing it to PyOS_string_to_double