Skip to content

Remove unnecessary casts in tokenizer.c #37792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 13, 2020

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Nov 12, 2020

Some of these are visual clutter, some are actual bugs though in unlikely cases (see the format specifiers)

https://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc

}
} else {
*error = -1;
if (q != NULL) {
// p and pc are different len due to tsep removal. Can't report
// how much it has consumed of p. Just rewind to beginning.
*q = (char *)p;
*q = (char *)p; // TODO: this could be undefined behavior
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably just need better types here. I don't think q is ever actually mutated, but if it is this would be undefined behavior as the contents of p are qualified as const

@WillAyd WillAyd added the Clean label Nov 12, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine, do we have lesser warnings now?

@@ -600,7 +599,7 @@ static int parser_buffer_bytes(parser_t *self, size_t nbytes) {
TRACE(("PUSH_CHAR: ERROR!!! slen(%d) >= stream_cap(%d)\n", slen, \
self->stream_cap)) \
int64_t bufsize = 100; \
self->error_msg = (char *)malloc(bufsize); \
self->error_msg = malloc(bufsize); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra whitespace

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is by design to keep all of the slashes right aligned

@jreback jreback added this to the 1.2 milestone Nov 13, 2020
@jreback jreback added the IO CSV read_csv, to_csv label Nov 13, 2020
@WillAyd
Copy link
Member Author

WillAyd commented Nov 13, 2020

There are no warnings now because the casts would suppress them, but this fixes a bug when the number of rows in a CSV file exceeds the size of a long long

@jreback jreback merged commit fe6d55c into pandas-dev:master Nov 13, 2020
@jreback
Copy link
Contributor

jreback commented Nov 13, 2020

thanks

@rhshadrach rhshadrach mentioned this pull request Jun 1, 2021
1 task
@WillAyd WillAyd deleted the tokenizer-casting-cleanup branch April 12, 2023 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants