Skip to content

BUG: Missing value code not recognised for Stata format version 105 a… #59325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 26, 2024

Conversation

cmjcharlton
Copy link
Contributor

…nd earlier

This is an initial attempt at fixing this bug, although a more efficient implementation probably exists.

As Pandas does not write in any of these version formats I have simply replaced instances of the old missing code with the current one for these versions before carrying on as before.

I believe that the range of valid values for float/double was also changed when the missing code was changed, however as this was widened any existing files written in these formats should still be within the current range and therefore will be read correctly. If an older format version is created with values outside the documented range then the current version of Stata (18) reads them as valid values, rather than converting them to missing, so I think the behaviour here is consistent with that.

@cmjcharlton cmjcharlton marked this pull request as ready for review July 26, 2024 13:29
# recode instances of this to the currently used value
if self._format_version <= 105 and fmt == "d":
data.iloc[:, i] = data.iloc[:, i].replace(
float.fromhex("0x1.0p333"), self.MISSING_VALUES["d"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could you define float.fromhex("0x1.0p333") outside the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that should be straightforward, I'll make that change.

@mroeschke mroeschke added the IO Stata read_stata, to_stata label Jul 26, 2024
@mroeschke mroeschke added this to the 3.0 milestone Jul 26, 2024
@mroeschke mroeschke merged commit 5af55e0 into pandas-dev:main Jul 26, 2024
41 of 46 checks passed
@mroeschke
Copy link
Member

Thanks @cmjcharlton

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Stata read_stata, to_stata
Projects
None yet
2 participants