Skip to content

Add keep_whitespace and whitespace_chars to read_fwf #51577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

RonaldBarnes
Copy link

@RonaldBarnes RonaldBarnes commented Feb 23, 2023

RonaldBarnes and others added 6 commits February 23, 2023 00:55
…lowing

more control over handling of whitespace in fields and removing the
requirement to specify a `delimiter` in order to preserve whitespace. (pandas-dev#51569)

Signed-off-by: Ronald Barnes <[email protected]>
@RonaldBarnes
Copy link
Author

This PR adds 2 options to read_fwf: keep_whitespace and whitespace_chars.

In this version, keep_whitespace (bool | tuple(bool,bool) = False) has a default of False to match current behaviour as closely as possible and to reduce the changes required in the integration test file.

However, it still is a default that modifies data while it's being read in. At least the expected behaviour is now clearly documented.

Personally, I'd prefer a default of (True,False), which preserves leading space and strips trailing space - which seems a reasonable compromise.

The tuple represents (leading,trailing) whitespace. This is an enhancement, giving more fine-grained control over whitespace.

The whitespace_chars gives the user the ability to define any character(s) as whitespace to be stripped from fields.

@RonaldBarnes
Copy link
Author

Hello @phofl,

I am curious what your opinion is of this technique to resolve the issues (you had chimed in on a previous PR which documented the existing situation, preferring a fix instead)?

Thanks!

@github-actions
Copy link
Contributor

github-actions bot commented Apr 1, 2023

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Apr 1, 2023
@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this May 1, 2023
@RonaldBarnes
Copy link
Author

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

Hi @mroeschke,

I've merged in with main... Should I open a new PR or can we re-open this one?

Thank you...

@mroeschke mroeschke reopened this May 11, 2023
@mroeschke
Copy link
Member

Thanks I reopened this pull request.

@RonaldBarnes
Copy link
Author

 Finished codemodding 167 files!
 - Transformed 167 files successfully.
 - Skipped 0 files.
 - Failed to codemod 0 files.
 - 0 warnings were generated.

Error: Process completed with exit code 1.

I don't understand the issue here - no warnings, 0 files failed, yet exit code indicates an error?

About 450 lines into the report on the failed test:

pyright reportGeneralTypeIssues............................................Failed
- hook id: pyright_reportGeneralTypeIssues
- duration: 24.53s
- exit code: 1

Loading configuration file at /home/runner/work/pandas/pandas/pyright_reportGeneralTypeIssues.json
Assuming Python version 3.10
Assuming Python platform Linux
Searching for source files
Found 239 source files
pyright 1.1.292
/home/runner/work/pandas/pandas/pandas/_libs/__init__.py
  /home/runner/work/pandas/pandas/pandas/_libs/__init__.py:18:6 - warning: Import "pandas._libs.interval" could not be resolved from source (reportMissingModuleSource)

Is this a "me" thing, or an issue with the CI/CD tests?

Thank you for any pointers that I can use to resolve this.

@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. Additionally from the original discussion in #49832, this probably needs more discussion on the original issue (from a core dev familiar with read_fwf) so closing for now.

@mroeschke mroeschke closed this Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: read_fwf modifies / corrupts object (string) whitespace data
3 participants