Skip to content

Store raw path bytes in Diff instances #474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 20, 2016
Merged

Conversation

nvie
Copy link
Contributor

@nvie nvie commented Jun 14, 2016

Previously, the following fields on Diff instances were assumed to be passed in as unicode strings:

  • a_path
  • b_path
  • rename_from
  • rename_to

However, since Git natively records paths as bytes, these may potentially not have a valid unicode representation.

This patch changes the Diff instance to instead take the following equivalent fields that should be raw bytes instead:

  • a_rawpath
  • b_rawpath
  • raw_rename_from
  • raw_rename_to

NOTE ON BACKWARD COMPATIBILITY:
The original a_path, b_path, etc. fields are still available as properties (rather than slots). These properties now dynamically decode the raw bytes into a unicode string (performing the potentially
destructive operation of replacing invalid unicode chars by "�"'s).

This means that all code using Diffs should remain backward compatible. The only exception is when people would manually construct Diff instances by calling the constructor directly, in which case they should now pass in bytes rather than unicode strings.

See also the discussion on #467

Previously, the following fields on Diff instances were assumed to be
passed in as unicode strings:

  - `a_path`
  - `b_path`
  - `rename_from`
  - `rename_to`

However, since Git natively records paths as bytes, these may
potentially not have a valid unicode representation.

This patch changes the Diff instance to instead take the following
equivalent fields that should be raw bytes instead:

  - `a_rawpath`
  - `b_rawpath`
  - `raw_rename_from`
  - `raw_rename_to`

NOTE ON BACKWARD COMPATIBILITY:
The original `a_path`, `b_path`, etc. fields are still available as
properties (rather than slots).  These properties now dynamically decode
the raw bytes into a unicode string (performing the potentially
destructive operation of replacing invalid unicode chars by "�"'s).
This means that all code using Diffs should remain backward compatible.
The only exception is when people would manually construct Diff
instances by calling the constructor directly, in which case they should
now pass in bytes rather than unicode strings.

See also the discussion on
#467
@nvie
Copy link
Contributor Author

nvie commented Jun 14, 2016

@Byron What do you think of this?

@Byron Byron added this to the v2.0.6 - Bugfixes milestone Jun 20, 2016
@Byron
Copy link
Member

Byron commented Jun 20, 2016

I love it!

Even though it would have the potential to break people who did manually create Diff instances, I believe in this case, the benefits of the many should outweigh the ones of the few.

Thanks for your contribution, again :) !

@Byron Byron merged commit e9405ac into master Jun 20, 2016
@nvie
Copy link
Contributor Author

nvie commented Jun 20, 2016

Cheers, I've just released 2.0.6 to PyPI — thanks!

@nvie nvie deleted the keep-raw-bytes-on-diffs branch June 20, 2016 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants