-
Notifications
You must be signed in to change notification settings - Fork 67
Encoding failures with unicode in paths in PY2 #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If I understand you correctly, GitDB needs the same fix as is already present in GitPython. To keep things minimal, what do you think about moving this implementation to GitDB and re-using it in GitPython to avoid unnecessary duplication ? As an anecdote of the past, back in the days when I ported GitPython from py2 to py2|3, I started out using libraries which were supposed to make it easy, and pulled them in as a dependency. Even though I ended up with a version that worked, performance was crippled. |
No, I'm not suggesting future just for the surrogateescape. Besides these 2 types, it has a lot of other backports to offer (eg I also hate proliferation of dependencies, but given that this project needs a great effort for PY2/3 compatibility, I believe ti will pay out in the end. But of course we can keep thins as they are, and I'm really running out of time for this. So if you do not want to support this, no prob at all. |
Back in the days when I tried one of these compatibility project (I am vague, as I don't know if it was future or not), these things were relatively new. It could very well be that negative performance implications are negligible. You mentioned that it solves issues with For all I am saying: If you think you it will solve major problems and make everything easier, please feel free to bring it in. I would prefer this to happen in a PR though, just to keep an eye out for the performance tests. |
I wished it was that simple. Now the typical way to fix PY2/3 encoding problems is this:
But if you miss an entrance-point, mixing will still happen in PY2 (PY3 screams). And you won't know it, unless you feed This where future steps in, and forbids mixing in the first place. Frankly, I cannot get involved in this now, my attention is on the "leaks" of these library. It is good nevertheless to have discussed about it, if you or anyone else finds the time to implement such a strict separation. |
@Byron can you explain these lines why they compare to the same value twice?: |
@ankostis I believe this is a bug. The first line taking the last 20bytes of the file is actually the indexfile checksum. However, the 20bytes before those are indeed the packfile checksum.
The text above was copied from the git pack file docs. |
Can this be closed now support for EOL Python 2 has been dropped? df73d7f |
Thanks for reviving this one to close it forever! |
Having python-2 working correctly (both in Linux and Windows) with unicodes in filepaths (and process-streams, but I don't think there are processes used in this project) is increasingly difficult, e.g. when running with unicodes in
TEMPDIR
env-var, as seen in this travis job, and discovered initially in gitpython issue #543.The way to deal with such issues comes from PY3, using pep383 and the
error='surrogateescape'
error-handler when dencoding filepaths and process-streams. Unfortunately in python-2 there is no such codec error-handler.Happily, the future projects provides a backported implementation.
I suggest to start depending this future project, for all git-python projects; in the end,
a lot of compatibility code can be substituted. But note that the POV for this project is the opposite from 2to3: you write PY3 code and you make sure it runs on PY2.
In any case, PY3 that is the future...
The text was updated successfully, but these errors were encountered: