-
Notifications
You must be signed in to change notification settings - Fork 34
Line endings #285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Line endings #285
Conversation
I'm nervous about merging this unless/until I can get some indication of what caused the problem to suddenly emerge -- it's been 3 years since any of the |
Every couple years I spend a day or so trying to figure out Git line endings. It seems that every time I run into problems there is some magical incantation that pushes away the problem for another year or two. I'm pretty confident that the reason the problem just appeared for me is that this was my first clone of I believe that this change will prevent future text files from being committed with CRLF line endings. As a separate PR (#286) I've converted the 266 files I found with CRLF so that they have LF line endings in the repository and check out clean even if you don't have any Git configurations that should make the change. |
Well, it isn't the new machine or some global configuration! I went back to another machine that had
This showed that there are 266 files with CRLF in them, but none show as modified. My current suspicion is that the change isn't in the files (whether or not they have CRLF line endings), but in whether Git thinks that they are modified. This is my next line of inquiry. |
I found |
As @prestoncarman, noted elsewhere, this was probably introduced by #215. We tried an experiment where we did a clone of his fork (which predated that merge), and there were no changes. Then we did a clone of this repository, and there were changes. So, I think the reason it "suddenly" appeared is that we did a new clone and the |
You should read this: (if not already done) https://docs.github.com/en/github/using-git/configuring-git-to-handle-line-endings And also this page from the "Further reading" section: http://adaptivepatchwork.com/2012/03/01/mind-the-end-of-your-line/ You could also consider adding a .editorconfig: |
TL;DR:
My immediate thought is to favor Windows' preferred line endings, because in my head that works out to "the least risk of receiving contributions with lots of accidental whitespace changes" |
My understanding is that the line ending modification happens both at checkout and at commit. I believe the best approach is to convert to native on checkout (so unsophisticated Windows editors aren't confused) and convert to LF on commit so we have a uniform format that matches Linux. |
Git crlf handlingI have now spent several hours digging into the git source and documentation and have gotten a decent understanding of how git handles crlf translation. The TL;DR version: After a .gitattributes file is added to a git repository, from then on git will (eventually) always do eol normalization (e.g. to LF) of any text file. You can configure what the eol encoding should be in the work-dir, but that is independent of what is stored in the index/git repository. From https://git-scm.com/docs/gitattributes:
However, just the action of adding a And this adding of files might not be enough. The The git documentation also says
but that is not true. I testes with all combinations of the #!/bin/sh
set -e
for attr in text.no_eol text.eol_crlf text_auto.no_eol text_auto.eol_crlf
do
figlet -f big $attr 2>/dev/null || banner $attr 2>/dev/null || echo "##### $attr #####"
mkdir test.$attr
cd test.$attr
git init -b main
echo "== Initial commit =="
printf "echo line 1\r\n" > file1.bat
printf "echo line 1\n" > file2.bat
git add file1.bat file2.bat
git commit --quiet -m "Initial commit"
file file*.bat && git check-attr -a file*.bat && git ls-files --eol file*.bat
echo "== Add .gitattributes file =="
case $attr
in
text.no_eol)
echo "*.bat text" > .gitattributes
;;
text.eol_crlf)
echo "*.bat text eol=crlf" > .gitattributes
;;
text_auto.no_eol)
echo "*.bat text=auto" > .gitattributes
;;
text_auto.eol_crlf)
echo "*.bat text=auto eol=crlf" > .gitattributes
;;
esac
git add .gitattributes
git commit --quiet -m "Add .gitattributes file"
file file*.bat && git check-attr -a file*.bat && git ls-files --eol file*.bat
echo "== Renormalize =="
git add --renormalize .
git commit --quiet -m "Renormalize"
file file*.bat && git check-attr -a file*.bat && git ls-files --eol file*.bat
echo "== Modifications & diff =="
printf "echo line 2\r\n" >> file1.bat
printf "echo line 2\n" >> file2.bat
git diff
file file*.bat && git check-attr -a file*.bat && git ls-files --eol file*.bat
echo "== Second commit =="
git add file1.bat file2.bat
git commit --quiet -m "Second line"
file file*.bat && git check-attr -a file*.bat && git ls-files --eol file*.bat
echo "== Clear and reset work-dir =="
git rm --cached -r .
git reset --hard
file file*.bat && git check-attr -a file*.bat && git ls-files --eol file*.bat
cd ..
done This comment is already quite long so I am not including output from this, but this should be quick to run and does not have any dependency other than git (never than 2.16 from 2018, which added CRLF files in arduino_ciFrom the above I conclude that having files stored in the index/repository with crlf (i.e. state When starting to look into this my first attempt was to add entries like Currently anyone that clones the repository will start with an initial conflict right out of the box: $ git clone https://github.com/Arduino-CI/arduino_ci
Cloning into 'arduino_ci'...
remote: Enumerating objects: 4416, done.
remote: Total 4416 (delta 0), reused 0 (delta 0), pack-reused 4416
Receiving objects: 100% (4416/4416), 1.67 MiB | 2.71 MiB/s, done.
Resolving deltas: 100% (2812/2812), done.
$ cd arduino_ci/
$ cat .gitattributes
# Set the default behavior, in case people don't have core.autocrlf set.
* text eol=lf
$ git ls-files --eol | grep crlf | wc -l
266
$ git ls-files --eol | grep crlf | head -3
i/crlf w/crlf attr/text eol=lf cpp/arduino/avr/io1200.h
i/crlf w/crlf attr/text eol=lf cpp/arduino/avr/io2313.h
i/crlf w/crlf attr/text eol=lf cpp/arduino/avr/io2323.h
$ git ls-files --eol | grep crlf | tail -3
i/crlf w/crlf attr/text eol=lf cpp/arduino/avr/iox64d3.h
i/crlf w/crlf attr/text eol=lf cpp/arduino/avr/iox64d4.h
i/crlf w/crlf attr/text eol=lf cpp/arduino/avr/iox8e5.h
$ git ls-files --eol | grep crlf | grep -v cpp/arduino/avr/
$ ls cpp/arduino/avr/ | wc -l
275
$ git status | head
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: cpp/arduino/avr/iom64rfr2.h
modified: cpp/arduino/avr/iom8.h
modified: cpp/arduino/avr/iom8515.h
modified: cpp/arduino/avr/iom8535.h
modified: cpp/arduino/avr/iom88.h
modified: cpp/arduino/avr/iom88a.h
$ git status | tail
modified: cpp/arduino/avr/iox64a3u.h
modified: cpp/arduino/avr/iox64a4u.h
modified: cpp/arduino/avr/iox64b1.h
modified: cpp/arduino/avr/iox64b3.h
modified: cpp/arduino/avr/iox64c3.h
modified: cpp/arduino/avr/iox64d3.h
modified: cpp/arduino/avr/iox64d4.h
modified: cpp/arduino/avr/iox8e5.h
no changes added to commit (use "git add" and/or "git commit -a")
$ While the changes to The core of the problem is the combination of dos2unix cpp/arduino/avr/*.h
git add cpp/arduino/avr/*.h
git commit -m "Change eol encoding to enforce git repository text normalization" Then all files will be consistently This problem has been present since the @jgfoster Can you update this pull request to contain a commit that changes eol encoding as described above? |
Yes. |
Yes. They are separated into a PR that has substantive changes (this one) and one where there are hundreds of files modified, but the changes are all whitespace (changing CRLF to LF). I find that having one or two substantive changes mixed in with hundreds of non-substantive changes creates a poor signal-to-noise ratio. I like having a single PR that can be effectively ignored. I would suggest #286 be merged into main/master independently of anything else (and as soon as possible so I stop having hundreds of modified files in my checkout!). There will never be a time when we need to study that PR to figure out when a bug/feature was introduced and combining it into another PR (even a big one) will only create excess noise. (But that's just an explanation of my rationale; I'm delighted to have you merge it in any way you see fit!) |
Pull request #286 is sufficient to solve issue #283, and can preferably be completed right away. This (#285) is related, but not dependent. This PR could by the way be improved further by the way specifying a # https://git-scm.com/docs/gitattributes
# https://gitattributes.io/
* text=auto
*.ino text diff=cpp
*.c text diff=c
*.cc text diff=cpp
*.cxx text diff=cpp
*.cpp text diff=cpp
*.c++ text diff=cpp
*.hpp text diff=cpp
*.h text diff=c
*.h++ text diff=cpp
*.hh text diff=cpp |
Merging all this together, I get the following: # Set the default behavior, in case people don't have core.autocrlf set.
* text=auto eol=lf
# https://git-scm.com/docs/gitattributes
# https://gitattributes.io/
# Explicitly declare text files you want to always be normalized and converted
# to native line endings on checkout. Git would likely get these right, but
# we can be sure by adding them here.
*.ino text diff=cpp
*.c text diff=c
*.cc text diff=cpp
*.cxx text diff=cpp
*.cpp text diff=cpp
*.c++ text diff=cpp
*.hpp text diff=cpp
*.h text diff=c
*.h++ text diff=cpp
*.hh text diff=cpp
*.md text
*.yaml text
*.yml text
# Denote all files that are truly binary and should not be modified.
# Even if we don't have any of these, they make a good example.
*.png binary
*.jpg binary Does this look sane? If so I'll add it locally. |
This is different from what I originally proposed in that it explicitly sets |
|
OK, will do. I was paying attention to this comment (from @prestoncarman), which appeared to suggest that I was wrong to remove -* text eol=lf
+* text=off Just trying to cover all the comments that have been made. Are we all in favor of |
I interpreted Preston to be giving us some background. We were wondering why this problem "suddenly" occurred for several of us in a short period of time, and he pointed out that it was tied to an attempt to fix a Windows problem. I don't think he was claiming that the previous fix was correct, just that it explains why we hadn't seen the issue before. |
I think that the best practice is to store text files in LF format in the Git repository, check them out in native format (CRLF on Windows) so that simplistic native editors aren't confused (I'm looking at you, Notepad), and then return them to LF format during checkin so we don't have unnecessary diffs. But, now that you bring us back to #215, as I review it I'm not so sure of how this will play out. According to that PR, So, much as I want this whack-a-mole experience to be over, I'm not sure that we have a solution yet (though I think that fixing the existing CRLF files in #286 was still correct). The remaining issue is whether a Windows checkout should give you LF or CRLF (and how to handle rubocop if we pick CRLF). |
I have no opinion on # Set the default behavior, in case people don't have core.autocrlf set.
* text=auto
# https://git-scm.com/docs/gitattributes
# https://gitattributes.io/
# Explicitly declare text files you want to always be normalized and converted
# to native line endings on checkout. Git would likely get these right, but
# we can be sure by adding them here.
*.ino text diff=cpp
*.c text diff=c
*.cc text diff=cpp
*.cxx text diff=cpp
*.cpp text diff=cpp
*.c++ text diff=cpp
*.hpp text diff=cpp
*.h text diff=c
*.h++ text diff=cpp
*.hh text diff=cpp
*.md text
*.yaml text
*.yml text
# Denote all files that are truly binary and should not be modified.
# Even if we don't have any of these, they make a good example.
*.png binary
*.jpg binary |
Actually, the fact that the checks have passed suggests that rubocop will not complain about the change, so let's move ahead! |
I just tried to make some edits to your branch, and for some reason those took effect immediately instead of showing up as a PR against your branch... is that some magic of how forks work in GitHub, or am I a contributor on your repo? Apologies if these were unwanted. |
@hlovdal, I'm returning to a few of the open issues and in my GitHub view you have been invited to review this PR. I'm satisfied with the current files; how about you? If you agree then I'm merge the PR. Thanks! |
Possible fix for #283.