[prev in list] [next in list] [prev in thread] [next in thread]
List: git
Subject: Re: [BUG?] iconv used as textconv, and spurious ^M on added lines on Windows
From: Jakub_Narębski <jnareb () gmail ! com>
Date: 2017-03-31 19:44:15
Message-ID: bbd60ab1-1309-6b1e-9b7f-09764bab5ccd () gmail ! com
[Download RAW message or body]
W dniu 31.03.2017 o 14:38, Torsten Bögershausen pisze:
> On 30.03.17 21:35, Jakub Narębski wrote:
>> Hello,
>>
>> Recently I had to work on a project which uses legacy 8-bit encoding
>> (namely cp1250 encoding) instead of utf-8 for text files (LaTeX
>> documents). My terminal, that is Git Bash from Git for Windows is set
>> up for utf-8.
>>
>> I wanted for "git diff" and friends to return something sane on said
>> utf-8 terminal, instead of mojibake. There is 'encoding'
>> gitattribute... but it works only for GUI ('git gui', that is).
>>
>> Therefore I have (ab)used textconv facility to convert from cp1250 of
>> file encoding to utf-8 encoding of console.
>>
>> I have set the following in .gitattributes file:
>>
>> ## LaTeX documents in cp1250 encoding
>> *.tex text diff=mylatex
>>
>> The 'mylatex' driver is defined as:
>>
>> [diff "mylatex"]
>> xfuncname = "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$"
>> wordRegex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[^\\{}[:space:]]+"
>> textconv = \"C:/Program Files/Git/usr/bin/iconv.exe\" -f cp1250 -t utf-8
>> cachetextconv = true
>>
>> And everything would be all right... if not the fact that Git appends
>> spurious ^M to added lines in the `git diff` output. Files use CRLF
>> end-of-line convention (the native MS Windows one).
>>
>> $ git diff test.tex
>> diff --git a/test.tex b/test.tex
>> index 029646e..250ab16 100644
>> --- a/test.tex
>> +++ b/test.tex
>> @@ -1,4 +1,4 @@
>> -\documentclass{article}
>> +\documentclass{mwart}^M
>>
>> \usepackage[cp1250]{inputenc}
>> \usepackage{polski}
>>
>> What gives? Why there is this ^M tacked on the end of added lines,
>> while it is not present in deleted lines, nor in content lines?
>>
>> Puzzled.
>>
>> P.S. Git has `i18n.commitEncoding` and `i18n.logOutputEncoding`; pity
>> that it doesn't supports in core `encoding` attribute together with
>> having `i18n.outputEncoding`.
>
> Is there a chance to give us a receipt how to reproduce it?
> A complete test script or ?
> (I don't want to speculate, if the invocation of iconv is the problem,
> where stdout is not in "binary mode", or however this is called under Windows)
I'm sorry, I though I posted whole recipe, but I missed some details
in the above description of the case.
First, files are stored on filesystem using CRLF eol (DOS end-of-line
convention). Due to `core.autocrlf` they are converted to LF in blobs,
that is in the index and in the repository.
Second, a textconv with filter preserving end-of-line needs to be
configured. I have used `iconv`, but I suspect that the problem would
happen also for `cat`.
In the .gitattributes file, or .git/info/attributes add, for example:
*.tex text diff=myconv
In the .git/config configure the textconv filter, for example:
[diff "myconv"]
textconv = iconv.exe -f cp1250 -t utf-8
Create a file which filename matches the attribute line, and which
uses CRLF end of line convention, and add it to Git (adding it to
the index):
$ printf "foo\r\n" >foo.tex
$ git add foo.tex
Modify file (also with CRLF):
$ printf "bar\r\n" >foo.tex
Check the difference
$ git diff foo.tex
HTH
--
Jakub Narębski
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic