[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sbcl-devel
Subject:    Re: [Sbcl-devel] [Sbcl-bugs] Unicode Bug on Windows
From:       Kieran Grant <kieran.thehacker.grant () gmail ! com>
Date:       2019-04-08 4:47:23
Message-ID: CAAsgrApdGb0h_Ksrgs4unSBoPEgwcxwTNTjtQLfXXAvjJAu2Qw () mail ! gmail ! com
[Download RAW message or body]

So...
On Windows sb-impl::*default-external-format* was :CP1252

Setting it to :UTF-8 fixes the problem... (I could use
:external-format :utf-8 when loading, but this was easier)
It also solved my problem with cl-cffi-gtk. I think it looks like
cl-cffi-gtk assumed SBCL was using UTF-8, so when doing string
conversion it hoped for the best.
(I'm not really sure if it is cffi or cl-cffi-gtk that needs to have
this awareness, I know CFFI has *default-foreign-encoding*, but that
was :utf-8 the whole time)

I think for my Windows build of the software I am making I'll
explicitly override the default external format both in build scripts
and my lisp entry-point.

I think it would be nice if there was a standard interface to update
the default external format.

Regards,
Kieran

On Wed, Apr 3, 2019 at 8:31 PM Stas Boukarev <stassats@gmail.com> wrote:
>
> Your default external format is not matching your input file encoding.
> But I have no idea how to change it on Windows.
>
> On Wed, Apr 3, 2019 at 1:05 PM Kieran Grant <kieran.thehacker.grant@gmail.com> wrote:
>>
>> Hi,
>>
>> I either found a bug in SBCL or a bug in my building of SBCL on Windows (:D).
>>
>> I built SBCL 1.5.1 on Windows in msys2 with:
>>  ./make.sh --dynamic-space-size=8Gb --arch=x86_64
>> --with-sb-core-compression --prefix=/usr
>>
>> Features are:
>> (:X86-64 :64-BIT :64-BIT-REGISTERS :ALIEN-CALLBACKS :ANSI-CL :AVX2
>>  :C-STACK-IS-CONTROL-STACK :CALL-SYMBOL :COMMON-LISP :COMPARE-AND-SWAP-VOPS
>>  :CYCLE-COUNTER :FP-AND-PC-STANDARD-SAVE :GENCGC :IEEE-FLOATING-POINT
>>  :INTEGER-EQL-VOP :LINKAGE-TABLE :LITTLE-ENDIAN :OS-PROVIDES-DLOPEN
>>  :OS-PROVIDES-PUTWC :PACKAGE-LOCAL-NICKNAMES :SB-CORE-COMPRESSION :SB-DOC
>>  :SB-DYNAMIC-CORE :SB-EVAL :SB-FUTEX :SB-LDB :SB-PACKAGE-LOCKS :SB-QSHOW
>>  :SB-SAFEPOINT :SB-SAFEPOINT-STRICTLY :SB-SIMD-PACK :SB-SIMD-PACK-256
>>  :SB-SOURCE-LOCATIONS :SB-THREAD :SB-THRUPTION :SB-UNICODE :SB-WTIMER :SBCL
>>  :STACK-ALLOCATABLE-CLOSURES :STACK-ALLOCATABLE-FIXED-OBJECTS
>>  :STACK-ALLOCATABLE-LISTS :STACK-ALLOCATABLE-VECTORS
>>  :STACK-GROWS-DOWNWARD-NOT-UPWARD :UNDEFINED-FUN-RESTARTS
>>  :UNWIND-TO-FRAME-AND-CALL-VOP :WIN32)
>>
>> When I ran:
>> (defvar *data* "'") ;;;; Single character string, with code-point 8217
>> (progn
>>   (format t "Length is: ~d~%" (length *data*))
>>   (format t "Type is: ~A~%" (type-of *data*))
>>   (loop for i from 0 below (length *data*) do
>>        (format t "Character at index ~d is code-point ~d~%" i
>> (char-code (elt *data* i)))))
>>
>> It outputted:
>> Length is: 3
>> Type is: (SIMPLE-ARRAY CHARACTER (3))
>> Character at index 0 is code-point 226
>> Character at index 1 is code-point 8364
>> Character at index 2 is code-point 8482
>>
>> On Linux it outputs:
>> Length is: 1
>> Type is: (SIMPLE-ARRAY CHARACTER (1))
>> Character at index 0 is code-point 8217
>>
>> I think this is causing problems with cl-cffi-gtk causing Unicode
>> characters getting scrambled only on Windows.
>> (I confirmed GTK3 on Windows with a C program that it didn't scramble
>> my Unicode text)
>>
>> Worse, it also effects saving with :external-format :utf-8
>>
>> Consider this code:
>> (defvar *data* "'") ;;;; Single character string, with code-point 8217
>> (with-open-file (out "test-data.txt" :direction :output :external-format :utf-8)
>>   (with-standard-io-syntax
>>     (let ((*read-eval* nil))
>>       (print (list *data*) out))))
>>
>> (with-open-file (in "test-data.txt" :direction :input :external-format :utf-8)
>>   (with-standard-io-syntax
>>     (let ((*read-eval* nil))
>>       (print (read in)))))
>>
>> Both Windows and Linux correctly output the expected List containing the string:
>> ("'")
>>
>> But the file test-data.txt is different, when running "xxd test-data.txt"
>> Windows:
>> 00000000: 0a28 22c3 a2e2 82ac e284 a222 2920
>> Linux:
>> 00000000: 0a28 22e2 8099 2229 20
>>
>> Regards,
>> Kieran
>>
>>
>> _______________________________________________
>> Sbcl-bugs mailing list
>> Sbcl-bugs@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/sbcl-bugs


_______________________________________________
Sbcl-devel mailing list
Sbcl-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sbcl-devel

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic