[prev in list] [next in list] [prev in thread] [next in thread]
List: sbcl-devel
Subject: Re: [Sbcl-devel] [Sbcl-bugs] Unicode Bug on Windows
From: Kieran Grant <kieran.thehacker.grant () gmail ! com>
Date: 2019-04-08 4:47:23
Message-ID: CAAsgrApdGb0h_Ksrgs4unSBoPEgwcxwTNTjtQLfXXAvjJAu2Qw () mail ! gmail ! com
[Download RAW message or body]
So...
On Windows sb-impl::*default-external-format* was :CP1252
Setting it to :UTF-8 fixes the problem... (I could use
:external-format :utf-8 when loading, but this was easier)
It also solved my problem with cl-cffi-gtk. I think it looks like
cl-cffi-gtk assumed SBCL was using UTF-8, so when doing string
conversion it hoped for the best.
(I'm not really sure if it is cffi or cl-cffi-gtk that needs to have
this awareness, I know CFFI has *default-foreign-encoding*, but that
was :utf-8 the whole time)
I think for my Windows build of the software I am making I'll
explicitly override the default external format both in build scripts
and my lisp entry-point.
I think it would be nice if there was a standard interface to update
the default external format.
Regards,
Kieran
On Wed, Apr 3, 2019 at 8:31 PM Stas Boukarev <stassats@gmail.com> wrote:
>
> Your default external format is not matching your input file encoding.
> But I have no idea how to change it on Windows.
>
> On Wed, Apr 3, 2019 at 1:05 PM Kieran Grant <kieran.thehacker.grant@gmail.com> wrote:
>>
>> Hi,
>>
>> I either found a bug in SBCL or a bug in my building of SBCL on Windows (:D).
>>
>> I built SBCL 1.5.1 on Windows in msys2 with:
>> ./make.sh --dynamic-space-size=8Gb --arch=x86_64
>> --with-sb-core-compression --prefix=/usr
>>
>> Features are:
>> (:X86-64 :64-BIT :64-BIT-REGISTERS :ALIEN-CALLBACKS :ANSI-CL :AVX2
>> :C-STACK-IS-CONTROL-STACK :CALL-SYMBOL :COMMON-LISP :COMPARE-AND-SWAP-VOPS
>> :CYCLE-COUNTER :FP-AND-PC-STANDARD-SAVE :GENCGC :IEEE-FLOATING-POINT
>> :INTEGER-EQL-VOP :LINKAGE-TABLE :LITTLE-ENDIAN :OS-PROVIDES-DLOPEN
>> :OS-PROVIDES-PUTWC :PACKAGE-LOCAL-NICKNAMES :SB-CORE-COMPRESSION :SB-DOC
>> :SB-DYNAMIC-CORE :SB-EVAL :SB-FUTEX :SB-LDB :SB-PACKAGE-LOCKS :SB-QSHOW
>> :SB-SAFEPOINT :SB-SAFEPOINT-STRICTLY :SB-SIMD-PACK :SB-SIMD-PACK-256
>> :SB-SOURCE-LOCATIONS :SB-THREAD :SB-THRUPTION :SB-UNICODE :SB-WTIMER :SBCL
>> :STACK-ALLOCATABLE-CLOSURES :STACK-ALLOCATABLE-FIXED-OBJECTS
>> :STACK-ALLOCATABLE-LISTS :STACK-ALLOCATABLE-VECTORS
>> :STACK-GROWS-DOWNWARD-NOT-UPWARD :UNDEFINED-FUN-RESTARTS
>> :UNWIND-TO-FRAME-AND-CALL-VOP :WIN32)
>>
>> When I ran:
>> (defvar *data* "'") ;;;; Single character string, with code-point 8217
>> (progn
>> (format t "Length is: ~d~%" (length *data*))
>> (format t "Type is: ~A~%" (type-of *data*))
>> (loop for i from 0 below (length *data*) do
>> (format t "Character at index ~d is code-point ~d~%" i
>> (char-code (elt *data* i)))))
>>
>> It outputted:
>> Length is: 3
>> Type is: (SIMPLE-ARRAY CHARACTER (3))
>> Character at index 0 is code-point 226
>> Character at index 1 is code-point 8364
>> Character at index 2 is code-point 8482
>>
>> On Linux it outputs:
>> Length is: 1
>> Type is: (SIMPLE-ARRAY CHARACTER (1))
>> Character at index 0 is code-point 8217
>>
>> I think this is causing problems with cl-cffi-gtk causing Unicode
>> characters getting scrambled only on Windows.
>> (I confirmed GTK3 on Windows with a C program that it didn't scramble
>> my Unicode text)
>>
>> Worse, it also effects saving with :external-format :utf-8
>>
>> Consider this code:
>> (defvar *data* "'") ;;;; Single character string, with code-point 8217
>> (with-open-file (out "test-data.txt" :direction :output :external-format :utf-8)
>> (with-standard-io-syntax
>> (let ((*read-eval* nil))
>> (print (list *data*) out))))
>>
>> (with-open-file (in "test-data.txt" :direction :input :external-format :utf-8)
>> (with-standard-io-syntax
>> (let ((*read-eval* nil))
>> (print (read in)))))
>>
>> Both Windows and Linux correctly output the expected List containing the string:
>> ("'")
>>
>> But the file test-data.txt is different, when running "xxd test-data.txt"
>> Windows:
>> 00000000: 0a28 22c3 a2e2 82ac e284 a222 2920
>> Linux:
>> 00000000: 0a28 22e2 8099 2229 20
>>
>> Regards,
>> Kieran
>>
>>
>> _______________________________________________
>> Sbcl-bugs mailing list
>> Sbcl-bugs@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/sbcl-bugs
_______________________________________________
Sbcl-devel mailing list
Sbcl-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sbcl-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic