[prev in list] [next in list] [prev in thread] [next in thread] 

List:       r-devel
Subject:    [Rd] =?utf-8?b?562U5aSNOiAgUi00LjMgdmVyc2lvbiBsaXN0LmZpbGVzIGZ1?= =?utf-8?q?nction_could_not_work_co
From:        <yeyueguang () goldwind ! com>
Date:       2023-08-13 5:39:39
Message-ID: 72b017336ae143e1b0755b312b95c8f2 () goldwind ! com
[Download RAW message or body]

[Attachment #2 (text/plain)]

     list.files function is notcorrect。


 

-----邮件原件-----
发件人: Ivan Krylov [mailto:krylov.r00t@gmail.com] 
发送时间: 2023年8月12日 23:33
收件人: Yihui Xie <xie@yihui.name>
抄送: 叶月光 <yeyueguang@goldwind.com>; r-devel@r-project.org
主题: Re: [Rd] R-4.3 version list.files function could not work correctly in \
chinese

Dear Yihui,

Thanks a lot for your help!

Unfortunately, I was not able to reproduce this. I've tried creating files with \
Chinese characters in their names and populating them with valid UTF-8 and valid \
non-UTF-8 text, but R seems to be able to list them all in my case.

I'm running a US English evaluation ISO image of a slightly newer build of Windows \
10, and I also compiled R-4.3.1 from source, anticipating having to single-step \
through the list.files() implementation:

sessionInfo()
# R version 4.3.1 (2023-06-16 ucrt)
# Platform: x86_64-w64-mingw32/x64 (64-bit) # Running under: Windows 10 x64 (build \
19045) # # Matrix products: default # # # locale: # [1] LC_COLLATE=English_United \
States.utf8  LC_CTYPE=English_United # States.utf8 # [3] LC_MONETARY=English_United \
States.utf8 LC_NUMERIC=C # [5] LC_TIME=English_United States.utf8 # # time zone: \
America/Los_Angeles # tzcode source: internal # # attached base packages: # [1] stats \
graphics  grDevices utils     datasets  methods   base #
# loaded via a namespace (and not attached):
# [1] compiler_4.3.1
dir("测试文件")
# [1] "测试中文-non-utf8-ЪЪЪЪЪ.txt" "测试中文-utf-8.txt"
system('cmd /c dir /s *.txt')
#  Volume in drive C has no label.
#  Volume Serial Number is A85A-AA74
#
#  Directory of C:\R\R-4.3.1\bin\x64\????
#
# 08/12/2023  07:57 AM                22 ????-non-utf8-?????.txt
# 08/12/2023  07:56 AM                18 ????-utf-8.txt
#                2 File(s)             40 bytes
#
#      Total Files Listed:
#                2 File(s)             40 bytes
#                0 Dir(s)  29,538,418,688 bytes free
# [1] 0

(The OEM codepage cannot represent the characters I used in the file names, but all \
the files are present in both lists.)

In order to find out what's wrong, it will be needed to download the R source code \
and compile it [*], install gdb using pacman (part of Rtools), then set a breakpoint \
on the list_files function from src/main/platform.c and step through it [**], paying \
attention to the R_readdir calls. Do the missing file names not even come out from \
FindNextFile()? Are they somehow skipped around the time of regex match?

(I could help with the details of this, maybe off-list, if there's
interest.)

Unless Tomas Kalibera is able to deduce the root cause from the observed symptoms, \
someone who can reproduce the problem will have to investigate further.

--
Best regards,
Ivan

[*] https://cran.r-project.org/bin/windows/base/howto-R-devel.html

[**] https://beej.us/guide/bggdb/
以下内容是邮件系统安全提示:
在电子邮件中索要个人信息、账号密 \
、银行卡信息、求助、补助、钱款转账等情况为"钓鱼邮件"或者"病毒邮件", \
需响应,并请立即 除。 \
如遇到邮件安全问题,请联系数字化中心 ITSecurity@goldwind.com。

———————————————
Email system security tips:
The use of emails to collect personal information, account passwords, bank card \
information, help, subsidies, money transfers, etc. is "phishing email" or "virus \
email", no response is required, and please delete it immediately. If you encounter \
email security issues, please contact ITSecurity@goldwind.com.


["r-sessioninfo.png" (image/png)]
["list.files_test.png" (image/png)]
["path-files.png" (image/png)]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic