From gentoo-portage-dev Mon Sep 24 10:25:35 2012 From: "Gregory M. Turner" Date: Mon, 24 Sep 2012 10:25:35 +0000 To: gentoo-portage-dev Subject: [gentoo-portage-dev] blech... (multijob/multiprocessing work-around for cygwin) Message-Id: <5060351F.2040002 () malth ! us> X-MARC-Message: https://marc.info/?l=gentoo-portage-dev&m=141144402418718 On cygwin, there is a problem with bi-directional pipe support in bash. I used to solve this with an ugly reversion in portage and an ultra-simple stubbification patch for multiprocessing.eclass (both serialized everything). However, this really sucked for numerous reasons, including the obvious one: it makes stuff slow as hell. So I've reworked everything to use uni-directional named pipes. In portage I have: diff --git a/bin/helper-functions.sh b/bin/helper-functions.sh index c7400fa..87f3120 100644 --- a/bin/helper-functions.sh +++ b/bin/helper-functions.sh @@ -7,6 +7,36 @@ source "${PORTAGE_BIN_PATH:-/usr/lib/portage/bin}"/isolated-functions.sh +# try real hard to figure out if this is a cygwin host; cache results. +this_host_is_cygwin() { + if [[ -n ${_this_host_is_cygwin} ]] ; then + return $_this_host_is_cygwin + fi + [[ -x ${EPREFIX}/usr/bin/uname ]] && \ + [[ $( ${EPREFIX}/usr/bin/uname -o 2>/dev/null ) == Cygwin* ]] && \ + export _this_host_is_cygwin=0 && return 0 + [[ -x /usr/bin/uname ]] && \ + [[ $( /usr/bin/uname -o 2>/dev/null ) == Cygwin* ]] && \ + export _this_host_is_cygwin=0 && return 0 + [[ -x /bin/uname ]] && \ + [[ $( /bin/uname -o 2>/dev/null ) == Cygwin* ]] && \ + export _this_host_is_cygwin=0 && return 0 + # hail-mary before we resort to envvars + [[ $( uname -o 2>/dev/null ) == Cygwin* ]] && \ + export _this_host_is_cygwin=0 && return 0 + + [[ -n ${CHOST} ]] && \ + [[ ${CHOST} == *-cygwin* ]] && \ + export _this_host_is_cygwin=0 && return 0 + [[ -n ${CTARGET} ]] && \ + [[ ${CTARGET} == *-cygwin* ]] && \ + export _this_host_is_cygwin=0 && return 0 + + # either it ain't cygwin or something is very broken. + export _this_host_is_cygwin=1 + return 1 +} + # # API functions for doing parallel processing # @@ -19,25 +49,51 @@ numjobs() { multijob_init() { # Setup a pipe for children to write their pids to when they finish. - mj_control_pipe=$(mktemp -t multijob.XXXXXX) - rm "${mj_control_pipe}" - mkfifo "${mj_control_pipe}" - redirect_alloc_fd mj_control_fd "${mj_control_pipe}" + export mj_control_pipe=$(mktemp -t multijob.XXXXXX) rm -f "${mj_control_pipe}" + mkfifo "${mj_control_pipe}" + + if ! this_host_is_cygwin ; then + redirect_alloc_fd mj_control_fd "${mj_control_pipe}" + rm -f "${mj_control_pipe}" + fi # See how many children we can fork based on the user's settings. mj_max_jobs=$(numjobs) mj_num_jobs=0 } +# make sure someone called multijob_init +multijob_assert() { + if this_host_is_cygwin ; then + [[ -z ${mj_control_pipe} ]] && \ + die "multijob initialization required" + [[ $( file -b "${mj_control_pipe}" ) != fifo* ]] && \ + die "multijob fifo gone" + else + [[ -z ${mj_control_fd} ]] && \ + die "multijob initialization required" + fi +} + multijob_child_init() { - trap 'echo ${BASHPID} $? >&'${mj_control_fd} EXIT + multijob_assert + if this_host_is_cygwin ; then + trap 'echo ${BASHPID} $? >'${mj_control_pipe} EXIT + else + trap 'echo ${BASHPID} $? >&'${mj_control_fd} EXIT + fi trap 'exit 1' INT TERM } multijob_finish_one() { local pid ret - read -r -u ${mj_control_fd} pid ret + multijob_assert + if this_host_is_cygwin ; then + read -r pid ret < ${mj_control_pipe} + else + read -r -u ${mj_control_fd} pid ret + fi : $(( --mj_num_jobs )) return ${ret} } @@ -70,6 +126,9 @@ multijob_post_fork() { redirect_alloc_fd() { local var=$1 file=$2 redir=${3:-"<>"} + [[ "${redir}" == "<>" ]] && this_host_is_cygwin && \ + die "Cygwin bash has broken <> bidirectional redirection support." + if [[ $(( (BASH_VERSINFO[0] << 8) + BASH_VERSINFO[1] )) -ge $(( (4 << 8) + 1 )) ]] ; then # Newer bash provides this functionality. eval "exec {${var}}${redir}'${file}'" -- (Here's hoping Thunderbird didn't reflow the above; apologies in advance if it did) In multiprocessing.eclass, I do something fairly similar, also relying on named pipes rather than file-descriptor cloning. I was wondering if anyone in the know could comment on how correct or incorrect the above actually is? For example, is either portage multijob or mutliprocessing.eclass expected to work if someone nests multijob_init's (because I'm pretty sure mine won't -- FTR, this seems pretty naughty, given that both get their parallelism defaults from MAKEOPTS)? Finally, to be clear, I'm not wanting this to go in to git or anything... at least, certainly not as of now. Just seeking constructive criticism so my code isn't horribly broken, since there aren't a ton of multi{jobs,processing} clients, ATM, for me to test against. -gmt Postscript: status of this bug in cygwin, for anyone who may be interested/googling: I've tried to look into this, just a little bit, because if I run an ultra-simple bidirectional pipe test in C, it almost kinda-sorta works (although some weird pump-priming seems to be required ... blah). bash's code for this doesn't exactly make for light reading, nor does cygwin's named pipe code, although the latter is at least not mixed up with a bunch of parser/generator code. Given that I'm not even sure the problem isn't already fixed upstream, it didn't seem worth the effort. The matter has been mentioned on the cygwin mailing list, and cgf, cygwin lead dev and implementation SME, has stated he doesn't want the named-pipe code churning for now. So, unless the fix is already in-repo, but unreleased (my over-under on this would be maybe 3:1 against; I'm too lazy to check), it's likely to stay this way for a while. My personal game-plan is: just wait until the next cygwin core release and see if it's fixed. Until then, I have what seem to be tolerable work-arounds for my overlay, as described above. If the next release fails to resolve the problem, then I might revisit and see if I can't find a solution, or at least a work-around in bash.