[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gentoo-portage-dev
Subject:    [gentoo-portage-dev] blech... (multijob/multiprocessing work-around for cygwin)
From:       "Gregory M. Turner" <gmt () malth ! us>
Date:       2012-09-24 10:25:35
Message-ID: 5060351F.2040002 () malth ! us
[Download RAW message or body]

On cygwin, there is a problem with bi-directional pipe support in bash.

I used to solve this with an ugly reversion in portage and an 
ultra-simple stubbification patch for multiprocessing.eclass (both 
serialized everything).

However, this really sucked for numerous reasons, including the obvious 
one: it makes stuff slow as hell.

So I've reworked everything to use uni-directional named pipes.

In portage I have:

diff --git a/bin/helper-functions.sh b/bin/helper-functions.sh
index c7400fa..87f3120 100644
--- a/bin/helper-functions.sh
+++ b/bin/helper-functions.sh
@@ -7,6 +7,36 @@

  source "${PORTAGE_BIN_PATH:-/usr/lib/portage/bin}"/isolated-functions.sh

+# try real hard to figure out if this is a cygwin host; cache results.
+this_host_is_cygwin() {
+	if [[ -n ${_this_host_is_cygwin} ]] ; then
+		return $_this_host_is_cygwin
+	fi
+	[[ -x ${EPREFIX}/usr/bin/uname ]] && \
+		[[ $( ${EPREFIX}/usr/bin/uname -o 2>/dev/null ) == Cygwin* ]] && \
+			export _this_host_is_cygwin=0 && return 0
+	[[ -x /usr/bin/uname ]] && \
+		[[ $( /usr/bin/uname -o 2>/dev/null ) == Cygwin* ]] && \
+			export _this_host_is_cygwin=0 && return 0
+	[[ -x /bin/uname ]] && \
+		[[ $( /bin/uname -o 2>/dev/null ) == Cygwin* ]] && \
+			export _this_host_is_cygwin=0 && return 0
+	# hail-mary before we resort to envvars
+	[[ $( uname -o 2>/dev/null ) == Cygwin* ]] && \
+		export _this_host_is_cygwin=0 && return 0
+
+	[[ -n ${CHOST} ]] && \
+		[[ ${CHOST} == *-cygwin* ]] && \
+			export _this_host_is_cygwin=0 && return 0
+	[[ -n ${CTARGET} ]] && \
+		[[ ${CTARGET} == *-cygwin* ]] && \
+			export _this_host_is_cygwin=0 && return 0
+
+	# either it ain't cygwin or something is very broken.
+	export _this_host_is_cygwin=1
+	return 1
+}
+
  #
  # API functions for doing parallel processing
  #
@@ -19,25 +49,51 @@ numjobs() {

  multijob_init() {
  	# Setup a pipe for children to write their pids to when they finish.
-	mj_control_pipe=$(mktemp -t multijob.XXXXXX)
-	rm "${mj_control_pipe}"
-	mkfifo "${mj_control_pipe}"
-	redirect_alloc_fd mj_control_fd "${mj_control_pipe}"
+	export mj_control_pipe=$(mktemp -t multijob.XXXXXX)
  	rm -f "${mj_control_pipe}"
+	mkfifo "${mj_control_pipe}"
+
+	if ! this_host_is_cygwin ; then
+		redirect_alloc_fd mj_control_fd "${mj_control_pipe}"
+		rm -f "${mj_control_pipe}"
+	fi

  	# See how many children we can fork based on the user's settings.
  	mj_max_jobs=$(numjobs)
  	mj_num_jobs=0
  }

+# make sure someone called multijob_init
+multijob_assert() {
+	if this_host_is_cygwin ; then
+		[[ -z ${mj_control_pipe} ]] && \
+			die "multijob initialization required"
+		[[ $( file -b "${mj_control_pipe}" ) != fifo* ]] && \
+			die "multijob fifo gone"
+	else
+		[[ -z ${mj_control_fd} ]] && \
+			die "multijob initialization required"
+	fi
+}
+
  multijob_child_init() {
-	trap 'echo ${BASHPID} $? >&'${mj_control_fd} EXIT
+	multijob_assert
+	if this_host_is_cygwin ; then
+		trap 'echo ${BASHPID} $? >'${mj_control_pipe} EXIT
+	else
+		trap 'echo ${BASHPID} $? >&'${mj_control_fd} EXIT
+	fi
  	trap 'exit 1' INT TERM
  }

  multijob_finish_one() {
  	local pid ret
-	read -r -u ${mj_control_fd} pid ret
+	multijob_assert
+	if this_host_is_cygwin ; then
+		read -r pid ret < ${mj_control_pipe}
+	else
+		read -r -u ${mj_control_fd} pid ret
+	fi
  	: $(( --mj_num_jobs ))
  	return ${ret}
  }
@@ -70,6 +126,9 @@ multijob_post_fork() {
  redirect_alloc_fd() {
  	local var=$1 file=$2 redir=${3:-"<>"}

+	[[ "${redir}" == "<>" ]] && this_host_is_cygwin && \
+		die "Cygwin bash has broken <> bidirectional redirection support."
+
  	if [[ $(( (BASH_VERSINFO[0] << 8) + BASH_VERSINFO[1] )) -ge $(( (4 << 
8) + 1 )) ]] ; then
  			# Newer bash provides this functionality.
  			eval "exec {${var}}${redir}'${file}'"
--
(Here's hoping Thunderbird didn't reflow the above; apologies in advance 
if it did)

In multiprocessing.eclass, I do something fairly similar, also relying 
on named pipes rather than file-descriptor cloning.

I was wondering if anyone in the know could comment on how correct or 
incorrect the above actually is?  For example, is either portage 
multijob or mutliprocessing.eclass expected to work if someone nests 
multijob_init's (because I'm pretty sure mine won't -- FTR, this seems 
pretty naughty, given that both get their parallelism defaults from 
MAKEOPTS)?

Finally, to be clear, I'm not wanting this to go in to git or 
anything... at least, certainly not as of now.  Just seeking 
constructive criticism so my code isn't horribly broken, since there 
aren't a ton of multi{jobs,processing} clients, ATM, for me to test against.

-gmt

Postscript: status of this bug in cygwin, for anyone who may be 
interested/googling:

I've tried to look into this, just a little bit, because if I run an 
ultra-simple bidirectional pipe test in C, it almost kinda-sorta works 
(although some weird pump-priming seems to be required ... blah).

bash's code for this doesn't exactly make for light reading, nor does 
cygwin's named pipe code, although the latter is at least not mixed up 
with a bunch of parser/generator code.  Given that I'm not even sure the 
problem isn't already fixed upstream, it didn't seem worth the effort.

The matter has been mentioned on the cygwin mailing list, and cgf, 
cygwin lead dev and implementation SME, has stated he doesn't want the 
named-pipe code churning for now.  So, unless the fix is already 
in-repo, but unreleased (my over-under on this would be maybe 3:1 
against; I'm too lazy to check), it's likely to stay this way for a while.

My personal game-plan is: just wait until the next cygwin core release 
and see if it's fixed.  Until then, I have what seem to be tolerable 
work-arounds for my overlay, as described above.  If the next release 
fails to resolve the problem, then I might revisit and see if I can't 
find a solution, or at least a work-around in bash.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic