[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gentoo-dev
Subject:    Re: [gentoo-dev] Bazel Build eclass
From:       Jason Zaman <perfinion () gentoo ! org>
Date:       2018-12-19 15:31:42
Message-ID: 20181219153142.GA56005 () baraddur ! perfinion ! com
[Download RAW message or body]

On Sun, Nov 18, 2018 at 09:01:00AM +0100, Michał Górny wrote:
> On Sun, 2018-11-18 at 15:37 +0800, Jason Zaman wrote:
> > On Sat, Nov 17, 2018 at 11:54:24PM +0100, Michał Górny wrote:
> > > On Sun, 2018-11-18 at 03:37 +0800, Jason Zaman wrote:
> > > > Hey all,
> > > > 
> > > > I've been using Bazel (https://bazel.build/) to build TensorFlow for a
> > > > while now. Here is a bazel.eclass I'd like to commit to make it easier
> > > > for packages that use it to build. It's basically bits that I've
> > > > refactored out of the TensorFlow ebuild that would be useful to other
> > > > packages as well. I have a bump to sci-libs/tensorflow-1.12.0 prepared
> > > > that uses this eclass and have tested a full install.
> > > > 
> > > > -- Jason

Here is a v2 of the eclass for review:

Major changes:
- removed MULTIBUILD_VARIANT in favour of BUILD_DIR
- changed bazel_load_distfiles() to tokenize and skip items instead of a
  giant sed command.


# Copyright 1999-2018 Jason Zaman
# Distributed under the terms of the GNU General Public License v2

# @ECLASS: bazel.eclass
# @MAINTAINER:
# Jason Zaman <perfinion@gentoo.org>
# @AUTHOR:
# Jason Zaman <perfinion@gentoo.org>
# @BLURB: Utility functions for packages using Bazel Build
# @DESCRIPTION:
# A utility eclass providing functions to run the Bazel Build system.
#
# This eclass does not export any phase functions.

case "${EAPI:-0}" in
	0|1|2|3|4|5|6)
		die "Unsupported EAPI=${EAPI:-0} (too old) for ${ECLASS}"
		;;
	7)
		;;
	*)
		die "Unsupported EAPI=${EAPI} (unknown) for ${ECLASS}"
		;;
esac

if [[ ! ${_BAZEL_ECLASS} ]]; then

inherit multiprocessing toolchain-funcs

BDEPEND=">=dev-util/bazel-0.19"

# @FUNCTION: bazel_get_flags
# @DESCRIPTION:
# Obtain and print the bazel flags for target and host *FLAGS.
#
# To add more flags to this, append the flags to the
# appropriate variable before calling this function
bazel_get_flags() {
	local i fs=()
	for i in ${CFLAGS}; do
		fs+=( "--conlyopt=${i}" )
	done
	for i in ${BUILD_CFLAGS}; do
		fs+=( "--host_conlyopt=${i}" )
	done
	for i in ${CXXFLAGS}; do
		fs+=( "--cxxopt=${i}" )
	done
	for i in ${BUILD_CXXFLAGS}; do
		fs+=( "--host_cxxopt=${i}" )
	done
	for i in ${CPPFLAGS}; do
		fs+=( "--conlyopt=${i}" "--cxxopt=${i}" )
	done
	for i in ${BUILD_CPPFLAGS}; do
		fs+=( "--host_conlyopt=${i}" "--host_cxxopt=${i}" )
	done
	for i in ${LDFLAGS}; do
		fs+=( "--linkopt=${i}" )
	done
	for i in ${BUILD_LDFLAGS}; do
		fs+=( "--host_linkopt=${i}" )
	done
	echo "${fs[*]}"
}

# @FUNCTION: bazel_setup_bazelrc
# @DESCRIPTION:
# Creates the bazelrc with common options that will be passed
# to bazel. This will be called by ebazel automatically so
# does not need to be called from the ebuild.
bazel_setup_bazelrc() {
	if [[ -f "${T}/bazelrc" ]]; then
		return
	fi

	# F: fopen_wr
	# P: /proc/self/setgroups
	# Even with standalone enabled, the Bazel sandbox binary is run for feature test:
	# https://github.com/bazelbuild/bazel/blob/7b091c1397a82258e26ab5336df6c8dae1d97384/s \
rc/main/java/com/google/devtools/build/lib/sandbox/LinuxSandboxedSpawnRunner.java#L61 \
# https://github.com/bazelbuild/bazel/blob/76555482873ffcf1d32fb40106f89231b37f850a/src/main/tools/linux-sandbox-pid1.cc#L113
  addpredict /proc

	mkdir -p "${T}/bazel-cache" || die
	mkdir -p "${T}/bazel-distdir" || die

	cat > "${T}/bazelrc" <<-EOF || die
		startup --batch

		# dont strip HOME, portage sets a temp per-package dir
		build --action_env HOME

		# make bazel respect MAKEOPTS
		build --jobs=$(makeopts_jobs)
		build --compilation_mode=opt --host_compilation_mode=opt

		# FLAGS
		build $(bazel_get_flags)

		# Use standalone strategy to deactivate the bazel sandbox, since it
		# conflicts with FEATURES=sandbox.
		build --spawn_strategy=standalone --genrule_strategy=standalone
		test --spawn_strategy=standalone --genrule_strategy=standalone

		build --strip=never
		build --verbose_failures --noshow_loading_progress
		test --verbose_test_summary --verbose_failures --noshow_loading_progress

		# make bazel only fetch distfiles from the cache
		fetch --repository_cache="${T}/bazel-cache/" --distdir="${T}/bazel-distdir/"
		build --repository_cache="${T}/bazel-cache/" --distdir="${T}/bazel-distdir/"

		build --define=PREFIX=${EPREFIX%/}/usr
		build --define=LIBDIR=\$(PREFIX)/$(get_libdir)
		EOF

	if tc-is-cross-compiler; then
		echo "build --nodistinct_host_configuration" >> "${T}/bazelrc" || die
	fi
}

# @FUNCTION: ebazel
# @USAGE: [<args>...]
# @DESCRIPTION:
# Run bazel with the bazelrc and output_base.
#
# output_base will be specific to $BUILD_DIR (if unset, $S).
# bazel_setup_bazelrc will be called and the created bazelrc
# will be passed to bazel.
#
# Will automatically die if bazel does not exit cleanly.
ebazel() {
	bazel_setup_bazelrc

	# Use different build folders for each multibuild variant.
	local output_base="${BUILD_DIR:-${S}}"
	output_base="${output_base%/}-bazel-base"
	mkdir -p "${output_base}" || die

	set -- bazel --bazelrc="${T}/bazelrc" --output_base="${output_base}" ${@}
	echo "${*}" >&2
	"${@}" || die "ebazel failed"
}

# @FUNCTION: bazel_load_distfiles
# @USAGE: <distfiles>...
# @DESCRIPTION:
# Populate the bazel distdir to fetch from since it cannot use
# the network. Bazel looks in distdir but will only look for the
# original filename, not the possibly renamed one that portage
# downloaded. If the line has -> we to rename it back. This also
# handles use-conditionals that SRC_URI does.
#
# Example:
# @CODE
# bazel_external_uris="http://a/file-2.0.tgz
#     python? ( http://b/1.0.tgz -> foo-1.0.tgz )"
# SRC_URI="http://c/${PV}.tgz
#     ${bazel_external_uris}"
#
# src_unpack() {
#     unpack ${PV}.tgz
#     bazel_load_distfiles "${bazel_external_uris}"
# }
# @CODE
bazel_load_distfiles() {
	local file=""
	local rename=0

	[[ "${@}" ]] || die "Missing args"
	mkdir -p "${T}/bazel-distdir" || die

	for word in ${@}
	do
		if [[ "${word}" == "->" ]]; then
			# next word is a dest filename
			rename=1
		elif [[ "${word}" == ")" ]]; then
			# close conditional block
			continue
		elif [[ "${word}" == "(" ]]; then
			# open conditional block
			continue
		elif [[ "${word}" == ?(\!)[A-Za-z0-9]*([A-Za-z0-9+_@-])\? ]]; then
			# use-conditional block
			# USE-flags can contain [A-Za-z0-9+_@-], and start with alphanum
			# https://dev.gentoo.org/~ulm/pms/head/pms.html#x1-200003.1.4
			# ?(\!) matches zero-or-one !'s
			# *(...) zero-or-more characters
			# ends with a ?
			continue
		elif [[ ${rename} -eq 1 ]]; then
			# Make sure the distfile is used
			if [[ "${A}" == *"${word}"* ]]; then
				echo "Copying ${file} to bazel distdir as ${word}"
				ln -s "${DISTDIR}/${word}" "${T}/bazel-distdir/${file}" || die
			fi
			rename=0
			file=""
		else
			# another URL, current one may or may not be a rename
			# if there was a previous one, its not renamed so copy it now
			if [[ -n "${file}" && "${A}" == *"${file}"* ]]; then
				echo "Copying ${file} to bazel distdir"
				ln -s "${DISTDIR}/${file}" "${T}/bazel-distdir/${file}" || die
			fi
			# save the current URL, later we will find out if its a rename or not.
			file="${word##*/}"
		fi
	done

	# handle last file
	if [[ -n "${file}" ]]; then
		echo "Copying ${file} to bazel distdir"
		ln -s "${DISTDIR}/${file}" "${T}/bazel-distdir/${file}" || die
	fi
}

_BAZEL_ECLASS=1
fi







> > > > 
> > > > # Copyright 1999-2018 Jason Zaman
> > > > # Distributed under the terms of the GNU General Public License v2
> > > > 
> > > > # @ECLASS: bazel.eclass
> > > > # @MAINTAINER:
> > > > # Jason Zaman <perfinion@gentoo.org>
> > > > # @AUTHOR:
> > > > # Jason Zaman <perfinion@gentoo.org>
> > > > # @BLURB: Utility functions for packages using Bazel Build
> > > > # @DESCRIPTION:
> > > > # A utility eclass providing functions to run the Bazel Build system.
> > > > #
> > > > # This eclass does not export any phase functions.
> > > > 
> > > > case "${EAPI:-0}" in
> > > > 	0|1|2|3|4|5|6)
> > > > 		die "Unsupported EAPI=${EAPI:-0} (too old) for ${ECLASS}"
> > > > 		;;
> > > > 	7)
> > > > 		;;
> > > > 	*)
> > > > 		die "Unsupported EAPI=${EAPI} (unknown) for ${ECLASS}"
> > > > 		;;
> > > > esac
> > > > 
> > > > if [[ ! ${_BAZEL_ECLASS} ]]; then
> > > > 
> > > > inherit multiprocessing toolchain-funcs
> > > > 
> > > > BDEPEND=">=dev-util/bazel-0.19"
> > > > 
> > > > # @FUNCTION: bazel_get_flags
> > > > # @DESCRIPTION:
> > > > # Obtain and print the bazel flags for target and host *FLAGS.
> > > > #
> > > > # To add more flags to this, append the flags to the
> > > > # appropriate variable before calling this function
> > > > bazel_get_flags() {
> > > > 	local i fs=()
> > > > 	for i in ${CFLAGS}; do
> > > > 		fs+=( "--conlyopt=${i}" )
> > > > 	done
> > > > 	for i in ${BUILD_CFLAGS}; do
> > > > 		fs+=( "--host_conlyopt=${i}" )
> > > > 	done
> > > > 	for i in ${CXXFLAGS}; do
> > > > 		fs+=( "--cxxopt=${i}" )
> > > > 	done
> > > > 	for i in ${BUILD_CXXFLAGS}; do
> > > > 		fs+=( "--host_cxxopt=${i}" )
> > > > 	done
> > > > 	for i in ${CPPFLAGS}; do
> > > > 		fs+=( "--conlyopt=${i}" "--cxxopt=${i}" )
> > > > 	done
> > > > 	for i in ${BUILD_CPPFLAGS}; do
> > > > 		fs+=( "--host_conlyopt=${i}" "--host_cxxopt=${i}" )
> > > > 	done
> > > > 	for i in ${LDFLAGS}; do
> > > > 		fs+=( "--linkopt=${i}" )
> > > > 	done
> > > > 	for i in ${BUILD_LDFLAGS}; do
> > > > 		fs+=( "--host_linkopt=${i}" )
> > > > 	done
> > > > 	echo "${fs[*]}"
> > > > }
> > > > 
> > > > # @FUNCTION: bazel_setup_bazelrc
> > > > # @DESCRIPTION:
> > > > # Creates the bazelrc with common options that will be passed
> > > > # to bazel. This will be called by ebazel automatically so
> > > > # does not need to be called from the ebuild.
> > > > bazel_setup_bazelrc() {
> > > > 	if [[ -f "${T}/bazelrc" ]]; then
> > > > 		return
> > > > 	fi
> > > > 
> > > > 	# F: fopen_wr
> > > > 	# P: /proc/self/setgroups
> > > > 	# Even with standalone enabled, the Bazel sandbox binary is run for feature \
> > > > test:  # https://github.com/bazelbuild/bazel/blob/7b091c1397a82258e26ab5336df6 \
> > > > c8dae1d97384/src/main/java/com/google/devtools/build/lib/sandbox/LinuxSandboxedSpawnRunner.java#L61
> > > >   # https://github.com/bazelbuild/bazel/blob/76555482873ffcf1d32fb40106f89231b37f850a/src/main/tools/linux-sandbox-pid1.cc#L113
> > > >   addpredict /proc
> > > > 
> > > > 	mkdir -p "${T}/bazel-cache" || die
> > > > 	mkdir -p "${T}/bazel-distdir" || die
> > > > 
> > > > 	cat > "${T}/bazelrc" <<-EOF || die
> > > > 	startup --batch
> > > 
> > > Maybe indent this stuff to make it stand out from ebuild code.
> > > 
> > > > 
> > > > 	# dont strip HOME, portage sets a temp per-package dir
> > > > 	build --action_env HOME
> > > > 
> > > > 	# make bazel respect MAKEOPTS
> > > > 	build --jobs=$(makeopts_jobs)
> > > > 	build --compilation_mode=opt --host_compilation_mode=opt
> > > > 
> > > > 	# FLAGS
> > > > 	build $(bazel_get_flags)
> > > > 
> > > > 	# Use standalone strategy to deactivate the bazel sandbox, since it
> > > > 	# conflicts with FEATURES=sandbox.
> > > > 	build --spawn_strategy=standalone --genrule_strategy=standalone
> > > > 	test --spawn_strategy=standalone --genrule_strategy=standalone
> > > > 
> > > > 	build --strip=never
> > > > 	build --verbose_failures --noshow_loading_progress
> > > > 	test --verbose_test_summary --verbose_failures --noshow_loading_progress
> > > > 
> > > > 	# make bazel only fetch distfiles from the cache
> > > > 	fetch --repository_cache="${T}/bazel-cache/" --distdir="${T}/bazel-distdir/"
> > > > 	build --repository_cache="${T}/bazel-cache/" --distdir="${T}/bazel-distdir/"
> > > > 
> > > > 	build --define=PREFIX=${EPREFIX%/}/usr
> > > > 	build --define=LIBDIR=\$(PREFIX)/$(get_libdir)
> > > > 
> > > > 	EOF
> > > > 
> > > > 	tc-is-cross-compiler || \
> > > > 		echo "build --nodistinct_host_configuration" >> "${T}/bazelrc" || die
> > > 
> > > Don't do || chains, they are unreadable.
> > 
> > ok
> > 
> > > > }
> > > > 
> > > > # @FUNCTION: ebazel
> > > > # @USAGE: [<args>...]
> > > > # @DESCRIPTION:
> > > > # Run bazel with the bazelrc and output_base.
> > > > #
> > > > # If $MULTIBUILD_VARIANT is set, this will make an output_base
> > > > # specific to that variant.
> > > > # bazel_setup_bazelrc will be called and the created bazelrc
> > > > # will be passed to bazel.
> > > > #
> > > > # Will automatically die if bazel does not exit cleanly.
> > > > ebazel() {
> > > > 	bazel_setup_bazelrc
> > > > 
> > > > 	# Use different build folders for each multibuild variant.
> > > > 	local base_suffix="${MULTIBUILD_VARIANT+-}${MULTIBUILD_VARIANT}"
> > > 
> > > Any reason not to use BUILD_DIR instead of reinventing it?
> > 
> > Isnt $BUILD_DIR $S by default? output_base is a bazel thing with a weird
> > structure and has nothing to do with the source tree. Doing it this way
> > meant python_foreach_impl was easy. but I might be confused and will
> > look into it again and see if it can be better. this way means with and
> > without multibuild just works tho.
> 
> There's no default.  It's defined by the few eclasses such as multibuild
> and cmake-utils.  You just need to default if it other eclasses don't
> set it.
> 
> > 
> > > > 	local output_base="${WORKDIR}/bazel-base${base_suffix}"
> > > > 	mkdir -p "${output_base}" || die
> > > > 
> > > > 	einfo Running: bazel --output_base="${output_base}" "$@"
> > > > 	bazel --bazelrc="${T}/bazelrc" --output_base="${output_base}" $@ || die \
> > > > "ebazel $@"
> > > 
> > > The common practice is to echo >&2 it.  Also, you output different
> > > arguments than you execute which is going to confuse the hell out of
> > > users who'll end up having to debug this.  You can use a trick like
> > > the following to avoid typing args twice:
> > > 
> > > set -- bazel --bazelrc...
> > > echo "${*}" >&2
> > > "${@}" || die ...
> > 
> > Oh, forgot that trick. yeah I'll do that.
> > 
> > > > }
> > > > 
> > > > # @FUNCTION: bazel_load_distfiles
> > > > # @USAGE: <distfiles>...
> > > > # @DESCRIPTION:
> > > > # Populate the bazel distdir to fetch from since it cannot use
> > > > # the network. Bazel looks in distdir but will only look for the
> > > > # original filename, not the possibly renamed one that portage
> > > > # downloaded. If the line has -> we to rename it back. This also
> > > > # handles use-conditionals that SRC_URI does.
> > > 
> > > Why oh why do you have to implement custom parser for the ebuild syntax?
> > > That's just asking for horrible failures.
> > 
> > Yeah ... so thats the problem with bazel and the main reason for the
> > eclass. Bazel has this idea that downloading and unpacking random
> > tarballs during build is okay. Now bazel has an option when it needs to
> > fetch something it will look in the dir --distdir= points to instead and
> > read the file instead of trying to fetch from the (sandboxed thus
> > non-existent) internet. But bazel only knows the original filenames not
> > the names that portage renamed things to so I need to rename them back
> > to the original names.
> > 
> > The other option would be to have ebuilds maintain a second list of urls
> > on top of the SRC_URI one which seems way more of a maintenance burden,
> > especially when use-flags are involved. So all I really do here is strip
> > the use? and ()'s and take the list of urls and put them in the
> > bazel-distdir with the original filenames.
> > 
> > If you can think of a better way to accomplish that I'd love to hear it
> > tho. I'll add more comments about what it does but bazel is weird and
> > just utterly fails if it cant read the tarballs. And unpacking things
> > myself doesnt work because of the weird file structure bazel uses so
> > knowing where to put them is non-trivial.
> 
> What if the original URL goes down and somebody replaces 'foo -> bar'
> with 'mirror://gentoo/bar'?
> 
> > 
> > > > #
> > > > # Example:
> > > > # @CODE
> > > > # bazel_external_uris="http://a/file-2.0.tgz
> > > > #     python? ( http://b/1.0.tgz -> foo-1.0.tgz )"
> > > > # SRC_URI="http://c/${PV}.tgz
> > > > #     ${bazel_external_uris}"
> > > > #
> > > > # src_unpack() {
> > > > #     unpack ${PV}.tgz
> > > > #     bazel_load_distfiles "${bazel_external_uris}"
> > > > # }
> > > > # @CODE
> > > > bazel_load_distfiles() {
> > > > 	local src dst uri rename
> > > > 
> > > > 	[[ "$@" ]] || die "Missing args"
> > > > 	mkdir -p "${T}/bazel-distdir" || die
> > > > 
> > > > 	while read uri rename dst; do
> > > > 		src="${uri##*/}"
> > > > 		[[ -z $src ]] && continue
> > > 
> > > Please use ${foo} syntax in ebuilds, consistently.
> > 
> > ok
> > 
> > > > 		if [[ "$rename" != "->" ]]; then
> > > > 			dst="${src}"
> > > > 		fi
> > > > 
> > > > 		[[ ${A} =~ ${dst} ]] || continue
> > > 
> > > Why are you doing regex match here?  Last I checked, we didn't use
> > > regular expressions in SRC_URI.
> > 
> > It was to skip this one if its not in $A (eg disabled with a use-flag)
> > I'll change it to == *${dst}* instead then.
> > 
> > > > 
> > > > 		if [[ "$dst" == "$src" ]]; then
> > > > 			einfo "Copying $dst to bazel distdir ..."
> > > > 		else
> > > > 			einfo "Copying $dst to bazel distdir $src ..."
> > > 
> > > Are you using src and dst to mean the opposite?
> > 
> > No, this is right. its SRC_URI="src -> dst" so I need to rename dst back
> > so its named src.
> > 
> > > > 		fi
> > > > 		dst="$(readlink -f "${DISTDIR}/${dst}")"
> > > 
> > > Looks like you are hardcoding hacks for implementation details which
> > > indicates whatever you're doing is a very bad idea, and is going to fail
> > > whenever the implementation is subtly different than what you've worked
> > > around so far.
> > 
> > I can skip this line then no problem. It will just end up a link to a
> > link but thats fine.
> > 
> > > > 		ln -s "${dst}" "${T}/bazel-distdir/${src}" || die
> > > > 	done <<< "$(sed -re 's/!?[A-Za-z]+\?\s+\(\s*//g; s/\s+\)//g' <<< "$@")"
> > > 
> > > Please don't use horribly unreadable sed expressions.  This just means
> > > that whoever will have to touch this eclass in the future will wish you
> > > were never recruited.
> > 
> > I will split it up and add more comments. It just removes "use? (",
> > "!use? (", and " )". If I don't remove the useflag bits first then figuring out
> > if its a rename or not becomes a lot more complicated. If you have
> > other ideas im all ears.
> 
> The syntax is token-oriented, so just tokenize it and use a state
> machine to parse it if you must.
> 
> > 
> > 
> > > 
> > > > }
> > > > 
> > > > _BAZEL_ECLASS=1
> > > > fi
> > > > 
> > > > 
> > > > 
> > > 
> > > -- 
> > > Best regards,
> > > Michał Górny
> > 
> > 
> > 
> 
> -- 
> Best regards,
> Michał Górny


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic