[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gentoo-dev
Subject:    Re: [gentoo-dev] Re: Questions about SystemD and OpenRC
From:       "Gregory M. Turner" <gmt () malth ! us>
Date:       2012-08-18 20:50:31
Message-ID: 50300017.20803 () malth ! us
[Download RAW message or body]

On 8/16/2012 6:26 PM, Rich Freeman wrote:
> On Thu, Aug 16, 2012 at 4:05 PM, Michael Mol <mikemol@gmail.com> wrote:
>> The limited-visibility build feature discussed a week or so ago would
>> go a long way in detecting unexpressed build dependencies.

[snip]

> If portage has the
> dependency tree in RAM then you just need to dump all the edb listings
> for those packages plus @system and feed those into sandbox.

> That just requires reading a bunch of text files and no searching, so it
> should be pretty quick.

Portage could hypothetically compile such a list while it crawls the 
package dependency tree, but I suspect the cost will not be small as you 
predict.

> As far as I can tell the relevant calls to
> check for read access are already being made in sandbox already, and
> obviously they aren't taking forever.  We just have to see if the
> search gets slow if the access list has tens of thousands of entries
> (if it does, that is just a simple matter of optimization, but being
> in-RAM I can't see how tens of thousands of entries is going to slow
> down a modern CPU even if it is just an unsorted list).

I appreciate your optimism but I think you're underestimating the cost. 
  Can't speak for others, but my portage db's churn too much for comfort 
as is.  Once we start multiplying per-package-dependency iteration by 
the files-per-package iteration, that's going to be O(a-shit-load).

Of course, where there's a will there's a way.  I'd be surprised if some 
kind of delayed-evaluation + caching scheme wouldn't suffice, or, 
barring that, perhaps it's time to create an indexed-database-based 
drop-in replacement for the current portage db code.

I've enclosed some scripts you may find helpful in looking at the 
numbers.  They are kind-of kludgey (originally intended for 
in-house-only use and modified for present purposes) but may help shed 
some light, if they aren't too buggy, that is...

"dumpworld" slices and dices "emerge -ep" output to provide a list of 
atoms in the complete dependency tree of a given list of atoms (add 
'@system' to get the complete tree, dumpworld won't do so).

"dumpfiles" operates only on packages installed in the local system 
(non-installed atoms are silently dropped), and requires/assumes that 
'emerge -ep world' would not change anything if it is to give accurate 
information.  It takes a list of atoms, transforms them into the 
complete lists of atoms in their dependency tree via dumpworld, merges 
the lists together, and finds the number of files associated with each 
atom in portage.  Any collisions will be counted twice, since it doesn't 
keep track.  It also doesn't add '@system' unless you do.  By default it 
emits:

  o A list of package atoms and the files owned by each atom (stderr)
  o total atoms and files
  o average filename length

What is, perhaps, more discouraging than the numbers it reports is how 
long it takes to run (note: although I suspect an optimized python 
implementation could be made to do this faster by a moderate constant 
factor, I'm not sure if the big-oh performance characteristics can be 
significantly improved without database structure changes like the ones 
mentioned above).

My disturbingly bloated and slow workstation gives these answers (note: 
here it's even slower because it's running in an emulator):

greg@fedora64vmw ~ $ time bash -c 'dumpfiles @system 2>/dev/null'
TOTAL: 402967 files (in 816 ebuilds, average path length: 66)


real    15m33.719s
user    13m18.909s
sys     2m8.436s
greg@fedora64vmw ~ $ time bash -c 'dumpfiles chromium 2>/dev/null'
TOTAL: 401300 files (in 807 ebuilds, average path length: 66)


real    15m28.900s
user    13m15.126s
sys     2m8.088s

My workstation is surely an "outlier" as I have a lot of dependencies 
and files due to multilib, split-debug, and USE+=$( a lot ).  It's also 
got slow hardware Raid6 and the emulator only gives it 2G of ram to work 
with.  But I'm a real portage user; I'm sure there's other ones out 
there, if not many, with similar constraints.

-gmt

["dumpfiles" (text/plain)]

#!/bin/bash

if [[ x$(qlist -IC app-portage/portage-utils)x == xx || \
	x$(qlist -IC app-portage/gentoolkit)x == xx ]] ; then
	echo "This utility requires both app-portage/portage-utils" >&2
	echo "and app-portage/gentoolkit.  Emerge them both and try again." >&2
	exit 1
fi

declare -a arguments atoms

arguments=( )
atoms=( )

verbose=yes
redic=no

for arg in "$@" ; do
	case $arg in
		-q|--quiet) verbose=no ;;
		-r|--redic) redic=yes ;; 
		 *) arguments=( "${arguments[@]}" "$arg" ) ;;
	esac
done

[[ ${#arguments[*]} == 0 ]] && arguments=( '@world' )

for arg in "${arguments[@]}" ; do
	if [[ ${arg} == @* ]] ; then
		newatoms=( "${arg}" )
	else
		newatoms=( "$( qlist -eICv "${arg}" | sed 's/^/=/' )" )
	fi
	newatoms=( $( dumpworld "${newatoms[@]}" ) )
	result=$?
	[[ ${result} != 0 ]] && { echo "dumpworld failed, giving up." >&2 ; exit ${result} ; \
}  atoms=( "${atoms[@]}" "${newatoms[@]}" )
done

# OK, we have all the packages -- remove dups, there could be a bunch.
atoms=( $( for atom in "${atoms[@]}" ; do echo "${atom}" ; done | sort -u ) )
		
[[ ${verbose} == yes ]] && \
	echo "Checking for files depended upon by the specified atom(s):" >&2 && \
	echo >&2

total=0
totalfilechars=0
for atom in "${atoms[@]}" ; do
	# turns out equery filse includes certain files (/usr/lib/debug... but why?)
	# that qlist excludes so ... we'd might as well get all the bad news possible
	files=$( equery -Cq files "${atom}" )
	result=$?
	[[ $result == 0 ]] || { echo "equery -Cq files ${atom} failed." >&2 ; exit $result ; \
}  count=$( echo "${files}" | wc -l )
	(( total += count ))
	while read filename ; do
		(( totalfilechars += ${#filename} ))
	done < <( echo "${files}" )
	if [[ ${verbose} == yes ]] ; then
		if [[ ${redic} == yes ]] ; then
			echo "${files}"
		else
			echo "${atom}: ${count}" >&2
		fi
	fi
done
[[ ${verbose} == yes ]] && echo >&2 && echo >&2

[[ ${verbose} == yes ]] && echo -n "TOTAL: "
echo -n "${total}"

averagepathlen=$(( totalfilechars / ${total} ))
[[ ${verbose} == yes ]] && echo -n " files (in ${#atoms[*]} ebuilds, average path \
length: ${averagepathlen})" echo
echo

exit 0


["dumpworld" (text/plain)]

#!/bin/bash

declare -a atoms
if [[ x$1 == x ]] ; then
	atoms=( '@world' )
else
	atoms=( "$@" )
fi

emerge_result=$( emerge --ignore-default-opts -epqD --backtrack=999 --with-bdeps=y \
"${atoms[@]}" 2>/dev/null ) 

trouble=$?

echo "${emerge_result}" |grep -v ^.uninstall | grep -v ^.blocks | sed 's/^.[^]]*] \
/=/;s/ \[[^]]*].*$//'

if [[ ${trouble} != 0 ]] ; then
	echo "WARNING: results not reliable due to portage failure." >&2
	echo "since portage stderr is ignored by this script, this" >&2
	echo "could mean anything, perhaps depsolving trouble?" >&2
	exit ${trouble}
fi

exit 0



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic