[prev in list] [next in list] [prev in thread] [next in thread]
List: gentoo-dev
Subject: Re: [gentoo-dev] Re: Questions about SystemD and OpenRC
From: "Gregory M. Turner" <gmt () malth ! us>
Date: 2012-08-18 20:50:31
Message-ID: 50300017.20803 () malth ! us
[Download RAW message or body]
On 8/16/2012 6:26 PM, Rich Freeman wrote:
> On Thu, Aug 16, 2012 at 4:05 PM, Michael Mol <mikemol@gmail.com> wrote:
>> The limited-visibility build feature discussed a week or so ago would
>> go a long way in detecting unexpressed build dependencies.
[snip]
> If portage has the
> dependency tree in RAM then you just need to dump all the edb listings
> for those packages plus @system and feed those into sandbox.
> That just requires reading a bunch of text files and no searching, so it
> should be pretty quick.
Portage could hypothetically compile such a list while it crawls the
package dependency tree, but I suspect the cost will not be small as you
predict.
> As far as I can tell the relevant calls to
> check for read access are already being made in sandbox already, and
> obviously they aren't taking forever. We just have to see if the
> search gets slow if the access list has tens of thousands of entries
> (if it does, that is just a simple matter of optimization, but being
> in-RAM I can't see how tens of thousands of entries is going to slow
> down a modern CPU even if it is just an unsorted list).
I appreciate your optimism but I think you're underestimating the cost.
Can't speak for others, but my portage db's churn too much for comfort
as is. Once we start multiplying per-package-dependency iteration by
the files-per-package iteration, that's going to be O(a-shit-load).
Of course, where there's a will there's a way. I'd be surprised if some
kind of delayed-evaluation + caching scheme wouldn't suffice, or,
barring that, perhaps it's time to create an indexed-database-based
drop-in replacement for the current portage db code.
I've enclosed some scripts you may find helpful in looking at the
numbers. They are kind-of kludgey (originally intended for
in-house-only use and modified for present purposes) but may help shed
some light, if they aren't too buggy, that is...
"dumpworld" slices and dices "emerge -ep" output to provide a list of
atoms in the complete dependency tree of a given list of atoms (add
'@system' to get the complete tree, dumpworld won't do so).
"dumpfiles" operates only on packages installed in the local system
(non-installed atoms are silently dropped), and requires/assumes that
'emerge -ep world' would not change anything if it is to give accurate
information. It takes a list of atoms, transforms them into the
complete lists of atoms in their dependency tree via dumpworld, merges
the lists together, and finds the number of files associated with each
atom in portage. Any collisions will be counted twice, since it doesn't
keep track. It also doesn't add '@system' unless you do. By default it
emits:
o A list of package atoms and the files owned by each atom (stderr)
o total atoms and files
o average filename length
What is, perhaps, more discouraging than the numbers it reports is how
long it takes to run (note: although I suspect an optimized python
implementation could be made to do this faster by a moderate constant
factor, I'm not sure if the big-oh performance characteristics can be
significantly improved without database structure changes like the ones
mentioned above).
My disturbingly bloated and slow workstation gives these answers (note:
here it's even slower because it's running in an emulator):
greg@fedora64vmw ~ $ time bash -c 'dumpfiles @system 2>/dev/null'
TOTAL: 402967 files (in 816 ebuilds, average path length: 66)
real 15m33.719s
user 13m18.909s
sys 2m8.436s
greg@fedora64vmw ~ $ time bash -c 'dumpfiles chromium 2>/dev/null'
TOTAL: 401300 files (in 807 ebuilds, average path length: 66)
real 15m28.900s
user 13m15.126s
sys 2m8.088s
My workstation is surely an "outlier" as I have a lot of dependencies
and files due to multilib, split-debug, and USE+=$( a lot ). It's also
got slow hardware Raid6 and the emulator only gives it 2G of ram to work
with. But I'm a real portage user; I'm sure there's other ones out
there, if not many, with similar constraints.
-gmt
["dumpfiles" (text/plain)]
#!/bin/bash
if [[ x$(qlist -IC app-portage/portage-utils)x == xx || \
x$(qlist -IC app-portage/gentoolkit)x == xx ]] ; then
echo "This utility requires both app-portage/portage-utils" >&2
echo "and app-portage/gentoolkit. Emerge them both and try again." >&2
exit 1
fi
declare -a arguments atoms
arguments=( )
atoms=( )
verbose=yes
redic=no
for arg in "$@" ; do
case $arg in
-q|--quiet) verbose=no ;;
-r|--redic) redic=yes ;;
*) arguments=( "${arguments[@]}" "$arg" ) ;;
esac
done
[[ ${#arguments[*]} == 0 ]] && arguments=( '@world' )
for arg in "${arguments[@]}" ; do
if [[ ${arg} == @* ]] ; then
newatoms=( "${arg}" )
else
newatoms=( "$( qlist -eICv "${arg}" | sed 's/^/=/' )" )
fi
newatoms=( $( dumpworld "${newatoms[@]}" ) )
result=$?
[[ ${result} != 0 ]] && { echo "dumpworld failed, giving up." >&2 ; exit ${result} ; \
} atoms=( "${atoms[@]}" "${newatoms[@]}" )
done
# OK, we have all the packages -- remove dups, there could be a bunch.
atoms=( $( for atom in "${atoms[@]}" ; do echo "${atom}" ; done | sort -u ) )
[[ ${verbose} == yes ]] && \
echo "Checking for files depended upon by the specified atom(s):" >&2 && \
echo >&2
total=0
totalfilechars=0
for atom in "${atoms[@]}" ; do
# turns out equery filse includes certain files (/usr/lib/debug... but why?)
# that qlist excludes so ... we'd might as well get all the bad news possible
files=$( equery -Cq files "${atom}" )
result=$?
[[ $result == 0 ]] || { echo "equery -Cq files ${atom} failed." >&2 ; exit $result ; \
} count=$( echo "${files}" | wc -l )
(( total += count ))
while read filename ; do
(( totalfilechars += ${#filename} ))
done < <( echo "${files}" )
if [[ ${verbose} == yes ]] ; then
if [[ ${redic} == yes ]] ; then
echo "${files}"
else
echo "${atom}: ${count}" >&2
fi
fi
done
[[ ${verbose} == yes ]] && echo >&2 && echo >&2
[[ ${verbose} == yes ]] && echo -n "TOTAL: "
echo -n "${total}"
averagepathlen=$(( totalfilechars / ${total} ))
[[ ${verbose} == yes ]] && echo -n " files (in ${#atoms[*]} ebuilds, average path \
length: ${averagepathlen})" echo
echo
exit 0
["dumpworld" (text/plain)]
#!/bin/bash
declare -a atoms
if [[ x$1 == x ]] ; then
atoms=( '@world' )
else
atoms=( "$@" )
fi
emerge_result=$( emerge --ignore-default-opts -epqD --backtrack=999 --with-bdeps=y \
"${atoms[@]}" 2>/dev/null )
trouble=$?
echo "${emerge_result}" |grep -v ^.uninstall | grep -v ^.blocks | sed 's/^.[^]]*] \
/=/;s/ \[[^]]*].*$//'
if [[ ${trouble} != 0 ]] ; then
echo "WARNING: results not reliable due to portage failure." >&2
echo "since portage stderr is ignored by this script, this" >&2
echo "could mean anything, perhaps depsolving trouble?" >&2
exit ${trouble}
fi
exit 0
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic