[prev in list] [next in list] [prev in thread] [next in thread]
List: busybox
Subject: Patch fuzz factor design ruminations.
From: Rob Landley <rob () landley ! net>
Date: 2010-09-26 23:38:43
Message-ID: 201009261838.43659.rob () landley ! net
[Download RAW message or body]
So once again, a patch didn't apply because of fuzz factor, and I'm getting
tired of it. Implementing fuzz support isn't technically hard, but getting
the design right _is_, so I thought I'd publically mull it over while working
out what to do.
Fuzz factor is different from a patch offset. Applying patches at offsets is
normal, and in fact the patch algorithm I implemented ignores the suggested
location entirely. It actually operates in a streaming manner, like a really
weird form of sed. It just finds the first place to apply the patch, and if a
hunk doesn't apply it hits the end of the file and bails out there. This is a
reasonably simple and low-memory way of doing it, which works because the
pattern to apply has (generally three) leading context lines which have to
match, (generally three) trailing context lines which have to match, and the
lines removed by the patch also have to match. This is a fairly reliable
identifier of what needs to change, there's enough information to reliably
identify the correct location, even without the offset.
Fuzz factor, on the other hand, says "not all of those lines are going to
match". My understanding is that a fuzz factor of 1 says strip two context
lines (one from the beginning, one from the end). A fuzz factor of 2 says
strip four context lines (the first two and the last two).
This leads to the problem of hunks like this:
@@ blah blah blah
context
context
{
+ insert
+ insert
+ insert
context
context
With a fuzz factor of 2, and no deleted lines to match, that can insert
anywhere that has a curly bracket at the right indentation level followed by a
blank line. This causes SUBTLE BUGS which the linux-kernel guys complain
about from time to time. Guys like Andrew Morton, Al Viro, and Dave Jones
have all complained about gnu's default fuzz factor fallbacks on a conceptual
level, it makes a heroic attempt to apply stale patches and winds up mis-
applying them rather than breaking and forcing people to fix up a version-
skewed patch.
However, I'm playing around with automating the Linux From Scratch 6.6 build
on top of my Aboriginal Linux project, and the patches they're applying to the
packages in the lfs-6.6-source tarball have a fuzz factor of 2, ala:
Applying /home/landley/aboriginal/aboriginal/build/host-temp/lfs-
bootstrap/lfs/coreutils-8.4-uname-1.patch
patching file src/uname.c
Hunk #2 succeeded at 314 with fuzz 2.
Hunk #3 succeeded at 441 with fuzz 1.
Hunk #4 succeeded at 449 with fuzz 2.
(Nope, not the only example, diffutils-2.8.1-i18n-1.patch patches src/diff.h
with fuzz 2, and so on. Apparently, if patch doesn't refuse to apply it they
see no need ot upgrade it.)
Fuzz factor is just trimming context lines. I can do that. But under what
circumstances should I? (Especially since I'm _ignoring_ the offset
information, which can only make the mis-applied fuzz thing worse.)
Right now, the pathological case for applying a patch is 6 lines of context: 3
leading, 3 trailing, and all insertions with no deletions. That's fairly
reliable. If I count lines deleted in the body of the patch as additional
information, then I can auto-set a fuzz factor based on still needing to match
at least 6 lines to be happy that I'm applying the hunk at a good place.
However, that won't help this hunk, which _is_ the "Hunk #2 succeeded at 314
with fuzz 2" example above, and in fact the first one busybox complained about
not being able to apply:
@@ -308,6 +314,96 @@
if (0 <= sysinfo (SI_ARCHITECTURE, processor, sizeof processor))
element = processor;
}
+#else
+ {
BLAH BLAH BLAH nothing but insertion for many lines
+#endif
+ }
#endif
#ifdef UNAME_PROCESSOR
if (element == unknown)
The entire hunk is one big insertion, with no deletion to add additional
context. With fuzz 2 (stripping 2 context lines from the beginning, and 2
from the end), the remaining context is a curly bracket and an #endif. Yeah,
not likely to find _those_ together at some random place in a C file. As far as
I can tell, this file is only still applying correctly by sheer coincidence.
And it's EXACTLY the kind of thing that may produce code that still compiles,
but doesn't do what the author intended, and no human's looked at it since it
changed so they won't notice until they hit that bug, and will then be stumped
because _they_ didn't move it so won't _see_ that something else did...
So once again, I know how to implement fuzz factor, I can probably even come
up for sane rules when fuzz factor can be applied safely... and it won't fix
the problem in front of me.
Anybody have an opinion? Because I'm stumped. The code is _aware_ of current
offset (it's calculating the line count in case it needs to display it), maybe
I can work current offset into the fuzz factor calculations. But it's still
not going to be _reliable_...
Sigh.
Rob
--
GPLv3: as worthy a successor as The Phantom Menace, as timely as Duke Nukem
Forever, and as welcome as New Coke.
_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic