[prev in list] [next in list] [prev in thread] [next in thread] 

List:       busybox
Subject:    Re: [PATCH] awk: fix use after free (CVE-2022-30065)
From:       Natanael Copa <ncopa () alpinelinux ! org>
Date:       2022-06-16 21:04:47
Message-ID: 20220616230447.2473dff2 () ncopa-desktop ! lan
[Download RAW message or body]

On Thu, 16 Jun 2022 12:54:56 +0200
Natanael Copa <ncopa@alpinelinux.org> wrote:

> On Tue, 14 Jun 2022 18:24:54 +0200
> Denys Vlasenko <vda.linux@googlemail.com> wrote:
> 
> > On Tue, Jun 14, 2022 at 8:55 AM Natanael Copa <ncopa@alpinelinux.org> wrote:  
> > > Hi!
> > > 
> > > Is there anything else I can do to help fix CVE-2022-30065? I have
> > > created a testcase for the testsuite and proposed a fix, but I'm not
> > > that familiar with awk code so I would appreciate some help with this
> > > before pushing it to thousands (millions?) of users.    
> > 
> > cd testsuite && ./runtest awk
> > 
> > fails a lot with this change.  
> 
> Indeed, sorry! I thought I ran it locally but I must have done something wrong when \
> running them here. 
> Need to go back to the drawing board...

Ok, so I have made some progress and have another possible fix. I also
separated the test case into a separate commit, so it becomes easier to
test in different branches etc. while working on it.

I also have another (valid?) testcase which passes on gawk and nawk,
but fails with my fix suggestion. I also attach a WIP patch which adds
some debug info of pointer values to make it visible when tmpvars are
allocated and when evaluate() returns the just free'd pointer.

Summary of the attached patches:

0001-awk-fix-use-after-free-CVE-2022-30065.patch:
	Potential fix, that prevents the use-after-free. But honestly,
	I don't really know what I'm doing and I don't know if this
	breaks any valid cases.

0002-awk-test-for-CVE-2022-30065.patch:
	The test case for the CVE. gawk and mawk returns syntax error
	for this test. This patch can be applied as is, but it will not
	pass until we have a proper fix.

0003-awk-add-test-for-setting-1-while-testing-it.patch:
	A test case that passes with mawk/gawk but fails with the
	proposed fix. I don't know if this is valid syntax or if it is
	ok to error out with syntax error.

0004-WIP-debug-awk.patch:
	Temp patch that adds extra printf debug info to show what
	happens. This patch should *not* be applied upstream, but can
	be used in local development git branches to help debug.


With the attached debug print 0004-WIP-debug-awk.patch applied, gives
the following output when running: echo | ./busybox awk '$1$1=0'

# My comments are prefixed with a #
# TIP: do a in browser/reader search for '0x7f52929debb0' to highlight
# where the problematic address is used.

fsrealloc: xrealloc(0, 512)
fsrealloc: Fields=0x7f52929df030..0x7f52929df22f
getvar_i: 0.000000
getvar_i: 1.000000
entered awk_getline()
returning from awk_getline(): 1
getvar_i: 0.000000
getvar_i: 0.000000
entered evaluate(op=0x7f52929dff30, res=0x7f5292a77328)
	tmpvars=0x7f52929deb60
opinfo:00000300 opn:00000000
switch(0x3)
NEWSOURCE
opinfo:00000d00 opn:00000000
switch(0xd)
TEST
entered evaluate(op=0x7f52929dd530, res=0x7f5292a772a8)
	tmpvars=0x7f52929debb0		# this is where allocation happens
opinfo:4a031f00 opn:00000000
entered evaluate(op=0x7f52929dd470, res=0x7f52929debb0) # the buffer is passed here
	tmpvars=0x7f52929dd820
opinfo:230f1500 opn:00000000
entered evaluate(op=0x7f52929dffc0, res=0x7f52929dd820)
	tmpvars=0x7f52929dd870
opinfo:05021700 opn:00000000
entered evaluate(op=0x7f52929dd410, res=0x7f52929dd890)
	tmpvars=0x7f52929dd8c0
opinfo:00002700 opn:00000000
switch(0x27)
VAR
returning from evaluate(): res=0x7f52929dd440, tmpvars=0x7f52929dd8c0
switch(0x17)
FIELD
getvar_i: 1.000000
returning from evaluate(): res=0x7f52929df030, tmpvars=0x7f52929dd870
opinfo & OF_RES1: L.v:'0x7f52929df030'
L.s:''
entered evaluate(op=0x7f52929dd4a0, res=0x7f52929dd840)
	tmpvars=0x7f52929dd910
opinfo:05021700 opn:00000000
entered evaluate(op=0x7f52929dd4d0, res=0x7f52929dd930)
	tmpvars=0x7f52929dd960
opinfo:00002700 opn:00000000
switch(0x27)
VAR
returning from evaluate(): res=0x7f52929dd500, tmpvars=0x7f52929dd960
switch(0x17)
FIELD
getvar_i: 1.000000
returning from evaluate(): res=0x7f52929df030, tmpvars=0x7f52929dd910
R.s:''
switch(0x15)
CONCAT /
COMMA: L.s='', sep='', R.s=''
returning from evaluate(): res=0x7f52929debb0, tmpvars=0x7f52929dd820   # address is \
return value here opinfo & OF_RES1: L.v:'0x7f52929debb0'		# L.v is from TEST tmpvars \
here entered evaluate(op=0x7f52929dd560, res=0x7f52929debd0)
	tmpvars=0x7f52929dd820
opinfo:00002700 opn:00000000
switch(0x27)
VAR
returning from evaluate(): res=0x7f52929dd590, tmpvars=0x7f52929dd820
switch(0x1f)
MOVE L.v=0x7f52929debb0, res=0x7f5292a772a8	# MOVE passes the reference here
copyvar: number:0.000000 string:'(null)'	# copyvar here will copy the reference to \
res here returning from evaluate(): res=0x7f52929debb0, tmpvars=0x7f52929debb0 # res \
is now set to same value as tmpvars which is free'd. # evaluate() here returns the \
                value it just free'd.
awk: cmd. line:1: Possible syntax error


So, during TEST tmpvars is allocated. MOVE will do res=tmpvars, and
then is tmpvars free'd and evaluate() return the free'd value.

So MOVE's copyvar should may actually copy it instead of just copying
the reference? If we do that, when and how is it free'd to avoid memory leaks?

Should we implement a refcount of some sort?

I have no idea.

Kind regards,
Natanael Copa


[Attachment #3 (text/x-patch)]

From 7c05b4b8e7212969f33fbd3e10be8b00d2380ea8 Mon Sep 17 00:00:00 2001
From: Natanael Copa <ncopa@alpinelinux.org>
Date: Thu, 16 Jun 2022 16:22:44 +0200
Subject: [PATCH 4/4] WIP: debug awk

---
 editors/awk.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/editors/awk.c b/editors/awk.c
index 8554871e4..c86180185 100644
--- a/editors/awk.c
+++ b/editors/awk.c
@@ -55,7 +55,7 @@
 /* If you comment out one of these below, it will be #defined later
  * to perform debug printfs to stderr: */
 #define debug_printf_walker(...)  do {} while (0)
-#define debug_printf_eval(...)  do {} while (0)
+//#define debug_printf_eval(...)  do {} while (0)
 #define debug_printf_parse(...)  do {} while (0)
 
 #ifndef debug_printf_walker
@@ -2872,11 +2872,10 @@ static var *evaluate(node *op, var *res)
 	if (!op)
 		return setvar_s(res, NULL);
 
-	debug_printf_eval("entered %s()\n", __func__);
-
 	tmpvars = nvalloc(2);
 #define TMPVAR0 (tmpvars)
 #define TMPVAR1 (tmpvars + 1)
+	debug_printf_eval("entered %s(op=%p, res=%p)\n\ttmpvars=%p\n", __func__, op, res, tmpvars);
 
 	while (op) {
 		struct {
@@ -2903,6 +2902,7 @@ static var *evaluate(node *op, var *res)
 			if ((opinfo & OF_REQUIRED) && !op1)
 				syntax_error(EMSG_TOO_FEW_ARGS);
 			L.v = evaluate(op1, TMPVAR0);
+			debug_printf_eval("opinfo & OF_RES1: L.v:'%p'\n", L.v);
 			if (opinfo & OF_STR1) {
 				L.s = getvar_s(L.v);
 				debug_printf_eval("L.s:'%s'\n", L.s);
@@ -3127,7 +3127,7 @@ static var *evaluate(node *op, var *res)
 			break;
 
 		case XC( OC_MOVE ):
-			debug_printf_eval("MOVE\n");
+			debug_printf_eval("MOVE L.v=%p, res=%p\n", L.v, res);
 			/* if source is a temporary string, jusk relink it to dest */
 			if (R.v == TMPVAR1
 			 && !(R.v->type & VF_NUMBER)
@@ -3438,9 +3438,9 @@ static var *evaluate(node *op, var *res)
 			debug_printf_eval("CONCAT /\n");
 		case XC( OC_COMMA ): {
 			const char *sep = "";
-			debug_printf_eval("COMMA\n");
 			if (opinfo == TI_COMMA)
 				sep = getvar_s(intvar[SUBSEP]);
+			debug_printf_eval("COMMA: L.s='%s', sep='%s', R.s='%s'\n", L.s, sep, R.s);
 			setvar_p(res, xasprintf("%s%s%s", L.s, sep, R.s));
 			break;
 		}
@@ -3541,10 +3541,10 @@ static var *evaluate(node *op, var *res)
 	 * we just nvfree'd. If we do, assume that it is a syntax error.
 	 * This has been seen with `awk '$1$1=0'`
 	 */
+	debug_printf_eval("returning from %s(): res=%p, tmpvars=%p\n", __func__, res, tmpvars);
 	if (tmpvars == res)
 		syntax_error(EMSG_POSSIBLE_ERROR);
 
-	debug_printf_eval("returning from %s(): %p\n", __func__, res);
 	return res;
 #undef fnargs
 #undef seed
-- 
2.36.1


[Attachment #4 (text/x-patch)]

From aa22f20ce9ee0f94725cfca3e294b99267e1cf59 Mon Sep 17 00:00:00 2001
From: Natanael Copa <ncopa@alpinelinux.org>
Date: Thu, 16 Jun 2022 22:25:28 +0200
Subject: [PATCH 3/4] awk: add test for setting $1 while testing it

This test passes on mawk and gawk.
---
 testsuite/awk.tests | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/testsuite/awk.tests b/testsuite/awk.tests
index 60a737dcf..3d2e11166 100755
--- a/testsuite/awk.tests
+++ b/testsuite/awk.tests
@@ -485,4 +485,9 @@ testing 'awk use-after-free (CVE-2022-30065)' \
 	"awk: cmd. line:1: Possible syntax error" \
 	""
 
+testing 'awk assign while test' \
+	"awk '\$1==\$1=\"foo\" {print \$1}'" \
+	"foo" \
+	"" \
+	"foo"
 exit $FAILCOUNT
-- 
2.36.1


[Attachment #5 (text/x-patch)]

From 91ce0365e84fc413f5d584ca08d2646f101cdab0 Mon Sep 17 00:00:00 2001
From: Natanael Copa <ncopa@alpinelinux.org>
Date: Thu, 16 Jun 2022 21:54:48 +0200
Subject: [PATCH 2/4] awk: test for CVE-2022-30065

---
 testsuite/awk.tests | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/testsuite/awk.tests b/testsuite/awk.tests
index 93e25d8c1..60a737dcf 100755
--- a/testsuite/awk.tests
+++ b/testsuite/awk.tests
@@ -479,4 +479,10 @@ testing 'awk backslash+newline eaten with no trace' \
 	"Hello world\n" \
 	'' ''
 
+testing 'awk use-after-free (CVE-2022-30065)' \
+	"awk '\$3i\$3in\$9=\$r||\$9=i6/6-9f'" \
+	"" \
+	"awk: cmd. line:1: Possible syntax error" \
+	""
+
 exit $FAILCOUNT
-- 
2.36.1


[Attachment #6 (text/x-patch)]

From 614b5c056097c12c398e8d98c4c727a1b4a214c7 Mon Sep 17 00:00:00 2001
From: Natanael Copa <ncopa@alpinelinux.org>
Date: Tue, 7 Jun 2022 21:53:04 +0200
Subject: [PATCH 1/4] awk: fix use after free (CVE-2022-30065)

fixes https://bugs.busybox.net/show_bug.cgi?id=14781
---
 editors/awk.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/editors/awk.c b/editors/awk.c
index 079d0bde5..8554871e4 100644
--- a/editors/awk.c
+++ b/editors/awk.c
@@ -3537,6 +3537,12 @@ static var *evaluate(node *op, var *res)
 	nvfree(tmpvars, 2);
 #undef TMPVAR0
 #undef TMPVAR1
+	/* We should never end up in a state where we return the pointer that
+	 * we just nvfree'd. If we do, assume that it is a syntax error.
+	 * This has been seen with `awk '$1$1=0'`
+	 */
+	if (tmpvars == res)
+		syntax_error(EMSG_POSSIBLE_ERROR);
 
 	debug_printf_eval("returning from %s(): %p\n", __func__, res);
 	return res;
-- 
2.36.1



_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic