[prev in list] [next in list] [prev in thread] [next in thread] 

List:       busybox
Subject:    [PATCH 1/2] wget: Use HEAD for --spider
From:       Sergey Ponomarev <stokito () gmail ! com>
Date:       2022-05-08 17:13:53
Message-ID: 20220508171354.56073-1-stokito () gmail ! com
[Download RAW message or body]

From: Jake <jake@signedbit.net>

In GNU wget the --spider[1] first issues a HEAD request[2], then if HEAD fails, \
issues a GET request[3]. In BusyBox wget, only a GET request is sent. All webservers \
including BB httpd and uhttpd supports the HEAD. The patch changes GET to HEAD e.g. \
get the file size only without downloading first. This is still not totally \
compatible with GNU wget because it does not retry with GET if HEAD fails. \
Potentially someone may use the --spider to call a GET only API, so they may be \
affected. But this is incorrect usage while others may expect that the spider uses \
HEAD and don't expect a download.

For testing use a CGI script /www/cgi-bin/echo.sh:

#!/bin/sh
CONTENT=$(cat -)
printf "Content-Length: ${#CONTENT}\r\n"
printf "Content-Type: text/html\r\n"
printf "REQUEST_METHOD: $REQUEST_METHOD\r\n"
printf "CONTENT_TYPE: $CONTENT_TYPE\r\n"
printf "CONTENT_LENGTH: $CONTENT_LENGTH\r\n"
printf "\r\n"
printf "$CONTENT"

Then call it:

$ busybox wget -O - -S -q --spider  http://localhost:8080/cgi-bin/echo.sh
HTTP/1.0 200 OK
Content-Length: 0
Content-Type: text/html
REQUEST_METHOD: HEAD
CONTENT_TYPE:
CONTENT_LENGTH:

When both post-data and spider options then gnu wget behaves confusing[4].
It sets Content-Type: application/x-www-form-urlencoded as for post-data but anyway \
sends a HEAD request:

$ wget -O - -S -q --post-data="trololo" --spider  \
http://localhost:8080/cgi-bin/echo.shest.sh HTTP/1.0 200 OK
Content-Length: 7
Content-Type: text/html
REQUEST_METHOD: HEAD
CONTENT_TYPE: application/x-www-form-urlencoded
CONTENT_LENGTH:

Instead, this version will send the request as POST but still skip it's response \
body:

$ busybox wget -O - -S -q --post-data="trololo" --spider  \
http://localhost:8080/cgi-bin/echo.sh HTTP/1.0 200 OK
Content-Length: 7
Content-Type: text/html
REQUEST_METHOD: POST
CONTENT_TYPE: application/x-www-form-urlencoded
CONTENT_LENGTH: 7

This would be useful for heavy API calls but we have to wait what GNU wget author \
will say. We may change this behaviour later.

[1] https://www.gnu.org/software/wget/manual/wget.html#index-spider
[2] https://httpwg.org/specs/rfc7231.html#HEAD
[3] https://git.savannah.gnu.org/cgit/wget.git/tree/src/http.c#n4304
[4] https://savannah.gnu.org/bugs/index.php?56808

function                                             old     new   delta
wget_main                                           2797    2824     +27
.rodata                                            10213   10217      +4
------------------------------------------------------------------------------
(add/remove: 0/0 grow/shrink: 2/0 up/down: 31/0)               Total: 31 bytes
text    data     bss     dec     hex filename
177416    3971    1688  183075   2cb23 busybox_old
177447    3971    1688  183106   2cb42 busybox_unstripped

Signed-off-by: Sergey Ponomarev <stokito@gmail.com>
---
 networking/wget.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/networking/wget.c b/networking/wget.c
index 9ec0e67b9..0006f8807 100644
--- a/networking/wget.c
+++ b/networking/wget.c
@@ -242,6 +242,7 @@ static const char wget_user_headers[] ALIGN1 =
 
 /* Globals */
 struct globals {
+	const char *method;
 	off_t content_len;        /* Content-length of the file */
 	off_t beg_range;          /* Range at which continue begins */
 #if ENABLE_FEATURE_WGET_STATUSBAR
@@ -1220,12 +1221,13 @@ static void download_one_url(const char *url)
 #endif
 		/* Send HTTP request */
 		if (use_proxy) {
-			SENDFMT(sfp, "GET %s://%s/%s HTTP/1.1\r\n",
+			SENDFMT(sfp, "%s %s://%s/%s HTTP/1.1\r\n",
+				G.method,
 				target.protocol, target.host,
 				target.path);
 		} else {
 			SENDFMT(sfp, "%s /%s HTTP/1.1\r\n",
-				(option_mask32 & WGET_OPT_POST) ? "POST" : "GET",
+				G.method,
 				target.path);
 		}
 		if (!USR_HEADER_HOST)
@@ -1582,6 +1584,15 @@ IF_DESKTOP(	"no-parent\0"        No_argument       "\xf0")
 #endif
 	argv += optind;
 
+	if (option_mask32 & WGET_OPT_POST) {
+		G.method = "POST";
+	} else if (option_mask32 & WGET_OPT_SPIDER) {
+		/* Note: GNU wget --spider sends a HEAD and if it failed repeats with a GET */
+		G.method = "HEAD";
+	} else {
+		G.method = "GET";
+	}
+
 #if ENABLE_FEATURE_WGET_LONG_OPTIONS
 	if (headers_llist) {
 		int size = 0;
-- 
2.34.1

_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic