[prev in list] [next in list] [prev in thread] [next in thread]
List: busybox
Subject: [PATCH 1/2] wget: Use HEAD for --spider
From: Sergey Ponomarev <stokito () gmail ! com>
Date: 2022-05-08 17:13:53
Message-ID: 20220508171354.56073-1-stokito () gmail ! com
[Download RAW message or body]
From: Jake <jake@signedbit.net>
In GNU wget the --spider[1] first issues a HEAD request[2], then if HEAD fails, \
issues a GET request[3]. In BusyBox wget, only a GET request is sent. All webservers \
including BB httpd and uhttpd supports the HEAD. The patch changes GET to HEAD e.g. \
get the file size only without downloading first. This is still not totally \
compatible with GNU wget because it does not retry with GET if HEAD fails. \
Potentially someone may use the --spider to call a GET only API, so they may be \
affected. But this is incorrect usage while others may expect that the spider uses \
HEAD and don't expect a download.
For testing use a CGI script /www/cgi-bin/echo.sh:
#!/bin/sh
CONTENT=$(cat -)
printf "Content-Length: ${#CONTENT}\r\n"
printf "Content-Type: text/html\r\n"
printf "REQUEST_METHOD: $REQUEST_METHOD\r\n"
printf "CONTENT_TYPE: $CONTENT_TYPE\r\n"
printf "CONTENT_LENGTH: $CONTENT_LENGTH\r\n"
printf "\r\n"
printf "$CONTENT"
Then call it:
$ busybox wget -O - -S -q --spider http://localhost:8080/cgi-bin/echo.sh
HTTP/1.0 200 OK
Content-Length: 0
Content-Type: text/html
REQUEST_METHOD: HEAD
CONTENT_TYPE:
CONTENT_LENGTH:
When both post-data and spider options then gnu wget behaves confusing[4].
It sets Content-Type: application/x-www-form-urlencoded as for post-data but anyway \
sends a HEAD request:
$ wget -O - -S -q --post-data="trololo" --spider \
http://localhost:8080/cgi-bin/echo.shest.sh HTTP/1.0 200 OK
Content-Length: 7
Content-Type: text/html
REQUEST_METHOD: HEAD
CONTENT_TYPE: application/x-www-form-urlencoded
CONTENT_LENGTH:
Instead, this version will send the request as POST but still skip it's response \
body:
$ busybox wget -O - -S -q --post-data="trololo" --spider \
http://localhost:8080/cgi-bin/echo.sh HTTP/1.0 200 OK
Content-Length: 7
Content-Type: text/html
REQUEST_METHOD: POST
CONTENT_TYPE: application/x-www-form-urlencoded
CONTENT_LENGTH: 7
This would be useful for heavy API calls but we have to wait what GNU wget author \
will say. We may change this behaviour later.
[1] https://www.gnu.org/software/wget/manual/wget.html#index-spider
[2] https://httpwg.org/specs/rfc7231.html#HEAD
[3] https://git.savannah.gnu.org/cgit/wget.git/tree/src/http.c#n4304
[4] https://savannah.gnu.org/bugs/index.php?56808
function old new delta
wget_main 2797 2824 +27
.rodata 10213 10217 +4
------------------------------------------------------------------------------
(add/remove: 0/0 grow/shrink: 2/0 up/down: 31/0) Total: 31 bytes
text data bss dec hex filename
177416 3971 1688 183075 2cb23 busybox_old
177447 3971 1688 183106 2cb42 busybox_unstripped
Signed-off-by: Sergey Ponomarev <stokito@gmail.com>
---
networking/wget.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/networking/wget.c b/networking/wget.c
index 9ec0e67b9..0006f8807 100644
--- a/networking/wget.c
+++ b/networking/wget.c
@@ -242,6 +242,7 @@ static const char wget_user_headers[] ALIGN1 =
/* Globals */
struct globals {
+ const char *method;
off_t content_len; /* Content-length of the file */
off_t beg_range; /* Range at which continue begins */
#if ENABLE_FEATURE_WGET_STATUSBAR
@@ -1220,12 +1221,13 @@ static void download_one_url(const char *url)
#endif
/* Send HTTP request */
if (use_proxy) {
- SENDFMT(sfp, "GET %s://%s/%s HTTP/1.1\r\n",
+ SENDFMT(sfp, "%s %s://%s/%s HTTP/1.1\r\n",
+ G.method,
target.protocol, target.host,
target.path);
} else {
SENDFMT(sfp, "%s /%s HTTP/1.1\r\n",
- (option_mask32 & WGET_OPT_POST) ? "POST" : "GET",
+ G.method,
target.path);
}
if (!USR_HEADER_HOST)
@@ -1582,6 +1584,15 @@ IF_DESKTOP( "no-parent\0" No_argument "\xf0")
#endif
argv += optind;
+ if (option_mask32 & WGET_OPT_POST) {
+ G.method = "POST";
+ } else if (option_mask32 & WGET_OPT_SPIDER) {
+ /* Note: GNU wget --spider sends a HEAD and if it failed repeats with a GET */
+ G.method = "HEAD";
+ } else {
+ G.method = "GET";
+ }
+
#if ENABLE_FEATURE_WGET_LONG_OPTIONS
if (headers_llist) {
int size = 0;
--
2.34.1
_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic