[prev in list] [next in list] [prev in thread] [next in thread]
List: fedora-list
Subject: Re: downloading a complete web page without using a browser...
From: Samuel Sieb <samuel () sieb ! net>
Date: 2021-07-06 6:22:24
Message-ID: b66dfa0b-4ce0-f347-f5b7-8bcc4c689ed4 () sieb ! net
[Download RAW message or body]
On 2021-07-05 10:30 p.m., Thomas Stephen Lee wrote:
> On Mon, Jul 5, 2021 at 12:26 PM Samuel Sieb <samuel@sieb.net> wrote:
>>
>> On 2021-07-03 8:02 p.m., dwoody5654@gmail.com wrote:
>>> the url I am trying to download does not have an extension ie. no '.htm' such
>>> as:
>>> https://my.acbl.org/club-results/details/338288
>>>
>>> wget does not download the correct web page.
>>
>> I tried it and it worked, sort of. The problem is that you want to
>> download everything to view it offline, but the site my.acbl.org has a
>> robots.txt that says "no robots allowed". So wget respects that and
>> will not download any required files from that site other than the
>> initial page. curl probably has the same issue.
>> _______________________________________________
>
> for wget
> https://gist.github.com/u0d7i/87aa962311f2a7c739aa
Ok, that solves it. I was able to download everything and opening the
resulting file in Firefox didn't have any network access. I was able to
see the entire page and even interact with it somewhat.
wget -e robots=off -EHkp https://my.acbl.org/club-results/details/338288
_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic