[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-list
Subject:    Re: Python text file fetch specific part of line
From:       cs () zip ! com ! au
Date:       2016-07-30 2:46:55
Message-ID: 20160730024655.GA1902 () cskk ! homeip ! net
[Download RAW message or body]

On 29Jul2016 18:42, Gordon Levi <gordon@address.invalid> wrote:
>cs@zip.com.au wrote:
>
>>On 28Jul2016 19:28, Gordon Levi <gordon@address.invalid> wrote:
>>>Arshpreet Singh <arsh840@gmail.com> wrote:
>>>>I am writing Imdb scrapper, and getting available list of titles from IMDB
>>>>website which provide txt file in very raw format, Here is the one part of
>>>>file(http://pastebin.com/fpMgBAjc) as the file provides tags like
>>>>Distribution  Votes,Rank,Title I want to parse title names, I tried with
>>>>readlines() method but it returns only list which is quite heterogeneous, is
>>>>it possible that I can parse each value comes under title section?
>>>
>>>Beautiful Soup will make your task much easier
>>><https://www.crummy.com/software/BeautifulSoup/>.
>>
>>Did you look at his sample data?
>
>No. I read he was "writing an IMDB scraper, and getting the available
>list of titles from the IMDB web site". It's here
><http://www.imdb.com/>.
>
>> Plain text, not HTML or XML. Beautiful Soup is
>>not what he needs here.
>
>Fortunately the OP told us his application rather than just telling us
>his current problem. His life would be much easier if he ignored the
>plain text he has obtained so far and started again using a Beautiful
>Soup tutorial.

Or bypass IMDB's computer unfriendliness and go straight to http://omdbapi.com/

You can have JSON directly from it, and avoid BS entirely. BS is an amazing 
library, but is essentially a workaround for computer-hostile websites: those 
not providing clean machine readable data, and only unstable mutable HTML 
output.

Cheers,
Cameron Simpson <cs@zip.com.au>
-- 
https://mail.python.org/mailman/listinfo/python-list
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic