[prev in list] [next in list] [prev in thread] [next in thread]
List: sas-l
Subject: SAS-L: python scrape web page source and save to local file like view source but programmatically
From: Roger Deangelis <roger_deangelis () COMCAST ! NET>
Date: 2024-04-19 20:27:55
Message-ID: 0800671192192480.WA.rogerdeangeliscomcast.net () listserv ! uga ! edu
[Download RAW message or body]
%let pgm=utl-python-scrape-web-page-source-and-save-to-local-file-like-view-source-but-programatically;
%stop_submission; /* in case you accidentally submit all the code below **/
python scrape web page source and save to local file like view source but \
programatically
The purpose of this repository is to minimize hitting a web pages over and over \
durring development. Download page and parse on the desktop.
FYI: There is free simple web page for some testing at 'example.com'
Related repos on end
github
https://tinyurl.com/3z9h962s
https://github.com/rogerjdeangelis/utl-python-scrape-web-page-source-and-save-to-local-file-like-view-source-but-programatically
845 San Franciso Companies in the Chaber of Commerce
https://tinyurl.com/22rp7k5h
https://raw.githubusercontent.com/rogerjdeangelis/utl-python-scrape-web-page-source-an \
d-save-to-local-file-like-view-source-but-programatically/main/giv_sfCompanyList.csv
/* _ _
_ __ _ __ ___ | |__ | | ___ _ __ ___
> `_ \| `__/ _ \| `_ \| |/ _ \ `_ ` _ \
> > _) | | | (_) | |_) | | __/ | | | | |
> .__/|_| \___/|_.__/|_|\___|_| |_| |_|
> _|
*/
/**************************************************************************************************************************/
/* \
*/ /* Problem List Companies in San Franciso Chamber of Commerce Site \
*/ /* \
*/ /* BUSINESS COMPANY STREET CITY \
STATE ZIP PHONE */ /* \
*/ /* 1 QTV RENTAL 222 ALAMEDA DE LAS PULGAS \
REDWOOD CITY CA 94062 3104350278 */ /* 2 Q & A CONSULTING \
LLC ONE SANSOME ST FL 35 PMB 6023 SAN FRANCISCO CA 94104 4155217552 \
*/ /* 3 QATAR AIRWAYS 350 5TH AVE., SUITE 7630 NEW \
YORK NY 10118 3109200722 */ /* 4 QUEEN ANNE HOTEL \
1590 SUTTER STREET SAN FRANCISCO CA 94109 4154412828 */ /* \
*/ /**************************************************************************************************************************/
You may need to install these for problematic dites (does not fix 400 type errors)
https://slproweb.com/products/Win32OpenSSL.html
I installed the light version
OpenSSL 1.1.1q 5 Jul 2022
You should also use the safer python context option
ssl._create_default_https_context = ssl._create_unverified_context
SOAPBOX ON
There are hundreds of sites with company data, some require parsing pdfs.
For problem sites you should be able to view source, select all cntl-c and
then hit a sas function key and have sas read the clipbord and
save
Maybe you do not need a commercial package at $1,000 a month to get
some company data?
Wiki
Chambers of Commerce
Secretay of States
Dept of Trans
Better Business Beaurea
Dept od Commerce
...
google 'list of US companies'
These requre authorization but you can cut and pase
https://www.sec.gov/edgar/searchedgar/companysearch
https://www.sec.gov/Archives/edgar/data/1048477/000195917324002921/xsl144X01/primary_doc.xml
Yelp
ZoomInfo
Yellow Pages
SOAPBOX OFF
/* _
(_)_ __ _ __ _ _| |_
> > `_ \| `_ \| | | | __|
> > > > > > _) | |_| | |_
> _|_| |_| .__/ \__,_|\__|
|_|
*/
/**************************************************************************************************************************/
/* \
*/ /* This creates the list \
*/ /* \
*/ /* %let lettersq= \
*/ /* "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"; \
*/ /* \
*/ /* data _null_; \
*/ /* do ltr=&lettersq; \
*/ /* url= cats("url=","'https://business.sfchamber.com/list/searchalpha/",ltr,"'"); \
*/ /* put url; \
*/ /* file_path = cats('file_path ="d:/giv/htm/giv_sfchamber',ltr,'.html"'); \
*/ /* put file_path; \
*/ /* put 'save_html_to_file(url, file_path)' /; \
*/ /* end; \
*/ /* run;quit; \
*/ /* \
*/ /* It also looks like there are many pages out ther that use the \
*/ /* ticker symol to separate pages or sections of pages. \
*/ /* \
*/ /* San Framcisco Chamber of Commerce Pages with lists of Companes by first letter \
of company name */ /* \
*/ /* https://business.sfchamber.com/list/searchalpha/a \
*/ /* https://business.sfchamber.com/list/searchalpha/b \
*/ /* https://business.sfchamber.com/list/searchalpha/c \
*/ /* https://business.sfchamber.com/list/searchalpha/d \
*/ /* https://business.sfchamber.com/list/searchalpha/e \
*/ /* https://business.sfchamber.com/list/searchalpha/f \
*/ /* https://business.sfchamber.com/list/searchalpha/g \
*/ /* https://business.sfchamber.com/list/searchalpha/h \
*/ /* https://business.sfchamber.com/list/searchalpha/i \
*/ /* https://business.sfchamber.com/list/searchalpha/j \
*/ /* https://business.sfchamber.com/list/searchalpha/k \
*/ /* https://business.sfchamber.com/list/searchalpha/l \
*/ /* https://business.sfchamber.com/list/searchalpha/m \
*/ /* https://business.sfchamber.com/list/searchalpha/n \
*/ /* https://business.sfchamber.com/list/searchalpha/o \
*/ /* https://business.sfchamber.com/list/searchalpha/p \
*/ /* https://business.sfchamber.com/list/searchalpha/q \
*/ /* https://business.sfchamber.com/list/searchalpha/r \
*/ /* https://business.sfchamber.com/list/searchalpha/s \
*/ /* https://business.sfchamber.com/list/searchalpha/t \
*/ /* https://business.sfchamber.com/list/searchalpha/u \
*/ /* https://business.sfchamber.com/list/searchalpha/v \
*/ /* https://business.sfchamber.com/list/searchalpha/w \
*/ /* https://business.sfchamber.com/list/searchalpha/x \
*/ /* https://business.sfchamber.com/list/searchalpha/y \
*/ /* https://business.sfchamber.com/list/searchalpha/z \
*/ /* \
*/ /**************************************************************************************************************************/
/*
_ __ _ __ ___ ___ ___ ___ ___
> `_ \| `__/ _ \ / __/ _ \/ __/ __|
> > _) | | | (_) | (_| __/\__ \__ \
> .__/|_| \___/ \___\___||___/___/
> _|
*/
* copy each page hrml source to a local file on your desktop;
* example for companies that begin with q;
%utl_pybegin;
parmcards4;
import urllib.request
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
def save_html_to_file(url, file_path):
try:
with urllib.request.urlopen(url) as response:
html_content = response.read()
with open(file_path, 'wb') as file:
file.write(html_content)
print(f"HTML source saved to: {file_path}")
except urllib.error.URLError as e:
print(f"Error occurred: {e}")
url='https://business.sfchamber.com/list/searchalpha/a'
file_path ="d:/giv/htm/giv_sfchambera.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/b'
file_path ="d:/giv/htm/giv_sfchamberb.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/c'
file_path ="d:/giv/htm/giv_sfchamberc.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/d'
file_path ="d:/giv/htm/giv_sfchamberd.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/e'
file_path ="d:/giv/htm/giv_sfchambere.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/f'
file_path ="d:/giv/htm/giv_sfchamberf.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/g'
file_path ="d:/giv/htm/giv_sfchamberg.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/h'
file_path ="d:/giv/htm/giv_sfchamberh.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/i'
file_path ="d:/giv/htm/giv_sfchamberi.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/j'
file_path ="d:/giv/htm/giv_sfchamberj.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/k'
file_path ="d:/giv/htm/giv_sfchamberk.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/l'
file_path ="d:/giv/htm/giv_sfchamberl.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/m'
file_path ="d:/giv/htm/giv_sfchamberm.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/n'
file_path ="d:/giv/htm/giv_sfchambern.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/o'
file_path ="d:/giv/htm/giv_sfchambero.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/p'
file_path ="d:/giv/htm/giv_sfchamberp.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/q'
file_path ="d:/giv/htm/giv_sfchamberq.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/r'
file_path ="d:/giv/htm/giv_sfchamberr.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/s'
file_path ="d:/giv/htm/giv_sfchambers.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/t'
file_path ="d:/giv/htm/giv_sfchambert.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/u'
file_path ="d:/giv/htm/giv_sfchamberu.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/v'
file_path ="d:/giv/htm/giv_sfchamberv.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/w'
file_path ="d:/giv/htm/giv_sfchamberw.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/x'
file_path ="d:/giv/htm/giv_sfchamberx.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/y'
file_path ="d:/giv/htm/giv_sfchambery.html"
save_html_to_file(url, file_path)
url='https://business.sfchamber.com/list/searchalpha/z'
file_path ="d:/giv/htm/giv_sfchamberz.html"
save_html_to_file(url, file_path)
;;;;
%utl_pyend;
/* _ _ _ _
> > _____ _ _ | |__ | |_ _ __ ___ | | | |_ __ _ __ _ ___
> > / / _ \ | | | | `_ \| __| `_ ` _ \| | | __/ _` |/ _` / __|
> < __/ |_| | | | | | |_| | | | | | | | || (_| | (_| \__ \
> _|\_\___|\__, | |_| |_|\__|_| |_| |_|_| \__\__,_|\__, |___/
|___/ |___/
*/
/**************************************************************************************************************************/
/* \
*/ /* d:/giv/htm/giv_sfchamberq.html (There will be 26 of these) \
*/ /* \
*/ /* <A HREF="HTTPS://BUSINESS.SFCHAMBER.COM/LIST/MEMBER/QTV-RENTAL-115985" ALT="QTV \
RENTAL">QTV RENTAL</A> */ /* <SPAN CLASS="GZ-STREET-ADDRESS" \
ITEMPROP="STREETADDRESS">222 ALAMEDA DE LAS PULGAS</SPAN> \
*/ /* <SPAN CLASS="GZ-ADDRESS-CITY">REDWOOD CITY</SPAN> \
*/ /* <SPAN>94062</SPAN> \
*/ /* <A HREF="TEL:3104350278" CLASS="CARD-LINK"><I CLASS="GZ-FAL \
GZ-FA-PHONE"></I><SPAN>(310) 435-0278</SPAN></A> */ /* \
*/ /* <A HREF="HTTPS://BUSINESS.SFCHAMBER.COM/LIST/MEMBER/Q-A-CONSULTING-LLC-SAN-FRANCISCO-115710" \
ALT="QCONSUL">QCONSUL</A */ /* <SPAN CLASS="GZ-STREET-ADDRESS" \
ITEMPROP="STREETADDRESS">ONE SANSOME ST FL 35 PMB 6023</SPAN> \
*/ /* <SPAN CLASS="GZ-ADDRESS-CITY">SAN FRANCISCO</SPAN> \
*/ /* <SPAN>94104</SPAN> \
*/ /* <A HREF="TEL:4155217552" CLASS="CARD-LINK"><I CLASS="GZ-FAL \
GZ-FA-PHONE"></I><SPAN>(415) 521-7552</SPAN></A> */ /* \
*/ /* <A HREF="HTTPS://BUSINESS.SFCHAMBER.COM/LIST/MEMBER/QATAR-AIRWAYS-104422" \
ALT="QATAR AIRWAYS">QATAR AIRWAYS</A> */ /* <SPAN CLASS="GZ-STREET-ADDRESS" \
ITEMPROP="STREETADDRESS">350 5TH AVE., SUITE 7630</SPAN> \
*/ /* <SPAN CLASS="GZ-ADDRESS-CITY">NEW YORK</SPAN> \
*/ /* <A HREF="TEL:3109200722" CLASS="CARD-LINK"><I CLASS="GZ-FAL \
GZ-FA-PHONE"></I><SPAN>(310) 920-0722</SPAN></A> */ /* \
*/ /* <A HREF="HTTPS://BUSINESS.SFCHAMBER.COM/LIST/MEMBER/QUEEN-ANNE-HOTEL-414" \
ALT="QUEEN ANNE HOTEL">QUEEN ANNE HOTEL</A> */ /* <SPAN CLASS="GZ-STREET-ADDRESS" \
ITEMPROP="STREETADDRESS">1590 SUTTER STREET</SPAN> \
*/ /* <SPAN CLASS="GZ-ADDRESS-CITY">SAN FRANCISCO</SPAN> \
*/ /* <SPAN>94109</SPAN> \
*/ /* <A HREF="TEL:4154412828" CLASS="CARD-LINK"><I CLASS="GZ-FAL \
GZ-FA-PHONE"></I><SPAN>(415) 441-2828</SPAN></A> */ /* \
*/ /**************************************************************************************************************************/
/*
_ __ __ _ _ __ ___ ___
> `_ \ / _` | `__/ __|/ _ \
> > _) | (_| | | \__ \ __/
> .__/ \__,_|_| |___/\___|
> _|
*/
* convert html source to sas table;
%macro sfalpha(ltr);
data giv_alpha_<r;
retain business 0 col;
infile "d:/giv/htm/giv_sfchamber<r..html";
input;
lyn=strip(upcase(left(_infile_)));
select;
when ( lyn =: cats('<A \
HREF="HTTPS://BUSINESS.SFCHAMBER.COM/LIST/MEMBER/',"<r") and \
substr(lyn,length(lyn)-3)='</A>') do; business=business + 1;
col='COMPANY';
output;
end;
when ( index(lyn,'GZ-STREET-ADDRESS') > 0 and index(lyn,'ITEMPROP') > 0 ) \
do;col='STREET';output;end; /* street */ when ( index(lyn,'CITYSTATEZIP') > 0 ) \
do;
input;lyn=upcase(left(_infile_)); do;col='CITY'; output;end; /* city \
*/
input;lyn=upcase(left(_infile_)); do;col='STATE'; output;end; /* state \
*/
input;lyn=upcase(left(_infile_)); do;col='ZIP'; output;end; /* zip \
*/ end;
when ( index(lyn,'GZ-FA-PHONE')>0) do;col='PHONE'; output;end; /* phone \
*/ otherwise delete;
end;
run;quit;
%mend sfalpha;
options nonotes;
%sfalpha(A);
%sfalpha(B);
%sfalpha(C);
%sfalpha(D);
%sfalpha(E);
%sfalpha(F);
%sfalpha(G);
%sfalpha(H);
%sfalpha(I);
%sfalpha(J);
%sfalpha(K);
%sfalpha(L);
%sfalpha(M);
%sfalpha(N);
%sfalpha(O);
%sfalpha(P);
%sfalpha(Q);
%sfalpha(R);
%sfalpha(S);
%sfalpha(T);
%sfalpha(U);
%sfalpha(V);
%sfalpha(W);
%sfalpha(X);
%sfalpha(Y);
%sfalpha(Z);
options notes;
*/ /**************************************************************************************************************************/
*/ /* \
*/
*/ /* GIV_PASS1 total obs=24 \
*/
*/ /* \
*/
*/ /* BUSINESS COL LYN \
*/
*/ /* \
*/
*/ /* 1 COMPANY <A \
HREF="HTTPS://BUSINESS.SFCHAMBER.COM/LIST/MEMBER/QTV-RENTAL-115985" ALT="QTV \
RENTAL">... */
*/ /* 1 STREET <SPAN CLASS="GZ-STREET-ADDRESS" \
ITEMPROP="STREETADDRESS">222 ALAMEDA DE LAS PULGAS</SPAN... */
*/ /* 1 CITY <SPAN CLASS="GZ-ADDRESS-CITY">REDWOOD CITY</SPAN> \
*/
*/ /* 1 STATE <SPAN>CA</SPAN>... \
*/
*/ /* 1 ZIP <SPAN>94062</SPAN>... \
*/
*/ /* 1 PHONE <A HREF="TEL:3104350278" CLASS="CARD-LINK"><I \
CLASS="GZ-FAL GZ-FA-PHONE"></I><SPAN>(310)... */
*/ /* 2 COMPANY <A \
HREF="HTTPS://BUSINESS.SFCHAMBER.COM/LIST/MEMBER/Q-A-CONSULTING-LLC-SAN-FRANCISCO-115... \
*/
*/ /* 2 STREET <SPAN CLASS="GZ-STREET-ADDRESS" \
ITEMPROP="STREETADDRESS">ONE SANSOME ST FL 35 PMB 6023</... */
*/ /* 2 CITY <SPAN CLASS="GZ-ADDRESS-CITY">SAN FRANCISCO</SPAN> \
*/
*/ /* 2 STATE <SPAN>CA</SPAN> \
*/
*/ /* 2 ZIP <SPAN>94104</SPAN> \
*/
*/ /* 2 PHONE <A HREF="TEL:4155217552" CLASS="CARD-LINK"><I \
CLASS="GZ-FAL GZ-FA-PHONE"></I><SPAN>(415)... */
*/ /* 3 COMPANY <A \
HREF="HTTPS://BUSINESS.SFCHAMBER.COM/LIST/MEMBER/QATAR-AIRWAYS-104422" ALT="QATAR \
AIR... */
*/ /* 3 STREET <SPAN CLASS="GZ-STREET-ADDRESS" \
ITEMPROP="STREETADDRESS">350 5TH AVE., SUITE 7630</SPAN>... */
*/ /* 3 CITY <SPAN CLASS="GZ-ADDRESS-CITY">NEW YORK</SPAN> \
*/
*/ /* 3 STATE <SPAN>NY</SPAN> \
*/
*/ /* 3 ZIP <SPAN>10118</SPAN> \
*/
*/ /* 3 PHONE <A HREF="TEL:3109200722" CLASS="CARD-LINK"><I \
CLASS="GZ-FAL GZ-FA-PHONE"></I><SPAN>(310)... */
*/ /* 4 COMPANY <A \
HREF="HTTPS://BUSINESS.SFCHAMBER.COM/LIST/MEMBER/QUEEN-ANNE-HOTEL-414" ALT="QUEEN \
ANN... */
*/ /* 4 STREET <SPAN CLASS="GZ-STREET-ADDRESS" \
ITEMPROP="STREETADDRESS">1590 SUTTER STREET</SPAN> */
*/ /* 4 CITY <SPAN CLASS="GZ-ADDRESS-CITY">SAN FRANCISCO</SPAN> \
*/
*/ /* 4 STATE <SPAN>CA</SPAN> \
*/
*/ /* 4 ZIP <SPAN>94109</SPAN> \
*/
*/ /* 4 PHONE <A HREF="TEL:4154412828" CLASS="CARD-LINK"><I \
CLASS="GZ-FAL GZ-FA-PHONE"></I><SPAN>(415)... */
*/ /* \
*/
*/ /**************************************************************************************************************************/
/* _ _ _
__| | ___ _ __ ___ _ __ _ __ ___ __ _| (_)_______
/ _` |/ _ \ `_ \ / _ \| `__| `_ ` _ \ / _` | | |_ / _ \
> (_| | __/ | | | (_) | | | | | | | | (_| | | |/ / __/
\__,_|\___|_| |_|\___/|_| |_| |_| |_|\__,_|_|_/___\___|
*/
data giv_prexpo;
length
var $32
val $200
company $200;
retain company "";
set giv_alpha_:;
select (col);
when ("COMPANY") do; var='COMPANY'; val= \
scan(substr(lyn,index(lyn,"ALT=")),2,'"');company=val; end; when ("STREET" ) do; \
var='STREET '; val= scan(lyn,2,'<>'); end; when ("CITY" ) do; var='CITY '; val= \
scan(lyn,2,'<>'); end; when ("STATE" ) do; var='STATE '; val= scan(lyn,2,'<>'); \
end; when ("ZIP" ) do; var='ZIP '; val= scan(lyn,2,'<>'); end;
when ("PHONE" ) do; var='PHONE '; val= scan(lyn,3,':"'); end;
otherwise;
end;
drop lyn;
run;quit;
proc sort data=giv_prexpo out=giv_prexposrt equals nodupkey;
by company col;
run;quit;
/*---- ----*/
/*---- pivot long to fat ----*/
/*---- ----*/
proc transpose data=giv_prexposrt out=giv_sfCompanyList(drop=_name_ business);
by business company notsorted;
id var;
var val;
run;
/* _ _
___ _ _| |_ _ __ _ _| |_
/ _ \| | | | __| `_ \| | | | __|
> (_) | |_| | |_| |_) | |_| | |_
\___/ \__,_|\__| .__/ \__,_|\__|
|_|
*/
/**************************************************************************************************************************/
/* \
*/ /* WORK.giv_prexpofin (all fields are not filled out in the San Francisco Chamber \
of Commerce */ /* \
*/ /* COMPANY STREET CITY \
PHONE STATE ZIP */ /* \
*/ /* QTV RENTAL 222 ALAMEDA DE LAS PULGAS REDWOOD CITY \
3104350278 CA 94062 */ /* Q & A CONSULTING LLC ONE SANSOME \
ST FL 35 PMB 6023 SAN FRANCISCO 4155217552 CA 94104 */ /* \
QATAR AIRWAYS 350 5TH AVE., SUITE 7630 NEW YORK \
3109200722 NY 10118 */ /* QUEEN ANNE HOTEL 1590 SUTTER \
STREET SAN FRANCISCO 4154412828 CA 94109 */ /* \
*/ /* \
*/ /* A BIGGER ROOM LLC 612 MASONIC AVENUE SAN FRANCISCO \
4154843548 CA 94117 */ /* A. MACIEL PRINTING 50 MENDELL \
ST., UNIT 5 SAN FRANCISCO 4156483553 CA 94124 */ /* \
A.M. JANITORIAL DALY CITY \
6503605116 CA 94014 */ /* A.S.F. ELECTRIC INC. 76 HILL ST. \
DALY CITY 6507559032 CA 94014 */ /* ABC7 \
900 FRONT ST. SAN FRANCISCO 4159547777 CA 94111 \
*/ /* ACADIA REALTY TRUST 411 THEODORE FREMD AVE., STE. 300 RYE \
9142883379 NY 10580 */ /* ACCENTURE 415 MISSION \
ST., STE. 3300 SAN FRANCISCO 4155375000 CA 94105 */ /* \
ACE MAILING CORPORATION 2736 16TH ST. SAN FRANCISCO \
4158634223 CA 94103 */ /* ADVANCED ETIQUETTE \
SAN ANSELMO 4153463665 CA */ /* AECOM HUNT \
300 CALIFORNIA ST., 6TH FLR. SAN FRANCISCO 4153913930 CA 94104 \
*/ /* \
*/ /* \
*/ /**************************************************************************************************************************/
/*---- ----*/
/*---- create comma delimited file ----*/
/*---- ----*/
dm 'dexport work.giv_sfCompanyList "d:/giv/csv/giv_sfCompanyList.csv"';
dm 'dimport "d:/giv/csv/giv_sfCompanyList.csv" work.check';
/*
_ __ ___ _ __ ___ ___
> `__/ _ \ `_ \ / _ \/ __|
> > > __/ |_) | (_) \__ \
> _| \___| .__/ \___/|___/
|_|
*/
REPO
----------------------------------------------------------------------------------------------------------------------------
https://github.com/rogerjdeangelis/utl-locating-the-html-node-and-web-scrape-the-html-table
https://github.com/rogerjdeangelis/utl-web-scaping-Loop-over-many-URLs-scrape-parse-extract-nodes-and-put-into-a-sas-table
https://github.com/rogerjdeangelis/utl_scrape_javascript_converting_table_to_sas_dataset_no_browser
https://github.com/rogerjdeangelis/utl_web_scrape_23_separate_pages_one_per_state_for_all_whole_food_store_addresses
https://github.com/rogerjdeangelis/utl_digitize_data_from_image
https://github.com/rogerjdeangelis/utl-scraping-server-screens-when-Copy-Print-PageSource-are-disabled-python-tesseract
With just one sinple command line python 'scr1' read the screen convert to text and \
save in d:/txt/scr1.txt Opent next scree and type scr2 and next screen saveds
/* _
___ _ __ __| |
/ _ \ `_ \ / _` |
> __/ | | | (_| |
\___|_| |_|\__,_|
*/
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic