[prev in list] [next in list] [prev in thread] [next in thread] 

List:       quanta
Subject:    Re: [Quanta] crashes when VPL is selected (3.4.0)
From:       Paulo Moura Guedes <moura () kdewebdev ! org>
Date:       2005-03-21 18:44:22
Message-ID: 200503211844.22223.moura () kdewebdev ! org
[Download RAW message or body]

Well, I can't reproduce the crash with Quanta and kdelibs from CVS and the 
rest of KDE from 3.3 series. It's a little difficult to me to address this 
problem...
Can someone test the attached file on VPL?

On Monday 21 March 2005 18:34, Gour wrote:
> Paulo Moura Guedes (moura@kdewebdev.org) wrote:
> > On every file?
>
> On practically every one on the site I'm working at, except the most
> simple page.
>
> > If not, send me the one please.
>
> In the attachment here is one - not from the site.
>
> Sincerely,
> Gour

-- 
Paulo Moura Guedes

Linux Caixa Mágica  - http://caixamagica.org
KDE Web Development - http://kdewebdev.org

["htdig.html" (text/html)]

<html><head><title>Installing and configuring the ht://Dig search engine</title>

<link rel=stylesheet type="text/css" href="../scrounge.css">

</head>

<body bgcolor="#ffffff">
<center>

<table border=0 width=60% cellspacing=0 cellpadding=0>
<tr><td valign="center" bgcolor="#cc3300">
<img src="../scrounge3r.gif" alt="scrounge.org"><br>
</td></tr></table>

<h2>Installing and Configuring the ht://Dig Search Engine</h2></center>

<p>ht://Dig is an excellent search engine to install on your web server.  <a \
href="../search.html">Try it out!</a>  See the <a \
href="http://www.htdig.org/require.html">Features and Requirements</a> page for more \
information.  Check the <a href="http://www.htdig.org/">ht://Dig home page</a> for \
the latest news and updates.  I'm going to cover some additional installation and \
configuration hints.

<h3>Getting it going</h3>
<ul>
<li><b><a href="#quickstart">Quick Start (for the intrepid)</a></b>

<p><li><b><a href="#longform">Installation (Long form)</a></b>
<ul>
<li><a href="#installrpm">Installing the RPM</a>
<li><a href="#installtarball">Installing the tarball</a>
<ul>
<li><a href="#configapache">Configuring Apache (tarball only)</a>
</ul>
<li><a href="#whereis">Where everything is</a>
<li><a href="#configconf">Configuring the htdig.conf file</a>
<li><a href="#digging">Generating the search index</a>
<li><a href="#searching">Doing a search. Finally.</a>
<li><a href="#troubleshooting">Troubleshooting</a>
</ul>
</ul>

<h3>Tips and Techniques</h3>
<ul>
<li><a href="#customizesearch">Customizing the search results</a>
<li><a href="#date">Making the date display all four digits of the year in search \
results</a> <li><a href="#rundig2">An alternate rundig script</a>
<li><a href="#pdf">Indexing PDF files</a>
<li><a href="#doc">Indexing Microsoft Word files</a>
<li><a href="#logging">Logging search requests</a>

</ul>

<p>Please report any <a href="mailto:wayne@scrounge.org">errors or ommissions to \
me</a>.  Suggestions are welcome too.  Thank you.

<p><hr>

<h1>Getting it going</h1>

<a name="quickstart"></a>
<h3>Quick Start (for the intrepid)</h3>

<p>If you are using Red Hat or Mandrake Linux and you are reasonably familiar with \
using Apache, you might get by by following these Quick Start instructions.  \
Otherwise, use the <a href="#longform">complete instructions</a>.

<ul>
<li>(As root) get and install the RPM.  (Full information <a \
href="#installrpm">here</a>.  Note the vixie-cron issue for Red Hat 5.0-5.1.)

<p><li>Edit <tt>/etc/htdig/htdig.conf</tt> and check to see that <tt><a \
href="http://www.htdig.org/attrs.html#start_url">start_url:</a></tt> correctly points \
what you want to index on your server.  Watch out because the RPM installer adds a \
<em>second</em> <tt>start_url:</tt> definition at the end of the file.

<p><li>Type <tt>rundig -v</tt> to create the search index database.  You \
<em>should</em> see indications that it is indexing each file.  If not and it appears \
to be "hanging," abort with Ctrl-C and check your configuration.

<p><li>You should now be able to search by accessing <tt>search.html</tt>, which is \
installed in <tt>/home/httpd/html</tt>.  <a \
href="http://www.htdig.org/hts_method.html">How searching works</a>.

<p><li>It worked?  Good.  Now look through the rest of this document to learn more \
about configuring ht://Dig.  If it <em>didn't</em> work, then well, the same advice \
applies:  look through the rest of this document.

</ul>

<p>Note that the RPM installer created a cron job in <tt>/etc/cron.daily</tt> that \
will run <tt>/usr/sbin/rundig</tt> once a day so that the search index will \
automatically be updated once a day.

<p>But you still should look over the rest of this documentation.

<p>&nbsp;


<a name="longform"></a>
<h3>Installation (Long form)</h3>

<p>Before you start, you should look over the <a \
href="http://www.htdig.org/require.html">Features and Requirements</a> page. Ht://Dig \
is available in source "tarball" and Red Hat style RPM distributions.  The RPM \
distribution is much easier to install, but the tarball gives you more flexibility in \
specifying the locations where everything will be installed.  Your choice. This \
document is going to cover installing both the htdig 3.1.5.tar.gz "tarball" and the \
RPM file. The <a href="http://www.htdig.org/where.html">Where to get it</a> page is \
the best place to get the most recent version of ht://Dig.  

<!-- <p>The ht://Dig installation instructions are excellent.  Follow them after \
reading my comments.  -->

<a name="installrpm"></a>
<h3>Installing the RPM</h3>

<p>Mandrake 7.2 has ht://Dig on the install CD and might already be installed on your \
system.  Red Hat 7.0 has it on the "Power Tools" CD.  You can get other RPM \
distributions <a href="http://www.htdig.org/files/binaries/">from here</a>.  (Or <a \
href="http://www.scrc.umanitoba.ca/htdig/rpms/">from here</a>.) Download <em>one</em> \
of these:  <ul>
<p>htdig-3.1.5-0.i386.rpm  (Red Hat 4.2)<br>
htdig-3.1.5-0glibc.i386.rpm  (Red Hat 5.x) *<br>
htdig-3.1.5-0glibc21.i386.rpm (for glibc-2.1, Red Hat 6.0, 7.0**) 
</ul>
<p> Put it somewhere on your Linux machine and (as root) type \
<tt>rpm&nbsp;-Uvh&nbsp;htdig*.rpm</tt>.  Bang, it's installed.  Now skip to <a \
href="#whereis">Where everything is</a>.

<span class="smalltext">
<blockquote><small>
<p class="smalltext">* There is a bug with vixie-cron for Red Hat 5.0 and 5.1.  The \
ht://Dig team reccomends upgrading to a newer version of vixie-cron.  Look for \
vixie-cron-3.0.1-37.5.2.i386.rpm.  This affects you, because the RPM installer \
installs <b>rundig</b> as an <b>/etc/cron.daily</b> job.  Get the updated vixie-cron \
<a href="http://www.scrc.umanitoba.ca/htdig/rpms/">from here</a>.

<p class="smalltext">** If you are using Red Hat 7.0 and don't have the Power Tools \
CD, then you can use htdig-3.1.5-0glibc21.i386.rpm, but it needs some additional work \
to get it going.  You must first install compat-libstdc++-6.2-2.9.0.9.i386.rpm from \
the first Red Hat 7.0 install CD.  The default HTML directory in previous version of \
Red Hat was /home/httpd/html.  It is now /var/www/html.  \
htdig-3.1.5-0glibc21.i386.rpm installs several things in /home/httpd/html.  These \
need to be moved to /var/www/html.

<p class="smalltext">Move search.html and the htdig directory to /var/www/html.  You \
must also move /home/httpd/cgi-bin/htsearch to /var/www/cgi-bin/htsearch.  The \
'local_urls' variable in /etc/htdig/htdig.conf needs to be modified because it refers \
to /home/httpd/html.

</small></blockquote></span>

<a name="installtarball"></a>
<h3>Installing the tarball</h3>

<p>For the tarball, you should decide where you want ht://Dig to install its \
programs.  <!-- I went with the default.  You might want to change this.  --> You \
must decide this before you install it, because you can't move it after you have it \
installed.  (Except by deleting the entire installation and re-installing from \
scratch.)  The default is to install in the <tt>/opt/www</tt> directory.  The \
assorted ht://Dig binaries and configuration files will be located in this directory \
tree.  You must configure your Web server to execute the ht://Dig CGI programs from \
here.  If this is not acceptable, then change these locations during the installation \
procedure.

<p>OK, now follow the <a href="http://www.htdig.org/install.html">ht://Dig \
installation instructions</a>.  (You probably should open them in a new window so \
that you can refer to this page.)  When you get to the <b>Configure</b> step, you \
have the opportunity to edit the <TT>CONFIGURE</TT> script that defines where \
everything will get installed.  If you want to go with the default location, then \
just continue on through the procedure.

<a name="configapache"></a>
<h3>Configuring Apache (tarball only)</h3>

<blockquote>
<p class="smalltext">The RPM installation should need no Apache configuration \
changes, because everything goes in "standard" locations.  Assuming that your \
installation uses the standard locations.... </blockquote>

<p>Assuming that you installed ht:/Dig in the default <tt>/opt/www</tt> directory, \
here are the configuration changes that you should add to your Apache configuration \
file(s).

<p><table border="1" cellspacing=0 cellpadding=8>
<tr valign="top"><td>
<tt>Alias /htdig/ /opt/www/htdocs/htdig/</tt></td><td>
So that you can "point" to assorted graphic files.  e.g.,<br>
<tt>&lt;img src="/htdig/htdig.gif"&gt;</tt>  Also, the default <tt>search.html</tt> \
file is located here.

<p>It is a real good idea to keep the <tt>/htdig/</tt> definition, because the \
template files that are used to display the search results all refer to \
<tt>htdig/</tt> to locate files. </td></tr>

<tr valign="top"><td>
<tt>ScriptAlias /htdig-cgi/ /opt/www/cgi-bin/</tt></td><td>
Is how you access the htsearch program for searching.  e.g.,<br>
<tt>&lt;form method="post" action="/htdig-cgi/htsearch"&gt;</tt>
</td></tr>

<tr valign="top"><td>
<pre>
&lt;Directory /opt/www/cgi-bin/&gt;
AllowOverride None
Options ExecCGI
&lt;/Directory&gt;
</pre></td><td>
So that Apache will allow access to the ht://Dig cgi-bin directory.
</td></tr>
</table>

<p>After editing your Apache configuration files, type \
<tt>/etc/rc.d/rc.init/httpd&nbsp;restart</tt> to restart Apache.

<a name="whereis"></a>
<h3>Where everything is</h3>

<table border="1" cellspacing=0 cellpadding=7>
<tr valign="top"><th>Name</th><th>RPM locations</th><th>Tarball (Default \
locations)</th><th>Used for</th><tr>

<a name="configdir"></a>
<tr valign="top"><td><a  \
href="http://www.htdig.org/config.html#htdig.conf">${CONFIG_DIR}</a></td><td>/etc/htdig</td><td>/opt/www/htdig/conf</td><td>htdig.conf \
configuration file</td></tr>

<a name="commondir"></a>
<tr valign="top"><td>${COMMON_DIR}</td><td>/var/lib/htdig/common</td><td>/opt/www/htdig/common</td><td>Template \
files used for search results</td></tr>

<a name="bindir"></a>
<tr valign="top"><td>${BIN_DIR}</td><td>/usr/sbin</td><td>/opt/www/htdig/bin</td><td>rundig \
and other "digging" binaries</td></tr>

<a name="databasedir"></a>
<tr valign="top"><td>${DATABASE_DIR}</td><td>/var/lib/htdig/db</td><td>/opt/www/htdig/db</td><td>The \
search index database files.</td></tr>

<a name="cgibindir"></a>
<tr valign="top"><td>${CGIBIN_DIR}</td><td>/home/httpd/cgi-bin</td><td>/opt/www/cgi-bin</td><td>htsearch</td></tr>


<a name="imagdir"></a>
<tr valign="top"><td>${IMAGE_DIR}</td><td>/home/httpd/html/htdig</td><td>/opt/www/htdocs/htdig</td><td>htdig.gif, \
and other graphic files</td></tr>

<a name="searchdir"></a>
<tr valign="top"><td><a \
href="http://www.htdig.org/config.html#search.html">${SEARCH_DIR}</a></td><td>/home/httpd/html</td><td>/opt/www/htdocs/htdig</td><td>search.html \
sample search form</td></tr>


<!--
<tr valign="top"><td></td><td></td><td></td><td></td></tr>

<tr valign="top"><td></td><td></td><td></td><td></td></tr>

<tr valign="top"><td></td><td></td><td></td><td></td></tr>

<tr valign="top"><td></td><td></td><td></td><td></td></tr>
-->
<tr valign="bottom"><th>Name</th><th>RPM locations</th><th>Tarball (Default \
locations)</th><th>Used for</th><tr> </table>

<a name="configconf"></a>
<h3>Configuring the htdig.conf file</h3>

<blockquote>
<p class="smalltext"><b>Important note for RPM users:</b>  The RPM installation \
program attempts to configure ht://Dig so that it will work "out of the box."  They \
installed the various files in "standard" Red Hat locations.  One thing that is never \
standard, however, is the name of your machine.  The ht://Dig RPM installer attempts \
to glean this information from your existing configuration files and <em>appends new \
definitions at the <b>end</b> of the htdig.conf file</em>, in addition to the "stock" \
definitions that are scattered throughout the htdig.conf file.  This includes the all \
important <tt>start_url:</tt> variable.  Variable definitions at the end of the file \
override earlier definitions.  Bear this in mind as you are scrolling through \
htdig.conf. </blockquote>

<p>Edit <tt><a href="#configdir">${CONFIG_DIR}</a>/htdig.conf</tt>.  Scroll down and \
find the <tt><a href="http://www.htdig.org/attrs.html#start_url">start_url:</a></tt> \
line.  This line defines what ht://Dig will index for searching.  The default is to \
index the http://www.htdig.org/ site.  This is not a good site to test with, because \
it takes a <em>long</em> time to index.  Change this to point to a "site" on your own \
machine.  For speed, change the URL to use your machine's IP address, rather than the \
full domain name.  For example, if your machine is addressed as 192.168.1.1, then set \
<tt>start_url:</tt> to be <tt>http://192.168.1.1/</tt>

<p><b>Start_url: must be specified to be accessed the same way as your web server \
accesses it. </b>

<p>Because ht://Dig works like a web crawler and accesses your HTML pages the same \
way as a web browser does.  So use a browser to access the site on your own machine.  \
Use the same URL that your browser uses in <tt>start_url:</tt>.

<blockquote><small>
<p class="smalltext">Using the IP address to refer to the site is a shortcut for \
testing.  This IP address will be returned in the search results, so 192.168.1.1, for \
example, isn't what you would use when you release the search form to the public.  In \
this case, you either have to set start_url: to the actual domain that the site uses, \
or (preferably) use <em>two</em> configuration files (one for digging and another for \
searching) and use the <b><a \
href="http://www.htdig.org/attrs.html#url_part_aliases">url_part_aliases</a></b> \
directive to translate from a local IP address to the real domain.  This is more \
complicated than what you should be doing until you have it working and are familiar \
with the basic operations.

<p class="smalltext">For an additional speed boost, check out the <b><a \
href="http://www.htdig.org/attrs.html#local_urls">local_urls:</a></b> directive that \
lets ht://Dig access the files through the local filesystem, rather than having to go \
through the web server.  But, again, wait until you have ht:/Dig working and are \
reasonably familiar with how everything works before you try using this. \
</small></blockquote>

<p>You should create a <tt>robots.txt</tt> file in the server's root directory to \
specify what you do <em>not</em> want ht://Dig (or any other search engine!) to \
index.  Here is a sample <tt>robots.txt</tt> file 

<blockquote><pre>
# robots.txt for http://www.example.com/

User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
Disallow: /tmp/ # these will soon disappear
Disallow: /foo.html
</pre>

<p>From <a href="http://info.webcrawler.com/mak/projects/robots/norobots.html">A \
Standard for Robot Exclusion</a>. </blockquote>

<p><a href="http://www.htdig.org/confindex.html">Reference for all configuration file \
directives</a>

<a name="digging"></a>
<h3>Generating the search index</h3>

<p>Before you can search you must generate the search index database.  Change to \
<tt><a href="#bindir">${BIN_DIR}</a></tt>.  Use the <tt>rundig</tt> script to run the \
ht://Dig programs to index your site.  Type <tt>./rundig -v</tt>  Rundig will run the \
<tt>htdig</tt> "digging" (indexing) and <tt>htmerge</tt> (second step of creating the \
search index) programs.  The <tt>-v</tt> option tells them to be verbose.  Meaning \
that you should see each file as it is indexed, followed by indications of the \
merging activity.

<p>This <em>should</em> complete in a reasonable length of time (depending on the \
size of your site.)  If you see prolonged periods of inactivity, then press Ctrl-C to \
abort the programs and check <tt>start_url:</tt> in the <tt><a \
href="#configdir">${CONFIG_DIR}</a>/htdig.conf</tt> configuration file.  If indexing \
is taking too long for testing, consider changing <tt>start_url:</tt> to only index a \
subset of your site until you are done wrestling with the configuration file.

<p>Note that you must update the index whenever the site is updated.  If your site is \
large and indexing is time consuming, then you might want to do the indexing in a <a \
href="cron.html">cron</a> job that is run in the middle of the night.

<blockquote><small>
<p class="smalltext">RPM users should know that the RPM installer creates an \
<b>/etc/cron.daily</b> job that will automatically run <b>rundig</b> once a day.  \
This may be all that you need.

<p class="smalltext">When you get the configuration file squared away, then use \
<b>./rundig&nbsp;-s</b> for a considerably shorter display.  Alternatively, if \
something is giving you problems then try using <b>./rundig&nbsp;-vvv</b> for an \
<em>extremely</em> detailed and verbose display.  In this case, you would probably \
want to redirect the output to a file. &nbsp;&nbsp; \
<b>./rundig&nbsp;-vvv&nbsp&gt;&nbsp;debug.txt</b>  Then load <b>debug.txt</b> in an \
editor.

<p class="smalltext">Right now the only way you have to generate the index is by \
running the <b>rundig</b>  (or <b><a href="#rundig2">rundig2</a></b>) script, which \
possibly is limiting because generates the whole index from scratch each time that it \
is run.  This has two undesireable side effects: 1., it takes times and machine \
resources, and, 2., searching returns no results while the <b>rundig</b> script is \
running.

<p class="smalltext">There are other ways to do the search index database updating to \
sidestep these issues.  You should examine the command line options for the indexing \
programs so that you can develop an indexing procedure that best suits your site's \
needs.

</blockquote></small>

<p>More information on the <a href="http://www.htdig.org/htdig.html">htdig</a>, 
<a href="http://www.htdig.org/htmerge.html">htmerge</a>,  
<a href="http://www.htdig.org/htnotify.html">htnotify</a>, and 
<a href="http://www.htdig.org/htfuzzy.html">htfuzzy</a> programs that are used to \
generate the search index database.

<a name="searching"></a>
<h3>Doing a search.  Finally.</h3>

<p>Look at <tt><a href="#searchdir">${SEARCH_DIR}</a>/search.html</tt>  This is your \
sample <a href="http://www.htdig.org/hts_form.html">search form</a>.  

<blockquote><small>
<p class="smalltext">For the tarball installation, you probably have to change one \
line, because we defined the CGI directory to be <b>htdig-cgi</b> in the Apache \
configuration file.  So change </small>
<pre>&lt;form method="post" action="/cgi-bin/htsearch"&gt;
</pre>
<small>
<p class="smalltext">to
</small>
<pre>&lt;form method="post" action="/htdig-cgi/htsearch"&gt;
</pre>
<small>
<p class="smalltext">and save the file.  
</blockquote></small>

<p>Now use a browser to access this search form.  If the IP address of your server is \
192.168.1.1, then enter either <tt>192.168.1.1/htdig/search.html</tt> (tarball) or \
<tt>192.168.1.1/search.html</tt> (RPM) as the URL for your browser.  You should see \
the search form.  Enter a word that you know is somewhere on your site.  Click the \
search button.

<p>(Fingers are crossed.)

<p>You <em>should</em> see the search results displayed, almost instantly.

<p>More information on the <a href="http://www.htdig.org/htsearch.html">htsearch</a> \
CGI program that does the actual searching.

<a name="troubleshooting"></a>
<h3>Troubleshooting</h3>

<p>If something isn't working right, the first thing to do is to go back and check \
your configuration and try repeating the above procedures.  If this doesn't help, \
then the <a href="http://www.htdig.org/">ht://Dig site</a> has a lot of valuable \
reference material.  Check the <a \
href="http://www.htdig.org/config.html">configuration page</a>, check the <a \
href="http://www.htdig.org/FAQ.html">FAQ</a>.  Check the on-line reference section.  \
Most important, make sure to visit the <a \
href="http://www.htdig.org/mailarchive.html">ht://Dig Mailing List Archive</a>.  The \
ht://Dig community provides <em>excellent</em> support.  Most (if not all) common \
"why doesn't this work" type questions have already been asked and answered on the \
mailing list, or in the FAQ.  

<p><b>Use the search box at the bottom of <a href="http://www.htdig.org/">the main \
ht://Dig page</a> to search the archives (and the rest of the ht://Dig site.)</b>

<p>&nbsp;


<h1>Tips and Techniques</h1>

<a name="customizesearch"></a>
<h3>Customizing the search results</h3>

<p>Examine <tt><a href="#searchdir">${SEARCH_DIR}</a>/search.html</tt>.  You use this \
as a basis for how you want the search forms to look.  The search results are defined \
by the template files that are located in <tt><a \
href="#commondir">${COMMON_DIR}</a></tt>.  You edit these to change how the search \
results are displayed.

<ul>
<li><a href="http://www.htdig.org/hts_form.html">Search form</a>
<li><a href="http://www.htdig.org/hts_templates.html">Template files</a>
<li><a href="http://www.htdig.org/config.html">Configuration documentation</a> has \
more information on these files.  <li><a \
href="http://www.htdig.org/htsearch.html">Htsearch</a> <li><a \
href="http://www.htdig.org/hts_method.html">How searching works</a> </ul>

<p>One tricky part is that ht://Dig <em>totally ignores</em> the template files \
unless you add a <tt><a \
href="http://www.htdig.org/attrs.html#template_map">template_map directive</a></tt> \
to <tt>htdig.conf</tt>.  Like this:

<pre>
this_base:  myweb

search_results_header: ${common_dir}/${this_base}/header.html
search_results_footer: ${common_dir}/${this_base}/footer.html
nothing_found_file: ${common_dir}/${this_base}/nomatch.html
syntax_error_file: ${common_dir}/${this_base}/syntax.html

template_map:   Long builtin-long ${common_dir}/${this_base}/long.html \
                Short builtin-short ${common_dir}/${this_base}/short.html \
                Default default ${common_dir}/${this_base}/long.html
template_name: Default
</pre>

<p>In this case I defined a new variable, <tt>this_base:</tt> with a value of \
<tt>myweb</tt>.  The way I use this is to first create a <tt>myweb</tt> directory on \
top of <tt><a href="#commondir">${COMMON_DIR}</a></tt> and copy all the template \
files into it <em>before</em> I started editing them.  This leaves an untouched set \
of the template files.

<p>Once this has been done I went through and edited all the template files so that \
they displayed the way I wanted.  e.g., editing <tt>${COMMON_DIR}/myweb/header.html, \
${COMMON_DIR}/myweb/footer.html</tt>, etc.  This method is also valuable if you are \
indexing (and searching) multiple sites and are using multiple configuration files.  \
You keep each different set of template files in a different directory (defined by \
the value that is assigned to <tt>this_base</tt>.)  

<p>Optional.  You could also separate the database files by defining them like

<pre>
database_base:    ${database_dir}/${this_base}
</pre>

<p>The database files default to be named like <tt>db.docdb, db.word.db</tt>, etc.  \
Making the above change would result in the database files being named like \
<tt>myweb.docdb, myweb.word.db</tt>, etc.  Again, this is important if you are using \
multiple configuration files to manage multiple search databases on the same machine. \
If you are only using one search database, then you can ignore defining \
<tt>database_base:</tt>.

<!--
<p>You would have to make a new directory for each different \
<tt>${database_dir}/${this_base}</tt>.  For example, if <tt>${database_dir}</tt> is \
defined as <tt>/opt/www/htdig/db</tt> and <tt>${this_base}</tt> is defined as \
                <tt>myweb</tt>, then you would create the \
                <tt>/opt/www/htdig/db/myweb</tt> directory.
-->

<a name="date"></a>
<h3>Making the date display all four digits of the year in search results</h3>

<p>Add a <tt><a href="http://www.htdig.org/attrs.html#date_format">date_format:</a></tt> \
command to <tt>htdig.conf</tt>.  

<p>Example: &nbsp;<tt>date_format: %m/%d/%Y</tt> &nbsp;&nbsp;will display like \
<b>01/23/2000</b>.

<p>See <tt>man strftime</tt> for full reference. 


<a name="rundig2"></a>
<h3>An alternate rundig script</h3>

<p>ht://Dig supplies the <tt>rundig</tt> script that is sufficient to manage some \
ht://Dig indexing operations.  But <tt>rundig</tt> doesn't support all the possible \
<tt>htdig, htmerge, and htfuzzy</tt> command line options.  It is also difficult to \
use when you are specifying a different configuration file, because you have to type \
in the complete path to the configuration file.

<p>I have modified <tt>rundig</tt> to address this.  The modified script is named \
<tt>rundig2</tt>.  It now supports all the command line options.  It also supplies \
the path and file extension when you use the <tt>-c&nbsp;config file</tt> option.

<p>Download either 

<ul>
<a href="rundig2tar.txt">rundig2tar.txt</a>  (For a default tarball installation)<br>
<a href="rundig2rpm.txt">rundig2rpm.txt</a>  (For a RPM installation)
</ul>

<p>Download whichever of these is most appropriate.  Rename it to be \
<tt>rundig2</tt>, check to see that the variables that define locations \
(<tt>DBDIR</tt>, etc.) are correct, move it to <tt><a \
href="#bindir">${BIN_DIR}</a></tt>, and chmod it to be executable. \
(<tt>chmod&nbsp;755&nbsp;rundig2</tt>)

<p>Now you can use <tt>rundig2</tt> instead of <tt>rundig</tt> when you are creating \
the database files.  If <tt>rundig2</tt> doesn't work for you, for some reason, then \
go back to using <tt>rundig</tt> and <a href="mailto:wayne@scrounge.org">please let \
me know about it</a>.


<a name="pdf"></a>
<h3>Indexing PDF files</h3>

<p>Ht://Dig will index Adobe Acrobat PDF files quite nicely, but it needs some \
additional configuration.  You must download and install a PDF-to-text converter and \
do some additional configuration.  Here's how.

<p>Download the <a href="http://www.foolabs.com/xpdf/">Xpdf package</a> from the <a \
href="http://www.foolabs.com/xpdf/download.html">Xpdf Download page</a>.  Linux Intel \
users can download the pre-compiled binaries (x86, Linux 2.0 (libc6):)  Once you have \
the binaries, then copy <tt>pdftotext</tt> and <tt>pdfinfo</tt> to a suitable \
location (<tt><a href="#bindir">${BIN_DIR}</a></tt> or <tt>/usr/bin</tt>, for \
example)

<p>Alternatively, you can also use one of these <a \
href="http://www.scrc.umanitoba.ca/htdig/rpms/">Xpdf RPM files</a>.   Download \
<em>one</em> of these files:

<p><ul>
xpdf-0.90.0.i386.rpm (Red Hat 4.2)<br>
xpdf-0.90.0glibc.i386.rpm (Red Hat 5.x)<br>
xpdf-0.90.0glibc21.i386.rpm (Red Hat 6.x)<br>
</ul>

<p>Install the RPM (<tt>rpm -Uvh xpdf*.rpm</tt>) and <tt>pdftotext</tt> and \
<tt>pdfinfo</tt> will be installed in <tt>usr/bin</tt> (Double check the location \
with <tt>rpm&nbsp;-ql&nbsp;xpdf</tt>)

<!--
<p>Download <tt>conv_doc.pl.gz</tt> from <a \
href="http://www.htdig.org/files/contrib/parsers/">http://www.htdig.org/files/contrib/parsers/</a>. \
Then use gunzip to "unzip" it.

<blockquote><small>
<p class="smalltext">You <em>might</em> have trouble unzipping conv_doc.pl.gz after \
downloading  it with your browser.  Try right clicking on it.  Or try using a real \
FTP program.  You can also get (ungzipped) conv_doc.pl <a \
href="http://www.scrc.umanitoba.ca/htdig/rpms">from here</a>. </small></blockquote>
-->

<p>Download <tt>conv_doc.pl</tt> <a \
href="http://www.scrc.umanitoba.ca/htdig/rpms">from here</a> and copy it to your \
<tt><a href="#bindir">${BIN_DIR}</a></tt> directory.  Chmod it to to be executable.  \
(<tt>chmod&nbsp;755&nbsp;conv_doc.pl</tt>)  Then load it in your editor and change \
the <tt>$CATPDF</tt> variable to point to where <tt>pdftotext</tt> is and change \
<tt>$PDFINFO</tt> to where <tt>pdfinfo</tt> is.

<p>Finally, edit <tt><a href="#configdir">${CONFIG_DIR}</a>/htdig.conf</tt> and add

<pre>
external_parsers:  application/pdf-&gt;text/html <i>/usr/local/bin/</i>conv_doc.pl
</pre>

<p>Replace <tt><i>/usr/local/bin/</i></tt> with the location of where you copied \
<tt>conv_doc.pl</tt>  <a href="http://www.htdig.org/attrs.html#external_parsers">More \
about the external_parsers: directive</a>.

<p><b>Important note.</b>  ht:/Dig must read each PDF file in its entirety in order \
to index it.  This is affected by the <tt>max_doc_size:</tt> directive in \
<tt>htdig.conf</tt>.  Make sure that <tt>max_doc_size:</tt> is set to be larger than \
your largest PDF file. 

<blockquote><small>
<p class="smalltext">pdftotext is pretty nifty.  It can also be interfaced to lynx  \
Check /etc/lynx.cfg and ~.mailcap. </small></blockquote>

<a name="doc"></a>
<h3>Indexing Microsoft Word files</h3>

<p> Installing a Microsoft Word to text converter is similar to <a \
href="#pdf">Indexing PDF Files</a>. Follow the procedures there to install and \
configure <tt>conv_doc.pl</tt>.  The only difference is that you install a \
Word-to-Text converter, such as  <a \
href="http://www.fe.msk.ru/~vitus/catdoc/">catdoc</a>.  These go together, so it is \
almost as easy to install both the Word and PDF converters at the same time.  \
<tt>conv_doc.pl</tt> is already partially configured to use <tt>catdoc</tt>.  Add  
<pre>
external_parsers:  application/msword-&gt;text/html <i>/usr/local/bin/</i>conv_doc.pl
</pre>

<p>to <tt><a href="#configdir">${CONFIG_DIR}</a>/htdig.conf</tt>.  If you were \
installing both the PDF and Word converters, then you'd add

<pre>
external_parsers:  application/msword-&gt;text/html <i>/usr/local/bin/</i>conv_doc.pl \
                \
                   application/pdf-&gt;text/html <i>/usr/local/bin/</i>conv_doc.pl
</pre>

<p>Again, replace <i>/usr/local/bin/</i> with the location where you have actually \
installed the <tt>conv_doc.pl</tt> script.


<a name="logging"></a>
<h3>Logging search requests</h3>

<p>It is valuable to have a record of what prople are searching for so that you know \
what they are interested in.  This can give you hints on additional content that you \
need to add to your site.  

<p>To log search requests, add <tt><a \
href="http://www.htdig.org/attrs.html#logging">logging:</a> true</tt> to your \
configuration file.  This will direct the system logging facility to log search \
requests.   

<p>However, you might want to change the default logfile where syslog sends these \
messages to. (By default it goes to <tt>/var/log/messages</tt>.)   To do this, edit \
your <tt>/etc/syslog.conf</tt> file and add this to it:

<pre>
# Log ht://Dig search requests
local5.*                            /var/log/htdig
</pre>

<p><b>Remember to use tabs and <i>NOT</i> spaces in your <tt>syslog.conf</tt> file.  \
Otherwise it won't work.</b>  

<p>The system will now log search requests to both <tt>/var/log/messages</tt> as well \
as to <tt>/var/log/htdig</tt>, so now you have to tell it not to log search requests \
to <tt>/var/log/messages</tt>.  To do this, add <tt>;local5.none</tt> to your \
<tt>/var/log/messages</tt> line. It should look something like this:

<pre>
# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none<font color="#FF0000"><b>;local5.none</b></font>        \
/var/log/messages </pre>

<p>For the changes to take effect, you'll need to restart your <tt>syslog</tt> \
daemon.  To do so, just do a

<pre>
killall -HUP syslogd
</pre>

<p>That will force <tt>syslogd</tt> to re-read its config file for the changes to \
take  effect.

<p>See <tt>man syslog.conf -S 5</tt> for more information.

<p><i>Syslog information courtesy of Bruce A. Buhler</i>


<p><hr>

<p>Back to the <a href="../index.html#linux">scrounge.org home page.</a>

</body></html>



_______________________________________________
Quanta mailing list
Quanta@mail.kde.org
https://mail.kde.org/mailman/listinfo/quanta


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic