[prev in list] [next in list] [prev in thread] [next in thread] 

List:       slide-dev
Subject:    Re: Incorrect wiki page, DaslConfiguration. And casesensitivity
From:       Eirikur Hrafnsson <eiki () idega ! is>
Date:       2006-06-21 17:59:00
Message-ID: 9A5B8B97-3C22-4CDB-ADAD-A08915602A56 () idega ! is
[Download RAW message or body]

hehe am I going insane or are I just getting cached versions of that  
Wiki page??
Have you looked at the page or are you just changing it in the cvs.

I will copy and paste the whole thing here (as plain text), and then  
tell me there is no example code with property-contains in it:
Jakarta-slide Wiki

Login
FrontPage
RecentChanges
FindPage
HelpContents
DaslConfiguration

Immutable Page
Show Changes
Get Info
More Actions:Show Raw TextShow Print ViewDelete Cache-------- 
AttachmentsCheck SpellingShow Like PagesShow Local Site Map-------- 
Rename PageDelete Page
DaslConfiguration
DASL Configuration
The default implementation scans the complete resource tree provided  
in the scope of DASL query and tests for each resource whether it  
matches the condition or not.

This works, but is quite slow.

To avoid this, you caurrently have the following options:

If you are using JDBCStore/J2EEStore you can enable metadata  
searching using the database.
You can enable metadata searching using integrated lucene search engine.
You can enable content searching using integrated lucene search engine.


Searching meta-data using RDMBS
If you are using a JDBCStore/J2EEStore you can use the database to  
search the metadata. To enable this add the parameter use-rdbms- 
expression-factory to your store definition.

<store name="myStore">
   <parameter name="cache-mode">full</parameter>
     <nodestore classname="org.apache.slide.store.impl.rdbms.JDBCStore">
       ... your JDBCStore configuration ..
       <parameter name="use-rdbms-expression-factory">true</parameter>
     </nodestore>
     <securitystore><reference store="nodestore"/></securitystore>
     <lockstore><reference store="nodestore"/></lockstore>
     <revisiondescriptorsstore><reference store="nodestore"/></ 
revisiondescriptorsstore>
     <revisiondescriptorstore><reference store="nodestore"/></ 
revisiondescriptorstore>
     <contentstore><reference store="nodestore"/></contentstore>
</store>



Searching meta-data with the Lucene based properties indexer
Note this is under delevlopment, and will be part of Slide 2.2. To  
check this out you can use cvs HEAD.

Searching the meta data.

Enabling
To use this indexer add the following to your store definition.

   <propertiesindexer  
classname="org.apache.slide.index.lucene.LucenePropertiesIndexer">
       <parameter name="indexpath">store/index/metadata</parameter>
   </propertiesindexer>

Parameter

parameter
description
required/default
indexpath
directory where the index data is stored
true/none
asynchron
If set to false the index is updated inside the transaction. If set  
to true the index in updated on a separate thread. So the transaction  
can be finished before the index is updated.
no/false
priority
Priority ofthe indexing thread if asynchron is true. Must be a value  
between Thread.MIN_PRIORITY and Thread.MAX_PRIORITY
no/Thread.NORM_PRIORITY
includes
A comma separated list of pathes for which indexix should happen. If  
empty all inthe store is indexed
no
optimization-threshold
The number of write accesses to the index after which the index is  
optimized
no/100
supported DASL operators and data types
The indexer currently supports the datatypes:

string indexed with out any modification

date indexed as a normalized date string (without seconds)

integer indexed as a normalized integer string (between  
Long.MIN_VALUE and Long.MAX_VALUE)

text indexed in a tokenized and normalized form (normalized using  
Lucene analyzers)


string
date
integer
text
eq
*
*
*
-
lt
+
*
*
-
gt
+
*
*
-
lte
+
*
*
-
gte
+
*
*
-
like
*
~
~
-
is-defined
*
*
*
*
between
+
*
*
-
property-contains
-
-
-
*
* supported (if indexing for the property is enabled)

+ ditto but the order of strings is limited to char code ordering

~ supported but not executed with the index (so will be slow)

- unsupported (will return an error)

Also supported are the boolean operators and, or, not and the special  
operators is-collection and is-principal.

Configuring what properties are indexed
TODO

To reduce the indexing overhead, not all properties are index by  
default. For properties that are not indexed the default search  
implementation we be called.

By default the following properties are indexed:

namespace
property
type
DAV:
displayname
string
DAV:
getcontenttype
string
DAV:
getcontentlanguage
string
DAV:
getcontentlength
integer
DAV:
getlastmodified
date
DAV:
creationdate
date
User defined text properties

You can add additional properties to the indexing, including user  
defined properties.

The following sample defines two user defined properties in the  
namepace 

  http://any.domain/test/. Both are text properties analyzed with  
different analyzers.

   <propertiesindexer  
classname="org.apache.slide.index.lucene.LucenePropertiesIndexer">
     <parameter name="indexpath">${datapath}/store1/index/metadata</ 
parameter>
     <configuration name="indexed-properties">
       <property name="abstract" namespace="http://any.domain/test/">
         <text analyzer="org.apache.lucene.analysis.de.GermanAnalyzer"/>
         <is-defined/>
       </property>
       <property name="keywords" namespace="http://any.domain/test/">
         <text  
analyzer="org.apache.lucene.analysis.WhitespaceAnalyzer"/>
         <is-defined/>
       </property>
     </configuration>
   </propertiesindexer>
Operators (extensions)
Operator property-contains

Is an extension to RFC. It works like the contains operator but for  
properties. This is intended for use with properties that contains  
abstracts, keyword lists etc.

Usage

<searchrequest xmlns:D="DAV:"
   xmlns:S="http://jakarta.apache.org/slide/"
   xmlns:u="http://any.domain/test/">
   ...
     <D:where>
         <S:property-contains>
           <D:prop><u:abstract/></D:prop>
           <D:literal>Server</D:literal>
         </S:property-contains>
     </D:where>
   ...
</searchrequest>
1. Search for a single word

<S:property-contains>
   <D:prop><u:abstract/></D:prop>
   <D:literal>Word</D:literal>
</S:property-contains>
2. Search for words with wildcards

<S:property-contains>
   <D:prop><u:abstract/></D:prop>
   <D:literal>prefix*</D:literal>
</S:property-contains>
<S:property-contains>
   <D:prop><u:abstract/></D:prop>
   <D:literal>wild?ard</D:literal>
</S:property-contains>
3. Search for phrases

<S:property-contains>
   <D:prop><u:abstract/></D:prop>
   <D:literal>a longer phrase of text</D:literal>
</S:property-contains>


Searching content with the Lucene based content indexer
Enabling
To use this indexer add the following to your store definition.

   <contentindexer  
classname="org.apache.slide.index.lucene.LuceneContentIndexer">
       <parameter name="indexpath">store/index/content</parameter>
   </contentindexer>

Parameter

parameter
description
required/default
indexpath
directory where the index data is stored
true/none
asynchron
If set to false the index is updated inside the transaction. If set  
to true the index in updated on a separate thread. So the transaction  
can be finished before the index is updated.
no/false
priority
Priority ofthe indexing thread if asynchron is true. Must be a value  
between Thread.MIN_PRIORITY and Thread.MAX_PRIORITY
no/Thread.NORM_PRIORITY
includes
A comma separated list of pathes for which indexix should happen. If  
empty all inthe store is indexed
no
optimization-threshold
The number of write accesses to the index after which the index is  
optimized
no/100
analyzer


Search for a single word

<D:contains>Word</D:contains>
</S:property-contains>
Extractors
The content indexer will only process resources that match any  
content extractor. So don't forget to configure the content  
extractors according to your needs. If you want to include text, pdf  
and word documents into your search, your extractor configuration  
could look like this:

<!-- Extractor configuration -->
<extractors>
    <extractor classname="org.apache.slide.extractor.PDFExtractor"/>
    <extractor classname="org.apache.slide.extractor.MSWordExtractor"/>
    <extractor  
classname="org.apache.slide.extractor.TextContentExtractor"/>
</extractors>
last edited 2005-12-07 09:09:57 by DanielFlorey

Immutable Page
Show Changes
Get Info
More Actions:Show Raw TextShow Print ViewDelete Cache-------- 
AttachmentsCheck SpellingShow Like PagesShow Local Site Map-------- 
Rename PageDelete Page
MoinMoin Powered Python Powered Valid HTML 4.01





---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic