[prev in list] [next in list] [prev in thread] [next in thread]
List: slide-dev
Subject: Re: Incorrect wiki page, DaslConfiguration. And casesensitivity
From: Eirikur Hrafnsson <eiki () idega ! is>
Date: 2006-06-21 17:59:00
Message-ID: 9A5B8B97-3C22-4CDB-ADAD-A08915602A56 () idega ! is
[Download RAW message or body]
hehe am I going insane or are I just getting cached versions of that
Wiki page??
Have you looked at the page or are you just changing it in the cvs.
I will copy and paste the whole thing here (as plain text), and then
tell me there is no example code with property-contains in it:
Jakarta-slide Wiki
Login
FrontPage
RecentChanges
FindPage
HelpContents
DaslConfiguration
Immutable Page
Show Changes
Get Info
More Actions:Show Raw TextShow Print ViewDelete Cache--------
AttachmentsCheck SpellingShow Like PagesShow Local Site Map--------
Rename PageDelete Page
DaslConfiguration
DASL Configuration
The default implementation scans the complete resource tree provided
in the scope of DASL query and tests for each resource whether it
matches the condition or not.
This works, but is quite slow.
To avoid this, you caurrently have the following options:
If you are using JDBCStore/J2EEStore you can enable metadata
searching using the database.
You can enable metadata searching using integrated lucene search engine.
You can enable content searching using integrated lucene search engine.
Searching meta-data using RDMBS
If you are using a JDBCStore/J2EEStore you can use the database to
search the metadata. To enable this add the parameter use-rdbms-
expression-factory to your store definition.
<store name="myStore">
<parameter name="cache-mode">full</parameter>
<nodestore classname="org.apache.slide.store.impl.rdbms.JDBCStore">
... your JDBCStore configuration ..
<parameter name="use-rdbms-expression-factory">true</parameter>
</nodestore>
<securitystore><reference store="nodestore"/></securitystore>
<lockstore><reference store="nodestore"/></lockstore>
<revisiondescriptorsstore><reference store="nodestore"/></
revisiondescriptorsstore>
<revisiondescriptorstore><reference store="nodestore"/></
revisiondescriptorstore>
<contentstore><reference store="nodestore"/></contentstore>
</store>
Searching meta-data with the Lucene based properties indexer
Note this is under delevlopment, and will be part of Slide 2.2. To
check this out you can use cvs HEAD.
Searching the meta data.
Enabling
To use this indexer add the following to your store definition.
<propertiesindexer
classname="org.apache.slide.index.lucene.LucenePropertiesIndexer">
<parameter name="indexpath">store/index/metadata</parameter>
</propertiesindexer>
Parameter
parameter
description
required/default
indexpath
directory where the index data is stored
true/none
asynchron
If set to false the index is updated inside the transaction. If set
to true the index in updated on a separate thread. So the transaction
can be finished before the index is updated.
no/false
priority
Priority ofthe indexing thread if asynchron is true. Must be a value
between Thread.MIN_PRIORITY and Thread.MAX_PRIORITY
no/Thread.NORM_PRIORITY
includes
A comma separated list of pathes for which indexix should happen. If
empty all inthe store is indexed
no
optimization-threshold
The number of write accesses to the index after which the index is
optimized
no/100
supported DASL operators and data types
The indexer currently supports the datatypes:
string indexed with out any modification
date indexed as a normalized date string (without seconds)
integer indexed as a normalized integer string (between
Long.MIN_VALUE and Long.MAX_VALUE)
text indexed in a tokenized and normalized form (normalized using
Lucene analyzers)
string
date
integer
text
eq
*
*
*
-
lt
+
*
*
-
gt
+
*
*
-
lte
+
*
*
-
gte
+
*
*
-
like
*
~
~
-
is-defined
*
*
*
*
between
+
*
*
-
property-contains
-
-
-
*
* supported (if indexing for the property is enabled)
+ ditto but the order of strings is limited to char code ordering
~ supported but not executed with the index (so will be slow)
- unsupported (will return an error)
Also supported are the boolean operators and, or, not and the special
operators is-collection and is-principal.
Configuring what properties are indexed
TODO
To reduce the indexing overhead, not all properties are index by
default. For properties that are not indexed the default search
implementation we be called.
By default the following properties are indexed:
namespace
property
type
DAV:
displayname
string
DAV:
getcontenttype
string
DAV:
getcontentlanguage
string
DAV:
getcontentlength
integer
DAV:
getlastmodified
date
DAV:
creationdate
date
User defined text properties
You can add additional properties to the indexing, including user
defined properties.
The following sample defines two user defined properties in the
namepace
http://any.domain/test/. Both are text properties analyzed with
different analyzers.
<propertiesindexer
classname="org.apache.slide.index.lucene.LucenePropertiesIndexer">
<parameter name="indexpath">${datapath}/store1/index/metadata</
parameter>
<configuration name="indexed-properties">
<property name="abstract" namespace="http://any.domain/test/">
<text analyzer="org.apache.lucene.analysis.de.GermanAnalyzer"/>
<is-defined/>
</property>
<property name="keywords" namespace="http://any.domain/test/">
<text
analyzer="org.apache.lucene.analysis.WhitespaceAnalyzer"/>
<is-defined/>
</property>
</configuration>
</propertiesindexer>
Operators (extensions)
Operator property-contains
Is an extension to RFC. It works like the contains operator but for
properties. This is intended for use with properties that contains
abstracts, keyword lists etc.
Usage
<searchrequest xmlns:D="DAV:"
xmlns:S="http://jakarta.apache.org/slide/"
xmlns:u="http://any.domain/test/">
...
<D:where>
<S:property-contains>
<D:prop><u:abstract/></D:prop>
<D:literal>Server</D:literal>
</S:property-contains>
</D:where>
...
</searchrequest>
1. Search for a single word
<S:property-contains>
<D:prop><u:abstract/></D:prop>
<D:literal>Word</D:literal>
</S:property-contains>
2. Search for words with wildcards
<S:property-contains>
<D:prop><u:abstract/></D:prop>
<D:literal>prefix*</D:literal>
</S:property-contains>
<S:property-contains>
<D:prop><u:abstract/></D:prop>
<D:literal>wild?ard</D:literal>
</S:property-contains>
3. Search for phrases
<S:property-contains>
<D:prop><u:abstract/></D:prop>
<D:literal>a longer phrase of text</D:literal>
</S:property-contains>
Searching content with the Lucene based content indexer
Enabling
To use this indexer add the following to your store definition.
<contentindexer
classname="org.apache.slide.index.lucene.LuceneContentIndexer">
<parameter name="indexpath">store/index/content</parameter>
</contentindexer>
Parameter
parameter
description
required/default
indexpath
directory where the index data is stored
true/none
asynchron
If set to false the index is updated inside the transaction. If set
to true the index in updated on a separate thread. So the transaction
can be finished before the index is updated.
no/false
priority
Priority ofthe indexing thread if asynchron is true. Must be a value
between Thread.MIN_PRIORITY and Thread.MAX_PRIORITY
no/Thread.NORM_PRIORITY
includes
A comma separated list of pathes for which indexix should happen. If
empty all inthe store is indexed
no
optimization-threshold
The number of write accesses to the index after which the index is
optimized
no/100
analyzer
Search for a single word
<D:contains>Word</D:contains>
</S:property-contains>
Extractors
The content indexer will only process resources that match any
content extractor. So don't forget to configure the content
extractors according to your needs. If you want to include text, pdf
and word documents into your search, your extractor configuration
could look like this:
<!-- Extractor configuration -->
<extractors>
<extractor classname="org.apache.slide.extractor.PDFExtractor"/>
<extractor classname="org.apache.slide.extractor.MSWordExtractor"/>
<extractor
classname="org.apache.slide.extractor.TextContentExtractor"/>
</extractors>
last edited 2005-12-07 09:09:57 by DanielFlorey
Immutable Page
Show Changes
Get Info
More Actions:Show Raw TextShow Print ViewDelete Cache--------
AttachmentsCheck SpellingShow Like PagesShow Local Site Map--------
Rename PageDelete Page
MoinMoin Powered Python Powered Valid HTML 4.01
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-dev-help@jakarta.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic