[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nutch-cvs
Subject:    [Nutch-cvs] nutch/src/plugin/parse-rtf README.txt,NONE,1.1 build.xml,NONE,1.1 plugin.xml,NONE,1.1
From:       John <johnnx () users ! sourceforge ! net>
Date:       2004-09-29 5:22:18
Message-ID: E1CCWuw-0002pM-IC () sc8-pr-cvs1 ! sourceforge ! net
[Download RAW message or body]

Update of /cvsroot/nutch/nutch/src/plugin/parse-rtf
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv10432/src/plugin/parse-rtf

Added Files:
	README.txt build.xml plugin.xml 
Log Message:
Added plugin parse-rtf, contributed by Andy Hedges.


--- NEW FILE: build.xml ---
<?xml version="1.0"?>

<project name="parse-rtf" default="jar">

  <import file="../build-plugin.xml"/>

  <!-- for junit test -->
  <mkdir dir="${build.test}/data"/>
  <copy file="sample/test.rtf" todir="${build.test}/data"/>
</project>

--- NEW FILE: README.txt ---
Prereqs: JDK 1.4+ and javacc version 3.2+

This document describes how to create rtf-parser.jar file as used by Nutch.

Source files are contained in:

http://www.cobase.cs.ucla.edu/pub/javacc/rtf_parser_src.jar

Create a new directory with the following files in:

	LICENCE
	RTFParser.jj
	RTFParserDelegate.java

cd into this new directory create a src directory
	
	$mkdir src
	
copy RTFParser.jj RTFParserDelegate.java into this src directory

	$cp RTFParser.jj RTFParserDelegate.java src/
	
now cd into this src directory and generate the javacc classes for the parser
and then cd out again

	$cd src
	$javacc RTFParser.jj
	$cd ..
	
now compile all the source and generated files

	$javac -d . src/*.java
	
(optional) remove the generated source

	$rm -rf src # (optional)
	
finally create the jar archive of all the salient files

	$jar -cvf rtf-parser.jar com/ LICENCE RTFParser*
	
--Andy Hedges

Credits:

Thanks to Eric Friedman for writing this javacc grammar file.


--- NEW FILE: plugin.xml ---
<?xml version = '1.0' encoding = 'UTF-8'?>
<plugin version="1.0.0" provider-name="nutch.org" id="parse-rtf" name="RTF Parse \
Plug-in" >  <extension-point id="net.nutch.parse.Parser" name="Nutch Content Parser" \
/>  <runtime>
    <library name="parse-rtf.jar" >
      <export name="*" />
    </library>
    <library name="rtf-parser.jar"/>
  </runtime>
  <extension point="net.nutch.parse.Parser" id="net.nutch.parse.rtf" name="RTFParse" \
>  <implementation class="net.nutch.parse.rtf.RTFParseFactory" pathSuffix="rtf" \
> id="net.nutch.parse.rtf.RTFParseFactory" contentType="application/rtf" />
  </extension>
</plugin>



-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic