[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nutch-general
Subject:    Re: [Nutch-general] error merger index
From:       "Enzo Michelangeli" <enzomich () gmail ! com>
Date:       2007-07-30 0:05:40
Message-ID: 008901c7d23d$780b1a20$0800a8c0 () EMLT
[Download RAW message or body]

----- Original Message ----- 
From: "Le Quoc Anh" <quocanh263@gmail.com>
Sent: Sunday, July 29, 2007 5:14 PM

> .Hi everyone,
>
> When I recrawl, I must delete indexes and index files, and re-create index
> file.  If I only indexer segments that I have just fetched and merger with
> index existe, an error appear "index/merge-output exists". Anyone help me?
>
> Thanks a lot,
>
> Quoc Anh

Which command lines are you using? This works for me (in a non-distributed
environment), where the old index is in crawl/indexes and the one just
created is in crawl/indexes_new :

errexit() {
 echo "## $(date): *** LAST COMMAND RETURNED NONZERO STATUS: $? ***"
 exit 1
}

[...]
echo "## $(date): Starting to merge new indexes into old..."
MERGE_DATE=$(date +%Y%m%d%H%M)
bin/nutch merge crawl/indexes_merged/${MERGE_DATE} \
                crawl/indexes crawl/indexes_new || errexit

# in nutch-site, hadoop.tmp.dir points to crawl/tmp
rm -rf crawl/tmp/*

# create the index.done flag
touch crawl/indexes_merged/${MERGE_DATE}/index.done

# delete indexes_new and replace indexes with indexes_merged
rm -rf crawl/indexes_new
mv crawl/indexes crawl/indexes_old
mv crawl/indexes_merged crawl/indexes
rm -rf crawl/indexes_old

echo "## $(date): Re-indexing completed, making webapp aware of that..."
touch -c /usr/share/tomcat5/webapps/nutch*/WEB-INF/web.xml


Enzo


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic