[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-commits
Subject:    =?utf-8?q?=5BHadoop_Wiki=5D_Trivial_Update_of_=22Hbase/PoweredBy=22_by_Ot?= =?utf-8?q?isGospodnetic?
From:       Apache Wiki <wikidiffs () apache ! org>
Date:       2012-05-25 21:54:04
Message-ID: 20120525215404.23902.24068 () eos ! apache ! org
[Download RAW message or body]

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change \
notification.

The "Hbase/PoweredBy" page has been changed by OtisGospodnetic:
http://wiki.apache.org/hadoop/Hbase/PoweredBy?action=diff&rev1=74&rev2=75

Comment:
Removed SubRecord project - it's dead

  
  [[http://www.stumbleupon.com/|Stumbleupon]] and [[http://su.pr|Su.pr]] use HBase as \
a real time data storage and analytics platform. Serving directly out of HBase, \
various site features and statistics are kept up to date in a real time fashion. We \
also use HBase a map-reduce data source to overcome traditional query speed limits in \
MySQL.  
- [[http://www.subrecord.org|SubRecord Project]] is an Open Source project that is \
using HBase as a repository of records (persisted map-like data) for the aspects it \
provides like logging, tracing or metrics. HBase and Lucene index both constitute a \
                repo/storage for this platform.
- 
  [[http://www.tokenizer.org|Shopping Engine at Tokenizer]] is a web crawler; it uses \
HBase to store URLs and Outlinks (!AnchorText + LinkedURL): more than a billion. It \
was initially designed as Nutch-Hadoop extension, then (due to very specific \
'shopping' scenario) moved to SOLR + MySQL(InnoDB) (ten thousands queries per \
second), and now - to HBase. HBase is significantly faster due to: no need for huge \
transaction logs, column-oriented design exactly matches 'lazy' business logic, data \
compression, !MapReduce support. Number of mutable 'indexes' (term from RDBMS) \
significantly reduced due to the fact that each 'row::column' structure is physically \
sorted by 'row'. MySQL InnoDB engine is best DB choice for highly-concurrent updates. \
However, necessity to flash a block of data to harddrive even if we changed only few \
bytes is obvious bottleneck. HBase greatly helps: not-so-popular in modern DBMS \
'delete-insert', 'mutable primary key', and 'natural primary key' patterns become a \
big advantage with HBase.  
  [[http://traackr.com/|Traackr]] uses HBase to store and serve online influencer \
data in real-time. We use MapReduce to frequently re-score our entire data set as we \
keep updating influencer metrics on a daily basis.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic