[prev in list] [next in list] [prev in thread] [next in thread] 

List:       wikitech-l
Subject:    Re: [Wikitech-l] Recent changes, notifications & pageprops
From:       Stas Malyshev <smalyshev () wikimedia ! org>
Date:       2016-09-23 20:09:37
Message-ID: 4253d936-7b1d-1e43-1a11-e56a2227a2a3 () wikimedia ! org
[Download RAW message or body]

Hi!

> You can seek back on EventBus events, but not permanently (by default, only
> up to 1 week).  If you want to respond to changes in an event stream, you

1 week is not enough for this use case, but if it could be extended to,
say, 1 month, that could be workable.

The reason is that the starting point for the WDQS server install is
wikidata dump, which is made weekly. Then the server is catching up to
the data that changed from the dump point until the current moment.
However, there could be dump failures or other conditions which may make
most recent dump unusable. It also takes to load the dump itself. So the
delta between current moment and data in freshly deployed WDQS server
could be 2 weeks or even more. We need to be able to catch up to the
changes since then. We probably will never need the full month, but it's
a conservative limit we're using now for how far back we can ask for
data. 2 weeks would probably work too even if it could mean some
scenarios become more complicated to handle.

> should consume the full event stream realtime and react to the events as
> they come in.  A proper Stream Processing system (like Flink or Spark

This is not possible for the WDQS Updater. Since WDQS server is
completely independent of Wikidata, it can be started and stopped at
anytime. There's no way to ensure that at every moment something is
changed in Wikidata all WDQS instances that are interested in this
change are up and running. There needs to be an intermediary system that
keeps the data. So far recent changes API served as this system, but
since it does not know about secondary data, it's no longer enough.

> this stream will be relatively small, and you don't need fancy features
> like time based windowing.  You just need to update something based on an
> event, right?

Well, I need something based on an even that I can ask something like:
"give me all events that happened since time point T". For T being, say,
from a second ago to 2 weeks ago.

> The change-propagation service that the Services team is building can help
> you with this.  It allows you to consume events, and specify matching rules
> and actions to take based on those rules.
> 
> https://www.mediawiki.org/wiki/Change_propagation

I see no mention of ability to consume past events. Is it possible?

-- 
Stas Malyshev
smalyshev@wikimedia.org

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic