[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openldap-devel
Subject:    Re: (ITS#6710) Mods already refreshed on a forwarding server is lost
From:       Rein Tollevik <rein () openldap ! org>
Date:       2010-11-22 22:27:05
Message-ID: 4CEAEE39.2070406 () OpenLDAP ! org
[Download RAW message or body]

[I have switched this thread from -its to -devel]

On 11/22/10 1:31 , Howard Chu wrote:
> rein@OpenLDAP.org wrote:
>> On 11/15/10 16:39 , rein@OpenLDAP.org wrote:
>>> New syncprov consumers connecting to a forwarding server and
>>> presenting an
>>> apperently up-to-date cookie will loose any mods that have already
>>> taken place
>>> on the forwarding server if it itself is refreshing from its
>>> provider. This
>>> should not be a problem if the forwarding server have a sufficiently
>>> large
>>> syncprov log, but a fix for servers without is coming.

>> The currently committed fix for this its leaves one problem open. If a
>> forwarding server restarts in the middle of the refresh phase, after
>> making some changes but before updating the csn, new consumers
>> connecting with an apparently up to date csn set after the server comes
>> up again will not know that the context is dirty and will loose these
>> changes. The same problem arise if the server restarts between the time
>> a locally initiated delete operation is performed in the database and
>> the accompanying csn set is saved.
>>
>> A fix could be to always assume the context is dirty after start up, and
>> thereby forcing all clients to undergo the refresh phase even if they
>> are in sync until some operation that updates the csn set is performed.
>> An unnecessary refresh is probably better than loosing changes..
>
> I haven't had the opportunity to review these patches yet, but the bug
> description sounds a little flaky to me.

This ITS as well as the previous should probably have mentioned that 
they are related to refreshAndPersist replication.

> The original design is this: when a consumer requests a refresh from the
> provider, the provider uses a snapshot of its current contextCSN. All
> changes from the consumer's cookie to this snapshot (inclusive) are sent
> to the consumer. Any changes currently in progress that have not yet
> been committed will be skipped until the next refresh. Nothing is lost,
> it's simply delayed, and that's in accordance with syncrepl's "eventual
> convergence" model.

This is true for the refresh only configuration, I guess these problems 
don't show up there as they will be fixed during the next refresh.  But 
if the persist phase is entered without a proper refresh first then the 
consumer have forever lost the missing changes.  Or well, until the 
admin forces a full refresh that is.

> Likewise, the provider only updates its contextCSN when a change is
> fully committed. syncprov should NOT need to defer any consumer while it
> has outstanding mods. There is no reason that would be needed by the
> sync protocol.

This ITS does not delay the consumer, it forces a refresh even if the 
consumer appear to be in sync with the provider (from looking at the 
CSNs of the provider and consumer).

ITS#6709 introduce a delay, this is not dictated by the protocol but by 
the implementation.  syncprov_matchops() decide which consumers the 
change should be sent to before the change is actually made to the db. 
This leaves a window where a consumer may connect and apparently be in 
sync, enter its persistent phase but not receive the changes that was 
active when it connected.  Waiting for this active change to complete 
before deciding whether the consumer is in sync or not closes this window.

> Servers only update their contextCSN after an entire refresh has
> completed. If a downstream consumer connects while a forwarder is still
> refreshing, the consumer should receive nothing. (Or, it should only
> receive the changes between its cookie and the server's committed
> contextCSN, if any.)

The consumer connects and appear to be in sync with the provider, 
therefore the provider initiates the persistent phase immediately 
without any refresh first.  Without the fix in this ITS syncprov had no 
idea that there was uncommitted changes (i.e changes not accompanied by 
a CSN update) within its context, changes that requires the refresh 
phase to be performed even when an consumer that appear to be in sync 
connects.  And an restart of the provider at the inappropriate time can 
make it loose this knowledge again as I outlined in my first reply.

Yes, in refresh only mode the server sends the changes between the 
cookie the consumer presented and the servers committed csn.  But in the 
refresh phase of refreshAndPersistent mode it sends all changes newer 
than the consumers csn.  Which opens the windows where the ITS#6709 and 
ITS#6710 cases can loose modifications.

> You may very well have found bugs in the implementation, but it sounds
> to me like you've changed the overall behavior to something outside of
> the original design.

No, I don't think I have changed the design, if it was changed then it 
was the persistent mode that did it..

Rein
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic