[prev in list] [next in list] [prev in thread] [next in thread] 

List:       tortoisesvn-dev
Subject:    Re: log cache assertion
From:       Stefan.Fuhrmann () etas ! de
Date:       2007-05-24 21:30:13
Message-ID: OFCDCAFC45.CE1E3F4E-ONC12572E5.00728A38-C12572E5.00762392 () etas ! de
[Download RAW message or body]

--=_alternative 0076238CC12572E5_=
Content-Type: text/plain; charset="US-ASCII"

Hi Stefan,

> While doing some testing with the cache, I came to an assertion which I 
> don't know how to solve. You are more familiar with the log cache code, 
> so I put together a reproduction recipe for the assertion

There was actually an edge-case that I did not handle properly.
My wrong assumption that got caught by the assertion is fixed in r9548.

For the sake of documentation, the following describes the problem:

(1) The log cache stores (requested and) received data only.
    Revisions that have not shown up in some log request are
    reported as "data not available". Technically, their
    revision number is mapped to the NO_INDEX revision index.

(2) Within the log cache, paths are represented as indices
    to an entry in a/the CPathDictionary instance. To reduce
    the amount of data to store, only those paths that are
    mentioned in a change list of a cached revision are put
    into the path dictionary. 

    "Mentioned" means as path, copy-from-path or as a parent 
    of any of them. This info is sufficient to create all 
    possible logs from a complete log of the repository root.

(3) When receiving a log for e.g. /branches/1.4.x, there will
    be large "gaps" of uncovered revisions. Unless those gaps
    get filled, there would be no way to reproduce the log
    for /branches/1.4.x from the cache some times later because 
    the gaps might "anything", including changes affecting
    /branches/1.4.x. 

    Problem: we just don't "remember" that they can't.

    In fact, we *do* remember (i.e. store) that information.
    It's in the CSkipRevisionInfo instance: For every log
    we received, it stores the path along with the list of 
    "gaps" in the revision list received.

    An entry means that for this path and all its children
    there will be no log information in the respective
    revision range. Log iterators use that information to
    skip revisions that are not in cache.

    Optimization methods reduce the CSkipRevisionInfo by
    removing redundant info (parent path vs. sub-path;
    remove ranges that are covered by cached log info).

(4) Skip revison info is persistent in the log cache file.
    Therefore, only paths available in (2) can be represented.

(5) In your example, file1 got moved around quite a bit
    but never showed up in the log on those new places.
    Therefore, no cache-based path is available for it.
    Hence, we cannot add the skip info for it.

    We also can't just store it with it's parent path
    because that might hide changes in file2, for instance
    (depending of the completeness in (1)).

O.k. that's enough for today. But since I can be quite
unresponsive at times, it will help to get others on speed
soon ...

-- Stefan^2.

--=_alternative 0076238CC12572E5_=
Content-Type: text/html; charset="US-ASCII"


<br><font size=2 face="sans-serif">Hi Stefan,<br>
</font>
<br><font size=2><tt>&gt; While doing some testing with the cache, I came
to an assertion which I <br>
&gt; don't know how to solve. You are more familiar with the log cache
code, <br>
&gt; so I put together a reproduction recipe for the assertion</tt></font>
<br>
<br><font size=2><tt>There was actually an edge-case that I did not handle
properly.</tt></font>
<br><font size=2><tt>My wrong assumption that got caught by the assertion
is fixed in r9548.</tt></font>
<br>
<br><font size=2><tt>For the sake of documentation, the following describes
the problem:</tt></font>
<br>
<br><font size=2><tt>(1) The log cache stores (requested and) received
data only.</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; Revisions that have not shown up in
some log request are</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; reported as &quot;data not available&quot;.
Technically, their</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; revision number is mapped to the NO_INDEX
revision index.</tt></font>
<br>
<br><font size=2><tt>(2) Within the log cache, paths are represented as
indices</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; to an entry in a/the CPathDictionary
instance. To reduce</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; the amount of data to store, only those
paths that are</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; mentioned in a change list of a cached
revision are put</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; into the path dictionary. </tt></font>
<br>
<br><font size=2><tt>&nbsp; &nbsp; &quot;Mentioned&quot; means as path,
copy-from-path or as a parent </tt></font>
<br><font size=2><tt>&nbsp; &nbsp; of any of them. This info is sufficient
to create all </tt></font>
<br><font size=2><tt>&nbsp; &nbsp; possible logs from a complete log of
the repository root.</tt></font>
<br>
<br><font size=2><tt>(3) When receiving a log for e.g. /branches/1.4.x,
there will</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; be large &quot;gaps&quot; of uncovered
revisions. Unless those gaps</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; get filled, there would be no way to
reproduce the log</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; for /branches/1.4.x from the cache some
times later because </tt></font>
<br><font size=2><tt>&nbsp; &nbsp; the gaps might &quot;anything&quot;,
including changes affecting</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; /branches/1.4.x. </tt></font>
<br>
<br><font size=2><tt>&nbsp; &nbsp; Problem: we just don't &quot;remember&quot;
that they can't.</tt></font>
<br>
<br><font size=2><tt>&nbsp; &nbsp; In fact, we *do* remember (i.e. store)
that information.</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; It's in the CSkipRevisionInfo instance:
For every log</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; we received, it stores the path along
with the list of </tt></font>
<br><font size=2><tt>&nbsp; &nbsp; &quot;gaps&quot; in the revision list
received.</tt></font>
<br>
<br><font size=2><tt>&nbsp; &nbsp; An entry means that for this path and
all its children</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; there will be no log information in
the respective</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; revision range. Log iterators use that
information to</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; skip revisions that are not in cache.</tt></font>
<br>
<br><font size=2><tt>&nbsp; &nbsp; Optimization methods reduce the CSkipRevisionInfo
by</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; removing redundant info (parent path
vs. sub-path;</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; remove ranges that are covered by cached
log info).</tt></font>
<br>
<br><font size=2><tt>(4) Skip revison info is persistent in the log cache
file.</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; Therefore, only paths available in (2)
can be represented.</tt></font>
<br>
<br><font size=2><tt>(5) In your example, file1 got moved around quite
a bit</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; but never showed up in the log on those
new places.</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; Therefore, no cache-based path is available
for it.</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; Hence, we cannot add the skip info for
it.</tt></font>
<br>
<br><font size=2><tt>&nbsp; &nbsp; We also can't just store it with it's
parent path</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; because that might hide changes in file2,
for instance</tt></font>
<br><font size=2><tt>&nbsp; &nbsp; (depending of the completeness in (1)).</tt></font>
<br>
<br><font size=2><tt>O.k. that's enough for today. But since I can be quite</tt></font>
<br><font size=2><tt>unresponsive at times, it will help to get others
on speed</tt></font>
<br><font size=2><tt>soon ...</tt></font>
<br>
<br><font size=2><tt>-- Stefan^2.</tt></font>
<br>
--=_alternative 0076238CC12572E5_=--

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic