[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-kimageshop
Subject:    Re: Tile data management changed in Krita
From:       Fazekas_László <mneko () freemail ! hu>
Date:       2016-03-22 19:15:21
Message-ID: 56F199C9.1030708 () freemail ! hu
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


I made a testbed to check these and really hardly trying to crash Krita :)
So far I have to say you're right, it not happens. At least I found some 
other bugs to solve.

Fazek

2016-03-22 09:31 keltezéssel, Dmitry Kazakov írta:
> On Tue, Mar 22, 2016 at 10:41 AM, Fazekas László <mneko@freemail.hu> wrote:
>
>> Hello Dmitry,
>>
>> The part I have problem here is the deletion of the tile inside the
>> deref() call. See the following case:
>>
>> 1. There is a tile with a single owner, the owner thread protected it with
>> a ref() call.
>> The reference counter is 1.
>>
>> 2. A second thread gets a pointer to the tile to use it. Right after that,
>> the task scheduler pauses this thread.
>>
> Here we need to consider what "single owned" and "other thread" are.
>
> Single owners that release the tile data can be:
>
> A1) KisTile::lockForWrite()
> A2) ~KisTile
> A3) ~KisMementoItem()
>
> "Other thread" can be:
>
> B1) KisTile::lockForWrite()
> B2) KisTile::lockForRead()
> B3) KisTile::init()
> B4) KisTileDataPooler
> B5) KisTileDataSwapper
> B6) KisMementoItem::tile()
>
> Cases B4 and B5 are not affected, because they hold a KisTileDataStore
> lock. No tile data can be deleted with that.
> Case B6 is also not relevant, because m_tileData is already owned by the
> memento item.
>
> Combination of A1 and (B1 or B2) is guarded by
> KisTile::safeReleaseOldTileData(). No tile data will be released before
> *all* the locks are unlocked.
>
> B3 is usually called either within a source tile lock held or for a default
> tile data, which is stored in KisTiledDataManager
>
> Combining A2 and A3 with anything from the second list --- I cannot imagine
> how it is possible. All tile read/write accesses should be done with a lock
> held, so the destructor of a tile can not be called while something else is
> still accessing its data.
>
>
>
>> I agree this is very unlikely but I think it is theoretically possible.
>>
> As far as I remember the higher level structures guard us from this scenario
>
>
>> If there are thousands of ref() and deref() calls per second it can
>> happen.  The tile data object is on the heap so perhaps it is accessible
>> after its destruction without a segfault. So even if this happens, perhaps
>> it remains hidden. The null data pointer means the tile was swapped out so
>> perhaps the program can't notice if it's using a deleted tile. I don't
>> really know what happens on the heap if you delete an object twice.
>>
> If you delete the object twice, there will be a crash with 99% probability.
> We use a boost::pool for allocating tile data objects. I guess it has the
> same guards from it.
>
>
>> To be sure about it, I think the following solution may helps: let the
>> initial value of the reference counter be 1 (I think it already becomes one
>> right after its creation). Then if the ref() call encounters a zero
>> reference counter it means the tile was deleted so it can drop an error
>> message. It is not a real overhead since it only checks the reply value of
>> the Qt ref() call. Then we can see if this bug happens or not.
>>
> You can check it by adding Q_ASSERT into  KisTileData::deref() like this:
>
> inline bool KisTileData::deref() {
>      bool _ref;
>
>      if (!(_ref = m_refCount.deref())) {
>          m_store->freeTileData(this);
>          return 0;
>      }
>
>      Q_ASSERT(_ref > 0);
>
>      return _ref;
> }
>
> But this code is for testing only, not to be pushed into master.
>
>
>
>> Fazek
>>
>>
>>
>> 2016-03-22 07:54 keltezéssel, Dmitry Kazakov írta:
>>
>> Hi, Fazekas!
>>
>>
>> I know it's dangerous. In my opinion Krita is still very unstable so I can
>>
>> totally imagine this bug happens sometimes. There are just too many crashes.
>>
>>
>> For the ref() and deref() calls, I still think it's not a safe way to
>> allocate and delete things. Because there is an unprotected time gap
>> between accessing and protecting the tile. You can build extra protection
>> around this, but multi threading is very tricky sometimes. I know these
>> calls must be highly regulated by the lockings, but I'm also sure the
>> deref() call is not protected - it cant be - by the lock I tried to avoid
>> in my second commit.
>>
>>
>> Well, this code is here for almost five years now and there was not a
>> single backtrace pointing to this place. Could you please provide *at least
>> theoretical* sequence of multithreaded calls that could cause a problem
>> here? Most of the acquire() and ref() calls happen when at least one object
>> (mostly (*this) of a KisTile) still owns at least one ref+userCount to the
>> KisTileData. Or (*this) of KisMementoItem holds the ref counter. I know
>> that this is more a C-way of coding, but take care, that code is called
>> million times per second when working with 10k images. So calling ref(),
>> deref() on every function call is not a go. Memory barrier is not a cheap
>> operation, so we should avoid it if possible.
>>
>> I made some tests to see how the different threads are calling these
>>
>> functions and it happens very often in an asynchronous way.
>>
>>
>> Yes, this part is called from multiple threads all the time.
>>
>>
>>
>>
>> Perhaps I can make a test to force this bug for you, but it's not easy,
>> since it would happen only if the threads are synchronized exactly right.
>> And my brain hurts if I'm trying to imagine what happening in the different
>> threads simultanously. So for now, I used the fact that a solution without
>> time gaps is surely better.
>>
>>
>> I cannot understand what "gaps" you mean. Please elaborate a bit, then we
>> will be able to find a solution if needed. Doing a test might be a bit
>> difficult, but if you decide to do it, you can use
>> KisTiledDataManagerTest::stressTest() for a base.
>>
>>
>>
>>
>> I noticed the test directory, but I don't know how to include the
>> unittests for building so I left the test part untouched. But I'd like to
>> make the test codes too, even for my csv plugin so how can I build the
>> unittests actually?
>>
>>
>> You should pass -DBUILD_TESTING=ON to a cmake command.
>>
>>
>>
>>
>>
>> _______________________________________________
>> Krita mailing listkimageshop@kde.orghttps://mail.kde.org/mailman/listinfo/kimageshop
>>
>>
>>
>> _______________________________________________
>> Krita mailing list
>> kimageshop@kde.org
>> https://mail.kde.org/mailman/listinfo/kimageshop
>>
>>
>
>
>
> _______________________________________________
> Krita mailing list
> kimageshop@kde.org
> https://mail.kde.org/mailman/listinfo/kimageshop


[Attachment #5 (text/html)]

<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">I made a testbed to check these and
      really hardly trying to crash Krita :)<br>
      So far I have to say you're right, it not happens. At least I
      found some other bugs to solve.<br>
      <br>
      Fazek<br>
      <br>
      2016-03-22 09:31 keltezéssel, Dmitry Kazakov írta:<br>
    </div>
    <blockquote
cite="mid:CAEkBSfU9md3uAjC4Es3HSHdNzOn9jXWQn6JH-BwEsmp9P6Gyiw@mail.gmail.com"
      type="cite">
      <pre wrap="">On Tue, Mar 22, 2016 at 10:41 AM, Fazekas László <a \
class="moz-txt-link-rfc2396E" \
href="mailto:mneko@freemail.hu">&lt;mneko@freemail.hu&gt;</a> wrote:

</pre>
      <blockquote type="cite">
        <pre wrap="">Hello Dmitry,

The part I have problem here is the deletion of the tile inside the
deref() call. See the following case:

1. There is a tile with a single owner, the owner thread protected it with
a ref() call.
The reference counter is 1.

2. A second thread gets a pointer to the tile to use it. Right after that,
the task scheduler pauses this thread.

</pre>
      </blockquote>
      <pre wrap="">
Here we need to consider what "single owned" and "other thread" are.

Single owners that release the tile data can be:

A1) KisTile::lockForWrite()
A2) ~KisTile
A3) ~KisMementoItem()

"Other thread" can be:

B1) KisTile::lockForWrite()
B2) KisTile::lockForRead()
B3) KisTile::init()
B4) KisTileDataPooler
B5) KisTileDataSwapper
B6) KisMementoItem::tile()

Cases B4 and B5 are not affected, because they hold a KisTileDataStore
lock. No tile data can be deleted with that.
Case B6 is also not relevant, because m_tileData is already owned by the
memento item.

Combination of A1 and (B1 or B2) is guarded by
KisTile::safeReleaseOldTileData(). No tile data will be released before
*all* the locks are unlocked.

B3 is usually called either within a source tile lock held or for a default
tile data, which is stored in KisTiledDataManager

Combining A2 and A3 with anything from the second list --- I cannot imagine
how it is possible. All tile read/write accesses should be done with a lock
held, so the destructor of a tile can not be called while something else is
still accessing its data.



</pre>
      <blockquote type="cite">
        <pre wrap="">I agree this is very unlikely but I think it is theoretically \
possible.

</pre>
      </blockquote>
      <pre wrap="">
As far as I remember the higher level structures guard us from this scenario


</pre>
      <blockquote type="cite">
        <pre wrap="">If there are thousands of ref() and deref() calls per second it \
can happen.  The tile data object is on the heap so perhaps it is accessible
after its destruction without a segfault. So even if this happens, perhaps
it remains hidden. The null data pointer means the tile was swapped out so
perhaps the program can't notice if it's using a deleted tile. I don't
really know what happens on the heap if you delete an object twice.

</pre>
      </blockquote>
      <pre wrap="">
If you delete the object twice, there will be a crash with 99% probability.
We use a boost::pool for allocating tile data objects. I guess it has the
same guards from it.


</pre>
      <blockquote type="cite">
        <pre wrap="">To be sure about it, I think the following solution may helps: \
let the initial value of the reference counter be 1 (I think it already becomes one
right after its creation). Then if the ref() call encounters a zero
reference counter it means the tile was deleted so it can drop an error
message. It is not a real overhead since it only checks the reply value of
the Qt ref() call. Then we can see if this bug happens or not.

</pre>
      </blockquote>
      <pre wrap="">
You can check it by adding Q_ASSERT into  KisTileData::deref() like this:

inline bool KisTileData::deref() {
    bool _ref;

    if (!(_ref = m_refCount.deref())) {
        m_store-&gt;freeTileData(this);
        return 0;
    }

    Q_ASSERT(_ref &gt; 0);

    return _ref;
}

But this code is for testing only, not to be pushed into master.



</pre>
      <blockquote type="cite">
        <pre wrap="">
Fazek



2016-03-22 07:54 keltezéssel, Dmitry Kazakov írta:

Hi, Fazekas!


I know it's dangerous. In my opinion Krita is still very unstable so I can

totally imagine this bug happens sometimes. There are just too many crashes.


For the ref() and deref() calls, I still think it's not a safe way to
allocate and delete things. Because there is an unprotected time gap
between accessing and protecting the tile. You can build extra protection
around this, but multi threading is very tricky sometimes. I know these
calls must be highly regulated by the lockings, but I'm also sure the
deref() call is not protected - it cant be - by the lock I tried to avoid
in my second commit.


Well, this code is here for almost five years now and there was not a
single backtrace pointing to this place. Could you please provide *at least
theoretical* sequence of multithreaded calls that could cause a problem
here? Most of the acquire() and ref() calls happen when at least one object
(mostly (*this) of a KisTile) still owns at least one ref+userCount to the
KisTileData. Or (*this) of KisMementoItem holds the ref counter. I know
that this is more a C-way of coding, but take care, that code is called
million times per second when working with 10k images. So calling ref(),
deref() on every function call is not a go. Memory barrier is not a cheap
operation, so we should avoid it if possible.

I made some tests to see how the different threads are calling these

functions and it happens very often in an asynchronous way.


Yes, this part is called from multiple threads all the time.




Perhaps I can make a test to force this bug for you, but it's not easy,
since it would happen only if the threads are synchronized exactly right.
And my brain hurts if I'm trying to imagine what happening in the different
threads simultanously. So for now, I used the fact that a solution without
time gaps is surely better.


I cannot understand what "gaps" you mean. Please elaborate a bit, then we
will be able to find a solution if needed. Doing a test might be a bit
difficult, but if you decide to do it, you can use
KisTiledDataManagerTest::stressTest() for a base.




I noticed the test directory, but I don't know how to include the
unittests for building so I left the test part untouched. But I'd like to
make the test codes too, even for my csv plugin so how can I build the
unittests actually?


You should pass -DBUILD_TESTING=ON to a cmake command.





_______________________________________________
Krita mailing <a class="moz-txt-link-abbreviated" \
href="mailto:listkimageshop@kde.orghttps://mail.kde.org/mailman/listinfo/kimageshop">listkimageshop@kde.orghttps://mail.kde.org/mailman/listinfo/kimageshop</a>




_______________________________________________
Krita mailing list
<a class="moz-txt-link-abbreviated" \
href="mailto:kimageshop@kde.org">kimageshop@kde.org</a> <a \
class="moz-txt-link-freetext" \
href="https://mail.kde.org/mailman/listinfo/kimageshop">https://mail.kde.org/mailman/listinfo/kimageshop</a>



</pre>
      </blockquote>
      <pre wrap="">

</pre>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Krita mailing list
<a class="moz-txt-link-abbreviated" \
href="mailto:kimageshop@kde.org">kimageshop@kde.org</a> <a \
class="moz-txt-link-freetext" \
href="https://mail.kde.org/mailman/listinfo/kimageshop">https://mail.kde.org/mailman/listinfo/kimageshop</a>
 </pre>
    </blockquote>
    <br>
  </body>
</html>


[Attachment #6 (text/plain)]

_______________________________________________
Krita mailing list
kimageshop@kde.org
https://mail.kde.org/mailman/listinfo/kimageshop


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic