[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-list
Subject:    Re: collect data using threads
From:       Jeremy Jones <zanesdad () bellsouth ! net>
Date:       2005-06-14 16:15:57
Message-ID: 42AF02BD.9080809 () bellsouth ! net
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Kent Johnson wrote:

> Peter Hansen wrote:
> 
> 
> > Qiangning Hong wrote:
> > 
> > 
> > 
> > > A class Collector, it spawns several threads to read from serial port.
> > > Collector.get_data() will get all the data they have read since last
> > > call.  Who can tell me whether my implementation correct?
> > > 
> > > 
> > [snip sample with a list]
> > 
> > 
> > 
> > > I am not very sure about the get_data() method.  Will it cause data lose
> > > if there is a thread is appending data to self.data at the same time?
> > > 
> > > 
> > That will not work, and you will get data loss, as Jeremy points out.
> > 
> > Normally Python lists are safe, but your key problem (in this code) is 
> > that you are rebinding self.data to a new list!  If another thread calls 
> > on_received() just after the line "x = self.data" executes, then the new 
> > data will never be seen.
> > 
> > 
> 
> Can you explain why not? self.data is still bound to the same list as x. At least \
> if the execution sequence is  x = self.data
> self.data.append(a_piece_of_data)
> self.data = []
> 
> ISTM it should work.
> 
> I'm not arguing in favor of the original code, I'm just trying to understand your \
> specific failure mode. 
> Thanks,
> Kent
> 
> 
Here's the original code:

class Collector(object):
    def __init__(self):
        self.data = []
        spawn_work_bees(callback=self.on_received)

    def on_received(self, a_piece_of_data):
        """This callback is executed in work bee threads!"""
        self.data.append(a_piece_of_data)

    def get_data(self):
        x = self.data
        self.data = []
        return x

The more I look at this, the more I'm not sure whether data loss will 
occur.  For me, that's good enough reason to rewrite this code.  I'd 
rather be clear and certain than clever anyday. 

So, let's say you a thread T1 which starts in ``get_data()`` and makes 
it as far as ``x = self.data``.  Then another thread T2 comes along in 
``on_received()`` and gets as far as 
``self.data.append(a_piece_of_data)``.  ``x`` in T1's get_data()`` (as 
you pointed out) is still pointing to the list that T2 just appended to 
and T1 will return that list.  But what happens if you get multiple guys 
in ``get_data()`` and multiple guys in ``on_received()``?  I can't prove 
it, but it seems like you're going to have an uncertain outcome.  If 
you're just dealing with 2 threads, I can't see how that would be 
unsafe.  Maybe someone could come up with a use case that would disprove 
that.  But if you've got, say, 4 threads, 2 in each method....that's 
gonna get messy. 

And, honestly, I'm trying *really* hard to come up with a scenario that 
would lose data and I can't.  Maybe someone like Peter or Aahz or some 
little 13 year old in Topeka who's smarter than me can come up with 
something.  But I do know this - the more I think about this as to 
whether this is unsafe or not is making my head hurt.  If you have a 
piece of code that you have to spend that much time on trying to figure 
out if it is threadsafe or not, why would you leave it as is?  Maybe the 
rest of you are more confident in your thinking and programming skills 
than I am, but I would quickly slap a Queue in there.  If for nothing 
else than to rest from simulating in my head 1, 2, 3, 5, 10 threads in 
the ``get_data()`` method while various threads are in the 
``on_received()`` method.  Aaaagghhh.....need....motrin......


Jeremy Jones


[Attachment #5 (text/html)]

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
  <title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Kent Johnson wrote:
<blockquote cite="mid42aeeef3$1_1@newspeer2.tds.net" type="cite">
  <pre wrap="">Peter Hansen wrote:
  </pre>
  <blockquote type="cite">
    <pre wrap="">Qiangning Hong wrote:

    </pre>
    <blockquote type="cite">
      <pre wrap="">A class Collector, it spawns several threads to read from serial \
port. Collector.get_data() will get all the data they have read since last
call.  Who can tell me whether my implementation correct?
      </pre>
    </blockquote>
    <pre wrap="">[snip sample with a list]

    </pre>
    <blockquote type="cite">
      <pre wrap="">I am not very sure about the get_data() method.  Will it cause \
data lose if there is a thread is appending data to self.data at the same time?
      </pre>
    </blockquote>
    <pre wrap="">
That will not work, and you will get data loss, as Jeremy points out.

Normally Python lists are safe, but your key problem (in this code) is 
that you are rebinding self.data to a new list!  If another thread calls 
on_received() just after the line "x = self.data" executes, then the new 
data will never be seen.
    </pre>
  </blockquote>
  <pre wrap=""><!---->
Can you explain why not? self.data is still bound to the same list as x. At least if \
the execution sequence is  x = self.data
                    self.data.append(a_piece_of_data)
self.data = []

ISTM it should work.

I'm not arguing in favor of the original code, I'm just trying to understand your \
specific failure mode.

Thanks,
Kent
  </pre>
</blockquote>
Here's the original code:<br>
<br>
<pre wrap="">class Collector(object):
    def __init__(self):
        self.data = []
        spawn_work_bees(callback=self.on_received)

    def on_received(self, a_piece_of_data):
        """This callback is executed in work bee threads!"""
        self.data.append(a_piece_of_data)

    def get_data(self):
        x = self.data
        self.data = []
        return x
</pre>
The more I look at this, the more I'm not sure whether data loss will
occur.&nbsp; For me, that's good enough reason to rewrite this code.&nbsp; I'd
rather be clear and certain than clever anyday.&nbsp; <br>
<br>
So, let's say you a thread T1 which starts in ``get_data()`` and makes
it as far as ``x = self.data``.&nbsp; Then another thread T2 comes along in
``on_received()`` and gets as far as
``self.data.append(a_piece_of_data)``.&nbsp; ``x`` in T1's get_data()`` (as
you pointed out) is still pointing to the list that T2 just appended to
and T1 will return that list.&nbsp; But what happens if you get multiple
guys in ``get_data()`` and multiple guys in ``on_received()``?&nbsp; I can't
prove it, but it seems like you're going to have an uncertain outcome.&nbsp;
If you're just dealing with 2 threads, I can't see how that would be
unsafe.&nbsp; Maybe someone could come up with a use case that would
disprove that.&nbsp; But if you've got, say, 4 threads, 2 in each
method....that's gonna get messy.&nbsp; <br>
<br>
And, honestly, I'm trying *really* hard to come up with a scenario that
would lose data and I can't.&nbsp; Maybe someone like Peter or Aahz or some
little 13 year old in Topeka who's smarter than me can come up with
something.&nbsp; But I do know this - the more I think about this as to
whether this is unsafe or not is making my head hurt.&nbsp; If you have a
piece of code that you have to spend that much time on trying to figure
out if it is threadsafe or not, why would you leave it as is?&nbsp; Maybe
the rest of you are more confident in your thinking and programming
skills than I am, but I would quickly slap a Queue in there.&nbsp; If for
nothing else than to rest from simulating in my head 1, 2, 3, 5, 10
threads in the ``get_data()`` method while various threads are in the
``on_received()`` method.&nbsp; Aaaagghhh.....need....motrin......<br>
<br>
<br>
Jeremy Jones<br>
</body>
</html>



-- 
http://mail.python.org/mailman/listinfo/python-list

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic