[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-list
Subject:    Re: collect data using threads
From:       Kent Johnson <kent37 () tds ! net>
Date:       2005-06-14 18:11:09
Message-ID: 42af1d4f$1_1 () newspeer2 ! tds ! net
[Download RAW message or body]

Qiangning Hong wrote:
> I actually had considered Queue and pop() before I wrote the above code.
> However, because there is a lot of data to get every time I call
> get_data(), I want a more CPU friendly way to avoid the while-loop and
> empty checking, and then the above code comes out.  But I am not very
> sure whether it will cause serious problem or not, so I ask here.  If
> anyone can prove it is correct, I'll use it in my program, else I'll go
> back to the Queue solution.

OK, here is a real failure mode. Here is the code and the disassembly:
 >>> class Collector(object):
 ...     def __init__(self):
 ...         self.data = []
 ...     def on_received(self, a_piece_of_data):
 ...         """This callback is executed in work bee threads!"""
 ...         self.data.append(a_piece_of_data)
 ...     def get_data(self):
 ...         x = self.data
 ...         self.data = []
 ...         return x
 ...
 >>> import dis
 >>> dis.dis(Collector.on_received)
  6           0 LOAD_FAST                0 (self)
              3 LOAD_ATTR                1 (data)
              6 LOAD_ATTR                2 (append)
              9 LOAD_FAST                1 (a_piece_of_data)
             12 CALL_FUNCTION            1
             15 POP_TOP
             16 LOAD_CONST               1 (None)
             19 RETURN_VALUE
 >>> dis.dis(Collector.get_data)
  8           0 LOAD_FAST                0 (self)
              3 LOAD_ATTR                1 (data)
              6 STORE_FAST               1 (x)

  9           9 BUILD_LIST               0
             12 LOAD_FAST                0 (self)
             15 STORE_ATTR               1 (data)

 10          18 LOAD_FAST                1 (x)
             21 RETURN_VALUE

Imagine the thread calling on_received() gets as far as LOAD_ATTR (data), LOAD_ATTR \
(append) or LOAD_FAST (a_piece_of_data), so it has a reference to self.data; then it \
blocks and the get_data() thread runs. The get_data() thread could call get_data() \
and *finish processing the returned list* before the on_received() thread runs again \
and actually appends to the list. The appended value will never be processed.

If you want to avoid the overhead of a Queue.get() for each data element you could \
just put your own mutex into on_received() and get_data().

Kent
-- 
http://mail.python.org/mailman/listinfo/python-list


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic