'Re: [Fwd: Re: multi processing]'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       relax-devel
Subject:    Re: [Fwd: Re: multi processing]
From:       "Gary S. Thompson" <garyt () domain ! hid>
Date:       2006-06-20 9:57:51
Message-ID: 4497C69F.6040701 () domain ! hid
[Download RAW message or body]

Edward d'Auvergne wrote:

>For an email which was accidentally not sent to the mailing lists it
>may be better to resend the email rather than forwarding it as your
>forwarded post started a new thread in
>(https://mail.gna.org/public/relax-devel/2006-04/threads.html).  It
>may be possible to just remove the forwarding junk in the email, I'm
>not sure how the 'Message ID' tag in the headers work.
>
>  
>
>>>Whoa, that's a big supercomputer.  You are most welcome to give it a
>>>go, it should speed up your model-free runs using relax.  The changes
>>>will necessarily be extensive and will cause breakages while
>>>development occurs, so Gary if you decide to go forwards with it, I
>>>will probably fork relax and create an unstable development branch
>>>called 1.3 where all new developments will go.  It might even be a
>>>good idea to create a private branch for your changes from 1.3.  I
>>>will then reserve 1.2 for bug fixes only.
>>>
>>>
>>>      
>>>
>>Yep that seems like a good idea, however, read on;-)
>>
>>    
>>
>>>I've always planned on adding support for clusters and I have a basic
>>>framework in place which might be a good platform to start from.  The
>>>other idea I've had in the back of my mind is the conversion of the
>>>all the model-free function code in the directory 'maths_fns' to C
>>>(while still retaining the Python code as an option),
>>>
>>>      
>>>
>>This seems reasonable, when I do a wc | sort -nr on maths_fns I get
>>
>>
>> 12149  48347 493475 total
>>  3857  20572 174665 jw_mf.py
>>  2966  10359 153396 mf.py
>>  1314   3520  39824 ri_comps.py
>>   924   2434  22280 correlation_time.py
>>   836   2937  23114 weights.py
>>   732   2476  24964 jw_mf_comps.py
>>   599   2748  24435 direction_cosine.py
>>   470   1269  12150 ri_prime.py
>>   175    700   6129 ri.py
>>   109    519   4614 chi2.py
>>   106    448   4185 jw_mapping.py
>>    33    177   1922 __init__.py
>>    28    188   1797 test.c_chi.py
>>
>>and I guess mf.py would be the one to hit first... The  questions are
>>    
>>
>
>The translation to C was just a suggestion as, computationally wise,
>the change would be a significant improvement.  It would decrease the
>computation time on each node of the cluster however it is a lot of
>work and is inessential for clustering.  Please don't fell obliged to
>even start this mammoth task.
>  
>

yep I think I will skip this at the moment...

>  
>
>>1. do we need to do all of it or could we just wrap the maths intensive
>>parts and leave the object creation and management in python
>>    
>>
>
>If the 'profile_flag' at the end of the 'relax' file in the base
>directory is changed to 1, you can see the relative computational
>requirements of the various bits of code.  To obtain the full benefits
>of C, it would all need to be translated.
>
>  
>
>>2. Is there a low level test suite so conformity of python and C code
>>can be verified
>>    
>>
>
>The test suite is very primitive and basic at the moment.  A large
>number of tests would need to be added to cover all parameter
>combinations.  These would need to cover all four types of model-free
>minimisation:  the model-free parameters for one residue, the
>model-free parameters together with a local tm parameter, the
>diffusion parameters for all residues, and all parameters
>simultaneously.
>
>  
>
>>3. would it be better to do it in pytrex rather than straight C? I guess
>>the thing to do would be to test it out and see what the quality of the
>>C code is like
>>    
>>
>
>I would prefer to stay with proper C using the standard Python/C API. 
>I've played with
>Pyrex (pytrex is XML I think), Swig, and a few other interfaces but I
>don't believe that these will give the full speed ups of the raw
>interface.  The number crunching is very low level and using these
>high level interfaces is an overkill.
>  
>

sorry, my typo, I mean't pytrex! Pyrex is really rather different from 
the others as it is not an interface but a reimplementation of a large 
subset of python to produce c source code not byte code with some 
extensions which allow direct access to C structures.  To quote from the 
author* 'Pyrex is Python with C data types' *(his emphasis)*
*

>  
>
>>>which may give
>>>potential gains of 10 to 20 times increased performance.  This code is
>>>by far the most CPU intensive, the minimisation code isn't anywhere
>>>near as expensive.
>>>
>>>
>>>      
>>>
>>yep seems logical, the only question is have you profiled? Chris was
>>trying to do some before the break and there didn't seem to be any
>>really hot spots.. but I maybe misreading the rumour mill (He is of
>>course a gargantuan 5 feet way much of the time ;-) Chris any comments?
>>    
>>
>
>The profile flag at the bottom of the file 'relax' will do it. 
>Although a line by line translation will almost produce functional
>code (when mixed with the concepts in the relaxation curve-fitting C
>code together with the creation of a large struct called 'data'), it
>is still a huge effort so only play with it if you really want to.
>
>  
>
>>>The framework currently in place is the threading code.  The way the
>>>threading code works is through SSH tunnels.  It starts a new instance
>>>of relax on the remote machine (or local if there are a number of CPUs
>>>or CPU cores), that instance gets data sent to it, does the
>>>calculation, and returns the result.  It does work, although it's not
>>>very good at catching failures.  I haven't used it lately so I don't
>>>know if it's broken.
>>>
>>>
>>>      
>>>
>>Thats generally the idea I had, i.e. a fairly course grained approach.
>>My thought was to add constructs to the top level commands (if needed)
>>to allow allow subsets of a set of calculations to be run from a script.
>>i.e. part of a grid search or a few monte carlo runs or a subset of
>>minimisations for a set of residues. Then the real script would generate
>>the required subscripts plus embedded data on the fly. I think this
>>provides a considerable degree of flexibility. Thus for instance our
>>cluster which runs grid engine needs a master script to start all the
>>sub processes rather than a set of separate password less ssh logons
>>which a cluster of workstations would require. In general I thought that
>>catching failures other than a failure to start is not required...
>>    
>>
>
>Is your idea similar to having the runs themselves threaded so instead
>of looping over them you run them simultaneously?  I don't know too
>much about clustering.  What is the interface by which data and
>instructions are sent and returned from the nodes?  And do you know if
>there are python wrappings?
>  
>

so the idea is to take the low hanging fruit for the moment and only 
parallelise the things that will naturally run for the same amounts of time

e.g. divide up sets of monte carlo simulations into parts, run 
minimisations on subsets of residues that share the same model and 
tensor frame etc

as to how to send data, scripts and results: I would write an interface 
class and then allow differnt instances of the class to deal with 
communication differently to support different transport mehtods e.g. 
ssh logins vs mpi sessions (or something which hasn't been invented yet)

transfer of data will use cpickles in my case with an mpi backend to 
keep compute nodes available to prevent queing problems (you don't want 
to resubmit to the batch queue each time you calculate a subpart of the 
problem....)


>  
>
>>>SSH tunnels is probably not the best option for your system.  Do you
>>>know anything about MPI?
>>>
>>>      
>>>
>>I have read about MPI but have not implimented anything __YET__;-). Also
>>I have compiled some MPI based programs. It seems to a bit of a pig and
>>I don't think the low hanging fruit necessarily require that degree of
>>fine grained distribution...
>>    
>>
>
>I haven't used MPI either.  There may be much better protocols
>implemented for Python.
>  
>

actually after looking at the problem in our local implementation we 
will need mpi and I have the mpi from  from scientific working on my 
computer.   However,  as alluded to above mpi will only be a  dependancy 
for  a particular transport methods not the overall scheme

>  
>
>>>There are a number of options available for
>>>distributed calculations, but it will need to have a clean and stable
>>>Python interface.
>>>
>>>      
>>>
>>obviously a stable interface with as little change to the current top
>>level functions and as little suprise as possible is to be desired. I
>>thought it might be a good idea  to have some form of facade, so  that
>>the various forms of coarse grained multi processing looks the same,
>>whichever one you are using. The idea would be only to have the setup
>>and dispach code different.
>>
>>    
>>
>
>It would probably be best to use some existing standard protocol
>rather than inventing a relax specific system.
>  
>

I think the interface of scripts plus data provides all you need,  the  
actual methodology  in the   transport method can be private...

so for example:

1. create a clustre with a transport layer

top level script:

init_parallel()                                                         
                                                   # override relax 
commands as needed
cluster= create_cluster(name='test')                                    
                                     # the cluster to use you can have 
more than one...
mpi-transport=create_transport(name='name',method='mpi-local',....)      
              # a transport layer all extra keyword arguments are for 
configurateion
processor-set=create_processor(transport=mpi-transport,nprocessors=30,...)   
     # a particular set of processors using a partuicular transport 
method, with a particular weight
cluster_add_processor(processor-set, weight=1.0)                        
                        # add it to the pool of available processors

normal relax setup ...

minimise('newton',run=name,cluster=cluster)                            
                           # one extra argument

2. internally

class transport(object):                                            # 
just knows howto setup a connection to a bunch ot prosessors and 
communicate with them

  def __init__(self):                                                
    pass

  def start(self,nprocessors,**kw):                           # setup 
for calculation, returns processor-set for this particuar connection
     
pass                                                                       
# kw arguments from create_processor
 
 def shutdown(self,aprocessor-set):                                    
         # end all calculations and shutdown
    pass

  def setupData(self,processor-set,data,nodes=None):                # 
send setup data, in my case I would pickle it to an in memory file and 
then put it in a
                                                                                            
# numpy byte array for transport over numerics mpi layer,  if node is 
None send it to everyone
     pass
 
  def calculate(self,processor-set,node,script,callback, tag):         
                               # run the script on the node x and call 
completion callback with tag when complete
    pass

  def getData(self,processor-set,node=None):
    pass

  def status(self,processor-set,node=None):                              
                        # test for status of  a particular calculation
    pass

  def 
cancel(self,processor-set,node=None):                                                      
# give up calculation on a particular node
     pass


class cluster(object):
 
   def __init__(self):                                                
     pass
 
  def start(self):
      pass

  def getDivisions(self,nproblems):       # get a list of of size for  
'divisions' of  the problems to send to each element of each processor 
set based on weights and  number of processors
     pass

  def shutdown(self):
      pass
 

  def setupData(self,data):                             # send setup data
      pass

 
  def calculate(self,division,scripts):                           # run 
the script on all nodes
    pass


  def getData(self,division)                                          # 
get results
     pass
 


.... anyway i think the idea is fairly clear

>  
>
>>>Which ever system is decided upon, threading inside
>>>the program will probably be necessary so that each thread can be sent
>>>to a different machine.  This requires calculations which can be
>>>parallelised.  As minimisation is an iterative process with each
>>>iteration requiring the results of the previous, and as it's not the
>>>most CPU intensive part anyway, I can't see too many gains in
>>>modifying that code.
>>>
>>>      
>>>
>>Agreed
>>
>>    
>>
>>>I've already parallelised the Monte Carlo
>>>simulations for the threading code as those calculations are the most
>>>obvious target.
>>>
>>>      
>>>
>>They are a time hog
>>    
>>
>
>Grid searching model m8 {S2, tf, S2f, ts, Rex} probably beats the
>total of the MC sims (unless the data is dodgy).
>
>  
>
>>>But all residue specific calculations could be
>>>parellelised as well.  This is probably where you can get the best
>>>speed ups.
>>>
>>>      
>>>
>>Yes that and grid searches seem obvious candidates
>>
>>    
>>
>
>I was thinking more along the lines of splitting the residues rather
>than the grid search increments.  These increments could be threaded
>however the approach would need to be conservative.  I'm planning on
>eventually splitting out the minimisation code as a separate project
>on Gna! as a Python optimisation library.  The optimisers in Scipy are
>useless!
>  
>

I think whichever divisons are equal and fit the best are what is 
required, though residues would be the obvious first candidate followed 
by grid steps

>  
>
>>>I have a few more comments below.
>>>
>>>On 4/13/06, Gary S. Thompson <garyt@domain.hid> wrote:
>>>
>>>
>>>      
>>>
>>>>Dear Ed
>>>>   I was we have a 148 processor beowolf cluster ;-) I was thinking of
>>>>having a go at developing a distributed version of relax... are you ok
>>>>with that or do you have plans of your own?
>>>>
>>>>The general idea was to have scripts look almost as they are but
>>>>
>>>>1. have  a command to register multi processor handlers
>>>>
>>>>
>>>>        
>>>>
>>>The user function class 'threading' is probably close to what you want.
>>>
>>>
>>>      
>>>
>>I shall have a look at it
>>
>>    
>>
>
>Actually it's called 'thread'.
>
>  
>
>>>      
>>>
>>>>2. have a command to add machines and parameters to the multi processor pool
>>>>
>>>>
>>>>        
>>>>
>>>threading.add() is probably a good template.
>>>
>>>
>>>
>>>
>>>      
>>>
>>again I shall have a read
>>
>>    
>>
>
>I got the wrong name again.  It's 'threading.read', 'threading.add'
>hasn't been written yet!
>
>  
>
>>>>3. add code to the generic functions/or replace the generic funcntions
>>>>if the multiprocessing is setup to batch up components of calculations
>>>>and pass them out to the compute  servers
>>>>
>>>>
>>>>        
>>>>
>>>'generic/minimise.py' is the best bet.  Otherwise there is
>>>'maths_fns/mf.py' which can be hacked.
>>>
>>>
>>>      
>>>
>>more reading ;-)
>>
>>    
>>
>>>      
>>>
>>>>4. add  code to multiplex the results back together again
>>>>
>>>>
>>>>        
>>>>
>>>That should be pretty straight forward.
>>>
>>>
>>>
>>>      
>>>
>>>>obviously this would just be a prototype at first but it could be rather
>>>>useful
>>>>        
>>>>
>
>The use of published standards and low level protocols would be best
>to keep the calculations bug free and fast.  For debugging, it might
>be worth considering adding threading tests to the test suite.
>
>Edward
>
>.
>
>  
>


anyway i intend to branch now to a provate branch;-)

regards
gary

-- 
-------------------------------------------------------------------
Dr Gary Thompson
Astbury Centre for Structural Molecular Biology,
University of Leeds, Astbury Building,
Leeds, LS2 9JT, West-Yorkshire, UK             Tel. +44-113-3433024
email: garyt@domain.hid                   Fax  +44-113-2331407
-------------------------------------------------------------------




[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic