[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pcc-list
Subject:    Re: [Pcc] Things to add for next release
From:       "Steve Johnson" <scj () yaccman ! com>
Date:       2017-02-26 2:49:00
Message-ID: 1e2a99df22fa0a7a4660115f701ed0a050723400 () webmail ! yaccman ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


You might be interested in something called 'twig'.   Al Aho and a
couple of colleagues coded and extended the PCC2 compilation
algorithm.   The paper is published in the ACM Transactions on
Programming Languages and Systems, Vol 11, #4, October 1989.   Google
"Aho Ganapathi Code Generation" and you'll find it on line...

Also, there is an intermediate language called LLVM that is getting a
lot of play.   I haven't used it, but it looks like a pretty good
piece of work.     There are back-ends for many different machines.  
I have no idea what its performance is (I suspect it is acceptable,
but may not be really fast...).

Multiple register types is a pain.   Register pairs, however, can be
dealt with by a simple hack.     If there are n free registers, but
some operations require an even/odd pair, you can make that operation
appear to require n/2+1 registers.   This guarantees that there is
always a register pair.   It's not optimal, but has worked well in
practice (especially with the dynamic programming algorithm), and the
register pair issue is also a big problem for coloring...

Hope this helps...

Steve

----- Original Message -----
From: "Anders Magnusson" <ragge@ludd.ltu.se>
To:<scj@yaccman.com>
Cc:<pcc@lists.ludd.ltu.se>
Sent:Sat, 25 Feb 2017 17:39:54 +0100
Subject:Re: [Pcc] Things to add for next release

 Good morning,

 I have lately revisited this due to some problem I recently
encountered,
 so it may be reasonable to change some of the current logic.

 As it is right now; instruction selection and register assignment are
done
 in three distinct steps for each expression tree:

 1) Walk the tree top-down to do instruction selection ("maximal
munch").
 This is a simple way to both get reasonable code and handle the 
 multiple-register-class instruction selection.

 2) Do a bottom-up tree walk to calculate the sethi-ullman numbers to 
 find the evaluation order. The registers needed are given temporary 
 numbers.
 It is very simple and will not take into account different register 
 classes etc. Improvement needed.

 3) Do a top-down walk to find liveness information for the needed 
 registers and find moves, hard-regs use etc... in the tree.

 The result is then fed to the graph-coloring code.
 This approach has worked reasonable well, but step 1 and 2 need a
better 
 solution.
 I stumbled into a problem while having fun porting to pdp7, and
realized 
 that it affects
 other targets with few registers as well.

 Den 2014-10-16 kl. 20:11, skrev scj@yaccman.com:
 > The problem with this is the first step -- doing a blind tree walk.
 > Suppose you don't have enough registers, and will have to spill a
 > computation to a temporary and then reload it later to continue. If
you
 > knew what you were going to need to spill, then the best code comes
if you
 > compute and spill it when all the registers are available -- the
code
 > would be the most efficient. You keep doing this, making the tree
smaller
 > and simpler, until you can do the remaining tree with no spills
using the
 > available registers.
 >
 > There's a JACM paper I wrote with Al Aho that discusses this
problem, and
 > we prove that if the tree has no common subexpressions and all the
 > registers are interchangeable, you can generate optimal code in
time that
 > is linear in the size of the tree, with effectively any reasonable
 > instruction set from RISC to the VAX. If there are common
subexpressions,
 > even if the only ones are of the form leaf op leaf, then the
problem is
 > NP-Complete, even if there are an infinite number of registers!
 >
 I agree that dynamic programming as you describe it in your paper is
the 
 correct
 way of going forward. I have seen that you used it in PCC2, 
 unfortunately that code is
 not released in public :-/

 It raises a bunch of implementation questions though, especially with

 multiple register classes, like:

 - Doing bottom-up traversal may require the instruction found for
each 
 register classes to be kept around, since
 we do not know what is needed higher-up.
 - When filling in the array at each node we must calculate and keep
the 
 numbers for all register classes around.
 - Overlapping register classes will be a challenge in calculation,
and 
 overlapping may be of two kinds. Our favourite architecture x86 uses 
 both of them:
 * char registers are two in each integer register.
 * long long uses two integer registers.

 It also open for a bunch of other optimizations in register
allocation. 
 Hm, this will be interesting :-)

 -- Ragge


[Attachment #5 (text/html)]

<html><body style="font-family: Helvetica,Arial,sans-serif; font-size: 12px;">You \
might be interested in something called 'twig'.   Al Aho and a couple of colleagues \
coded and extended the PCC2 compilation algorithm.   The paper is published in the \
ACM Transactions on Programming Languages and Systems, Vol 11, #4, October 1989.   \
Google "Aho Ganapathi Code Generation" and you'll find it on line...<br><br>Also, \
there is an intermediate language called LLVM that is getting a lot of play.   I \
haven't used it, but it looks like a pretty good piece of work.     There are \
back-ends for many different machines.   I have no idea what its performance is (I \
suspect it is acceptable, but may not be really fast...).<br><br>Multiple register \
types is a pain.   Register pairs, however, can be dealt with by a simple hack.     \
If there are n free registers, but some operations require an even/odd pair, you can \
make that operation appear to require n/2+1 registers.   This guarantees that there \
is always a register pair.   It's not optimal, but has worked well in practice \
(especially with the dynamic programming algorithm), and the register pair issue is \
also a big problem for coloring...<br><br>Hope this \
helps...<br><br>Steve<br><br><br><blockquote class="atmailquote"><br>----- Original \
Message -----<br><div id="origionalMessageFromField" \
style="width:100%;display:inline;background:rgb(228,228,228);"><div \
style="display:inline;font-weight:bold;">From:</div> "Anders Magnusson" \
&lt;ragge@ludd.ltu.se&gt;</div><br><div id="origionalMessageToField" \
style="display:inline;font-weight:bold;">To:</div>&lt;scj@yaccman.com&gt;<br><div \
id="origionalMessageSentField" \
style="display:inline;font-weight:bold;">Cc:</div>&lt;pcc@lists.ludd.ltu.se&gt;<br><div \
style="display:inline;font-weight:bold;">Sent:</div>Sat, 25 Feb 2017 17:39:54 \
+0100<br><div id="origionalMessageSubjectField" \
style="display:inline;font-weight:bold;">Subject:</div>Re: [Pcc] Things to add for \
next release<br><br><br> Good morning,<br><br>
I have lately revisited this due to some problem I recently encountered,<br>
so it may be reasonable to change some of the current logic.<br><br>
As it is right now; instruction selection and register assignment are done<br>
in three distinct steps for each expression tree:<br><br>
1) Walk the tree top-down to do instruction selection ("maximal munch").<br>
   This is a simple way to both get reasonable code and handle the <br>
multiple-register-class instruction selection.<br><br>
2) Do a bottom-up tree walk to calculate the sethi-ullman numbers to <br>
find the evaluation order.  The registers needed are given temporary <br>
numbers.<br>
   It is very simple and will not take into account different register <br>
classes etc. Improvement needed.<br><br>
3) Do a top-down walk to find liveness information for the needed <br>
registers and find moves,  hard-regs use etc... in the tree.<br><br>
The result is then fed to the graph-coloring code.<br>
This approach has worked reasonable well, but step 1 and 2 need a better <br>
solution.<br>
I stumbled into a problem while having fun porting to pdp7, and realized <br>
that it affects<br>
other targets with few registers as well.<br><br>
Den 2014-10-16 kl. 20:11, skrev scj@yaccman.com:<br>
&gt; The problem with this is the first step -- doing a blind tree walk.<br>
&gt; Suppose you don't have enough registers, and will have to spill a<br>
&gt; computation to a temporary and then reload it later to continue.  If you<br>
&gt; knew what you were going to need to spill, then the best code comes if you<br>
&gt; compute and spill it when all the registers are available -- the code<br>
&gt; would be the most efficient.  You keep doing this, making the tree smaller<br>
&gt; and simpler, until you can do the remaining tree with no spills using the<br>
&gt; available registers.<br>
&gt;<br>
&gt; There's a JACM paper I wrote with Al Aho that discusses this problem, and<br>
&gt; we prove that if the tree has no common subexpressions and all the<br>
&gt; registers are interchangeable, you can generate optimal code in time that<br>
&gt; is linear in the size of the tree, with effectively any reasonable<br>
&gt; instruction set from RISC to the VAX.  If there are common subexpressions,<br>
&gt; even if the only ones are of the form leaf op leaf, then the problem is<br>
&gt; NP-Complete, even if there are an infinite number of registers!<br>
&gt;<br>
I agree that dynamic programming as you describe it in your paper is the <br>
correct<br>
way of going forward.  I have seen that you used it in PCC2, <br>
unfortunately that code is<br>
not released in public :-/<br><br>
It raises a bunch of implementation questions though, especially with <br>
multiple register classes, like:<br><br>
- Doing bottom-up traversal may require the instruction found for each <br>
register classes to be kept around, since<br>
   we do not know what is needed higher-up.<br>
- When filling in the array at each node we must calculate and keep the <br>
numbers for all register classes around.<br>
- Overlapping register classes will be a challenge in calculation, and <br>
overlapping may be of two kinds.  Our favourite architecture x86 uses <br>
both of them:<br>
     * char registers are two in each integer register.<br>
     * long long uses two integer registers.<br><br>
It also open for a bunch of other optimizations in register allocation.  <br>
Hm, this will be interesting :-)<br><br>
-- Ragge<br></blockquote></body></html>


[Attachment #6 (text/plain)]

_______________________________________________
Pcc mailing list
Pcc@lists.ludd.ltu.se
https://lists.ludd.ltu.se/cgi-bin/mailman/listinfo/pcc

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic