[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cfe-dev
Subject:    [cfe-dev] [RFC] Add Intel TSX HLE Support
From:       Michael Liao <michael.liao () intel ! com>
Date:       2013-02-19 19:52:00
Message-ID: 1361303520.3225.41.camel () snbox
[Download RAW message or body]

Hi All,

I'd like to add HLE support in LLVM/clang consistent to GCC's style [1]. HLE from \
Intel TSX [2] is legacy compatible instruction set extension to specify transactional \
region by adding XACQUIRE and XRELEASE prefixes. To support that, GCC chooses the \
approach by extending the memory order flag in __atomic_* builtins with \
target-specific memory model in high bits (bit 31-16 for target-specific memory \
model, bit 15-0 for the general memory model.) To follow the similar approach, I \
propose to change LLVM/clang by adding:

+ a metadata 'targetflags' in LLVM atomic IR to pass this
  target-specific memory model hint

+ one extra target flag in AtomicSDNode & MemIntrinsicSDNode to specify XACQUIRE or \
XRELEASE hints  This extra target flag is embedded into the SubclassData fields. The \
following is rationale how such target flags are embedded into SubclassData in SDNode

  here is the current SDNode class hierarchy of memory related nodes

  SDNode -> MemSDNode -> LSBaseNode -> LoadSDNode
                    |             + -> StoreSDNode
                    + -> AtomicSDNode
                    + -> MemIntrinsicSDNode

  here is the current SubclassData definitions:

  bit 0~1 : extension type used in LoadSDNode
  bit 0   : truncating store in StoreSDNode
  bit 2~4 : addressing mode in LSBaseNode
  bit 5   : volatile bit in MemSDNode
  bit 6   : non-temporal bit in MemSDNode
  bit 7   : invariant bit in MemSDNode
  bit 8~11: memory order in AtomicSDNode
  bit 12  : synch scope in AtomicSDNode

  Considering the class hierarchy, we could safely reused bit 0~1 as the target flags \
in AtomicSDNode/MemIntrinsicNode  
+ X86 backend is modified to generate additional XACQUIRE/XRELEASE prefix based on \
the specified target flag


The following are details of each patch:

* 0001-Add-targetflags-in-AtomicSDNode-MemIntrinsicSDNode.patch

This patch adds 'targetflags' support in AtomicSDNode and MemIntrinsicSDNode. It will \
check metadata 'targetflags' and embedded its value into SubclassData. Currently, \
only two bits are defined.

* 0002-Add-HLE-target-feature.patch

This patch adds HLE feature and auto-detection support

* 0003-Add-XACQ-XREL-prefix-and-encoding-asm-printer-suppor.patch

This patch adds XACQUIRE/XRELEASE prefix and its assembler/encoding support

* 0004-Enable-HLE-code-generation.patch

This patch enables HLE code generation by extending the current logic to handle \
'targetflags'.

* 0001-Add-target-flags-support-for-atomic-ops.patch

This patch adds target flags support in __atomic_* builtins. It splits the whole \
32-bit order word into high and low 16-bit parts. The low 16-bit is the original \
memory order and the high 16-bit will be re-defined as target-specific flags and \
passed through 'targetflags' metadata.

* 0002-Add-mhle-option-support-and-populate-pre-defined-mac.patch

It adds '-m[no]hle' option to turn on HLE feature or not. Once HLE feature is turned \
on, two more macros (__ATOMIC_HLE_ACQUIRE and __ATOMIC_HLE_RELEASE) are defined for \
developers to mark atomic builtins.

Thanks for your time to review!

Yours
- Michael
---
[1] http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01073.html
[2] http://software.intel.com/sites/default/files/319433-014.pdf


["0001-Add-targetflags-in-AtomicSDNode-MemIntrinsicSDNode.patch" (0001-Add-targetflags-in-AtomicSDNode-MemIntrinsicSDNode.patch)]

From c2ed27488d773a6684e42adac9c61bff7f2badf8 Mon Sep 17 00:00:00 2001
From: Michael Liao <michael.hliao@gmail.com>
Date: Tue, 3 Jul 2012 23:28:17 -0700
Subject: [PATCH 1/4] Add targetflags in AtomicSDNode & MemIntrinsicSDNode

- to pass HLE acquire/release hint to backend
---
 include/llvm/CodeGen/SelectionDAG.h               |   22 ++++++-----
 include/llvm/CodeGen/SelectionDAGNodes.h          |   34 ++++++++++++----
 lib/CodeGen/SelectionDAG/LegalizeDAG.cpp          |    2 +
 lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp |   14 +++++--
 lib/CodeGen/SelectionDAG/SelectionDAG.cpp         |   44 +++++++++++----------
 lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp  |   23 +++++++++++
 lib/Target/AArch64/AArch64ISelLowering.cpp        |    2 +
 lib/Target/X86/X86ISelLowering.cpp                |   22 +++++++----
 8 files changed, 114 insertions(+), 49 deletions(-)

diff --git a/include/llvm/CodeGen/SelectionDAG.h \
b/include/llvm/CodeGen/SelectionDAG.h index c25497a..2ccda96 100644
--- a/include/llvm/CodeGen/SelectionDAG.h
+++ b/include/llvm/CodeGen/SelectionDAG.h
@@ -636,23 +636,24 @@ public:
   SDValue getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT, SDValue Chain,
                     SDValue Ptr, SDValue Cmp, SDValue Swp,
                     MachinePointerInfo PtrInfo, unsigned Alignment,
-                    AtomicOrdering Ordering,
+                    AtomicOrdering Ordering, unsigned TargetFlags,
                     SynchronizationScope SynchScope);
   SDValue getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT, SDValue Chain,
                     SDValue Ptr, SDValue Cmp, SDValue Swp,
                     MachineMemOperand *MMO,
-                    AtomicOrdering Ordering,
+                    AtomicOrdering Ordering, unsigned TargetFlags,
                     SynchronizationScope SynchScope);
 
   /// getAtomic - Gets a node for an atomic op, produces result (if relevant)
   /// and chain and takes 2 operands.
   SDValue getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT, SDValue Chain,
                     SDValue Ptr, SDValue Val, const Value* PtrVal,
-                    unsigned Alignment, AtomicOrdering Ordering,
+                    unsigned Alignment,
+                    AtomicOrdering Ordering, unsigned TargetFlags,
                     SynchronizationScope SynchScope);
   SDValue getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT, SDValue Chain,
                     SDValue Ptr, SDValue Val, MachineMemOperand *MMO,
-                    AtomicOrdering Ordering,
+                    AtomicOrdering Ordering, unsigned TargetFlags,
                     SynchronizationScope SynchScope);
 
   /// getAtomic - Gets a node for an atomic op, produces result and chain and
@@ -660,11 +661,11 @@ public:
   SDValue getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT, EVT VT,
                     SDValue Chain, SDValue Ptr, const Value* PtrVal,
                     unsigned Alignment,
-                    AtomicOrdering Ordering,
+                    AtomicOrdering Ordering, unsigned TargetFlags,
                     SynchronizationScope SynchScope);
   SDValue getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT, EVT VT,
                     SDValue Chain, SDValue Ptr, MachineMemOperand *MMO,
-                    AtomicOrdering Ordering,
+                    AtomicOrdering Ordering, unsigned TargetFlags,
                     SynchronizationScope SynchScope);
 
   /// getMemIntrinsicNode - Creates a MemIntrinsicNode that may produce a
@@ -676,17 +677,20 @@ public:
                               const SDValue *Ops, unsigned NumOps,
                               EVT MemVT, MachinePointerInfo PtrInfo,
                               unsigned Align = 0, bool Vol = false,
-                              bool ReadMem = true, bool WriteMem = true);
+                              bool ReadMem = true, bool WriteMem = true,
+                              unsigned TargetFlags = 0);
 
   SDValue getMemIntrinsicNode(unsigned Opcode, DebugLoc dl, SDVTList VTList,
                               const SDValue *Ops, unsigned NumOps,
                               EVT MemVT, MachinePointerInfo PtrInfo,
                               unsigned Align = 0, bool Vol = false,
-                              bool ReadMem = true, bool WriteMem = true);
+                              bool ReadMem = true, bool WriteMem = true,
+                              unsigned TargetFlags = 0);
 
   SDValue getMemIntrinsicNode(unsigned Opcode, DebugLoc dl, SDVTList VTList,
                               const SDValue *Ops, unsigned NumOps,
-                              EVT MemVT, MachineMemOperand *MMO);
+                              EVT MemVT, MachineMemOperand *MMO,
+                              unsigned TargetFlags = 0);
 
   /// getMergeValues - Create a MERGE_VALUES node from the given operands.
   SDValue getMergeValues(const SDValue *Ops, unsigned NumOps, DebugLoc dl);
diff --git a/include/llvm/CodeGen/SelectionDAGNodes.h \
b/include/llvm/CodeGen/SelectionDAGNodes.h index 2c34b4f..8e88834 100644
--- a/include/llvm/CodeGen/SelectionDAGNodes.h
+++ b/include/llvm/CodeGen/SelectionDAGNodes.h
@@ -1013,15 +1013,20 @@ public:
 class AtomicSDNode : public MemSDNode {
   SDUse Ops[4];
 
-  void InitAtomic(AtomicOrdering Ordering, SynchronizationScope SynchScope) {
+  void InitAtomic(AtomicOrdering Ordering, unsigned TargetFlags,
+                  SynchronizationScope SynchScope) {
     // This must match encodeMemSDNodeFlags() in SelectionDAG.cpp.
     assert((Ordering & 15) == Ordering &&
            "Ordering may not require more than 4 bits!");
+    assert((TargetFlags & 3) == TargetFlags &&
+           "TargetFlags may not require more than 2 bits!");
     assert((SynchScope & 1) == SynchScope &&
            "SynchScope may not require more than 1 bit!");
     SubclassData |= Ordering << 8;
+    SubclassData |= TargetFlags;
     SubclassData |= SynchScope << 12;
     assert(getOrdering() == Ordering && "Ordering encoding error!");
+    assert(getTargetFlags() == TargetFlags && "TargetFlags encoding error!");
     assert(getSynchScope() == SynchScope && "Synch-scope encoding error!");
   }
 
@@ -1037,28 +1042,34 @@ public:
   AtomicSDNode(unsigned Opc, DebugLoc dl, SDVTList VTL, EVT MemVT,
                SDValue Chain, SDValue Ptr,
                SDValue Cmp, SDValue Swp, MachineMemOperand *MMO,
-               AtomicOrdering Ordering, SynchronizationScope SynchScope)
+               AtomicOrdering Ordering, unsigned TargetFlags,
+               SynchronizationScope SynchScope)
     : MemSDNode(Opc, dl, VTL, MemVT, MMO) {
-    InitAtomic(Ordering, SynchScope);
+    InitAtomic(Ordering, TargetFlags, SynchScope);
     InitOperands(Ops, Chain, Ptr, Cmp, Swp);
   }
   AtomicSDNode(unsigned Opc, DebugLoc dl, SDVTList VTL, EVT MemVT,
                SDValue Chain, SDValue Ptr,
                SDValue Val, MachineMemOperand *MMO,
-               AtomicOrdering Ordering, SynchronizationScope SynchScope)
+               AtomicOrdering Ordering, unsigned TargetFlags,
+               SynchronizationScope SynchScope)
     : MemSDNode(Opc, dl, VTL, MemVT, MMO) {
-    InitAtomic(Ordering, SynchScope);
+    InitAtomic(Ordering, TargetFlags, SynchScope);
     InitOperands(Ops, Chain, Ptr, Val);
   }
   AtomicSDNode(unsigned Opc, DebugLoc dl, SDVTList VTL, EVT MemVT,
                SDValue Chain, SDValue Ptr,
                MachineMemOperand *MMO,
-               AtomicOrdering Ordering, SynchronizationScope SynchScope)
+               AtomicOrdering Ordering, unsigned TargetFlags,
+               SynchronizationScope SynchScope)
     : MemSDNode(Opc, dl, VTL, MemVT, MMO) {
-    InitAtomic(Ordering, SynchScope);
+    InitAtomic(Ordering, TargetFlags, SynchScope);
     InitOperands(Ops, Chain, Ptr);
   }
 
+  /// getTargetFlags - Return target-specific flags.
+  unsigned getTargetFlags() const { return SubclassData & 3; }
+
   const SDValue &getBasePtr() const { return getOperand(1); }
   const SDValue &getVal() const { return getOperand(2); }
 
@@ -1094,10 +1105,17 @@ class MemIntrinsicSDNode : public MemSDNode {
 public:
   MemIntrinsicSDNode(unsigned Opc, DebugLoc dl, SDVTList VTs,
                      const SDValue *Ops, unsigned NumOps,
-                     EVT MemoryVT, MachineMemOperand *MMO)
+                     EVT MemoryVT, MachineMemOperand *MMO,
+                     unsigned TargetFlags = 0)
     : MemSDNode(Opc, dl, VTs, Ops, NumOps, MemoryVT, MMO) {
+    assert((TargetFlags & 3) == TargetFlags &&
+           "TargetFlags may not require more than 2 bits!");
+    SubclassData |= TargetFlags;
   }
 
+  /// getTargetFlags - Return target-specific flags.
+  unsigned getTargetFlags() const { return SubclassData & 3; }
+
   // Methods to support isa and dyn_cast
   static bool classof(const SDNode *N) {
     // We lower some target intrinsics to their target opcode
diff --git a/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp \
b/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp index a9d40d0..18c1d16 100644
--- a/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -2799,6 +2799,7 @@ void SelectionDAGLegalize::ExpandNode(SDNode *Node) {
                                  Node->getOperand(1), Zero, Zero,
                                  cast<AtomicSDNode>(Node)->getMemOperand(),
                                  cast<AtomicSDNode>(Node)->getOrdering(),
+                                 cast<AtomicSDNode>(Node)->getTargetFlags(),
                                  cast<AtomicSDNode>(Node)->getSynchScope());
     Results.push_back(Swap.getValue(0));
     Results.push_back(Swap.getValue(1));
@@ -2812,6 +2813,7 @@ void SelectionDAGLegalize::ExpandNode(SDNode *Node) {
                                  Node->getOperand(1), Node->getOperand(2),
                                  cast<AtomicSDNode>(Node)->getMemOperand(),
                                  cast<AtomicSDNode>(Node)->getOrdering(),
+                                 cast<AtomicSDNode>(Node)->getTargetFlags(),
                                  cast<AtomicSDNode>(Node)->getSynchScope());
     Results.push_back(Swap.getValue(1));
     break;
diff --git a/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp \
b/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp index 182b7f3..a648940 100644
--- a/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -169,7 +169,8 @@ SDValue DAGTypeLegalizer::PromoteIntRes_Atomic0(AtomicSDNode *N) \
{  SDValue Res = DAG.getAtomic(N->getOpcode(), N->getDebugLoc(),
                               N->getMemoryVT(), ResVT,
                               N->getChain(), N->getBasePtr(),
-                              N->getMemOperand(), N->getOrdering(),
+                              N->getMemOperand(),
+                              N->getOrdering(), N->getTargetFlags(),
                               N->getSynchScope());
   // Legalized the chain result - switch anything that used the old chain to
   // use the new one.
@@ -182,7 +183,8 @@ SDValue DAGTypeLegalizer::PromoteIntRes_Atomic1(AtomicSDNode *N) \
{  SDValue Res = DAG.getAtomic(N->getOpcode(), N->getDebugLoc(),
                               N->getMemoryVT(),
                               N->getChain(), N->getBasePtr(),
-                              Op2, N->getMemOperand(), N->getOrdering(),
+                              Op2, N->getMemOperand(),
+                              N->getOrdering(), N->getTargetFlags(),
                               N->getSynchScope());
   // Legalized the chain result - switch anything that used the old chain to
   // use the new one.
@@ -195,7 +197,8 @@ SDValue DAGTypeLegalizer::PromoteIntRes_Atomic2(AtomicSDNode *N) \
{  SDValue Op3 = GetPromotedInteger(N->getOperand(3));
   SDValue Res = DAG.getAtomic(N->getOpcode(), N->getDebugLoc(),
                               N->getMemoryVT(), N->getChain(), N->getBasePtr(),
-                              Op2, Op3, N->getMemOperand(), N->getOrdering(),
+                              Op2, Op3, N->getMemOperand(),
+                              N->getOrdering(), N->getTargetFlags(),
                               N->getSynchScope());
   // Legalized the chain result - switch anything that used the old chain to
   // use the new one.
@@ -853,7 +856,8 @@ SDValue DAGTypeLegalizer::PromoteIntOp_ATOMIC_STORE(AtomicSDNode \
*N) {  SDValue Op2 = GetPromotedInteger(N->getOperand(2));
   return DAG.getAtomic(N->getOpcode(), N->getDebugLoc(), N->getMemoryVT(),
                        N->getChain(), N->getBasePtr(), Op2, N->getMemOperand(),
-                       N->getOrdering(), N->getSynchScope());
+                       N->getOrdering(), N->getTargetFlags(),
+                       N->getSynchScope());
 }
 
 SDValue DAGTypeLegalizer::PromoteIntOp_BITCAST(SDNode *N) {
@@ -2435,6 +2439,7 @@ void DAGTypeLegalizer::ExpandIntRes_ATOMIC_LOAD(SDNode *N,
                                N->getOperand(1), Zero, Zero,
                                cast<AtomicSDNode>(N)->getMemOperand(),
                                cast<AtomicSDNode>(N)->getOrdering(),
+                               cast<AtomicSDNode>(N)->getTargetFlags(),
                                cast<AtomicSDNode>(N)->getSynchScope());
   ReplaceValueWith(SDValue(N, 0), Swap.getValue(0));
   ReplaceValueWith(SDValue(N, 1), Swap.getValue(1));
@@ -2859,6 +2864,7 @@ SDValue DAGTypeLegalizer::ExpandIntOp_ATOMIC_STORE(SDNode *N) {
                                N->getOperand(1), N->getOperand(2),
                                cast<AtomicSDNode>(N)->getMemOperand(),
                                cast<AtomicSDNode>(N)->getOrdering(),
+                               cast<AtomicSDNode>(N)->getTargetFlags(),
                                cast<AtomicSDNode>(N)->getSynchScope());
   return Swap.getValue(1);
 }
diff --git a/lib/CodeGen/SelectionDAG/SelectionDAG.cpp \
b/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 09885d8..68f417b 100644
--- a/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -4061,7 +4061,7 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT,  SDValue Chain, SDValue Ptr, SDValue Cmp,
                                 SDValue Swp, MachinePointerInfo PtrInfo,
                                 unsigned Alignment,
-                                AtomicOrdering Ordering,
+                                AtomicOrdering Ordering, unsigned TargetFlags,
                                 SynchronizationScope SynchScope) {
   if (Alignment == 0)  // Ensure that codegen never sees alignment 0
     Alignment = getEVTAlignment(MemVT);
@@ -4082,14 +4082,14 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
                EVT MemVT,
     MF.getMachineMemOperand(PtrInfo, Flags, MemVT.getStoreSize(), Alignment);
 
   return getAtomic(Opcode, dl, MemVT, Chain, Ptr, Cmp, Swp, MMO,
-                   Ordering, SynchScope);
+                   Ordering, TargetFlags, SynchScope);
 }
 
 SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT,
                                 SDValue Chain,
                                 SDValue Ptr, SDValue Cmp,
                                 SDValue Swp, MachineMemOperand *MMO,
-                                AtomicOrdering Ordering,
+                                AtomicOrdering Ordering, unsigned TargetFlags,
                                 SynchronizationScope SynchScope) {
   assert(Opcode == ISD::ATOMIC_CMP_SWAP && "Invalid Atomic Op");
   assert(Cmp.getValueType() == Swp.getValueType() && "Invalid Atomic Op Types");
@@ -4109,7 +4109,7 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT,  }
   SDNode *N = new (NodeAllocator) AtomicSDNode(Opcode, dl, VTs, MemVT, Chain,
                                                Ptr, Cmp, Swp, MMO, Ordering,
-                                               SynchScope);
+                                               TargetFlags, SynchScope);
   CSEMap.InsertNode(N, IP);
   AllNodes.push_back(N);
   return SDValue(N, 0);
@@ -4120,7 +4120,7 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT,  SDValue Ptr, SDValue Val,
                                 const Value* PtrVal,
                                 unsigned Alignment,
-                                AtomicOrdering Ordering,
+                                AtomicOrdering Ordering, unsigned TargetFlags,
                                 SynchronizationScope SynchScope) {
   if (Alignment == 0)  // Ensure that codegen never sees alignment 0
     Alignment = getEVTAlignment(MemVT);
@@ -4143,14 +4143,14 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT,  MemVT.getStoreSize(), Alignment);
 
   return getAtomic(Opcode, dl, MemVT, Chain, Ptr, Val, MMO,
-                   Ordering, SynchScope);
+                   Ordering, TargetFlags, SynchScope);
 }
 
 SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT,
                                 SDValue Chain,
                                 SDValue Ptr, SDValue Val,
                                 MachineMemOperand *MMO,
-                                AtomicOrdering Ordering,
+                                AtomicOrdering Ordering, unsigned TargetFlags,
                                 SynchronizationScope SynchScope) {
   assert((Opcode == ISD::ATOMIC_LOAD_ADD ||
           Opcode == ISD::ATOMIC_LOAD_SUB ||
@@ -4181,8 +4181,8 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT,  return SDValue(E, 0);
   }
   SDNode *N = new (NodeAllocator) AtomicSDNode(Opcode, dl, VTs, MemVT, Chain,
-                                               Ptr, Val, MMO,
-                                               Ordering, SynchScope);
+                                               Ptr, Val, MMO, Ordering,
+                                               TargetFlags, SynchScope);
   CSEMap.InsertNode(N, IP);
   AllNodes.push_back(N);
   return SDValue(N, 0);
@@ -4193,7 +4193,7 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT,  SDValue Ptr,
                                 const Value* PtrVal,
                                 unsigned Alignment,
-                                AtomicOrdering Ordering,
+                                AtomicOrdering Ordering, unsigned TargetFlags,
                                 SynchronizationScope SynchScope) {
   if (Alignment == 0)  // Ensure that codegen never sees alignment 0
     Alignment = getEVTAlignment(MemVT);
@@ -4216,14 +4216,14 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT,  MemVT.getStoreSize(), Alignment);
 
   return getAtomic(Opcode, dl, MemVT, VT, Chain, Ptr, MMO,
-                   Ordering, SynchScope);
+                   Ordering, TargetFlags, SynchScope);
 }
 
 SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT,
                                 EVT VT, SDValue Chain,
                                 SDValue Ptr,
                                 MachineMemOperand *MMO,
-                                AtomicOrdering Ordering,
+                                AtomicOrdering Ordering, unsigned TargetFlags,
                                 SynchronizationScope SynchScope) {
   assert(Opcode == ISD::ATOMIC_LOAD && "Invalid Atomic Op");
 
@@ -4239,7 +4239,8 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT,  return SDValue(E, 0);
   }
   SDNode *N = new (NodeAllocator) AtomicSDNode(Opcode, dl, VTs, MemVT, Chain,
-                                               Ptr, MMO, Ordering, SynchScope);
+                                               Ptr, MMO, Ordering, TargetFlags,
+                                               SynchScope);
   CSEMap.InsertNode(N, IP);
   AllNodes.push_back(N);
   return SDValue(N, 0);
@@ -4265,10 +4266,11 @@ SelectionDAG::getMemIntrinsicNode(unsigned Opcode, DebugLoc \
dl,  const SDValue *Ops, unsigned NumOps,
                                   EVT MemVT, MachinePointerInfo PtrInfo,
                                   unsigned Align, bool Vol,
-                                  bool ReadMem, bool WriteMem) {
+                                  bool ReadMem, bool WriteMem,
+                                  unsigned TargetFlags) {
   return getMemIntrinsicNode(Opcode, dl, makeVTList(VTs, NumVTs), Ops, NumOps,
                              MemVT, PtrInfo, Align, Vol,
-                             ReadMem, WriteMem);
+                             ReadMem, WriteMem, TargetFlags);
 }
 
 SDValue
@@ -4276,7 +4278,8 @@ SelectionDAG::getMemIntrinsicNode(unsigned Opcode, DebugLoc dl, \
SDVTList VTList,  const SDValue *Ops, unsigned NumOps,
                                   EVT MemVT, MachinePointerInfo PtrInfo,
                                   unsigned Align, bool Vol,
-                                  bool ReadMem, bool WriteMem) {
+                                  bool ReadMem, bool WriteMem,
+                                  unsigned TargetFlags) {
   if (Align == 0)  // Ensure that codegen never sees alignment 0
     Align = getEVTAlignment(MemVT);
 
@@ -4291,13 +4294,14 @@ SelectionDAG::getMemIntrinsicNode(unsigned Opcode, DebugLoc \
dl, SDVTList VTList,  MachineMemOperand *MMO =
     MF.getMachineMemOperand(PtrInfo, Flags, MemVT.getStoreSize(), Align);
 
-  return getMemIntrinsicNode(Opcode, dl, VTList, Ops, NumOps, MemVT, MMO);
+  return getMemIntrinsicNode(Opcode, dl, VTList, Ops, NumOps, MemVT, MMO, \
TargetFlags);  }
 
 SDValue
 SelectionDAG::getMemIntrinsicNode(unsigned Opcode, DebugLoc dl, SDVTList VTList,
                                   const SDValue *Ops, unsigned NumOps,
-                                  EVT MemVT, MachineMemOperand *MMO) {
+                                  EVT MemVT, MachineMemOperand *MMO,
+                                  unsigned TargetFlags) {
   assert((Opcode == ISD::INTRINSIC_VOID ||
           Opcode == ISD::INTRINSIC_W_CHAIN ||
           Opcode == ISD::PREFETCH ||
@@ -4320,11 +4324,11 @@ SelectionDAG::getMemIntrinsicNode(unsigned Opcode, DebugLoc \
dl, SDVTList VTList,  }
 
     N = new (NodeAllocator) MemIntrinsicSDNode(Opcode, dl, VTList, Ops, NumOps,
-                                               MemVT, MMO);
+                                               MemVT, MMO, TargetFlags);
     CSEMap.InsertNode(N, IP);
   } else {
     N = new (NodeAllocator) MemIntrinsicSDNode(Opcode, dl, VTList, Ops, NumOps,
-                                               MemVT, MMO);
+                                               MemVT, MMO, TargetFlags);
   }
   AllNodes.push_back(N);
   return SDValue(N, 0);
diff --git a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp \
b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp index 3a55696..0aa4be5 100644
--- a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -3423,9 +3423,25 @@ static SDValue InsertFenceForAtomic(SDValue Chain, \
AtomicOrdering Order,  return DAG.getNode(ISD::ATOMIC_FENCE, dl, MVT::Other, Ops, 3);
 }
 
+static unsigned GetAtomicTargetFlags(const Instruction &I) {
+  const MDNode* TargetFlagsInfo = I.getMetadata("targetflags");
+
+  if (!TargetFlagsInfo)
+    return 0;
+
+  assert((TargetFlagsInfo->getNumOperands() > 0) &&
+         "'targetflags' requires 1 operand!");
+  const ConstantInt *CI =
+    dyn_cast<ConstantInt>(TargetFlagsInfo->getOperand(0));
+  assert(CI && "'targetflags' not a constant integer!");
+
+  return CI->getZExtValue();
+}
+
 void SelectionDAGBuilder::visitAtomicCmpXchg(const AtomicCmpXchgInst &I) {
   DebugLoc dl = getCurDebugLoc();
   AtomicOrdering Order = I.getOrdering();
+  unsigned TargetFlags = GetAtomicTargetFlags(I);
   SynchronizationScope Scope = I.getSynchScope();
 
   SDValue InChain = getRoot();
@@ -3443,6 +3459,7 @@ void SelectionDAGBuilder::visitAtomicCmpXchg(const \
AtomicCmpXchgInst &I) {  getValue(I.getNewValOperand()),
                   MachinePointerInfo(I.getPointerOperand()), 0 /* Alignment */,
                   TLI.getInsertFencesForAtomic() ? Monotonic : Order,
+                  TargetFlags,
                   Scope);
 
   SDValue OutChain = L.getValue(1);
@@ -3473,6 +3490,7 @@ void SelectionDAGBuilder::visitAtomicRMW(const AtomicRMWInst \
&I) {  case AtomicRMWInst::UMin: NT = ISD::ATOMIC_LOAD_UMIN; break;
   }
   AtomicOrdering Order = I.getOrdering();
+  unsigned TargetFlags = GetAtomicTargetFlags(I);
   SynchronizationScope Scope = I.getSynchScope();
 
   SDValue InChain = getRoot();
@@ -3489,6 +3507,7 @@ void SelectionDAGBuilder::visitAtomicRMW(const AtomicRMWInst \
&I) {  getValue(I.getValOperand()),
                   I.getPointerOperand(), 0 /* Alignment */,
                   TLI.getInsertFencesForAtomic() ? Monotonic : Order,
+                  TargetFlags,
                   Scope);
 
   SDValue OutChain = L.getValue(1);
@@ -3513,6 +3532,7 @@ void SelectionDAGBuilder::visitFence(const FenceInst &I) {
 void SelectionDAGBuilder::visitAtomicLoad(const LoadInst &I) {
   DebugLoc dl = getCurDebugLoc();
   AtomicOrdering Order = I.getOrdering();
+  unsigned TargetFlags = GetAtomicTargetFlags(I);
   SynchronizationScope Scope = I.getSynchScope();
 
   SDValue InChain = getRoot();
@@ -3527,6 +3547,7 @@ void SelectionDAGBuilder::visitAtomicLoad(const LoadInst &I) {
                   getValue(I.getPointerOperand()),
                   I.getPointerOperand(), I.getAlignment(),
                   TLI.getInsertFencesForAtomic() ? Monotonic : Order,
+                  TargetFlags,
                   Scope);
 
   SDValue OutChain = L.getValue(1);
@@ -3543,6 +3564,7 @@ void SelectionDAGBuilder::visitAtomicStore(const StoreInst &I) \
{  DebugLoc dl = getCurDebugLoc();
 
   AtomicOrdering Order = I.getOrdering();
+  unsigned TargetFlags = GetAtomicTargetFlags(I);
   SynchronizationScope Scope = I.getSynchScope();
 
   SDValue InChain = getRoot();
@@ -3563,6 +3585,7 @@ void SelectionDAGBuilder::visitAtomicStore(const StoreInst &I) \
{  getValue(I.getValueOperand()),
                   I.getPointerOperand(), I.getAlignment(),
                   TLI.getInsertFencesForAtomic() ? Monotonic : Order,
+                  TargetFlags,
                   Scope);
 
   if (TLI.getInsertFencesForAtomic())
diff --git a/lib/Target/AArch64/AArch64ISelLowering.cpp \
b/lib/Target/AArch64/AArch64ISelLowering.cpp index cea7f91..66f6eec 100644
--- a/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2414,6 +2414,7 @@ static SDValue PerformATOMIC_FENCECombine(SDNode *FenceNode,
                              Chain,                  // Chain
                              AtomicOp.getOperand(1), // Pointer
                              AtomicNode->getMemOperand(), Acquire,
+                             AtomicNode->getTargetFlags(),
                              FenceScope);
 
   if (AtomicNode->getOpcode() == ISD::ATOMIC_LOAD)
@@ -2447,6 +2448,7 @@ static SDValue PerformATOMIC_STORECombine(SDNode *N,
                        AtomicNode->getOperand(1),       // Pointer
                        AtomicNode->getOperand(2),       // Value
                        AtomicNode->getMemOperand(), Release,
+                       AtomicNode->getTargetFlags(),
                        FenceScope);
 }
 
diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
index 9ed03cd..d525e3d 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -11926,9 +11926,10 @@ static SDValue LowerCMP_SWAP(SDValue Op, const X86Subtarget \
*Subtarget,  DAG.getTargetConstant(size, MVT::i8),
                     cpIn.getValue(1) };
   SDVTList Tys = DAG.getVTList(MVT::Other, MVT::Glue);
-  MachineMemOperand *MMO = cast<AtomicSDNode>(Op)->getMemOperand();
+  const AtomicSDNode *AT = cast<AtomicSDNode>(Op);
   SDValue Result = DAG.getMemIntrinsicNode(X86ISD::LCMPXCHG_DAG, DL, Tys,
-                                           Ops, 5, T, MMO);
+                                           Ops, 5, T, AT->getMemOperand(),
+                                           AT->getTargetFlags());
   SDValue cpOut =
     DAG.getCopyFromReg(Result.getValue(0), DL, Reg, T, Result.getValue(1));
   return cpOut;
@@ -11986,6 +11987,7 @@ static SDValue LowerLOAD_SUB(SDValue Op, SelectionDAG &DAG) {
                        cast<AtomicSDNode>(Node)->getSrcValue(),
                        cast<AtomicSDNode>(Node)->getAlignment(),
                        cast<AtomicSDNode>(Node)->getOrdering(),
+                       cast<AtomicSDNode>(Node)->getTargetFlags(),
                        cast<AtomicSDNode>(Node)->getSynchScope());
 }
 
@@ -12007,6 +12009,7 @@ static SDValue LowerATOMIC_STORE(SDValue Op, SelectionDAG \
                &DAG) {
                                  Node->getOperand(1), Node->getOperand(2),
                                  cast<AtomicSDNode>(Node)->getMemOperand(),
                                  cast<AtomicSDNode>(Node)->getOrdering(),
+                                 cast<AtomicSDNode>(Node)->getTargetFlags(),
                                  cast<AtomicSDNode>(Node)->getSynchScope());
     return Swap.getValue(1);
   }
@@ -12179,6 +12182,7 @@ static void ReplaceATOMIC_LOAD(SDNode *Node,
                                Node->getOperand(1), Zero, Zero,
                                cast<AtomicSDNode>(Node)->getMemOperand(),
                                cast<AtomicSDNode>(Node)->getOrdering(),
+                               cast<AtomicSDNode>(Node)->getTargetFlags(),
                                cast<AtomicSDNode>(Node)->getSynchScope());
   Results.push_back(Swap.getValue(0));
   Results.push_back(Swap.getValue(1));
@@ -12199,9 +12203,10 @@ ReplaceATOMIC_BINARY_64(SDNode *Node, \
                SmallVectorImpl<SDValue>&Results,
                              Node->getOperand(2), DAG.getIntPtrConstant(1));
   SDValue Ops[] = { Chain, In1, In2L, In2H };
   SDVTList Tys = DAG.getVTList(MVT::i32, MVT::i32, MVT::Other);
-  SDValue Result =
-    DAG.getMemIntrinsicNode(NewOp, dl, Tys, Ops, 4, MVT::i64,
-                            cast<MemSDNode>(Node)->getMemOperand());
+  const AtomicSDNode *AT = cast<AtomicSDNode>(Node);
+  SDValue Result = DAG.getMemIntrinsicNode(NewOp, dl, Tys, Ops, 4, MVT::i64,
+                                           AT->getMemOperand(),
+                                           AT->getTargetFlags());
   SDValue OpsF[] = { Result.getValue(0), Result.getValue(1)};
   Results.push_back(DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i64, OpsF, 2));
   Results.push_back(Result.getValue(2));
@@ -12314,11 +12319,12 @@ void X86TargetLowering::ReplaceNodeResults(SDNode *N,
                       N->getOperand(1),
                       swapInH.getValue(1) };
     SDVTList Tys = DAG.getVTList(MVT::Other, MVT::Glue);
-    MachineMemOperand *MMO = cast<AtomicSDNode>(N)->getMemOperand();
+    const AtomicSDNode *AT = cast<AtomicSDNode>(N);
     unsigned Opcode = Regs64bit ? X86ISD::LCMPXCHG16_DAG :
                                   X86ISD::LCMPXCHG8_DAG;
-    SDValue Result = DAG.getMemIntrinsicNode(Opcode, dl, Tys,
-                                             Ops, 3, T, MMO);
+    SDValue Result = DAG.getMemIntrinsicNode(Opcode, dl, Tys, Ops, 3, T,
+                                             AT->getMemOperand(),
+                                             AT->getTargetFlags());
     SDValue cpOutL = DAG.getCopyFromReg(Result.getValue(0), dl,
                                         Regs64bit ? X86::RAX : X86::EAX,
                                         HalfT, Result.getValue(1));
-- 
1.7.9.5


["0002-Add-HLE-target-feature.patch" (0002-Add-HLE-target-feature.patch)]

From 5f18d83c4c633c43becfcb2557f831e3df717815 Mon Sep 17 00:00:00 2001
From: Michael Liao <michael.hliao@gmail.com>
Date: Thu, 5 Jul 2012 23:38:57 -0700
Subject: [PATCH 2/4] Add HLE target feature

---
 lib/Target/X86/X86.td           |    4 +++-
 lib/Target/X86/X86InstrInfo.td  |    1 +
 lib/Target/X86/X86Subtarget.cpp |    5 +++++
 lib/Target/X86/X86Subtarget.h   |    4 ++++
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/lib/Target/X86/X86.td b/lib/Target/X86/X86.td
index 0216252..810acee 100644
--- a/lib/Target/X86/X86.td
+++ b/lib/Target/X86/X86.td
@@ -120,6 +120,8 @@ def FeatureBMI2    : SubtargetFeature<"bmi2", "HasBMI2", "true",
                                       "Support BMI2 instructions">;
 def FeatureRTM     : SubtargetFeature<"rtm", "HasRTM", "true",
                                       "Support RTM instructions">;
+def FeatureHLE     : SubtargetFeature<"hle", "HasHLE", "true",
+                                      "Support HLE">;
 def FeatureADX     : SubtargetFeature<"adx", "HasADX", "true",
                                       "Support ADX instructions">;
 def FeatureLeaForSP : SubtargetFeature<"lea-sp", "UseLeaForSP", "true",
@@ -201,7 +203,7 @@ def : Proc<"core-avx2",       [FeatureAVX2, FeatureCMPXCHG16B, FeatureFastUAMem,
                                FeatureRDRAND, FeatureF16C, FeatureFSGSBase,
                                FeatureMOVBE, FeatureLZCNT, FeatureBMI,
                                FeatureBMI2, FeatureFMA,
-                               FeatureRTM]>;
+                               FeatureRTM, FeatureHLE]>;
 
 def : Proc<"k6",              [FeatureMMX]>;
 def : Proc<"k6-2",            [Feature3DNow]>;
diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td
index 84c278c..46daaad 100644
--- a/lib/Target/X86/X86InstrInfo.td
+++ b/lib/Target/X86/X86InstrInfo.td
@@ -603,6 +603,7 @@ def HasLZCNT     : Predicate<"Subtarget->hasLZCNT()">;
 def HasBMI       : Predicate<"Subtarget->hasBMI()">;
 def HasBMI2      : Predicate<"Subtarget->hasBMI2()">;
 def HasRTM       : Predicate<"Subtarget->hasRTM()">;
+def HasHLE       : Predicate<"Subtarget->hasHLE()">;
 def HasADX       : Predicate<"Subtarget->hasADX()">;
 def FPStackf32   : Predicate<"!Subtarget->hasSSE1()">;
 def FPStackf64   : Predicate<"!Subtarget->hasSSE2()">;
diff --git a/lib/Target/X86/X86Subtarget.cpp b/lib/Target/X86/X86Subtarget.cpp
index 0f2c008..a9955ce 100644
--- a/lib/Target/X86/X86Subtarget.cpp
+++ b/lib/Target/X86/X86Subtarget.cpp
@@ -310,6 +310,10 @@ void X86Subtarget::AutoDetectSubtargetFeatures() {
         HasBMI = true;
         ToggleFeature(X86::FeatureBMI);
       }
+      if ((EBX >> 4) & 0x1) {
+        HasHLE = true;
+        ToggleFeature(X86::FeatureHLE);
+      }
       if (IsIntel && ((EBX >> 5) & 0x1)) {
         X86SSELevel = AVX2;
         ToggleFeature(X86::FeatureAVX2);
@@ -439,6 +443,7 @@ void X86Subtarget::initializeEnvironment() {
   HasBMI = false;
   HasBMI2 = false;
   HasRTM = false;
+  HasHLE = false;
   HasADX = false;
   IsBTMemSlow = false;
   IsUAMemFast = false;
diff --git a/lib/Target/X86/X86Subtarget.h b/lib/Target/X86/X86Subtarget.h
index e97da4b..411494a 100644
--- a/lib/Target/X86/X86Subtarget.h
+++ b/lib/Target/X86/X86Subtarget.h
@@ -121,6 +121,9 @@ protected:
   /// HasRTM - Processor has RTM instructions.
   bool HasRTM;
 
+  /// HasHLE - Processor has HLE.
+  bool HasHLE;
+
   /// HasADX - Processor has ADX instructions.
   bool HasADX;
 
@@ -253,6 +256,7 @@ public:
   bool hasBMI() const { return HasBMI; }
   bool hasBMI2() const { return HasBMI2; }
   bool hasRTM() const { return HasRTM; }
+  bool hasHLE() const { return HasHLE; }
   bool hasADX() const { return HasADX; }
   bool isBTMemSlow() const { return IsBTMemSlow; }
   bool isUnalignedMemAccessFast() const { return IsUAMemFast; }
-- 
1.7.9.5


["0003-Add-XACQ-XREL-prefix-and-encoding-asm-printer-suppor.patch" (0003-Add-XACQ-XREL-prefix-and-encoding-asm-printer-suppor.patch)]

From 3fc0f1c4b089f16cc064437ba238c5e17b67ea04 Mon Sep 17 00:00:00 2001
From: Michael Liao <michael.hliao@gmail.com>
Date: Thu, 5 Jul 2012 21:32:14 -0700
Subject: [PATCH 3/4] Add XACQ/XREL prefix and encoding/asm-printer support

---
 lib/Target/X86/AsmParser/X86AsmParser.cpp          |    3 +-
 lib/Target/X86/InstPrinter/X86ATTInstPrinter.cpp   |   10 ++++++
 lib/Target/X86/InstPrinter/X86IntelInstPrinter.cpp |   10 ++++++
 lib/Target/X86/MCTargetDesc/X86BaseInfo.h          |   12 +++++++-
 lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp   |   10 ++++++
 lib/Target/X86/X86InstrFormats.td                  |   32 ++++++++++++--------
 lib/Target/X86/X86InstrInfo.td                     |    4 +++
 test/MC/X86/x86_64-hle-encoding.s                  |   25 +++++++++++++++
 8 files changed, 91 insertions(+), 15 deletions(-)
 create mode 100644 test/MC/X86/x86_64-hle-encoding.s

diff --git a/lib/Target/X86/AsmParser/X86AsmParser.cpp \
b/lib/Target/X86/AsmParser/X86AsmParser.cpp index 8c4c447..b9dc8bb 100644
--- a/lib/Target/X86/AsmParser/X86AsmParser.cpp
+++ b/lib/Target/X86/AsmParser/X86AsmParser.cpp
@@ -1515,7 +1515,8 @@ ParseInstruction(ParseInstructionInfo &Info, StringRef Name, \
SMLoc NameLoc,  Name == "lock" || Name == "rep" ||
     Name == "repe" || Name == "repz" ||
     Name == "repne" || Name == "repnz" ||
-    Name == "rex64" || Name == "data16";
+    Name == "rex64" || Name == "data16" ||
+    Name == "xacquire" || Name == "xrelease";
 
 
   // This does the actual operand parsing.  Don't parse any more if we have a
diff --git a/lib/Target/X86/InstPrinter/X86ATTInstPrinter.cpp \
b/lib/Target/X86/InstPrinter/X86ATTInstPrinter.cpp index e357710..7764961 100644
--- a/lib/Target/X86/InstPrinter/X86ATTInstPrinter.cpp
+++ b/lib/Target/X86/InstPrinter/X86ATTInstPrinter.cpp
@@ -47,6 +47,16 @@ void X86ATTInstPrinter::printInst(const MCInst *MI, raw_ostream \
&OS,  if (TSFlags & X86II::LOCK)
     OS << "\tlock\n";
 
+  if (TSFlags & X86II::XACQUIRE) {
+    assert(!(TSFlags & X86II::XRELEASE) && "unknown HLE prefix hints!");
+    OS << "\txacquire\n";
+  }
+
+  if (TSFlags & X86II::XRELEASE) {
+    assert(!(TSFlags & X86II::XACQUIRE) && "unknown HLE prefix hints!");
+    OS << "\txrelease\n";
+  }
+
   // Try to print any aliases first.
   if (!printAliasInstr(MI, OS))
     printInstruction(MI, OS);
diff --git a/lib/Target/X86/InstPrinter/X86IntelInstPrinter.cpp \
b/lib/Target/X86/InstPrinter/X86IntelInstPrinter.cpp index 141f4a4..734dfe2 100644
--- a/lib/Target/X86/InstPrinter/X86IntelInstPrinter.cpp
+++ b/lib/Target/X86/InstPrinter/X86IntelInstPrinter.cpp
@@ -39,6 +39,16 @@ void X86IntelInstPrinter::printInst(const MCInst *MI, raw_ostream \
&OS,  if (TSFlags & X86II::LOCK)
     OS << "\tlock\n";
 
+  if (TSFlags & X86II::XACQUIRE) {
+    assert(!(TSFlags & X86II::XRELEASE) && "unknown HLE prefix hints!");
+    OS << "\txacquire\n";
+  }
+
+  if (TSFlags & X86II::XRELEASE) {
+    assert(!(TSFlags & X86II::XACQUIRE) && "unknown HLE prefix hints!");
+    OS << "\txrelease\n";
+  }
+
   printInstruction(MI, OS);
 
   // Next always print the annotation.
diff --git a/lib/Target/X86/MCTargetDesc/X86BaseInfo.h \
b/lib/Target/X86/MCTargetDesc/X86BaseInfo.h index 9e68388..fb21398 100644
--- a/lib/Target/X86/MCTargetDesc/X86BaseInfo.h
+++ b/lib/Target/X86/MCTargetDesc/X86BaseInfo.h
@@ -415,9 +415,19 @@ namespace X86II {
     LOCKShift = FPTypeShift + 3,
     LOCK = 1 << LOCKShift,
 
+    // TSX/HLE prefix
+    TSXShift = LOCKShift + 1,
+    TSXMask  = 3 << TSXShift,
+
+    // XACQUIRE - Specifies that this instruction has XACQUIRE HLE prefix hint
+    XACQUIRE = 1 << TSXShift,
+
+    // XRELEASE - Specifies that this instruction has XRELEASE HLE prefix hint
+    XRELEASE = 2 << TSXShift,
+
     // Segment override prefixes. Currently we just need ability to address
     // stuff in gs and fs segments.
-    SegOvrShift = LOCKShift + 1,
+    SegOvrShift = TSXShift + 2,
     SegOvrMask  = 3 << SegOvrShift,
     FS          = 1 << SegOvrShift,
     GS          = 2 << SegOvrShift,
diff --git a/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp \
b/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp index 122204a..f227d7c 100644
--- a/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp
+++ b/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp
@@ -851,6 +851,16 @@ void X86MCCodeEmitter::EmitOpcodePrefix(uint64_t TSFlags, \
unsigned &CurByte,  if (TSFlags & X86II::LOCK)
     EmitByte(0xF0, CurByte, OS);
 
+  if (TSFlags & X86II::XACQUIRE) {
+    assert(!(TSFlags & X86II::XRELEASE) && "unknown HLE prefix hints!");
+    EmitByte(0xF2, CurByte, OS);
+  }
+
+  if (TSFlags & X86II::XRELEASE) {
+    assert(!(TSFlags & X86II::XACQUIRE) && "unknown HLE prefix hints!");
+    EmitByte(0xF3, CurByte, OS);
+  }
+
   // Emit segment override opcode prefix as needed.
   EmitSegmentOverridePrefix(TSFlags, CurByte, MemOperand, MI, OS);
 
diff --git a/lib/Target/X86/X86InstrFormats.td b/lib/Target/X86/X86InstrFormats.td
index 44e574d..d5bd098 100644
--- a/lib/Target/X86/X86InstrFormats.td
+++ b/lib/Target/X86/X86InstrFormats.td
@@ -99,6 +99,8 @@ class OpSize { bit hasOpSizePrefix = 1; }
 class AdSize { bit hasAdSizePrefix = 1; }
 class REX_W  { bit hasREX_WPrefix = 1; }
 class LOCK   { bit hasLockPrefix = 1; }
+class XACQ   { bit hasXAcquire = 1; }
+class XREL   { bit hasXRelease = 1; }
 class SegFS  { bits<2> SegOvrBits = 1; }
 class SegGS  { bits<2> SegOvrBits = 2; }
 class TB     { bits<5> Prefix = 1; }
@@ -163,6 +165,8 @@ class X86Inst<bits<8> opcod, Format f, ImmType i, dag outs, dag \
ins,  bit hasREX_WPrefix  = 0;  // Does this inst require the REX.W prefix?
   FPFormat FPForm = NotFP;  // What flavor of FP instruction is this?
   bit hasLockPrefix = 0;    // Does this inst have a 0xF0 prefix?
+  bit hasXAcquire = 0;      // Does this instruction require an XACQUIRE prefix?
+  bit hasXRelease = 0;      // Does this instruction require an XRELEASE prefix?
   bits<2> SegOvrBits = 0;   // Segment override prefix.
   Domain ExeDomain = d;
   bit hasVEXPrefix = 0;     // Does this inst require a VEX prefix?
@@ -187,19 +191,21 @@ class X86Inst<bits<8> opcod, Format f, ImmType i, dag outs, dag \
ins,  let TSFlags{16-14} = ImmT.Value;
   let TSFlags{19-17} = FPForm.Value;
   let TSFlags{20}    = hasLockPrefix;
-  let TSFlags{22-21} = SegOvrBits;
-  let TSFlags{24-23} = ExeDomain.Value;
-  let TSFlags{32-25} = Opcode;
-  let TSFlags{33}    = hasVEXPrefix;
-  let TSFlags{34}    = hasVEX_WPrefix;
-  let TSFlags{35}    = hasVEX_4VPrefix;
-  let TSFlags{36}    = hasVEX_4VOp3Prefix;
-  let TSFlags{37}    = hasVEX_i8ImmReg;
-  let TSFlags{38}    = hasVEX_L;
-  let TSFlags{39}    = ignoresVEX_L;
-  let TSFlags{40}    = has3DNow0F0FOpcode;
-  let TSFlags{41}    = hasMemOp4Prefix;
-  let TSFlags{42}    = hasXOP_Prefix;
+  let TSFlags{21}    = hasXAcquire;
+  let TSFlags{22}    = hasXRelease;
+  let TSFlags{24-23} = SegOvrBits;
+  let TSFlags{26-25} = ExeDomain.Value;
+  let TSFlags{34-27} = Opcode;
+  let TSFlags{35}    = hasVEXPrefix;
+  let TSFlags{36}    = hasVEX_WPrefix;
+  let TSFlags{37}    = hasVEX_4VPrefix;
+  let TSFlags{38}    = hasVEX_4VOp3Prefix;
+  let TSFlags{39}    = hasVEX_i8ImmReg;
+  let TSFlags{40}    = hasVEX_L;
+  let TSFlags{41}    = ignoresVEX_L;
+  let TSFlags{42}    = has3DNow0F0FOpcode;
+  let TSFlags{43}    = hasMemOp4Prefix;
+  let TSFlags{44}    = hasXOP_Prefix;
 }
 
 class PseudoI<dag oops, dag iops, list<dag> pattern>
diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td
index 46daaad..04d8f19 100644
--- a/lib/Target/X86/X86InstrInfo.td
+++ b/lib/Target/X86/X86InstrInfo.td
@@ -1460,6 +1460,10 @@ def REP_PREFIX : I<0xF3, RawFrm, (outs),  (ins), "rep", []>;
 def REPNE_PREFIX : I<0xF2, RawFrm, (outs),  (ins), "repne", []>;
 }
 
+// HLE hint prefix
+def : MnemonicAlias<"xacquire", "repne">;
+def : MnemonicAlias<"xrelease", "rep">;
+
 
 // String manipulation instructions
 def LODSB : I<0xAC, RawFrm, (outs), (ins), "lodsb", [], IIC_LODS>;
diff --git a/test/MC/X86/x86_64-hle-encoding.s b/test/MC/X86/x86_64-hle-encoding.s
new file mode 100644
index 0000000..4109fb4
--- /dev/null
+++ b/test/MC/X86/x86_64-hle-encoding.s
@@ -0,0 +1,25 @@
+// RUN: llvm-mc -triple x86_64-unknown-unknown --show-encoding %s | FileCheck %s
+
+// CHECK: lock
+// CHECK: encoding: [0xf0]
+// CHECK: repne
+// CHECK: encoding: [0xf2]
+	lock xacquire xaddq %rax, sym(%rip)
+
+// CHECK: repne
+// CHECK: encoding: [0xf2]
+// CHECK: lock
+// CHECK: encoding: [0xf0]
+	xacquire lock xaddq %rax, sym(%rip)
+
+// CHECK: lock
+// CHECK: encoding: [0xf0]
+// CHECK: rep
+// CHECK: encoding: [0xf3]
+	lock xrelease xaddq %rax, sym(%rip)
+
+// CHECK: rep
+// CHECK: encoding: [0xf3]
+// CHECK: lock
+// CHECK: encoding: [0xf0]
+	xrelease lock xaddq %rax, sym(%rip)
-- 
1.7.9.5


["0004-Enable-HLE-code-generation.patch" (0004-Enable-HLE-code-generation.patch)]

From 5cef473f18c43c646911c4d51f6e6a79293ff3fd Mon Sep 17 00:00:00 2001
From: Michael Liao <michael.hliao@gmail.com>
Date: Thu, 14 Feb 2013 22:05:25 -0800
Subject: [PATCH 4/4] Enable HLE code generation

- Add test cases
---
 lib/Target/X86/X86ISelDAGToDAG.cpp |  208 ++++++++++++++++++++----------------
 lib/Target/X86/X86ISelLowering.cpp |   40 +++++--
 lib/Target/X86/X86InstrCompiler.td |  130 ++++++++++++++++------
 lib/Target/X86/X86InstrInfo.td     |  162 +++++++++++++++++++++++++++-
 test/CodeGen/X86/hle-atomic16.ll   |  188 ++++++++++++++++++++++++++++++++
 test/CodeGen/X86/hle-atomic32.ll   |  188 ++++++++++++++++++++++++++++++++
 test/CodeGen/X86/hle-atomic64.ll   |  188 ++++++++++++++++++++++++++++++++
 test/CodeGen/X86/hle-atomic8.ll    |  188 ++++++++++++++++++++++++++++++++
 8 files changed, 1161 insertions(+), 131 deletions(-)
 create mode 100644 test/CodeGen/X86/hle-atomic16.ll
 create mode 100644 test/CodeGen/X86/hle-atomic32.ll
 create mode 100644 test/CodeGen/X86/hle-atomic64.ll
 create mode 100644 test/CodeGen/X86/hle-atomic8.ll

diff --git a/lib/Target/X86/X86ISelDAGToDAG.cpp b/lib/Target/X86/X86ISelDAGToDAG.cpp
index 6f13186..380df63 100644
--- a/lib/Target/X86/X86ISelDAGToDAG.cpp
+++ b/lib/Target/X86/X86ISelDAGToDAG.cpp
@@ -1494,12 +1494,20 @@ SDNode *X86DAGToDAGISel::SelectAtomic64(SDNode *Node, unsigned Opc) {
   SDValue In2L = Node->getOperand(2);
   SDValue In2H = Node->getOperand(3);
 
+  unsigned TargetFlags
+    = Subtarget->hasHLE() ? cast<MemIntrinsicSDNode>(Node)->getTargetFlags() :
+                            0;
+  assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+
+  SDValue TFlag = CurDAG->getTargetConstant(TargetFlags, MVT::i8);
+
   SDValue Tmp0, Tmp1, Tmp2, Tmp3, Tmp4;
   if (!SelectAddr(Node, In1, Tmp0, Tmp1, Tmp2, Tmp3, Tmp4))
     return NULL;
   MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);
   MemOp[0] = cast<MemSDNode>(Node)->getMemOperand();
-  const SDValue Ops[] = { Tmp0, Tmp1, Tmp2, Tmp3, Tmp4, In2L, In2H, Chain};
+  const SDValue Ops[] = { Tmp0, Tmp1, Tmp2, Tmp3, Tmp4, In2L, In2H, TFlag,
+                          Chain};
   SDNode *ResNode = CurDAG->getMachineNode(Opc, Node->getDebugLoc(),
                                            MVT::i32, MVT::i32, MVT::Other, Ops,
                                            array_lengthof(Ops));
@@ -1535,97 +1543,104 @@ enum AtomicSz {
   AtomicSzEnd
 };
 
-static const uint16_t AtomicOpcTbl[AtomicOpcEnd][AtomicSzEnd] = {
+enum AtomicTargetFlags {
+  TargetFlagNone,
+  TargetFlagXAcquire,
+  TargetFlagXRelease,
+  AtomicTfEnd
+};
+
+static const uint16_t AtomicOpcTbl[AtomicOpcEnd][AtomicSzEnd][AtomicTfEnd] = {
   {
-    X86::LOCK_ADD8mi,
-    X86::LOCK_ADD8mr,
-    X86::LOCK_ADD16mi8,
-    X86::LOCK_ADD16mi,
-    X86::LOCK_ADD16mr,
-    X86::LOCK_ADD32mi8,
-    X86::LOCK_ADD32mi,
-    X86::LOCK_ADD32mr,
-    X86::LOCK_ADD64mi8,
-    X86::LOCK_ADD64mi32,
-    X86::LOCK_ADD64mr,
+    { X86::LOCK_ADD8mi, X86::LOCK_ADDACQ8mi, X86::LOCK_ADDREL8mi },
+    { X86::LOCK_ADD8mr, X86::LOCK_ADDACQ8mr, X86::LOCK_ADDREL8mr },
+    { X86::LOCK_ADD16mi8, X86::LOCK_ADDACQ16mi8, X86::LOCK_ADDREL16mi8 },
+    { X86::LOCK_ADD16mi, X86::LOCK_ADDACQ16mi, X86::LOCK_ADDREL16mi },
+    { X86::LOCK_ADD16mr, X86::LOCK_ADDACQ16mr, X86::LOCK_ADDREL16mr },
+    { X86::LOCK_ADD32mi8, X86::LOCK_ADDACQ32mi8, X86::LOCK_ADDREL32mi8 },
+    { X86::LOCK_ADD32mi, X86::LOCK_ADDACQ32mi, X86::LOCK_ADDREL32mi },
+    { X86::LOCK_ADD32mr, X86::LOCK_ADDACQ32mr, X86::LOCK_ADDREL32mr },
+    { X86::LOCK_ADD64mi8, X86::LOCK_ADDACQ64mi8, X86::LOCK_ADDREL64mi8 },
+    { X86::LOCK_ADD64mi32, X86::LOCK_ADDACQ64mi32, X86::LOCK_ADDREL64mi32 },
+    { X86::LOCK_ADD64mr, X86::LOCK_ADDACQ64mr, X86::LOCK_ADDREL64mr }
   },
   {
-    X86::LOCK_SUB8mi,
-    X86::LOCK_SUB8mr,
-    X86::LOCK_SUB16mi8,
-    X86::LOCK_SUB16mi,
-    X86::LOCK_SUB16mr,
-    X86::LOCK_SUB32mi8,
-    X86::LOCK_SUB32mi,
-    X86::LOCK_SUB32mr,
-    X86::LOCK_SUB64mi8,
-    X86::LOCK_SUB64mi32,
-    X86::LOCK_SUB64mr,
+    { X86::LOCK_SUB8mi, X86::LOCK_SUBACQ8mi, X86::LOCK_SUBREL8mi },
+    { X86::LOCK_SUB8mr, X86::LOCK_SUBACQ8mr, X86::LOCK_SUBREL8mr },
+    { X86::LOCK_SUB16mi8, X86::LOCK_SUBACQ16mi8, X86::LOCK_SUBREL16mi8 },
+    { X86::LOCK_SUB16mi, X86::LOCK_SUBACQ16mi, X86::LOCK_SUBREL16mi },
+    { X86::LOCK_SUB16mr, X86::LOCK_SUBACQ16mr, X86::LOCK_SUBREL16mr },
+    { X86::LOCK_SUB32mi8, X86::LOCK_SUBACQ32mi8, X86::LOCK_SUBREL32mi8 },
+    { X86::LOCK_SUB32mi, X86::LOCK_SUBACQ32mi, X86::LOCK_SUBREL32mi },
+    { X86::LOCK_SUB32mr, X86::LOCK_SUBACQ32mr, X86::LOCK_SUBREL32mr },
+    { X86::LOCK_SUB64mi8, X86::LOCK_SUBACQ64mi8, X86::LOCK_SUBREL64mi8 },
+    { X86::LOCK_SUB64mi32, X86::LOCK_SUBACQ64mi32, X86::LOCK_SUBREL64mi32 },
+    { X86::LOCK_SUB64mr, X86::LOCK_SUBACQ64mr, X86::LOCK_SUBREL64mr }
   },
   {
-    0,
-    X86::LOCK_INC8m,
-    0,
-    0,
-    X86::LOCK_INC16m,
-    0,
-    0,
-    X86::LOCK_INC32m,
-    0,
-    0,
-    X86::LOCK_INC64m,
+    { 0, 0, 0 },
+    { X86::LOCK_INC8m, X86::LOCK_INCACQ8m, X86::LOCK_INCREL8m },
+    { 0, 0, 0 },
+    { 0, 0, 0 },
+    { X86::LOCK_INC16m, X86::LOCK_INCACQ16m, X86::LOCK_INCREL16m },
+    { 0, 0, 0 },
+    { 0, 0, 0 },
+    { X86::LOCK_INC32m, X86::LOCK_INCACQ32m, X86::LOCK_INCREL32m },
+    { 0, 0, 0 },
+    { 0, 0, 0 },
+    { X86::LOCK_INC64m, X86::LOCK_INCACQ64m, X86::LOCK_INCREL64m }
   },
   {
-    0,
-    X86::LOCK_DEC8m,
-    0,
-    0,
-    X86::LOCK_DEC16m,
-    0,
-    0,
-    X86::LOCK_DEC32m,
-    0,
-    0,
-    X86::LOCK_DEC64m,
+    { 0, 0, 0 },
+    { X86::LOCK_DEC8m, X86::LOCK_DECACQ8m, X86::LOCK_DECREL8m },
+    { 0, 0, 0 },
+    { 0, 0, 0 },
+    { X86::LOCK_DEC16m, X86::LOCK_DECACQ16m, X86::LOCK_DECREL16m },
+    { 0, 0, 0 },
+    { 0, 0, 0 },
+    { X86::LOCK_DEC32m, X86::LOCK_DECACQ32m, X86::LOCK_DECREL32m },
+    { 0, 0, 0 },
+    { 0, 0, 0 },
+    { X86::LOCK_DEC64m, X86::LOCK_DECACQ64m, X86::LOCK_DECREL64m }
   },
   {
-    X86::LOCK_OR8mi,
-    X86::LOCK_OR8mr,
-    X86::LOCK_OR16mi8,
-    X86::LOCK_OR16mi,
-    X86::LOCK_OR16mr,
-    X86::LOCK_OR32mi8,
-    X86::LOCK_OR32mi,
-    X86::LOCK_OR32mr,
-    X86::LOCK_OR64mi8,
-    X86::LOCK_OR64mi32,
-    X86::LOCK_OR64mr,
+    { X86::LOCK_OR8mi, X86::LOCK_ORACQ8mi, X86::LOCK_ORREL8mi },
+    { X86::LOCK_OR8mr, X86::LOCK_ORACQ8mr, X86::LOCK_ORREL8mr },
+    { X86::LOCK_OR16mi8, X86::LOCK_ORACQ16mi8, X86::LOCK_ORREL16mi8 },
+    { X86::LOCK_OR16mi, X86::LOCK_ORACQ16mi, X86::LOCK_ORREL16mi },
+    { X86::LOCK_OR16mr, X86::LOCK_ORACQ16mr, X86::LOCK_ORREL16mr },
+    { X86::LOCK_OR32mi8, X86::LOCK_ORACQ32mi8, X86::LOCK_ORREL32mi8 },
+    { X86::LOCK_OR32mi, X86::LOCK_ORACQ32mi, X86::LOCK_ORREL32mi },
+    { X86::LOCK_OR32mr, X86::LOCK_ORACQ32mr, X86::LOCK_ORREL32mr },
+    { X86::LOCK_OR64mi8, X86::LOCK_ORACQ64mi8, X86::LOCK_ORREL64mi8 },
+    { X86::LOCK_OR64mi32, X86::LOCK_ORACQ64mi32, X86::LOCK_ORREL64mi32 },
+    { X86::LOCK_OR64mr, X86::LOCK_ORACQ64mr, X86::LOCK_ORREL64mr }
   },
   {
-    X86::LOCK_AND8mi,
-    X86::LOCK_AND8mr,
-    X86::LOCK_AND16mi8,
-    X86::LOCK_AND16mi,
-    X86::LOCK_AND16mr,
-    X86::LOCK_AND32mi8,
-    X86::LOCK_AND32mi,
-    X86::LOCK_AND32mr,
-    X86::LOCK_AND64mi8,
-    X86::LOCK_AND64mi32,
-    X86::LOCK_AND64mr,
+    { X86::LOCK_AND8mi, X86::LOCK_ANDACQ8mi, X86::LOCK_ANDREL8mi },
+    { X86::LOCK_AND8mr, X86::LOCK_ANDACQ8mr, X86::LOCK_ANDREL8mr },
+    { X86::LOCK_AND16mi8, X86::LOCK_ANDACQ16mi8, X86::LOCK_ANDREL16mi8 },
+    { X86::LOCK_AND16mi, X86::LOCK_ANDACQ16mi, X86::LOCK_ANDREL16mi },
+    { X86::LOCK_AND16mr, X86::LOCK_ANDACQ16mr, X86::LOCK_ANDREL16mr },
+    { X86::LOCK_AND32mi8, X86::LOCK_ANDACQ32mi8, X86::LOCK_ANDREL32mi8 },
+    { X86::LOCK_AND32mi, X86::LOCK_ANDACQ32mi, X86::LOCK_ANDREL32mi },
+    { X86::LOCK_AND32mr, X86::LOCK_ANDACQ32mr, X86::LOCK_ANDREL32mr },
+    { X86::LOCK_AND64mi8, X86::LOCK_ANDACQ64mi8, X86::LOCK_ANDREL64mi8 },
+    { X86::LOCK_AND64mi32, X86::LOCK_ANDACQ64mi32, X86::LOCK_ANDREL64mi32 },
+    { X86::LOCK_AND64mr, X86::LOCK_ANDACQ64mr, X86::LOCK_ANDREL64mr }
   },
   {
-    X86::LOCK_XOR8mi,
-    X86::LOCK_XOR8mr,
-    X86::LOCK_XOR16mi8,
-    X86::LOCK_XOR16mi,
-    X86::LOCK_XOR16mr,
-    X86::LOCK_XOR32mi8,
-    X86::LOCK_XOR32mi,
-    X86::LOCK_XOR32mr,
-    X86::LOCK_XOR64mi8,
-    X86::LOCK_XOR64mi32,
-    X86::LOCK_XOR64mr,
+    { X86::LOCK_XOR8mi, X86::LOCK_XORACQ8mi, X86::LOCK_XORREL8mi },
+    { X86::LOCK_XOR8mr, X86::LOCK_XORACQ8mr, X86::LOCK_XORREL8mr },
+    { X86::LOCK_XOR16mi8, X86::LOCK_XORACQ16mi8, X86::LOCK_XORREL16mi8 },
+    { X86::LOCK_XOR16mi, X86::LOCK_XORACQ16mi, X86::LOCK_XORREL16mi },
+    { X86::LOCK_XOR16mr, X86::LOCK_XORACQ16mr, X86::LOCK_XORREL16mr },
+    { X86::LOCK_XOR32mi8, X86::LOCK_XORACQ32mi8, X86::LOCK_XORREL32mi8 },
+    { X86::LOCK_XOR32mi, X86::LOCK_XORACQ32mi, X86::LOCK_XORREL32mi },
+    { X86::LOCK_XOR32mr, X86::LOCK_XORACQ32mr, X86::LOCK_XORREL32mr },
+    { X86::LOCK_XOR64mi8, X86::LOCK_XORACQ64mi8, X86::LOCK_XORREL64mi8 },
+    { X86::LOCK_XOR64mi32, X86::LOCK_XORACQ64mi32, X86::LOCK_XORREL64mi32 },
+    { X86::LOCK_XOR64mr, X86::LOCK_XORACQ64mr, X86::LOCK_XORREL64mr }
   }
 };
 
@@ -1690,6 +1705,17 @@ SDNode *X86DAGToDAGISel::SelectAtomicLoadArith(SDNode *Node, EVT NVT) {
 
   DebugLoc dl = Node->getDebugLoc();
 
+  unsigned TargetFlags
+    = Subtarget->hasHLE() ? cast<AtomicSDNode>(Node)->getTargetFlags() : 0;
+
+  assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+
+  enum AtomicTargetFlags TFlag = TargetFlagNone;
+  if (TargetFlags & 1)
+    TFlag = TargetFlagXAcquire;
+  if (TargetFlags & 2)
+    TFlag = TargetFlagXRelease;
+
   // Optimize common patterns for __sync_or_and_fetch and similar arith
   // operations where the result is not used. This allows us to use the "lock"
   // version of the arithmetic instruction.
@@ -1718,7 +1744,7 @@ SDNode *X86DAGToDAGISel::SelectAtomicLoadArith(SDNode *Node, EVT NVT) {
       Op = ADD;
       break;
   }
-  
+
   Val = getAtomicLoadArithTargetConstant(CurDAG, dl, Op, NVT, Val);
   bool isUnOp = !Val.getNode();
   bool isCN = Val.getNode() && (Val.getOpcode() == ISD::TargetConstant);
@@ -1728,35 +1754,35 @@ SDNode *X86DAGToDAGISel::SelectAtomicLoadArith(SDNode *Node, EVT NVT) {
     default: return 0;
     case MVT::i8:
       if (isCN)
-        Opc = AtomicOpcTbl[Op][ConstantI8];
+        Opc = AtomicOpcTbl[Op][ConstantI8][TFlag];
       else
-        Opc = AtomicOpcTbl[Op][I8];
+        Opc = AtomicOpcTbl[Op][I8][TFlag];
       break;
     case MVT::i16:
       if (isCN) {
         if (immSext8(Val.getNode()))
-          Opc = AtomicOpcTbl[Op][SextConstantI16];
+          Opc = AtomicOpcTbl[Op][SextConstantI16][TFlag];
         else
-          Opc = AtomicOpcTbl[Op][ConstantI16];
+          Opc = AtomicOpcTbl[Op][ConstantI16][TFlag];
       } else
-        Opc = AtomicOpcTbl[Op][I16];
+        Opc = AtomicOpcTbl[Op][I16][TFlag];
       break;
     case MVT::i32:
       if (isCN) {
         if (immSext8(Val.getNode()))
-          Opc = AtomicOpcTbl[Op][SextConstantI32];
+          Opc = AtomicOpcTbl[Op][SextConstantI32][TFlag];
         else
-          Opc = AtomicOpcTbl[Op][ConstantI32];
+          Opc = AtomicOpcTbl[Op][ConstantI32][TFlag];
       } else
-        Opc = AtomicOpcTbl[Op][I32];
+        Opc = AtomicOpcTbl[Op][I32][TFlag];
       break;
     case MVT::i64:
-      Opc = AtomicOpcTbl[Op][I64];
+      Opc = AtomicOpcTbl[Op][I64][TFlag];
       if (isCN) {
         if (immSext8(Val.getNode()))
-          Opc = AtomicOpcTbl[Op][SextConstantI64];
+          Opc = AtomicOpcTbl[Op][SextConstantI64][TFlag];
         else if (i64immSExt32(Val.getNode()))
-          Opc = AtomicOpcTbl[Op][ConstantI64];
+          Opc = AtomicOpcTbl[Op][ConstantI64][TFlag];
       }
       break;
   }
diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
index d525e3d..423329a 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -12768,12 +12768,21 @@ static MachineBasicBlock *EmitXBegin(MachineInstr *MI, MachineBasicBlock *MBB,
 }
 
 // Get CMPXCHG opcode for the specified data type.
-static unsigned getCmpXChgOpcode(EVT VT) {
+static unsigned getCmpXChgOpcode(EVT VT, unsigned TargetFlags) {
+  assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+
+  static unsigned CmpXChgOps[4][3] = {
+    { X86::LCMPXCHG8,  X86::LCMPXCHGACQ8,  X86::LCMPXCHGREL8 },
+    { X86::LCMPXCHG16, X86::LCMPXCHGACQ16, X86::LCMPXCHGREL16 },
+    { X86::LCMPXCHG32, X86::LCMPXCHGACQ32, X86::LCMPXCHGREL32 },
+    { X86::LCMPXCHG64, X86::LCMPXCHGACQ64, X86::LCMPXCHGREL64 },
+  };
+
   switch (VT.getSimpleVT().SimpleTy) {
-  case MVT::i8:  return X86::LCMPXCHG8;
-  case MVT::i16: return X86::LCMPXCHG16;
-  case MVT::i32: return X86::LCMPXCHG32;
-  case MVT::i64: return X86::LCMPXCHG64;
+  case MVT::i8:  return CmpXChgOps[0][TargetFlags];
+  case MVT::i16: return CmpXChgOps[1][TargetFlags];
+  case MVT::i32: return CmpXChgOps[2][TargetFlags];
+  case MVT::i64: return CmpXChgOps[3][TargetFlags];
   default:
     break;
   }
@@ -12916,7 +12925,7 @@ X86TargetLowering::EmitAtomicLoadArith(MachineInstr *MI,
   MachineFunction::iterator I = MBB;
   ++I;
 
-  assert(MI->getNumOperands() <= X86::AddrNumOperands + 4 &&
+  assert(MI->getNumOperands() <= X86::AddrNumOperands + 5 &&
          "Unexpected number of operands");
 
   assert(MI->hasOneMemOperand() &&
@@ -12928,6 +12937,7 @@ X86TargetLowering::EmitAtomicLoadArith(MachineInstr *MI,
 
   unsigned DstReg, SrcReg;
   unsigned MemOpndSlot;
+  unsigned TargetFlags;
 
   unsigned CurOp = 0;
 
@@ -12935,12 +12945,13 @@ X86TargetLowering::EmitAtomicLoadArith(MachineInstr *MI,
   MemOpndSlot = CurOp;
   CurOp += X86::AddrNumOperands;
   SrcReg = MI->getOperand(CurOp++).getReg();
+  TargetFlags = MI->getOperand(CurOp++).getImm();
 
   const TargetRegisterClass *RC = MRI.getRegClass(DstReg);
   MVT::SimpleValueType VT = *RC->vt_begin();
   unsigned AccPhyReg = getX86SubSuperRegister(X86::EAX, VT);
 
-  unsigned LCMPXCHGOpc = getCmpXChgOpcode(VT);
+  unsigned LCMPXCHGOpc = getCmpXChgOpcode(VT, TargetFlags);
   unsigned LOADOpc = getLoadOpcode(VT);
 
   // For the atomic load-arith operator, we generate
@@ -13148,7 +13159,7 @@ X86TargetLowering::EmitAtomicLoadArith6432(MachineInstr *MI,
   MachineFunction::iterator I = MBB;
   ++I;
 
-  assert(MI->getNumOperands() <= X86::AddrNumOperands + 7 &&
+  assert(MI->getNumOperands() <= X86::AddrNumOperands + 8 &&
          "Unexpected number of operands");
 
   assert(MI->hasOneMemOperand() &&
@@ -13161,6 +13172,7 @@ X86TargetLowering::EmitAtomicLoadArith6432(MachineInstr *MI,
   unsigned DstLoReg, DstHiReg;
   unsigned SrcLoReg, SrcHiReg;
   unsigned MemOpndSlot;
+  unsigned TargetFlags;
 
   unsigned CurOp = 0;
 
@@ -13170,11 +13182,21 @@ X86TargetLowering::EmitAtomicLoadArith6432(MachineInstr *MI,
   CurOp += X86::AddrNumOperands;
   SrcLoReg = MI->getOperand(CurOp++).getReg();
   SrcHiReg = MI->getOperand(CurOp++).getReg();
+  TargetFlags = MI->getOperand(CurOp++).getImm();
+
+  assert(!(TargetFlags && !Subtarget->hasHLE()) &&
+         "'targetflags' is specified while HLE is disabled.");
+
+  assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+
+  bool IsXAcq = TargetFlags & 1;
+  bool IsXRel = TargetFlags & 2;
 
   const TargetRegisterClass *RC = &X86::GR32RegClass;
   const TargetRegisterClass *RC8 = &X86::GR8RegClass;
 
-  unsigned LCMPXCHGOpc = X86::LCMPXCHG8B;
+  unsigned LCMPXCHGOpc = IsXAcq ? X86::LCMPXCHG8BACQ :
+                         IsXRel ? X86::LCMPXCHG8BREL : X86::LCMPXCHG8B;
   unsigned LOADOpc = X86::MOV32rm;
 
   // For the atomic load-arith operator, we generate
diff --git a/lib/Target/X86/X86InstrCompiler.td b/lib/Target/X86/X86InstrCompiler.td
index f387962..353a4b4 100644
--- a/lib/Target/X86/X86InstrCompiler.td
+++ b/lib/Target/X86/X86InstrCompiler.td
@@ -515,32 +515,38 @@ multiclass PSEUDO_ATOMIC_LOAD_BINOP<string mnemonic> {
   let usesCustomInserter = 1, mayLoad = 1, mayStore = 1 in {
     let Defs = [EFLAGS, AL] in
     def NAME#8  : I<0, Pseudo, (outs GR8:$dst),
-                    (ins i8mem:$ptr, GR8:$val),
+                    (ins i8mem:$ptr, GR8:$val, i8imm:$flags),
                     !strconcat(mnemonic, "8 PSEUDO!"), []>;
     let Defs = [EFLAGS, AX] in
     def NAME#16 : I<0, Pseudo,(outs GR16:$dst),
-                    (ins i16mem:$ptr, GR16:$val),
+                    (ins i16mem:$ptr, GR16:$val, i8imm:$flags),
                     !strconcat(mnemonic, "16 PSEUDO!"), []>;
     let Defs = [EFLAGS, EAX] in
     def NAME#32 : I<0, Pseudo, (outs GR32:$dst),
-                    (ins i32mem:$ptr, GR32:$val),
+                    (ins i32mem:$ptr, GR32:$val, i8imm:$flags),
                     !strconcat(mnemonic, "32 PSEUDO!"), []>;
     let Defs = [EFLAGS, RAX] in
     def NAME#64 : I<0, Pseudo, (outs GR64:$dst),
-                    (ins i64mem:$ptr, GR64:$val),
+                    (ins i64mem:$ptr, GR64:$val, i8imm:$flags),
                     !strconcat(mnemonic, "64 PSEUDO!"), []>;
   }
 }
 
-multiclass PSEUDO_ATOMIC_LOAD_BINOP_PATS<string name, string frag> {
+multiclass PSEUDO_ATOMIC_LOAD_BINOP_PATS_WITH_FLAG<string name, string frag, int flag> {
   def : Pat<(!cast<PatFrag>(frag # "_8") addr:$ptr, GR8:$val),
-            (!cast<Instruction>(name # "8") addr:$ptr, GR8:$val)>;
+            (!cast<Instruction>(name # "8") addr:$ptr, GR8:$val, flag)>;
   def : Pat<(!cast<PatFrag>(frag # "_16") addr:$ptr, GR16:$val),
-            (!cast<Instruction>(name # "16") addr:$ptr, GR16:$val)>;
+            (!cast<Instruction>(name # "16") addr:$ptr, GR16:$val, flag)>;
   def : Pat<(!cast<PatFrag>(frag # "_32") addr:$ptr, GR32:$val),
-            (!cast<Instruction>(name # "32") addr:$ptr, GR32:$val)>;
+            (!cast<Instruction>(name # "32") addr:$ptr, GR32:$val, flag)>;
   def : Pat<(!cast<PatFrag>(frag # "_64") addr:$ptr, GR64:$val),
-            (!cast<Instruction>(name # "64") addr:$ptr, GR64:$val)>;
+            (!cast<Instruction>(name # "64") addr:$ptr, GR64:$val, flag)>;
+}
+
+multiclass PSEUDO_ATOMIC_LOAD_BINOP_PATS<string name, string frag> {
+  defm : PSEUDO_ATOMIC_LOAD_BINOP_PATS_WITH_FLAG<name, !strconcat(frag, "_none"), 0>;
+  defm : PSEUDO_ATOMIC_LOAD_BINOP_PATS_WITH_FLAG<name, !strconcat(frag, "_xacq"), 1>;
+  defm : PSEUDO_ATOMIC_LOAD_BINOP_PATS_WITH_FLAG<name, !strconcat(frag, "_xrel"), 2>;
 }
 
 // Atomic exchange, and, or, xor
@@ -566,7 +572,7 @@ multiclass PSEUDO_ATOMIC_LOAD_BINOP6432<string mnemonic> {
   let usesCustomInserter = 1, Defs = [EFLAGS, EAX, EDX],
       mayLoad = 1, mayStore = 1, hasSideEffects = 0 in
     def NAME#6432 : I<0, Pseudo, (outs GR32:$dst1, GR32:$dst2),
-                      (ins i64mem:$ptr, GR32:$val1, GR32:$val2),
+                      (ins i64mem:$ptr, GR32:$val1, GR32:$val2, i8imm:$flags),
                       !strconcat(mnemonic, "6432 PSEUDO!"), []>;
 }
 
@@ -685,11 +691,25 @@ def NAME#64mi8 : RIi8<{ImmOpc8{7}, ImmOpc8{6}, ImmOpc8{5}, ImmOpc8{4},
 
 }
 
-defm LOCK_ADD : LOCK_ArithBinOp<0x00, 0x80, 0x83, MRM0m, "add">;
-defm LOCK_SUB : LOCK_ArithBinOp<0x28, 0x80, 0x83, MRM5m, "sub">;
-defm LOCK_OR  : LOCK_ArithBinOp<0x08, 0x80, 0x83, MRM1m, "or">;
-defm LOCK_AND : LOCK_ArithBinOp<0x20, 0x80, 0x83, MRM4m, "and">;
-defm LOCK_XOR : LOCK_ArithBinOp<0x30, 0x80, 0x83, MRM6m, "xor">;
+defm LOCK_ADD    : LOCK_ArithBinOp<0x00, 0x80, 0x83, MRM0m, "add">;
+defm LOCK_ADDACQ : LOCK_ArithBinOp<0x00, 0x80, 0x83, MRM0m, "add">, XACQ;
+defm LOCK_ADDREL : LOCK_ArithBinOp<0x00, 0x80, 0x83, MRM0m, "add">, XREL;
+
+defm LOCK_SUB    : LOCK_ArithBinOp<0x28, 0x80, 0x83, MRM5m, "sub">;
+defm LOCK_SUBACQ : LOCK_ArithBinOp<0x28, 0x80, 0x83, MRM5m, "sub">, XACQ;
+defm LOCK_SUBREL : LOCK_ArithBinOp<0x28, 0x80, 0x83, MRM5m, "sub">, XREL;
+
+defm LOCK_OR     : LOCK_ArithBinOp<0x08, 0x80, 0x83, MRM1m, "or">;
+defm LOCK_ORACQ  : LOCK_ArithBinOp<0x08, 0x80, 0x83, MRM1m, "or">, XACQ;
+defm LOCK_ORREL  : LOCK_ArithBinOp<0x08, 0x80, 0x83, MRM1m, "or">, XREL;
+
+defm LOCK_AND    : LOCK_ArithBinOp<0x20, 0x80, 0x83, MRM4m, "and">;
+defm LOCK_ANDACQ : LOCK_ArithBinOp<0x20, 0x80, 0x83, MRM4m, "and">, XACQ;
+defm LOCK_ANDREL : LOCK_ArithBinOp<0x20, 0x80, 0x83, MRM4m, "and">, XREL;
+
+defm LOCK_XOR    : LOCK_ArithBinOp<0x30, 0x80, 0x83, MRM6m, "xor">;
+defm LOCK_XORACQ : LOCK_ArithBinOp<0x30, 0x80, 0x83, MRM6m, "xor">, XACQ;
+defm LOCK_XORREL : LOCK_ArithBinOp<0x30, 0x80, 0x83, MRM6m, "xor">, XREL;
 
 // Optimized codegen when the non-memory output is not used.
 multiclass LOCK_ArithUnOp<bits<8> Opc8, bits<8> Opc, Format Form,
@@ -712,7 +732,12 @@ def NAME#64m : RI<Opc, Form, (outs), (ins i64mem:$dst),
 }
 
 defm LOCK_INC    : LOCK_ArithUnOp<0xFE, 0xFF, MRM0m, "inc">;
+defm LOCK_INCACQ : LOCK_ArithUnOp<0xFE, 0xFF, MRM0m, "inc">, XACQ;
+defm LOCK_INCREL : LOCK_ArithUnOp<0xFE, 0xFF, MRM0m, "inc">, XREL;
+
 defm LOCK_DEC    : LOCK_ArithUnOp<0xFE, 0xFF, MRM1m, "dec">;
+defm LOCK_DECACQ : LOCK_ArithUnOp<0xFE, 0xFF, MRM1m, "dec">, XACQ;
+defm LOCK_DECREL : LOCK_ArithUnOp<0xFE, 0xFF, MRM1m, "dec">, XREL;
 
 // Atomic compare and swap.
 multiclass LCMPXCHG_UnOp<bits<8> Opc, Format Form, string mnemonic,
@@ -749,20 +774,39 @@ let isCodeGenOnly = 1 in {
 }
 
 let Defs = [EAX, EDX, EFLAGS], Uses = [EAX, EBX, ECX, EDX] in {
-defm LCMPXCHG8B : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg8b",
-                                X86cas8, i64mem,
-                                IIC_CMPX_LOCK_8B>;
+defm LCMPXCHG8B    : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg8b",
+                                   X86cas8_none, i64mem,
+                                   IIC_CMPX_LOCK_8B>;
+defm LCMPXCHG8BACQ : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg8b",
+                                   X86cas8_xacq, i64mem,
+                                   IIC_CMPX_LOCK_8B>, XACQ;
+defm LCMPXCHG8BREL : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg8b",
+                                   X86cas8_xrel, i64mem,
+                                   IIC_CMPX_LOCK_8B>, XREL;
 }
 
 let Defs = [RAX, RDX, EFLAGS], Uses = [RAX, RBX, RCX, RDX],
     Predicates = [HasCmpxchg16b] in {
-defm LCMPXCHG16B : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg16b",
-                                 X86cas16, i128mem,
-                                 IIC_CMPX_LOCK_16B>, REX_W;
+defm LCMPXCHG16B    : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg16b",
+                                    X86cas16_none, i128mem,
+                                    IIC_CMPX_LOCK_16B>, REX_W;
+defm LCMPXCHG16BACQ : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg16b",
+                                    X86cas16_xrel, i128mem,
+                                    IIC_CMPX_LOCK_16B>, REX_W, XACQ;
+defm LCMPXCHG16BREL : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg16b",
+                                    X86cas16_xrel, i128mem,
+                                    IIC_CMPX_LOCK_16B>, REX_W, XREL;
 }
 
-defm LCMPXCHG : LCMPXCHG_BinOp<0xB0, 0xB1, MRMDestMem, "cmpxchg",
-                               X86cas, IIC_CMPX_LOCK_8, IIC_CMPX_LOCK>;
+defm LCMPXCHG    : LCMPXCHG_BinOp<0xB0, 0xB1, MRMDestMem, "cmpxchg",
+                                  X86cas_none, IIC_CMPX_LOCK_8,
+                                  IIC_CMPX_LOCK>;
+defm LCMPXCHGACQ : LCMPXCHG_BinOp<0xB0, 0xB1, MRMDestMem, "cmpxchg",
+                                  X86cas_xacq, IIC_CMPX_LOCK_8,
+                                  IIC_CMPX_LOCK>, XACQ;
+defm LCMPXCHGREL : LCMPXCHG_BinOp<0xB0, 0xB1, MRMDestMem, "cmpxchg",
+                                  X86cas_xrel, IIC_CMPX_LOCK_8,
+                                  IIC_CMPX_LOCK>, XREL;
 
 // Atomic exchange and add
 multiclass ATOMIC_LOAD_BINOP<bits<8> opc8, bits<8> opc, string mnemonic,
@@ -799,9 +843,15 @@ multiclass ATOMIC_LOAD_BINOP<bits<8> opc8, bits<8> opc, string mnemonic,
   }
 }
 
-defm LXADD : ATOMIC_LOAD_BINOP<0xc0, 0xc1, "xadd", "atomic_load_add",
-                               IIC_XADD_LOCK_MEM8, IIC_XADD_LOCK_MEM>,
-             TB, LOCK;
+defm LXADD    : ATOMIC_LOAD_BINOP<0xc0, 0xc1, "xadd", "atomic_load_add_none",
+                                  IIC_XADD_LOCK_MEM8, IIC_XADD_LOCK_MEM>,
+                TB, LOCK;
+defm LXADDACQ : ATOMIC_LOAD_BINOP<0xc0, 0xc1, "xadd", "atomic_load_add_xacq",
+                                  IIC_XADD_LOCK_MEM8, IIC_XADD_LOCK_MEM>,
+                TB, LOCK, XACQ;
+defm LXADDREL : ATOMIC_LOAD_BINOP<0xc0, 0xc1, "xadd", "atomic_load_add_xrel",
+                                  IIC_XADD_LOCK_MEM8, IIC_XADD_LOCK_MEM>,
+                TB, LOCK, XREL;
 
 def ACQUIRE_MOV8rm  : I<0, Pseudo, (outs GR8 :$dst), (ins i8mem :$src),
                       "#ACQUIRE_MOV PSEUDO!",
@@ -818,16 +868,36 @@ def ACQUIRE_MOV64rm : I<0, Pseudo, (outs GR64:$dst), (ins i64mem:$src),
 
 def RELEASE_MOV8mr  : I<0, Pseudo, (outs), (ins i8mem :$dst, GR8 :$src),
                         "#RELEASE_MOV PSEUDO!",
-                        [(atomic_store_8  addr:$dst, GR8 :$src)]>;
+                        [(atomic_store_none_8  addr:$dst, GR8 :$src)]>;
 def RELEASE_MOV16mr : I<0, Pseudo, (outs), (ins i16mem:$dst, GR16:$src),
                         "#RELEASE_MOV PSEUDO!",
-                        [(atomic_store_16 addr:$dst, GR16:$src)]>;
+                        [(atomic_store_none_16 addr:$dst, GR16:$src)]>;
 def RELEASE_MOV32mr : I<0, Pseudo, (outs), (ins i32mem:$dst, GR32:$src),
                         "#RELEASE_MOV PSEUDO!",
-                        [(atomic_store_32 addr:$dst, GR32:$src)]>;
+                        [(atomic_store_none_32 addr:$dst, GR32:$src)]>;
 def RELEASE_MOV64mr : I<0, Pseudo, (outs), (ins i64mem:$dst, GR64:$src),
                         "#RELEASE_MOV PSEUDO!",
-                        [(atomic_store_64 addr:$dst, GR64:$src)]>;
+                        [(atomic_store_none_64 addr:$dst, GR64:$src)]>;
+
+multiclass ATOMIC_STORE<bits<8> opc8, bits<8> opc, string mnemonic, string frag> {
+  let isCodeGenOnly = 1 in {
+    def NAME#8mr  : I<opc8, MRMDestMem, (outs), (ins i8mem:$dst, GR8:$src),
+                      !strconcat(mnemonic, "{b}\t{$src, $dst|$dst, $src}"),
+                      [(!cast<PatFrag>(frag # "_8") addr:$dst, GR8:$src)]>;
+    def NAME#16mr : I<opc, MRMDestMem, (outs), (ins i16mem:$dst, GR16:$src),
+                      !strconcat(mnemonic, "{w}\t{$src, $dst|$dst, $src}"),
+                      [(!cast<PatFrag>(frag # "_16") addr:$dst, GR16:$src)]>,
+                    OpSize;
+    def NAME#32mr : I<opc, MRMDestMem, (outs), (ins i32mem:$dst, GR32:$src),
+                      !strconcat(mnemonic, "{l}\t{$src, $dst|$dst, $src}"),
+                      [(!cast<PatFrag>(frag # "_32") addr:$dst, GR32:$src)]>;
+    def NAME#64mr : RI<opc, MRMDestMem, (outs), (ins i64mem:$dst, GR64:$src),
+                       !strconcat(mnemonic, "{q}\t{$src, $dst|$dst, $src}"),
+                       [(!cast<PatFrag>(frag # "_64") addr:$dst, GR64:$src)]>;
+  }
+}
+
+defm XRELEASE_MOV : ATOMIC_STORE<0x88, 0x89, "mov", "atomic_store_xrel">, XREL;
 
 //===----------------------------------------------------------------------===//
 // Conditional Move Pseudo Instructions.
diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td
index 04d8f19..1d272c1 100644
--- a/lib/Target/X86/X86InstrInfo.td
+++ b/lib/Target/X86/X86InstrInfo.td
@@ -754,6 +754,162 @@ def trunc_su : PatFrag<(ops node:$src), (trunc node:$src), [{
   return N->hasOneUse();
 }]>;
 
+// Helper frag for atomic with target flags.
+class ATOMIC_NONE<dag ops, dag frag> : PatFrag<ops, frag, [{
+  unsigned TargetFlags = cast<AtomicSDNode>(N)->getTargetFlags();
+  assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+  return !Subtarget->hasHLE() || (TargetFlags == 0);
+}]>;
+
+class ATOMIC_XACQ<dag ops, dag frag> : PatFrag<ops, frag, [{
+  unsigned TargetFlags = cast<AtomicSDNode>(N)->getTargetFlags();
+  assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+  return Subtarget->hasHLE() && (TargetFlags & 1);
+}]>;
+
+class ATOMIC_XREL<dag ops, dag frag> : PatFrag<ops, frag, [{
+  unsigned TargetFlags = cast<AtomicSDNode>(N)->getTargetFlags();
+  assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+  return Subtarget->hasHLE() && (TargetFlags & 2);
+}]>;
+
+class MEMINTRINSIC_NONE<dag ops, dag frag> : PatFrag<ops, frag, [{
+  unsigned TargetFlags = cast<MemIntrinsicSDNode>(N)->getTargetFlags();
+  assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+  return !Subtarget->hasHLE() || (TargetFlags == 0);
+}]>;
+
+class MEMINTRINSIC_XACQ<dag ops, dag frag> : PatFrag<ops, frag, [{
+  unsigned TargetFlags = cast<MemIntrinsicSDNode>(N)->getTargetFlags();
+  assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+  return Subtarget->hasHLE() && (TargetFlags & 1);
+}]>;
+
+class MEMINTRINSIC_XREL<dag ops, dag frag> : PatFrag<ops, frag, [{
+  unsigned TargetFlags = cast<MemIntrinsicSDNode>(N)->getTargetFlags();
+  assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+  return Subtarget->hasHLE() && (TargetFlags & 2);
+}]>;
+
+multiclass atomic_unop<string frag> {
+  def  _none_8 : ATOMIC_NONE<(ops node:$ptr),
+                             (!cast<PatFrag>(frag #  "_8") node:$ptr)>;
+  def _none_16 : ATOMIC_NONE<(ops node:$ptr),
+                             (!cast<PatFrag>(frag # "_16") node:$ptr)>;
+  def _none_32 : ATOMIC_NONE<(ops node:$ptr),
+                             (!cast<PatFrag>(frag # "_32") node:$ptr)>;
+  def _none_64 : ATOMIC_NONE<(ops node:$ptr),
+                             (!cast<PatFrag>(frag # "_64") node:$ptr)>;
+  def  _xacq_8 : ATOMIC_XACQ<(ops node:$ptr),
+                             (!cast<PatFrag>(frag #  "_8") node:$ptr)>;
+  def _xacq_16 : ATOMIC_XACQ<(ops node:$ptr),
+                             (!cast<PatFrag>(frag # "_16") node:$ptr)>;
+  def _xacq_32 : ATOMIC_XACQ<(ops node:$ptr),
+                             (!cast<PatFrag>(frag # "_32") node:$ptr)>;
+  def _xacq_64 : ATOMIC_XACQ<(ops node:$ptr),
+                             (!cast<PatFrag>(frag # "_64") node:$ptr)>;
+  def  _xrel_8 : ATOMIC_XREL<(ops node:$ptr),
+                             (!cast<PatFrag>(frag #  "_8") node:$ptr)>;
+  def _xrel_16 : ATOMIC_XREL<(ops node:$ptr),
+                             (!cast<PatFrag>(frag # "_16") node:$ptr)>;
+  def _xrel_32 : ATOMIC_XREL<(ops node:$ptr),
+                             (!cast<PatFrag>(frag # "_32") node:$ptr)>;
+  def _xrel_64 : ATOMIC_XREL<(ops node:$ptr),
+                             (!cast<PatFrag>(frag # "_64") node:$ptr)>;
+}
+
+multiclass atomic_binop<string frag> {
+  def  _none_8 : ATOMIC_NONE<(ops node:$ptr, node:$val),
+                             (!cast<PatFrag>(frag #  "_8") node:$ptr, node:$val)>;
+  def _none_16 : ATOMIC_NONE<(ops node:$ptr, node:$val),
+                             (!cast<PatFrag>(frag # "_16") node:$ptr, node:$val)>;
+  def _none_32 : ATOMIC_NONE<(ops node:$ptr, node:$val),
+                             (!cast<PatFrag>(frag # "_32") node:$ptr, node:$val)>;
+  def _none_64 : ATOMIC_NONE<(ops node:$ptr, node:$val),
+                             (!cast<PatFrag>(frag # "_64") node:$ptr, node:$val)>;
+  def  _xacq_8 : ATOMIC_XACQ<(ops node:$ptr, node:$val),
+                             (!cast<PatFrag>(frag #  "_8") node:$ptr, node:$val)>;
+  def _xacq_16 : ATOMIC_XACQ<(ops node:$ptr, node:$val),
+                             (!cast<PatFrag>(frag # "_16") node:$ptr, node:$val)>;
+  def _xacq_32 : ATOMIC_XACQ<(ops node:$ptr, node:$val),
+                             (!cast<PatFrag>(frag # "_32") node:$ptr, node:$val)>;
+  def _xacq_64 : ATOMIC_XACQ<(ops node:$ptr, node:$val),
+                             (!cast<PatFrag>(frag # "_64") node:$ptr, node:$val)>;
+  def  _xrel_8 : ATOMIC_XREL<(ops node:$ptr, node:$val),
+                             (!cast<PatFrag>(frag #  "_8") node:$ptr, node:$val)>;
+  def _xrel_16 : ATOMIC_XREL<(ops node:$ptr, node:$val),
+                             (!cast<PatFrag>(frag # "_16") node:$ptr, node:$val)>;
+  def _xrel_32 : ATOMIC_XREL<(ops node:$ptr, node:$val),
+                             (!cast<PatFrag>(frag # "_32") node:$ptr, node:$val)>;
+  def _xrel_64 : ATOMIC_XREL<(ops node:$ptr, node:$val),
+                             (!cast<PatFrag>(frag # "_64") node:$ptr, node:$val)>;
+}
+
+multiclass atomic_ternop<string frag> {
+  def  _none_8 : ATOMIC_NONE<(ops node:$ptr, node:$cmp, node:$swap),
+                             (!cast<PatFrag>(frag #  "_8") node:$ptr, node:$cmp, node:$swap)>;
+  def _none_16 : ATOMIC_NONE<(ops node:$ptr, node:$cmp, node:$swap),
+                             (!cast<PatFrag>(frag # "_16") node:$ptr, node:$cmp, node:$swap)>;
+  def _none_32 : ATOMIC_NONE<(ops node:$ptr, node:$cmp, node:$swap),
+                             (!cast<PatFrag>(frag # "_32") node:$ptr, node:$cmp, node:$swap)>;
+  def _none_64 : ATOMIC_NONE<(ops node:$ptr, node:$cmp, node:$swap),
+                             (!cast<PatFrag>(frag # "_64") node:$ptr, node:$cmp, node:$swap)>;
+  def  _xacq_8 : ATOMIC_XACQ<(ops node:$ptr, node:$cmp, node:$swap),
+                             (!cast<PatFrag>(frag #  "_8") node:$ptr, node:$cmp, node:$swap)>;
+  def _xacq_16 : ATOMIC_XACQ<(ops node:$ptr, node:$cmp, node:$swap),
+                             (!cast<PatFrag>(frag # "_16") node:$ptr, node:$cmp, node:$swap)>;
+  def _xacq_32 : ATOMIC_XACQ<(ops node:$ptr, node:$cmp, node:$swap),
+                             (!cast<PatFrag>(frag # "_32") node:$ptr, node:$cmp, node:$swap)>;
+  def _xacq_64 : ATOMIC_XACQ<(ops node:$ptr, node:$cmp, node:$swap),
+                             (!cast<PatFrag>(frag # "_64") node:$ptr, node:$cmp, node:$swap)>;
+  def  _xrel_8 : ATOMIC_XREL<(ops node:$ptr, node:$cmp, node:$swap),
+                             (!cast<PatFrag>(frag #  "_8") node:$ptr, node:$cmp, node:$swap)>;
+  def _xrel_16 : ATOMIC_XREL<(ops node:$ptr, node:$cmp, node:$swap),
+                             (!cast<PatFrag>(frag # "_16") node:$ptr, node:$cmp, node:$swap)>;
+  def _xrel_32 : ATOMIC_XREL<(ops node:$ptr, node:$cmp, node:$swap),
+                             (!cast<PatFrag>(frag # "_32") node:$ptr, node:$cmp, node:$swap)>;
+  def _xrel_64 : ATOMIC_XREL<(ops node:$ptr, node:$cmp, node:$swap),
+                             (!cast<PatFrag>(frag # "_64") node:$ptr, node:$cmp, node:$swap)>;
+}
+
+// FIXME: some primitives doesn't support XACQUIRE or XRELEASE: e.g.
+// 'load' cannot be used with neither XACQUIRE or XRELEASE;
+// 'store' can only be used with XRELEASE;
+
+defm atomic_cmp_swap  : atomic_ternop<"atomic_cmp_swap">;
+defm atomic_load_add  : atomic_binop<"atomic_load_add">;
+defm atomic_swap      : atomic_binop<"atomic_swap">;
+defm atomic_load_sub  : atomic_binop<"atomic_load_sub">;
+defm atomic_load_and  : atomic_binop<"atomic_load_and">;
+defm atomic_load_or   : atomic_binop<"atomic_load_or">;
+defm atomic_load_xor  : atomic_binop<"atomic_load_xor">;
+defm atomic_load_nand : atomic_binop<"atomic_load_nand">;
+defm atomic_load_min  : atomic_binop<"atomic_load_min">;
+defm atomic_load_max  : atomic_binop<"atomic_load_max">;
+defm atomic_load_umin : atomic_binop<"atomic_load_umin">;
+defm atomic_load_umax : atomic_binop<"atomic_load_umax">;
+defm atomic_store     : atomic_binop<"atomic_store">;
+defm atomic_load      : atomic_unop<"atomic_load">;
+
+multiclass memintrinsic_unop<SDNode opnode> {
+  def _none : MEMINTRINSIC_NONE<(ops node:$ptr), (opnode node:$ptr)>;
+  def _xacq : MEMINTRINSIC_XACQ<(ops node:$ptr), (opnode node:$ptr)>;
+  def _xrel : MEMINTRINSIC_XREL<(ops node:$ptr), (opnode node:$ptr)>;
+}
+
+multiclass memintrinsic_ternop<SDNode opnode> {
+  def _none : MEMINTRINSIC_NONE<(ops node:$ptr, node:$val, node:$imm),
+                                (opnode node:$ptr, node:$val, node:$imm)>;
+  def _xacq : MEMINTRINSIC_XACQ<(ops node:$ptr, node:$val, node:$imm),
+                                (opnode node:$ptr, node:$val, node:$imm)>;
+  def _xrel : MEMINTRINSIC_XREL<(ops node:$ptr, node:$val, node:$imm),
+                                (opnode node:$ptr, node:$val, node:$imm)>;
+}
+
+defm X86cas : memintrinsic_ternop<X86cas>;
+defm X86cas8 : memintrinsic_unop<X86cas8>;
+defm X86cas16 : memintrinsic_unop<X86cas16>;
+
 //===----------------------------------------------------------------------===//
 // Instruction list.
 //
@@ -1350,7 +1506,11 @@ multiclass ATOMIC_SWAP<bits<8> opc8, bits<8> opc, string mnemonic, string frag,
   }
 }
 
-defm XCHG    : ATOMIC_SWAP<0x86, 0x87, "xchg", "atomic_swap", IIC_XCHG_MEM>;
+defm XCHG    : ATOMIC_SWAP<0x86, 0x87, "xchg", "atomic_swap_none", IIC_XCHG_MEM>;
+let isCodeGenOnly = 1 in {
+defm XCHGACQ : ATOMIC_SWAP<0x86, 0x87, "xchg", "atomic_swap_xacq", IIC_XCHG_MEM>, XACQ;
+defm XCHGREL : ATOMIC_SWAP<0x86, 0x87, "xchg", "atomic_swap_xrel", IIC_XCHG_MEM>, XREL;
+}
 
 // Swap between registers.
 let Constraints = "$val = $dst" in {
diff --git a/test/CodeGen/X86/hle-atomic16.ll b/test/CodeGen/X86/hle-atomic16.ll
new file mode 100644
index 0000000..f6c7374
--- /dev/null
+++ b/test/CodeGen/X86/hle-atomic16.ll
@@ -0,0 +1,188 @@
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=+hle | FileCheck %s --check-prefix X64HLE
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=-hle | FileCheck %s --check-prefix X64NOHLE
+
+@sc16 = external global i16
+
+; 16-bit
+
+define void @atomic_fetch_add16() nounwind {
+; X64HLE:   atomic_fetch_add16
+; X64NOHLE: atomic_fetch_add16
+  %t0 = atomicrmw add  i16* @sc16, i16 1 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       incw
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     incw
+  %t1 = atomicrmw add  i16* @sc16, i16 5 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       xaddw
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     xaddw
+  %t2 = atomicrmw add  i16* @sc16, i16 %t1 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       addw
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     addw
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_sub16() nounwind {
+; X64HLE:   atomic_fetch_sub16
+; X64NOHLE: atomic_fetch_sub16
+  %t3 = atomicrmw sub  i16* @sc16, i16 1 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       decw
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     decw
+  %t4 = atomicrmw sub  i16* @sc16, i16 5 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       xaddw
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     xaddw
+  %t5 = atomicrmw sub  i16* @sc16, i16 %t4 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       subw
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     subw
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_logic16() nounwind {
+; X64HLE:   atomic_fetch_logic16
+; X64NOHLE: atomic_fetch_logic16
+  %t6 = atomicrmw and  i16* @sc16, i16 5 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       andw
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     andw
+  %t7 = atomicrmw or   i16* @sc16, i16 5 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       orw
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     orw
+  %t8 = atomicrmw xor  i16* @sc16, i16 5 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       xorw
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     xorw
+  %t9 = atomicrmw nand i16* @sc16, i16 5 acquire, !targetflags !1
+; X64HLE:       andw
+; X64HLE:       notw
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       cmpxchgw
+; X64NOHLE:     andw
+; X64NOHLE:     notw
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     cmpxchgw
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_minmax16() nounwind {
+; X64HLE:   atomic_fetch_minmax16
+; X64NOHLE: atomic_fetch_minmax16
+  %t0 = atomicrmw max  i16* @sc16, i16 5 acquire, !targetflags !0
+; X64HLE:       cmpw
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       cmpxchgw
+; X64NOHLE:     cmpw
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     cmpxchgw
+  %t1 = atomicrmw min  i16* @sc16, i16 5 acquire, !targetflags !1
+; X64HLE:       cmpw
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       cmpxchgw
+; X64NOHLE:     cmpw
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     cmpxchgw
+  %t2 = atomicrmw umax i16* @sc16, i16 5 acquire, !targetflags !0
+; X64HLE:       cmpw
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       cmpxchgw
+; X64NOHLE:     cmpw
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     cmpxchgw
+  %t3 = atomicrmw umin i16* @sc16, i16 5 acquire, !targetflags !1
+; X64HLE:       cmpw
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       cmpxchgw
+; X64NOHLE:     cmpw
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     cmpxchgw
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_misc16() nounwind {
+; X64HLE:   atomic_fetch_misc16
+; X64NOHLE: atomic_fetch_misc16
+  %t4 = cmpxchg i16* @sc16, i16 0, i16 1 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       cmpxchgw
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     cmpxchgw
+  store atomic i16 0, i16* @sc16 release, align 2, !targetflags !1
+; X64HLE-NOT:   lock
+; X64HLE:       xrelease
+; X64HLE:       movw
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     movw
+  %t5 = atomicrmw xchg i16* @sc16, i16 %t4 acquire, !targetflags !0
+; X64HLE-NOT:   lock
+; X64HLE:       xacquire
+; X64HLE:       xchgw
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     xchgw
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+!0 = metadata !{i32 1}
+!1 = metadata !{i32 2}
diff --git a/test/CodeGen/X86/hle-atomic32.ll b/test/CodeGen/X86/hle-atomic32.ll
new file mode 100644
index 0000000..02f4bef
--- /dev/null
+++ b/test/CodeGen/X86/hle-atomic32.ll
@@ -0,0 +1,188 @@
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=+hle | FileCheck %s --check-prefix X64HLE
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=-hle | FileCheck %s --check-prefix X64NOHLE
+
+@sc32 = external global i32
+
+; 32-bit
+
+define void @atomic_fetch_add32() nounwind {
+; X64HLE:   atomic_fetch_add32
+; X64NOHLE: atomic_fetch_add32
+  %t0 = atomicrmw add  i32* @sc32, i32 1 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       incl
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     incl
+  %t1 = atomicrmw add  i32* @sc32, i32 5 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       xaddl
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     xaddl
+  %t2 = atomicrmw add  i32* @sc32, i32%t1 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       addl
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     addl
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_sub32() nounwind {
+; X64HLE:   atomic_fetch_sub32
+; X64NOHLE: atomic_fetch_sub32
+ %t0 = atomicrmw sub  i32* @sc32, i32 1 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       decl
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     decl
+ %t1 = atomicrmw sub  i32* @sc32, i32 5 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       xaddl
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     xaddl
+ %t2 = atomicrmw sub  i32* @sc32, i32%t1 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       subl
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     subl
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_logic32() nounwind {
+; X64HLE:   atomic_fetch_logic32
+; X64NOHLE: atomic_fetch_logic32
+ %t0 = atomicrmw and  i32* @sc32, i32 5 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       andl
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     andl
+ %t1 = atomicrmw or   i32* @sc32, i32 5 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       orl
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     orl
+ %t2 = atomicrmw xor  i32* @sc32, i32 5 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       xorl
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     xorl
+ %t3 = atomicrmw nand i32* @sc32, i32 5 acquire, !targetflags !1
+; X64HLE:       andl
+; X64HLE:       notl
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       cmpxchgl
+; X64NOHLE:     andl
+; X64NOHLE:     notl
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     cmpxchgl
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_minmax32() nounwind {
+; X64HLE:   atomic_fetch_minmax32
+; X64NOHLE: atomic_fetch_minmax32
+  %t0 = atomicrmw max  i32* @sc32, i32 5 acquire, !targetflags !0
+; X64HLE:       cmpl
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       cmpxchgl
+; X64NOHLE:     cmpl
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     cmpxchgl
+  %t1 = atomicrmw min  i32* @sc32, i32 5 acquire, !targetflags !1
+; X64HLE:       cmpl
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       cmpxchgl
+; X64NOHLE:     cmpl
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     cmpxchgl
+  %t2 = atomicrmw umax i32* @sc32, i32 5 acquire, !targetflags !0
+; X64HLE:       cmpl
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       cmpxchgl
+; X64NOHLE:     cmpl
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     cmpxchgl
+  %t3 = atomicrmw umin i32* @sc32, i32 5 acquire, !targetflags !1
+; X64HLE:       cmpl
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       cmpxchgl
+; X64NOHLE:     cmpl
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     cmpxchgl
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_misc32() nounwind {
+; X64HLE:   atomic_fetch_misc32
+; X64NOHLE: atomic_fetch_misc32
+  %t0 = cmpxchg i32* @sc32, i32 0, i32 1 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       cmpxchgl
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     cmpxchgl
+  store atomic i32 0, i32* @sc32 release, align 4, !targetflags !1
+; X64HLE-NOT:   lock
+; X64HLE:       xrelease
+; X64HLE:       movl
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     movl
+  %t1 = atomicrmw xchg i32* @sc32, i32 %t0 acquire, !targetflags !0
+; X64HLE-NOT:   lock
+; X64HLE:       xacquire
+; X64HLE:       xchgl
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     xchgl
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+!0 = metadata !{i32 1}
+!1 = metadata !{i32 2}
diff --git a/test/CodeGen/X86/hle-atomic64.ll b/test/CodeGen/X86/hle-atomic64.ll
new file mode 100644
index 0000000..f155aed
--- /dev/null
+++ b/test/CodeGen/X86/hle-atomic64.ll
@@ -0,0 +1,188 @@
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=+hle | FileCheck %s --check-prefix X64HLE
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=-hle | FileCheck %s --check-prefix X64NOHLE
+
+@sc64 = external global i64
+
+; 64-bit
+
+define void @atomic_fetch_add64() nounwind {
+; X64HLE:   atomic_fetch_add64
+; X64NOHLE: atomic_fetch_add64
+  %t0 = atomicrmw add  i64* @sc64, i64 1 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       incq
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     incq
+  %t1 = atomicrmw add  i64* @sc64, i64 5 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       xaddq
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     xaddq
+  %t2 = atomicrmw add  i64* @sc64, i64 %t1 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       addq
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     addq
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_sub64() nounwind {
+; X64HLE:   atomic_fetch_sub64
+; X64NOHLE: atomic_fetch_sub64
+  %t3 = atomicrmw sub  i64* @sc64, i64 1 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       decq
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     decq
+  %t4 = atomicrmw sub  i64* @sc64, i64 5 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       xaddq
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     xaddq
+  %t5 = atomicrmw sub  i64* @sc64, i64 %t4 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       subq
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     subq
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_logic64() nounwind {
+; X64HLE:   atomic_fetch_logic64
+; X64NOHLE: atomic_fetch_logic64
+  %t6 = atomicrmw and  i64* @sc64, i64 5 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       andq
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     andq
+  %t7 = atomicrmw or   i64* @sc64, i64 5 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       orq
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     orq
+  %t8 = atomicrmw xor  i64* @sc64, i64 5 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       xorq
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     xorq
+  %t9 = atomicrmw nand i64* @sc64, i64 5 acquire, !targetflags !1
+; X64HLE:       andq
+; X64HLE:       notq
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       cmpxchgq
+; X64NOHLE:     andq
+; X64NOHLE:     notq
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     cmpxchgq
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_minmax64() nounwind {
+; X64HLE:   atomic_fetch_minmax64
+; X64NOHLE: atomic_fetch_minmax64
+  %t0 = atomicrmw max  i64* @sc64, i64 5 acquire, !targetflags !0
+; X64HLE:       cmpq
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       cmpxchgq
+; X64NOHLE:     cmpq
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     cmpxchgq
+  %t1 = atomicrmw min  i64* @sc64, i64 5 acquire, !targetflags !1
+; X64HLE:       cmpq
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       cmpxchgq
+; X64NOHLE:     cmpq
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     cmpxchgq
+  %t2 = atomicrmw umax i64* @sc64, i64 5 acquire, !targetflags !0
+; X64HLE:       cmpq
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       cmpxchgq
+; X64NOHLE:     cmpq
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     cmpxchgq
+  %t3 = atomicrmw umin i64* @sc64, i64 5 acquire, !targetflags !1
+; X64HLE:       cmpq
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       cmpxchgq
+; X64NOHLE:     cmpq
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     cmpxchgq
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_misc64() nounwind {
+; X64HLE:   atomic_fetch_misc64
+; X64NOHLE: atomic_fetch_misc64
+  %t4 = cmpxchg i64* @sc64, i64 0, i64 1 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       cmpxchgq
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     cmpxchgq
+  store atomic i64 0, i64* @sc64 release, align 8, !targetflags !1
+; X64HLE-NOT:   lock
+; X64HLE:       xrelease
+; X64HLE:       movq
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     movq
+  %t5 = atomicrmw xchg i64* @sc64, i64 %t4 acquire, !targetflags !0
+; X64HLE-NOT:   lock
+; X64HLE:       xacquire
+; X64HLE:       xchgq
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     xchgq
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+!0 = metadata !{i32 1}
+!1 = metadata !{i32 2}
diff --git a/test/CodeGen/X86/hle-atomic8.ll b/test/CodeGen/X86/hle-atomic8.ll
new file mode 100644
index 0000000..6631a8e
--- /dev/null
+++ b/test/CodeGen/X86/hle-atomic8.ll
@@ -0,0 +1,188 @@
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=+hle | FileCheck %s --check-prefix X64HLE
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=-hle | FileCheck %s --check-prefix X64NOHLE
+
+@sc8 = external global i8
+
+; 8-bit
+
+define void @atomic_fetch_add8() nounwind {
+; X64HLE:   atomic_fetch_add8
+; X64NOHLE: atomic_fetch_add8
+  %t0 = atomicrmw add  i8* @sc8, i8 1 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       incb
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     incb
+  %t1 = atomicrmw add  i8* @sc8, i8 5 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       xaddb
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     xaddb
+  %t2 = atomicrmw add  i8* @sc8, i8 %t1 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       addb
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     addb
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_sub8() nounwind {
+; X64HLE:   atomic_fetch_sub8
+; X64NOHLE: atomic_fetch_sub8
+  %t3 = atomicrmw sub  i8* @sc8, i8 1 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       decb
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     decb
+  %t4 = atomicrmw sub  i8* @sc8, i8 5 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       xaddb
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     xaddb
+  %t5 = atomicrmw sub  i8* @sc8, i8 %t4 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       subb
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     subb
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_logic8() nounwind {
+; X64HLE:   atomic_fetch_logic8
+; X64NOHLE: atomic_fetch_logic8
+  %t6 = atomicrmw and  i8* @sc8, i8 5 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       andb
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     andb
+  %t7 = atomicrmw or   i8* @sc8, i8 5 acquire, !targetflags !1
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       orb
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     orb
+  %t8 = atomicrmw xor  i8* @sc8, i8 5 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       xorb
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     xorb
+  %t9 = atomicrmw nand i8* @sc8, i8 5 acquire, !targetflags !1
+; X64HLE:       andb
+; X64HLE:       notb
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       cmpxchgb
+; X64NOHLE:     andb
+; X64NOHLE:     notb
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     cmpxchgb
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_minmax8() nounwind {
+; X64HLE:   atomic_fetch_minmax8
+; X64NOHLE: atomic_fetch_minmax8
+  %t0 = atomicrmw max  i8* @sc8, i8 5 acquire, !targetflags !0
+; X64HLE:       cmpb
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       cmpxchgb
+; X64NOHLE:     cmpb
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     cmpxchgb
+  %t1 = atomicrmw min  i8* @sc8, i8 5 acquire, !targetflags !1
+; X64HLE:       cmpb
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       cmpxchgb
+; X64NOHLE:     cmpb
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     cmpxchgb
+  %t2 = atomicrmw umax i8* @sc8, i8 5 acquire, !targetflags !0
+; X64HLE:       cmpb
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       cmpxchgb
+; X64NOHLE:     cmpb
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     cmpxchgb
+  %t3 = atomicrmw umin i8* @sc8, i8 5 acquire, !targetflags !1
+; X64HLE:       cmpb
+; X64HLE:       cmov
+; X64HLE:       lock
+; X64HLE-NEXT:  xrelease
+; X64HLE:       cmpxchgb
+; X64NOHLE:     cmpb
+; X64NOHLE:     cmov
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     cmpxchgb
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+define void @atomic_fetch_misc8() nounwind {
+; X64HLE:   atomic_fetch_misc8
+; X64NOHLE: atomic_fetch_misc8
+  %t4 = cmpxchg i8* @sc8, i8 0, i8 1 acquire, !targetflags !0
+; X64HLE:       lock
+; X64HLE-NEXT:  xacquire
+; X64HLE:       cmpxchgb
+; X64NOHLE:     lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     cmpxchgb
+  store atomic i8 0, i8* @sc8 release, align 1, !targetflags !1
+; X64HLE-NOT:   lock
+; X64HLE:       xrelease
+; X64HLE:       movb
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE:     movb
+  %t5 = atomicrmw xchg i8* @sc8, i8 %t4 acquire, !targetflags !0
+; X64HLE-NOT:   lock
+; X64HLE:       xacquire
+; X64HLE:       xchgb
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE:     xchgb
+  ret void
+; X64HLE:       ret
+; X64NOHLE:     ret
+}
+
+!0 = metadata !{i32 1}
+!1 = metadata !{i32 2}
-- 
1.7.9.5


["0001-Add-target-flags-support-for-atomic-ops.patch" (0001-Add-target-flags-support-for-atomic-ops.patch)]

From 1ca5090753a8b82b9a9da33a176b89b1145c904b Mon Sep 17 00:00:00 2001
From: Michael Liao <michael.hliao@gmail.com>
Date: Sun, 1 Jul 2012 00:22:15 -0700
Subject: [PATCH 1/2] Add target flags support for atomic ops

---
 lib/CodeGen/CGBuiltin.cpp             |   40 ++++---
 lib/CodeGen/CGExpr.cpp                |   44 +++++---
 test/CodeGen/atomic-ops-targetflags.c |  193 +++++++++++++++++++++++++++++++++
 3 files changed, 251 insertions(+), 26 deletions(-)
 create mode 100644 test/CodeGen/atomic-ops-targetflags.c

diff --git a/lib/CodeGen/CGBuiltin.cpp b/lib/CodeGen/CGBuiltin.cpp
index 9e09131..2633e28 100644
--- a/lib/CodeGen/CGBuiltin.cpp
+++ b/lib/CodeGen/CGBuiltin.cpp
@@ -1075,7 +1075,9 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl *FD,
     Value *NewVal = Builder.getInt8(1);
     Value *Order = EmitScalarExpr(E->getArg(1));
     if (isa<llvm::ConstantInt>(Order)) {
-      int ord = cast<llvm::ConstantInt>(Order)->getZExtValue();
+      unsigned ord = cast<llvm::ConstantInt>(Order)->getZExtValue();
+      unsigned flags = ord >> 16;
+      ord = ord & 0xFFFF; // Mask off target flags.
       AtomicRMWInst *Result = 0;
       switch (ord) {
       case 0:  // memory_order_relaxed
@@ -1107,6 +1109,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl \
*FD,  break;
       }
       Result->setVolatile(Volatile);
+      if (flags) {
+        llvm::MDNode *TargetFlags = llvm::MDNode::get(getLLVMContext(),
+                                                      Builder.getInt32(flags));
+        Result->setMetadata("targetflags", TargetFlags);
+      }
       return RValue::get(Builder.CreateIsNotNull(Result, "tobool"));
     }
 
@@ -1124,7 +1131,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl *FD,
       llvm::AcquireRelease, llvm::SequentiallyConsistent
     };
 
-    Order = Builder.CreateIntCast(Order, Builder.getInt32Ty(), false);
+    Order = Builder.CreateIntCast(Order, Builder.getInt16Ty(), false);
     llvm::SwitchInst *SI = Builder.CreateSwitch(Order, BBs[0]);
 
     Builder.SetInsertPoint(ContBB);
@@ -1139,12 +1146,12 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl \
*FD,  Builder.CreateBr(ContBB);
     }
 
-    SI->addCase(Builder.getInt32(0), BBs[0]);
-    SI->addCase(Builder.getInt32(1), BBs[1]);
-    SI->addCase(Builder.getInt32(2), BBs[1]);
-    SI->addCase(Builder.getInt32(3), BBs[2]);
-    SI->addCase(Builder.getInt32(4), BBs[3]);
-    SI->addCase(Builder.getInt32(5), BBs[4]);
+    SI->addCase(Builder.getInt16(0), BBs[0]);
+    SI->addCase(Builder.getInt16(1), BBs[1]);
+    SI->addCase(Builder.getInt16(2), BBs[1]);
+    SI->addCase(Builder.getInt16(3), BBs[2]);
+    SI->addCase(Builder.getInt16(4), BBs[3]);
+    SI->addCase(Builder.getInt16(5), BBs[4]);
 
     Builder.SetInsertPoint(ContBB);
     return RValue::get(Builder.CreateIsNotNull(Result, "tobool"));
@@ -1161,7 +1168,9 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl *FD,
     Value *NewVal = Builder.getInt8(0);
     Value *Order = EmitScalarExpr(E->getArg(1));
     if (isa<llvm::ConstantInt>(Order)) {
-      int ord = cast<llvm::ConstantInt>(Order)->getZExtValue();
+      unsigned ord = cast<llvm::ConstantInt>(Order)->getZExtValue();
+      unsigned flags = ord >> 16;
+      ord = ord & 0xFFFF; // Mask off target flags.
       StoreInst *Store = Builder.CreateStore(NewVal, Ptr, Volatile);
       Store->setAlignment(1);
       switch (ord) {
@@ -1176,6 +1185,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl \
*FD,  Store->setOrdering(llvm::SequentiallyConsistent);
         break;
       }
+      if (flags) {
+        llvm::MDNode *TargetFlags = llvm::MDNode::get(getLLVMContext(),
+                                                      Builder.getInt32(flags));
+        Store->setMetadata("targetflags", TargetFlags);
+      }
       return RValue::get(0);
     }
 
@@ -1190,7 +1204,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl *FD,
       llvm::Monotonic, llvm::Release, llvm::SequentiallyConsistent
     };
 
-    Order = Builder.CreateIntCast(Order, Builder.getInt32Ty(), false);
+    Order = Builder.CreateIntCast(Order, Builder.getInt16Ty(), false);
     llvm::SwitchInst *SI = Builder.CreateSwitch(Order, BBs[0]);
 
     for (unsigned i = 0; i < 3; ++i) {
@@ -1201,9 +1215,9 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl *FD,
       Builder.CreateBr(ContBB);
     }
 
-    SI->addCase(Builder.getInt32(0), BBs[0]);
-    SI->addCase(Builder.getInt32(3), BBs[1]);
-    SI->addCase(Builder.getInt32(5), BBs[2]);
+    SI->addCase(Builder.getInt16(0), BBs[0]);
+    SI->addCase(Builder.getInt16(3), BBs[1]);
+    SI->addCase(Builder.getInt16(5), BBs[2]);
 
     Builder.SetInsertPoint(ContBB);
     return RValue::get(0);
diff --git a/lib/CodeGen/CGExpr.cpp b/lib/CodeGen/CGExpr.cpp
index ba400b8..e9853bc 100644
--- a/lib/CodeGen/CGExpr.cpp
+++ b/lib/CodeGen/CGExpr.cpp
@@ -3045,7 +3045,8 @@ EmitPointerToDataMemberBinaryExpr(const BinaryOperator *E) {
 static void
 EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, llvm::Value *Dest,
              llvm::Value *Ptr, llvm::Value *Val1, llvm::Value *Val2,
-             uint64_t Size, unsigned Align, llvm::AtomicOrdering Order) {
+             uint64_t Size, unsigned Align, llvm::AtomicOrdering Order,
+             llvm::MDNode *TargetFlags = 0) {
   llvm::AtomicRMWInst::BinOp Op = llvm::AtomicRMWInst::Add;
   llvm::Instruction::BinaryOps PostOp = (llvm::Instruction::BinaryOps)0;
 
@@ -3066,6 +3067,8 @@ EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, llvm::Value \
*Dest,  llvm::AtomicCmpXchgInst *CXI =
         CGF.Builder.CreateAtomicCmpXchg(Ptr, LoadVal1, LoadVal2, Order);
     CXI->setVolatile(E->isVolatile());
+    if (TargetFlags)
+      CXI->setMetadata("targetflags", TargetFlags);
     llvm::StoreInst *StoreVal1 = CGF.Builder.CreateStore(CXI, Val1);
     StoreVal1->setAlignment(Align);
     llvm::Value *Cmp = CGF.Builder.CreateICmpEQ(CXI, LoadVal1);
@@ -3080,6 +3083,8 @@ EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, llvm::Value \
*Dest,  Load->setAtomic(Order);
     Load->setAlignment(Size);
     Load->setVolatile(E->isVolatile());
+    if (TargetFlags)
+      Load->setMetadata("targetflags", TargetFlags);
     llvm::StoreInst *StoreDest = CGF.Builder.CreateStore(Load, Dest);
     StoreDest->setAlignment(Align);
     return;
@@ -3095,6 +3100,8 @@ EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, llvm::Value \
*Dest,  Store->setAtomic(Order);
     Store->setAlignment(Size);
     Store->setVolatile(E->isVolatile());
+    if (TargetFlags)
+      Store->setMetadata("targetflags", TargetFlags);
     return;
   }
 
@@ -3157,6 +3164,8 @@ EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, llvm::Value \
*Dest,  llvm::AtomicRMWInst *RMWI =
       CGF.Builder.CreateAtomicRMW(Op, Ptr, LoadVal1, Order);
   RMWI->setVolatile(E->isVolatile());
+  if (TargetFlags)
+    RMWI->setMetadata("targetflags", TargetFlags);
 
   // For __atomic_*_fetch operations, perform the operation again to
   // determine the value which was written.
@@ -3412,34 +3421,40 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E, \
llvm::Value *Dest) {  if (Dest && !E->isCmpXChg()) Dest = Builder.CreateBitCast(Dest, \
IPtrTy);  
   if (isa<llvm::ConstantInt>(Order)) {
-    int ord = cast<llvm::ConstantInt>(Order)->getZExtValue();
+    unsigned ord = cast<llvm::ConstantInt>(Order)->getZExtValue();
+    unsigned flags = ord >> 16;
+    ord = ord & 0xFFFF; // Mask off target flags.
+    llvm::MDNode *TargetFlags = 0;
+    if (flags)
+      TargetFlags = llvm::MDNode::get(getLLVMContext(),
+                                      Builder.getInt32(flags));
     switch (ord) {
     case 0:  // memory_order_relaxed
       EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
-                   llvm::Monotonic);
+                   llvm::Monotonic, TargetFlags);
       break;
     case 1:  // memory_order_consume
     case 2:  // memory_order_acquire
       if (IsStore)
         break; // Avoid crashing on code with undefined behavior
       EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
-                   llvm::Acquire);
+                   llvm::Acquire, TargetFlags);
       break;
     case 3:  // memory_order_release
       if (IsLoad)
         break; // Avoid crashing on code with undefined behavior
       EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
-                   llvm::Release);
+                   llvm::Release, TargetFlags);
       break;
     case 4:  // memory_order_acq_rel
       if (IsLoad || IsStore)
         break; // Avoid crashing on code with undefined behavior
       EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
-                   llvm::AcquireRelease);
+                   llvm::AcquireRelease, TargetFlags);
       break;
     case 5:  // memory_order_seq_cst
       EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
-                   llvm::SequentiallyConsistent);
+                   llvm::SequentiallyConsistent, TargetFlags);
       break;
     default: // invalid order
       // We should not ever get here normally, but it's hard to
@@ -3470,7 +3485,10 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E, \
llvm::Value *Dest) {  // MonotonicBB is arbitrarily chosen as the default case; in \
practice, this  // doesn't matter unless someone is crazy enough to use something \
that  // doesn't fold to a constant for the ordering.
-  Order = Builder.CreateIntCast(Order, Builder.getInt32Ty(), false);
+  //
+  // Cast to i16 to mask off the target flags. So far, if order cannot be
+  // folded into a constant, target flags are ignored.
+  Order = Builder.CreateIntCast(Order, Builder.getInt16Ty(), false);
   llvm::SwitchInst *SI = Builder.CreateSwitch(Order, MonotonicBB);
 
   // Emit all the different atomics
@@ -3483,28 +3501,28 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E, \
llvm::Value *Dest) {  EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
                  llvm::Acquire);
     Builder.CreateBr(ContBB);
-    SI->addCase(Builder.getInt32(1), AcquireBB);
-    SI->addCase(Builder.getInt32(2), AcquireBB);
+    SI->addCase(Builder.getInt16(1), AcquireBB);
+    SI->addCase(Builder.getInt16(2), AcquireBB);
   }
   if (!IsLoad) {
     Builder.SetInsertPoint(ReleaseBB);
     EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
                  llvm::Release);
     Builder.CreateBr(ContBB);
-    SI->addCase(Builder.getInt32(3), ReleaseBB);
+    SI->addCase(Builder.getInt16(3), ReleaseBB);
   }
   if (!IsLoad && !IsStore) {
     Builder.SetInsertPoint(AcqRelBB);
     EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
                  llvm::AcquireRelease);
     Builder.CreateBr(ContBB);
-    SI->addCase(Builder.getInt32(4), AcqRelBB);
+    SI->addCase(Builder.getInt16(4), AcqRelBB);
   }
   Builder.SetInsertPoint(SeqCstBB);
   EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
                llvm::SequentiallyConsistent);
   Builder.CreateBr(ContBB);
-  SI->addCase(Builder.getInt32(5), SeqCstBB);
+  SI->addCase(Builder.getInt16(5), SeqCstBB);
 
   // Cleanup and return
   Builder.SetInsertPoint(ContBB);
diff --git a/test/CodeGen/atomic-ops-targetflags.c \
b/test/CodeGen/atomic-ops-targetflags.c new file mode 100644
index 0000000..82f211f
--- /dev/null
+++ b/test/CodeGen/atomic-ops-targetflags.c
@@ -0,0 +1,193 @@
+// RUN: %clang_cc1 %s -emit-llvm -o - -triple=i686-apple-darwin9 | FileCheck %s
+
+// Also test serialization of atomic operations here, to avoid duplicating the
+// test.
+// RUN: %clang_cc1 %s -emit-pch -o %t -triple=i686-apple-darwin9
+// RUN: %clang_cc1 %s -include-pch %t -triple=i686-apple-darwin9 -emit-llvm -o - | \
FileCheck %s +#ifndef ALREADY_INCLUDED
+#define ALREADY_INCLUDED
+
+// Basic IRGen tests for __c11_atomic_* and GNU __atomic_*
+
+typedef enum memory_order {
+  memory_order_relaxed, memory_order_consume, memory_order_acquire,
+  memory_order_release, memory_order_acq_rel, memory_order_seq_cst
+} memory_order;
+
+#define TFLAG  (1 << 16)
+
+int fi1(_Atomic(int) *i) {
+  // CHECK: @fi1
+  // CHECK: load atomic i32* {{.*}} seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+  return __c11_atomic_load(i, memory_order_seq_cst | TFLAG);
+}
+
+int fi1a(int *i) {
+  // CHECK: @fi1a
+  // CHECK: load atomic i32* {{.*}} seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+  int v;
+  __atomic_load(i, &v, memory_order_seq_cst | TFLAG);
+  return v;
+}
+
+int fi1b(int *i) {
+  // CHECK: @fi1b
+  // CHECK: load atomic i32* {{.*}} seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+  return __atomic_load_n(i, memory_order_seq_cst | TFLAG);
+}
+
+void fi2(_Atomic(int) *i) {
+  // CHECK: @fi2
+  // CHECK: store atomic i32 {{.*}} seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+  __c11_atomic_store(i, 1, memory_order_seq_cst | TFLAG);
+}
+
+void fi2a(int *i) {
+  // CHECK: @fi2a
+  // CHECK: store atomic i32 {{.*}} seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+  int v = 1;
+  __atomic_store(i, &v, memory_order_seq_cst | TFLAG);
+}
+
+void fi2b(int *i) {
+  // CHECK: @fi2b
+  // CHECK: store atomic i32 {{.*}} seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+  __atomic_store_n(i, 1, memory_order_seq_cst | TFLAG);
+}
+
+int fi3(_Atomic(int) *i) {
+  // CHECK: @fi3
+  // CHECK: atomicrmw and {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+  // CHECK-NOT: and
+  return __c11_atomic_fetch_and(i, 1, memory_order_seq_cst | TFLAG);
+}
+
+int fi3a(int *i) {
+  // CHECK: @fi3a
+  // CHECK: atomicrmw xor {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+  // CHECK-NOT: xor
+  return __atomic_fetch_xor(i, 1, memory_order_seq_cst | TFLAG);
+}
+
+int fi3b(int *i) {
+  // CHECK: @fi3b
+  // CHECK: atomicrmw add {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+  // CHECK: add
+  return __atomic_add_fetch(i, 1, memory_order_seq_cst | TFLAG);
+}
+
+int fi3c(int *i) {
+  // CHECK: @fi3c
+  // CHECK: atomicrmw nand {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+  // CHECK-NOT: and
+  return __atomic_fetch_nand(i, 1, memory_order_seq_cst | TFLAG);
+}
+
+int fi3d(int *i) {
+  // CHECK: @fi3d
+  // CHECK: atomicrmw nand {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+  // CHECK: and
+  // CHECK: xor
+  return __atomic_nand_fetch(i, 1, memory_order_seq_cst | TFLAG);
+}
+
+_Bool fi4(_Atomic(int) *i) {
+  // CHECK: @fi4
+  // CHECK: cmpxchg i32* %{{.*}}, {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+  int cmp = 0;
+  return __c11_atomic_compare_exchange_strong(i, &cmp, 1, memory_order_acquire | \
TFLAG, memory_order_acquire); +}
+
+_Bool fi4a(int *i) {
+  // CHECK: @fi4
+  // CHECK: cmpxchg i32* %{{.*}}, {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+  int cmp = 0;
+  int desired = 1;
+  return __atomic_compare_exchange(i, &cmp, &desired, 0, memory_order_acquire | \
TFLAG, memory_order_acquire); +}
+
+_Bool fi4b(int *i) {
+  // CHECK: @fi4
+  // CHECK: cmpxchg i32* %{{.*}}, {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+  int cmp = 0;
+  return __atomic_compare_exchange_n(i, &cmp, 1, 1, memory_order_acquire | TFLAG, \
memory_order_acquire); +}
+
+float ff1(_Atomic(float) *d) {
+  // CHECK: @ff1
+  // CHECK: load atomic i32* {{.*}} monotonic, {{.*}}, !targetflags !{{[0-9]+}}
+  return __c11_atomic_load(d, memory_order_relaxed | TFLAG);
+}
+
+void ff2(_Atomic(float) *d) {
+  // CHECK: @ff2
+  // CHECK: store atomic i32 {{.*}} release, {{.*}}, !targetflags !{{[0-9]+}}
+  __c11_atomic_store(d, 1, memory_order_release | TFLAG);
+}
+
+float ff3(_Atomic(float) *d) {
+  return __c11_atomic_exchange(d, 2, memory_order_seq_cst | TFLAG);
+}
+
+int* fp1(_Atomic(int*) *p) {
+  // CHECK: @fp1
+  // CHECK: load atomic i32* {{.*}} seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+  return __c11_atomic_load(p, memory_order_seq_cst | TFLAG);
+}
+
+int* fp2(_Atomic(int*) *p) {
+  // CHECK: @fp2
+  // CHECK: store i32 4
+  // CHECK: atomicrmw add {{.*}} monotonic, !targetflags !{{[0-9]+}}
+  return __c11_atomic_fetch_add(p, 1, memory_order_relaxed | TFLAG);
+}
+
+int *fp2a(int **p) {
+  // CHECK: @fp2a
+  // CHECK: store i32 4
+  // CHECK: atomicrmw sub {{.*}} monotonic, !targetflags !{{[0-9]+}}
+  // Note, the GNU builtins do not multiply by sizeof(T)!
+  return __atomic_fetch_sub(p, 4, memory_order_relaxed | TFLAG);
+}
+
+_Complex float fc(_Atomic(_Complex float) *c) {
+  // CHECK: @fc
+  // CHECK: atomicrmw xchg i64* %{{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+  return __c11_atomic_exchange(c, 2, memory_order_seq_cst | TFLAG);
+}
+
+typedef struct X { int x; } X;
+X fs(_Atomic(X) *c) {
+  // CHECK: @fs
+  // CHECK: atomicrmw xchg i32* %{{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+  return __c11_atomic_exchange(c, (X){2}, memory_order_seq_cst | TFLAG);
+}
+
+X fsa(X *c, X *d) {
+  // CHECK: @fsa
+  // CHECK: atomicrmw xchg i32* %{{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+  X ret;
+  __atomic_exchange(c, d, &ret, memory_order_seq_cst | TFLAG);
+  return ret;
+}
+
+_Bool fsb(_Bool *c) {
+  // CHECK: @fsb
+  // CHECK: atomicrmw xchg i8* %{{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+  return __atomic_exchange_n(c, 1, memory_order_seq_cst | TFLAG);
+}
+
+char flag1;
+volatile char flag2;
+void test_and_set() {
+  // CHECK: atomicrmw xchg i8* @flag1, i8 1 seq_cst, !targetflags !{{[0-9]+}}
+  __atomic_test_and_set(&flag1, memory_order_seq_cst | TFLAG);
+  // CHECK: atomicrmw volatile xchg i8* @flag2, i8 1 acquire, !targetflags \
!{{[0-9]+}} +  __atomic_test_and_set(&flag2, memory_order_acquire | TFLAG);
+  // CHECK: store atomic volatile i8 0, i8* @flag2 release, {{.*}}, !targetflags \
!{{[0-9]+}} +  __atomic_clear(&flag2, memory_order_release | TFLAG);
+  // CHECK: store atomic i8 0, i8* @flag1 seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+  __atomic_clear(&flag1, memory_order_seq_cst | TFLAG);
+}
+
+#endif
-- 
1.7.9.5


["0002-Add-mhle-option-support-and-populate-pre-defined-mac.patch" (0002-Add-mhle-option-support-and-populate-pre-defined-mac.patch)]

From df54e5f8e988b4fca5cd06ac9e7ea608086efbc0 Mon Sep 17 00:00:00 2001
From: Michael Liao <michael.hliao@gmail.com>
Date: Sun, 8 Jul 2012 14:07:19 -0700
Subject: [PATCH 2/2] Add '-mhle' option support and populate pre-defined
 macros

- 3 pre-defined macros are added if HLE is turned on
  * __HLE__
  * __ATOMIC_HLE_ACQUIRE__
  * __ATOMIC_HLE_RELEASE__
---
 include/clang/Driver/Options.td            |    2 ++
 lib/Basic/Targets.cpp                      |   23 +++++++++++++++++++++--
 test/Preprocessor/predefined-arch-macros.c |    6 ++++++
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/include/clang/Driver/Options.td b/include/clang/Driver/Options.td
index cafd7d7..47fd862 100644
--- a/include/clang/Driver/Options.td
+++ b/include/clang/Driver/Options.td
@@ -885,6 +885,7 @@ def mno_fma : Flag<["-"], "mno-fma">, Group<m_x86_Features_Group>;
 def mno_xop : Flag<["-"], "mno-xop">, Group<m_x86_Features_Group>;
 def mno_f16c : Flag<["-"], "mno-f16c">, Group<m_x86_Features_Group>;
 def mno_rtm : Flag<["-"], "mno-rtm">, Group<m_x86_Features_Group>;
+def mno_hle : Flag<["-"], "mno-hle">, Group<m_x86_Features_Group>;
 
 def mno_thumb : Flag<["-"], "mno-thumb">, Group<m_Group>;
 def marm : Flag<["-"], "marm">, Alias<mno_thumb>;
@@ -928,6 +929,7 @@ def mfma : Flag<["-"], "mfma">, Group<m_x86_Features_Group>;
 def mxop : Flag<["-"], "mxop">, Group<m_x86_Features_Group>;
 def mf16c : Flag<["-"], "mf16c">, Group<m_x86_Features_Group>;
 def mrtm : Flag<["-"], "mrtm">, Group<m_x86_Features_Group>;
+def mhle : Flag<["-"], "mhle">, Group<m_x86_Features_Group>;
 def mips16 : Flag<["-"], "mips16">, Group<m_Group>;
 def mno_mips16 : Flag<["-"], "mno-mips16">, Group<m_Group>;
 def mxgot : Flag<["-"], "mxgot">, Group<m_Group>;
diff --git a/lib/Basic/Targets.cpp b/lib/Basic/Targets.cpp
index eaf2e7d..70d10e7 100644
--- a/lib/Basic/Targets.cpp
+++ b/lib/Basic/Targets.cpp
@@ -1608,6 +1608,7 @@ class X86TargetInfo : public TargetInfo {
   bool HasBMI2;
   bool HasPOPCNT;
   bool HasRTM;
+  bool HasHLE;
   bool HasSSE4a;
   bool HasFMA4;
   bool HasFMA;
@@ -1759,8 +1760,8 @@ public:
     : TargetInfo(triple), SSELevel(NoSSE), MMX3DNowLevel(NoMMX3DNow),
       HasAES(false), HasPCLMUL(false), HasLZCNT(false), HasRDRND(false),
       HasBMI(false), HasBMI2(false), HasPOPCNT(false), HasRTM(false),
-      HasSSE4a(false), HasFMA4(false), HasFMA(false), HasXOP(false),
-      HasF16C(false), CPU(CK_Generic) {
+      HasHLE(false), HasSSE4a(false), HasFMA4(false), HasFMA(false),
+      HasXOP(false), HasF16C(false), CPU(CK_Generic) {
     BigEndian = false;
     LongDoubleFormat = &llvm::APFloat::x87DoubleExtended;
   }
@@ -1966,6 +1967,7 @@ void X86TargetInfo::getDefaultFeatures(llvm::StringMap<bool> &Features) const {
   Features["bmi2"] = false;
   Features["popcnt"] = false;
   Features["rtm"] = false;
+  Features["hle"] = false;
   Features["fma4"] = false;
   Features["fma"] = false;
   Features["xop"] = false;
@@ -2039,6 +2041,7 @@ void X86TargetInfo::getDefaultFeatures(llvm::StringMap<bool> &Features) const {
     setFeatureEnabled(Features, "bmi", true);
     setFeatureEnabled(Features, "bmi2", true);
     setFeatureEnabled(Features, "rtm", true);
+    setFeatureEnabled(Features, "hle", true);
     setFeatureEnabled(Features, "fma", true);
     break;
   case CK_K6:
@@ -2188,6 +2191,8 @@ bool X86TargetInfo::setFeatureEnabled(llvm::StringMap<bool> &Features,
       Features["f16c"] = true;
     else if (Name == "rtm")
       Features["rtm"] = true;
+    else if (Name == "hle")
+      Features["hle"] = true;
   } else {
     if (Name == "mmx")
       Features["mmx"] = Features["3dnow"] = Features["3dnowa"] = false;
@@ -2252,6 +2257,8 @@ bool X86TargetInfo::setFeatureEnabled(llvm::StringMap<bool> &Features,
       Features["f16c"] = false;
     else if (Name == "rtm")
       Features["rtm"] = false;
+    else if (Name == "hle")
+      Features["hle"] = false;
   }
 
   return true;
@@ -2308,6 +2315,11 @@ void X86TargetInfo::HandleTargetFeatures(std::vector<std::string> &Features) {
       continue;
     }
 
+    if (Feature == "hle") {
+      HasHLE = true;
+      continue;
+    }
+
     if (Feature == "sse4a") {
       HasSSE4a = true;
       continue;
@@ -2532,6 +2544,12 @@ void X86TargetInfo::getTargetDefines(const LangOptions &Opts,
   if (HasRTM)
     Builder.defineMacro("__RTM__");
 
+  if (HasHLE) {
+    Builder.defineMacro("__HLE__");
+    Builder.defineMacro("__ATOMIC_HLE_ACQUIRE", Twine(1U << 16));
+    Builder.defineMacro("__ATOMIC_HLE_RELEASE", Twine(2U << 16));
+  }
+
   if (HasSSE4a)
     Builder.defineMacro("__SSE4A__");
 
@@ -2620,6 +2638,7 @@ bool X86TargetInfo::hasFeature(StringRef Feature) const {
       .Case("pclmul", HasPCLMUL)
       .Case("popcnt", HasPOPCNT)
       .Case("rtm", HasRTM)
+      .Case("hle", HasHLE)
       .Case("sse", SSELevel >= SSE1)
       .Case("sse2", SSELevel >= SSE2)
       .Case("sse3", SSELevel >= SSE3)
diff --git a/test/Preprocessor/predefined-arch-macros.c b/test/Preprocessor/predefined-arch-macros.c
index 680f39a..4303735 100644
--- a/test/Preprocessor/predefined-arch-macros.c
+++ b/test/Preprocessor/predefined-arch-macros.c
@@ -509,11 +509,14 @@
 // RUN:     -target i386-unknown-linux \
 // RUN:   | FileCheck %s -check-prefix=CHECK_CORE_AVX2_M32
 // CHECK_CORE_AVX2_M32: #define __AES__ 1
+// CHECK_CORE_AVX2_M32: #define __ATOMIC_HLE_ACQUIRE 65536
+// CHECK_CORE_AVX2_M32: #define __ATOMIC_HLE_RELEASE 131072
 // CHECK_CORE_AVX2_M32: #define __AVX__ 1
 // CHECK_CORE_AVX2_M32: #define __BMI2__ 1
 // CHECK_CORE_AVX2_M32: #define __BMI__ 1
 // CHECK_CORE_AVX2_M32: #define __F16C__ 1
 // CHECK_CORE_AVX2_M32: #define __FMA__ 1
+// CHECK_CORE_AVX2_M32: #define __HLE__ 1
 // CHECK_CORE_AVX2_M32: #define __LZCNT__ 1
 // CHECK_CORE_AVX2_M32: #define __MMX__ 1
 // CHECK_CORE_AVX2_M32: #define __PCLMUL__ 1
@@ -536,11 +539,14 @@
 // RUN:     -target i386-unknown-linux \
 // RUN:   | FileCheck %s -check-prefix=CHECK_CORE_AVX2_M64
 // CHECK_CORE_AVX2_M64: #define __AES__ 1
+// CHECK_CORE_AVX2_M64: #define __ATOMIC_HLE_ACQUIRE 65536
+// CHECK_CORE_AVX2_M64: #define __ATOMIC_HLE_RELEASE 131072
 // CHECK_CORE_AVX2_M64: #define __AVX__ 1
 // CHECK_CORE_AVX2_M64: #define __BMI2__ 1
 // CHECK_CORE_AVX2_M64: #define __BMI__ 1
 // CHECK_CORE_AVX2_M64: #define __F16C__ 1
 // CHECK_CORE_AVX2_M64: #define __FMA__ 1
+// CHECK_CORE_AVX2_M64: #define __HLE__ 1
 // CHECK_CORE_AVX2_M64: #define __LZCNT__ 1
 // CHECK_CORE_AVX2_M64: #define __MMX__ 1
 // CHECK_CORE_AVX2_M64: #define __PCLMUL__ 1
-- 
1.7.9.5



_______________________________________________
cfe-dev mailing list
cfe-dev@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic