[prev in list] [next in list] [prev in thread] [next in thread]
List: cfe-dev
Subject: [cfe-dev] [RFC] Add Intel TSX HLE Support
From: Michael Liao <michael.liao () intel ! com>
Date: 2013-02-19 19:52:00
Message-ID: 1361303520.3225.41.camel () snbox
[Download RAW message or body]
Hi All,
I'd like to add HLE support in LLVM/clang consistent to GCC's style [1]. HLE from \
Intel TSX [2] is legacy compatible instruction set extension to specify transactional \
region by adding XACQUIRE and XRELEASE prefixes. To support that, GCC chooses the \
approach by extending the memory order flag in __atomic_* builtins with \
target-specific memory model in high bits (bit 31-16 for target-specific memory \
model, bit 15-0 for the general memory model.) To follow the similar approach, I \
propose to change LLVM/clang by adding:
+ a metadata 'targetflags' in LLVM atomic IR to pass this
target-specific memory model hint
+ one extra target flag in AtomicSDNode & MemIntrinsicSDNode to specify XACQUIRE or \
XRELEASE hints This extra target flag is embedded into the SubclassData fields. The \
following is rationale how such target flags are embedded into SubclassData in SDNode
here is the current SDNode class hierarchy of memory related nodes
SDNode -> MemSDNode -> LSBaseNode -> LoadSDNode
| + -> StoreSDNode
+ -> AtomicSDNode
+ -> MemIntrinsicSDNode
here is the current SubclassData definitions:
bit 0~1 : extension type used in LoadSDNode
bit 0 : truncating store in StoreSDNode
bit 2~4 : addressing mode in LSBaseNode
bit 5 : volatile bit in MemSDNode
bit 6 : non-temporal bit in MemSDNode
bit 7 : invariant bit in MemSDNode
bit 8~11: memory order in AtomicSDNode
bit 12 : synch scope in AtomicSDNode
Considering the class hierarchy, we could safely reused bit 0~1 as the target flags \
in AtomicSDNode/MemIntrinsicNode
+ X86 backend is modified to generate additional XACQUIRE/XRELEASE prefix based on \
the specified target flag
The following are details of each patch:
* 0001-Add-targetflags-in-AtomicSDNode-MemIntrinsicSDNode.patch
This patch adds 'targetflags' support in AtomicSDNode and MemIntrinsicSDNode. It will \
check metadata 'targetflags' and embedded its value into SubclassData. Currently, \
only two bits are defined.
* 0002-Add-HLE-target-feature.patch
This patch adds HLE feature and auto-detection support
* 0003-Add-XACQ-XREL-prefix-and-encoding-asm-printer-suppor.patch
This patch adds XACQUIRE/XRELEASE prefix and its assembler/encoding support
* 0004-Enable-HLE-code-generation.patch
This patch enables HLE code generation by extending the current logic to handle \
'targetflags'.
* 0001-Add-target-flags-support-for-atomic-ops.patch
This patch adds target flags support in __atomic_* builtins. It splits the whole \
32-bit order word into high and low 16-bit parts. The low 16-bit is the original \
memory order and the high 16-bit will be re-defined as target-specific flags and \
passed through 'targetflags' metadata.
* 0002-Add-mhle-option-support-and-populate-pre-defined-mac.patch
It adds '-m[no]hle' option to turn on HLE feature or not. Once HLE feature is turned \
on, two more macros (__ATOMIC_HLE_ACQUIRE and __ATOMIC_HLE_RELEASE) are defined for \
developers to mark atomic builtins.
Thanks for your time to review!
Yours
- Michael
---
[1] http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01073.html
[2] http://software.intel.com/sites/default/files/319433-014.pdf
["0001-Add-targetflags-in-AtomicSDNode-MemIntrinsicSDNode.patch" (0001-Add-targetflags-in-AtomicSDNode-MemIntrinsicSDNode.patch)]
From c2ed27488d773a6684e42adac9c61bff7f2badf8 Mon Sep 17 00:00:00 2001
From: Michael Liao <michael.hliao@gmail.com>
Date: Tue, 3 Jul 2012 23:28:17 -0700
Subject: [PATCH 1/4] Add targetflags in AtomicSDNode & MemIntrinsicSDNode
- to pass HLE acquire/release hint to backend
---
include/llvm/CodeGen/SelectionDAG.h | 22 ++++++-----
include/llvm/CodeGen/SelectionDAGNodes.h | 34 ++++++++++++----
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp | 2 +
lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp | 14 +++++--
lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 44 +++++++++++----------
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | 23 +++++++++++
lib/Target/AArch64/AArch64ISelLowering.cpp | 2 +
lib/Target/X86/X86ISelLowering.cpp | 22 +++++++----
8 files changed, 114 insertions(+), 49 deletions(-)
diff --git a/include/llvm/CodeGen/SelectionDAG.h \
b/include/llvm/CodeGen/SelectionDAG.h index c25497a..2ccda96 100644
--- a/include/llvm/CodeGen/SelectionDAG.h
+++ b/include/llvm/CodeGen/SelectionDAG.h
@@ -636,23 +636,24 @@ public:
SDValue getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT, SDValue Chain,
SDValue Ptr, SDValue Cmp, SDValue Swp,
MachinePointerInfo PtrInfo, unsigned Alignment,
- AtomicOrdering Ordering,
+ AtomicOrdering Ordering, unsigned TargetFlags,
SynchronizationScope SynchScope);
SDValue getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT, SDValue Chain,
SDValue Ptr, SDValue Cmp, SDValue Swp,
MachineMemOperand *MMO,
- AtomicOrdering Ordering,
+ AtomicOrdering Ordering, unsigned TargetFlags,
SynchronizationScope SynchScope);
/// getAtomic - Gets a node for an atomic op, produces result (if relevant)
/// and chain and takes 2 operands.
SDValue getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT, SDValue Chain,
SDValue Ptr, SDValue Val, const Value* PtrVal,
- unsigned Alignment, AtomicOrdering Ordering,
+ unsigned Alignment,
+ AtomicOrdering Ordering, unsigned TargetFlags,
SynchronizationScope SynchScope);
SDValue getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT, SDValue Chain,
SDValue Ptr, SDValue Val, MachineMemOperand *MMO,
- AtomicOrdering Ordering,
+ AtomicOrdering Ordering, unsigned TargetFlags,
SynchronizationScope SynchScope);
/// getAtomic - Gets a node for an atomic op, produces result and chain and
@@ -660,11 +661,11 @@ public:
SDValue getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT, EVT VT,
SDValue Chain, SDValue Ptr, const Value* PtrVal,
unsigned Alignment,
- AtomicOrdering Ordering,
+ AtomicOrdering Ordering, unsigned TargetFlags,
SynchronizationScope SynchScope);
SDValue getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT, EVT VT,
SDValue Chain, SDValue Ptr, MachineMemOperand *MMO,
- AtomicOrdering Ordering,
+ AtomicOrdering Ordering, unsigned TargetFlags,
SynchronizationScope SynchScope);
/// getMemIntrinsicNode - Creates a MemIntrinsicNode that may produce a
@@ -676,17 +677,20 @@ public:
const SDValue *Ops, unsigned NumOps,
EVT MemVT, MachinePointerInfo PtrInfo,
unsigned Align = 0, bool Vol = false,
- bool ReadMem = true, bool WriteMem = true);
+ bool ReadMem = true, bool WriteMem = true,
+ unsigned TargetFlags = 0);
SDValue getMemIntrinsicNode(unsigned Opcode, DebugLoc dl, SDVTList VTList,
const SDValue *Ops, unsigned NumOps,
EVT MemVT, MachinePointerInfo PtrInfo,
unsigned Align = 0, bool Vol = false,
- bool ReadMem = true, bool WriteMem = true);
+ bool ReadMem = true, bool WriteMem = true,
+ unsigned TargetFlags = 0);
SDValue getMemIntrinsicNode(unsigned Opcode, DebugLoc dl, SDVTList VTList,
const SDValue *Ops, unsigned NumOps,
- EVT MemVT, MachineMemOperand *MMO);
+ EVT MemVT, MachineMemOperand *MMO,
+ unsigned TargetFlags = 0);
/// getMergeValues - Create a MERGE_VALUES node from the given operands.
SDValue getMergeValues(const SDValue *Ops, unsigned NumOps, DebugLoc dl);
diff --git a/include/llvm/CodeGen/SelectionDAGNodes.h \
b/include/llvm/CodeGen/SelectionDAGNodes.h index 2c34b4f..8e88834 100644
--- a/include/llvm/CodeGen/SelectionDAGNodes.h
+++ b/include/llvm/CodeGen/SelectionDAGNodes.h
@@ -1013,15 +1013,20 @@ public:
class AtomicSDNode : public MemSDNode {
SDUse Ops[4];
- void InitAtomic(AtomicOrdering Ordering, SynchronizationScope SynchScope) {
+ void InitAtomic(AtomicOrdering Ordering, unsigned TargetFlags,
+ SynchronizationScope SynchScope) {
// This must match encodeMemSDNodeFlags() in SelectionDAG.cpp.
assert((Ordering & 15) == Ordering &&
"Ordering may not require more than 4 bits!");
+ assert((TargetFlags & 3) == TargetFlags &&
+ "TargetFlags may not require more than 2 bits!");
assert((SynchScope & 1) == SynchScope &&
"SynchScope may not require more than 1 bit!");
SubclassData |= Ordering << 8;
+ SubclassData |= TargetFlags;
SubclassData |= SynchScope << 12;
assert(getOrdering() == Ordering && "Ordering encoding error!");
+ assert(getTargetFlags() == TargetFlags && "TargetFlags encoding error!");
assert(getSynchScope() == SynchScope && "Synch-scope encoding error!");
}
@@ -1037,28 +1042,34 @@ public:
AtomicSDNode(unsigned Opc, DebugLoc dl, SDVTList VTL, EVT MemVT,
SDValue Chain, SDValue Ptr,
SDValue Cmp, SDValue Swp, MachineMemOperand *MMO,
- AtomicOrdering Ordering, SynchronizationScope SynchScope)
+ AtomicOrdering Ordering, unsigned TargetFlags,
+ SynchronizationScope SynchScope)
: MemSDNode(Opc, dl, VTL, MemVT, MMO) {
- InitAtomic(Ordering, SynchScope);
+ InitAtomic(Ordering, TargetFlags, SynchScope);
InitOperands(Ops, Chain, Ptr, Cmp, Swp);
}
AtomicSDNode(unsigned Opc, DebugLoc dl, SDVTList VTL, EVT MemVT,
SDValue Chain, SDValue Ptr,
SDValue Val, MachineMemOperand *MMO,
- AtomicOrdering Ordering, SynchronizationScope SynchScope)
+ AtomicOrdering Ordering, unsigned TargetFlags,
+ SynchronizationScope SynchScope)
: MemSDNode(Opc, dl, VTL, MemVT, MMO) {
- InitAtomic(Ordering, SynchScope);
+ InitAtomic(Ordering, TargetFlags, SynchScope);
InitOperands(Ops, Chain, Ptr, Val);
}
AtomicSDNode(unsigned Opc, DebugLoc dl, SDVTList VTL, EVT MemVT,
SDValue Chain, SDValue Ptr,
MachineMemOperand *MMO,
- AtomicOrdering Ordering, SynchronizationScope SynchScope)
+ AtomicOrdering Ordering, unsigned TargetFlags,
+ SynchronizationScope SynchScope)
: MemSDNode(Opc, dl, VTL, MemVT, MMO) {
- InitAtomic(Ordering, SynchScope);
+ InitAtomic(Ordering, TargetFlags, SynchScope);
InitOperands(Ops, Chain, Ptr);
}
+ /// getTargetFlags - Return target-specific flags.
+ unsigned getTargetFlags() const { return SubclassData & 3; }
+
const SDValue &getBasePtr() const { return getOperand(1); }
const SDValue &getVal() const { return getOperand(2); }
@@ -1094,10 +1105,17 @@ class MemIntrinsicSDNode : public MemSDNode {
public:
MemIntrinsicSDNode(unsigned Opc, DebugLoc dl, SDVTList VTs,
const SDValue *Ops, unsigned NumOps,
- EVT MemoryVT, MachineMemOperand *MMO)
+ EVT MemoryVT, MachineMemOperand *MMO,
+ unsigned TargetFlags = 0)
: MemSDNode(Opc, dl, VTs, Ops, NumOps, MemoryVT, MMO) {
+ assert((TargetFlags & 3) == TargetFlags &&
+ "TargetFlags may not require more than 2 bits!");
+ SubclassData |= TargetFlags;
}
+ /// getTargetFlags - Return target-specific flags.
+ unsigned getTargetFlags() const { return SubclassData & 3; }
+
// Methods to support isa and dyn_cast
static bool classof(const SDNode *N) {
// We lower some target intrinsics to their target opcode
diff --git a/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp \
b/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp index a9d40d0..18c1d16 100644
--- a/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
+++ b/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
@@ -2799,6 +2799,7 @@ void SelectionDAGLegalize::ExpandNode(SDNode *Node) {
Node->getOperand(1), Zero, Zero,
cast<AtomicSDNode>(Node)->getMemOperand(),
cast<AtomicSDNode>(Node)->getOrdering(),
+ cast<AtomicSDNode>(Node)->getTargetFlags(),
cast<AtomicSDNode>(Node)->getSynchScope());
Results.push_back(Swap.getValue(0));
Results.push_back(Swap.getValue(1));
@@ -2812,6 +2813,7 @@ void SelectionDAGLegalize::ExpandNode(SDNode *Node) {
Node->getOperand(1), Node->getOperand(2),
cast<AtomicSDNode>(Node)->getMemOperand(),
cast<AtomicSDNode>(Node)->getOrdering(),
+ cast<AtomicSDNode>(Node)->getTargetFlags(),
cast<AtomicSDNode>(Node)->getSynchScope());
Results.push_back(Swap.getValue(1));
break;
diff --git a/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp \
b/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp index 182b7f3..a648940 100644
--- a/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
+++ b/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
@@ -169,7 +169,8 @@ SDValue DAGTypeLegalizer::PromoteIntRes_Atomic0(AtomicSDNode *N) \
{ SDValue Res = DAG.getAtomic(N->getOpcode(), N->getDebugLoc(),
N->getMemoryVT(), ResVT,
N->getChain(), N->getBasePtr(),
- N->getMemOperand(), N->getOrdering(),
+ N->getMemOperand(),
+ N->getOrdering(), N->getTargetFlags(),
N->getSynchScope());
// Legalized the chain result - switch anything that used the old chain to
// use the new one.
@@ -182,7 +183,8 @@ SDValue DAGTypeLegalizer::PromoteIntRes_Atomic1(AtomicSDNode *N) \
{ SDValue Res = DAG.getAtomic(N->getOpcode(), N->getDebugLoc(),
N->getMemoryVT(),
N->getChain(), N->getBasePtr(),
- Op2, N->getMemOperand(), N->getOrdering(),
+ Op2, N->getMemOperand(),
+ N->getOrdering(), N->getTargetFlags(),
N->getSynchScope());
// Legalized the chain result - switch anything that used the old chain to
// use the new one.
@@ -195,7 +197,8 @@ SDValue DAGTypeLegalizer::PromoteIntRes_Atomic2(AtomicSDNode *N) \
{ SDValue Op3 = GetPromotedInteger(N->getOperand(3));
SDValue Res = DAG.getAtomic(N->getOpcode(), N->getDebugLoc(),
N->getMemoryVT(), N->getChain(), N->getBasePtr(),
- Op2, Op3, N->getMemOperand(), N->getOrdering(),
+ Op2, Op3, N->getMemOperand(),
+ N->getOrdering(), N->getTargetFlags(),
N->getSynchScope());
// Legalized the chain result - switch anything that used the old chain to
// use the new one.
@@ -853,7 +856,8 @@ SDValue DAGTypeLegalizer::PromoteIntOp_ATOMIC_STORE(AtomicSDNode \
*N) { SDValue Op2 = GetPromotedInteger(N->getOperand(2));
return DAG.getAtomic(N->getOpcode(), N->getDebugLoc(), N->getMemoryVT(),
N->getChain(), N->getBasePtr(), Op2, N->getMemOperand(),
- N->getOrdering(), N->getSynchScope());
+ N->getOrdering(), N->getTargetFlags(),
+ N->getSynchScope());
}
SDValue DAGTypeLegalizer::PromoteIntOp_BITCAST(SDNode *N) {
@@ -2435,6 +2439,7 @@ void DAGTypeLegalizer::ExpandIntRes_ATOMIC_LOAD(SDNode *N,
N->getOperand(1), Zero, Zero,
cast<AtomicSDNode>(N)->getMemOperand(),
cast<AtomicSDNode>(N)->getOrdering(),
+ cast<AtomicSDNode>(N)->getTargetFlags(),
cast<AtomicSDNode>(N)->getSynchScope());
ReplaceValueWith(SDValue(N, 0), Swap.getValue(0));
ReplaceValueWith(SDValue(N, 1), Swap.getValue(1));
@@ -2859,6 +2864,7 @@ SDValue DAGTypeLegalizer::ExpandIntOp_ATOMIC_STORE(SDNode *N) {
N->getOperand(1), N->getOperand(2),
cast<AtomicSDNode>(N)->getMemOperand(),
cast<AtomicSDNode>(N)->getOrdering(),
+ cast<AtomicSDNode>(N)->getTargetFlags(),
cast<AtomicSDNode>(N)->getSynchScope());
return Swap.getValue(1);
}
diff --git a/lib/CodeGen/SelectionDAG/SelectionDAG.cpp \
b/lib/CodeGen/SelectionDAG/SelectionDAG.cpp index 09885d8..68f417b 100644
--- a/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -4061,7 +4061,7 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT, SDValue Chain, SDValue Ptr, SDValue Cmp,
SDValue Swp, MachinePointerInfo PtrInfo,
unsigned Alignment,
- AtomicOrdering Ordering,
+ AtomicOrdering Ordering, unsigned TargetFlags,
SynchronizationScope SynchScope) {
if (Alignment == 0) // Ensure that codegen never sees alignment 0
Alignment = getEVTAlignment(MemVT);
@@ -4082,14 +4082,14 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT,
MF.getMachineMemOperand(PtrInfo, Flags, MemVT.getStoreSize(), Alignment);
return getAtomic(Opcode, dl, MemVT, Chain, Ptr, Cmp, Swp, MMO,
- Ordering, SynchScope);
+ Ordering, TargetFlags, SynchScope);
}
SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT,
SDValue Chain,
SDValue Ptr, SDValue Cmp,
SDValue Swp, MachineMemOperand *MMO,
- AtomicOrdering Ordering,
+ AtomicOrdering Ordering, unsigned TargetFlags,
SynchronizationScope SynchScope) {
assert(Opcode == ISD::ATOMIC_CMP_SWAP && "Invalid Atomic Op");
assert(Cmp.getValueType() == Swp.getValueType() && "Invalid Atomic Op Types");
@@ -4109,7 +4109,7 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT, }
SDNode *N = new (NodeAllocator) AtomicSDNode(Opcode, dl, VTs, MemVT, Chain,
Ptr, Cmp, Swp, MMO, Ordering,
- SynchScope);
+ TargetFlags, SynchScope);
CSEMap.InsertNode(N, IP);
AllNodes.push_back(N);
return SDValue(N, 0);
@@ -4120,7 +4120,7 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT, SDValue Ptr, SDValue Val,
const Value* PtrVal,
unsigned Alignment,
- AtomicOrdering Ordering,
+ AtomicOrdering Ordering, unsigned TargetFlags,
SynchronizationScope SynchScope) {
if (Alignment == 0) // Ensure that codegen never sees alignment 0
Alignment = getEVTAlignment(MemVT);
@@ -4143,14 +4143,14 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT, MemVT.getStoreSize(), Alignment);
return getAtomic(Opcode, dl, MemVT, Chain, Ptr, Val, MMO,
- Ordering, SynchScope);
+ Ordering, TargetFlags, SynchScope);
}
SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT,
SDValue Chain,
SDValue Ptr, SDValue Val,
MachineMemOperand *MMO,
- AtomicOrdering Ordering,
+ AtomicOrdering Ordering, unsigned TargetFlags,
SynchronizationScope SynchScope) {
assert((Opcode == ISD::ATOMIC_LOAD_ADD ||
Opcode == ISD::ATOMIC_LOAD_SUB ||
@@ -4181,8 +4181,8 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT, return SDValue(E, 0);
}
SDNode *N = new (NodeAllocator) AtomicSDNode(Opcode, dl, VTs, MemVT, Chain,
- Ptr, Val, MMO,
- Ordering, SynchScope);
+ Ptr, Val, MMO, Ordering,
+ TargetFlags, SynchScope);
CSEMap.InsertNode(N, IP);
AllNodes.push_back(N);
return SDValue(N, 0);
@@ -4193,7 +4193,7 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT, SDValue Ptr,
const Value* PtrVal,
unsigned Alignment,
- AtomicOrdering Ordering,
+ AtomicOrdering Ordering, unsigned TargetFlags,
SynchronizationScope SynchScope) {
if (Alignment == 0) // Ensure that codegen never sees alignment 0
Alignment = getEVTAlignment(MemVT);
@@ -4216,14 +4216,14 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT, MemVT.getStoreSize(), Alignment);
return getAtomic(Opcode, dl, MemVT, VT, Chain, Ptr, MMO,
- Ordering, SynchScope);
+ Ordering, TargetFlags, SynchScope);
}
SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, EVT MemVT,
EVT VT, SDValue Chain,
SDValue Ptr,
MachineMemOperand *MMO,
- AtomicOrdering Ordering,
+ AtomicOrdering Ordering, unsigned TargetFlags,
SynchronizationScope SynchScope) {
assert(Opcode == ISD::ATOMIC_LOAD && "Invalid Atomic Op");
@@ -4239,7 +4239,8 @@ SDValue SelectionDAG::getAtomic(unsigned Opcode, DebugLoc dl, \
EVT MemVT, return SDValue(E, 0);
}
SDNode *N = new (NodeAllocator) AtomicSDNode(Opcode, dl, VTs, MemVT, Chain,
- Ptr, MMO, Ordering, SynchScope);
+ Ptr, MMO, Ordering, TargetFlags,
+ SynchScope);
CSEMap.InsertNode(N, IP);
AllNodes.push_back(N);
return SDValue(N, 0);
@@ -4265,10 +4266,11 @@ SelectionDAG::getMemIntrinsicNode(unsigned Opcode, DebugLoc \
dl, const SDValue *Ops, unsigned NumOps,
EVT MemVT, MachinePointerInfo PtrInfo,
unsigned Align, bool Vol,
- bool ReadMem, bool WriteMem) {
+ bool ReadMem, bool WriteMem,
+ unsigned TargetFlags) {
return getMemIntrinsicNode(Opcode, dl, makeVTList(VTs, NumVTs), Ops, NumOps,
MemVT, PtrInfo, Align, Vol,
- ReadMem, WriteMem);
+ ReadMem, WriteMem, TargetFlags);
}
SDValue
@@ -4276,7 +4278,8 @@ SelectionDAG::getMemIntrinsicNode(unsigned Opcode, DebugLoc dl, \
SDVTList VTList, const SDValue *Ops, unsigned NumOps,
EVT MemVT, MachinePointerInfo PtrInfo,
unsigned Align, bool Vol,
- bool ReadMem, bool WriteMem) {
+ bool ReadMem, bool WriteMem,
+ unsigned TargetFlags) {
if (Align == 0) // Ensure that codegen never sees alignment 0
Align = getEVTAlignment(MemVT);
@@ -4291,13 +4294,14 @@ SelectionDAG::getMemIntrinsicNode(unsigned Opcode, DebugLoc \
dl, SDVTList VTList, MachineMemOperand *MMO =
MF.getMachineMemOperand(PtrInfo, Flags, MemVT.getStoreSize(), Align);
- return getMemIntrinsicNode(Opcode, dl, VTList, Ops, NumOps, MemVT, MMO);
+ return getMemIntrinsicNode(Opcode, dl, VTList, Ops, NumOps, MemVT, MMO, \
TargetFlags); }
SDValue
SelectionDAG::getMemIntrinsicNode(unsigned Opcode, DebugLoc dl, SDVTList VTList,
const SDValue *Ops, unsigned NumOps,
- EVT MemVT, MachineMemOperand *MMO) {
+ EVT MemVT, MachineMemOperand *MMO,
+ unsigned TargetFlags) {
assert((Opcode == ISD::INTRINSIC_VOID ||
Opcode == ISD::INTRINSIC_W_CHAIN ||
Opcode == ISD::PREFETCH ||
@@ -4320,11 +4324,11 @@ SelectionDAG::getMemIntrinsicNode(unsigned Opcode, DebugLoc \
dl, SDVTList VTList, }
N = new (NodeAllocator) MemIntrinsicSDNode(Opcode, dl, VTList, Ops, NumOps,
- MemVT, MMO);
+ MemVT, MMO, TargetFlags);
CSEMap.InsertNode(N, IP);
} else {
N = new (NodeAllocator) MemIntrinsicSDNode(Opcode, dl, VTList, Ops, NumOps,
- MemVT, MMO);
+ MemVT, MMO, TargetFlags);
}
AllNodes.push_back(N);
return SDValue(N, 0);
diff --git a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp \
b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp index 3a55696..0aa4be5 100644
--- a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -3423,9 +3423,25 @@ static SDValue InsertFenceForAtomic(SDValue Chain, \
AtomicOrdering Order, return DAG.getNode(ISD::ATOMIC_FENCE, dl, MVT::Other, Ops, 3);
}
+static unsigned GetAtomicTargetFlags(const Instruction &I) {
+ const MDNode* TargetFlagsInfo = I.getMetadata("targetflags");
+
+ if (!TargetFlagsInfo)
+ return 0;
+
+ assert((TargetFlagsInfo->getNumOperands() > 0) &&
+ "'targetflags' requires 1 operand!");
+ const ConstantInt *CI =
+ dyn_cast<ConstantInt>(TargetFlagsInfo->getOperand(0));
+ assert(CI && "'targetflags' not a constant integer!");
+
+ return CI->getZExtValue();
+}
+
void SelectionDAGBuilder::visitAtomicCmpXchg(const AtomicCmpXchgInst &I) {
DebugLoc dl = getCurDebugLoc();
AtomicOrdering Order = I.getOrdering();
+ unsigned TargetFlags = GetAtomicTargetFlags(I);
SynchronizationScope Scope = I.getSynchScope();
SDValue InChain = getRoot();
@@ -3443,6 +3459,7 @@ void SelectionDAGBuilder::visitAtomicCmpXchg(const \
AtomicCmpXchgInst &I) { getValue(I.getNewValOperand()),
MachinePointerInfo(I.getPointerOperand()), 0 /* Alignment */,
TLI.getInsertFencesForAtomic() ? Monotonic : Order,
+ TargetFlags,
Scope);
SDValue OutChain = L.getValue(1);
@@ -3473,6 +3490,7 @@ void SelectionDAGBuilder::visitAtomicRMW(const AtomicRMWInst \
&I) { case AtomicRMWInst::UMin: NT = ISD::ATOMIC_LOAD_UMIN; break;
}
AtomicOrdering Order = I.getOrdering();
+ unsigned TargetFlags = GetAtomicTargetFlags(I);
SynchronizationScope Scope = I.getSynchScope();
SDValue InChain = getRoot();
@@ -3489,6 +3507,7 @@ void SelectionDAGBuilder::visitAtomicRMW(const AtomicRMWInst \
&I) { getValue(I.getValOperand()),
I.getPointerOperand(), 0 /* Alignment */,
TLI.getInsertFencesForAtomic() ? Monotonic : Order,
+ TargetFlags,
Scope);
SDValue OutChain = L.getValue(1);
@@ -3513,6 +3532,7 @@ void SelectionDAGBuilder::visitFence(const FenceInst &I) {
void SelectionDAGBuilder::visitAtomicLoad(const LoadInst &I) {
DebugLoc dl = getCurDebugLoc();
AtomicOrdering Order = I.getOrdering();
+ unsigned TargetFlags = GetAtomicTargetFlags(I);
SynchronizationScope Scope = I.getSynchScope();
SDValue InChain = getRoot();
@@ -3527,6 +3547,7 @@ void SelectionDAGBuilder::visitAtomicLoad(const LoadInst &I) {
getValue(I.getPointerOperand()),
I.getPointerOperand(), I.getAlignment(),
TLI.getInsertFencesForAtomic() ? Monotonic : Order,
+ TargetFlags,
Scope);
SDValue OutChain = L.getValue(1);
@@ -3543,6 +3564,7 @@ void SelectionDAGBuilder::visitAtomicStore(const StoreInst &I) \
{ DebugLoc dl = getCurDebugLoc();
AtomicOrdering Order = I.getOrdering();
+ unsigned TargetFlags = GetAtomicTargetFlags(I);
SynchronizationScope Scope = I.getSynchScope();
SDValue InChain = getRoot();
@@ -3563,6 +3585,7 @@ void SelectionDAGBuilder::visitAtomicStore(const StoreInst &I) \
{ getValue(I.getValueOperand()),
I.getPointerOperand(), I.getAlignment(),
TLI.getInsertFencesForAtomic() ? Monotonic : Order,
+ TargetFlags,
Scope);
if (TLI.getInsertFencesForAtomic())
diff --git a/lib/Target/AArch64/AArch64ISelLowering.cpp \
b/lib/Target/AArch64/AArch64ISelLowering.cpp index cea7f91..66f6eec 100644
--- a/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -2414,6 +2414,7 @@ static SDValue PerformATOMIC_FENCECombine(SDNode *FenceNode,
Chain, // Chain
AtomicOp.getOperand(1), // Pointer
AtomicNode->getMemOperand(), Acquire,
+ AtomicNode->getTargetFlags(),
FenceScope);
if (AtomicNode->getOpcode() == ISD::ATOMIC_LOAD)
@@ -2447,6 +2448,7 @@ static SDValue PerformATOMIC_STORECombine(SDNode *N,
AtomicNode->getOperand(1), // Pointer
AtomicNode->getOperand(2), // Value
AtomicNode->getMemOperand(), Release,
+ AtomicNode->getTargetFlags(),
FenceScope);
}
diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
index 9ed03cd..d525e3d 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -11926,9 +11926,10 @@ static SDValue LowerCMP_SWAP(SDValue Op, const X86Subtarget \
*Subtarget, DAG.getTargetConstant(size, MVT::i8),
cpIn.getValue(1) };
SDVTList Tys = DAG.getVTList(MVT::Other, MVT::Glue);
- MachineMemOperand *MMO = cast<AtomicSDNode>(Op)->getMemOperand();
+ const AtomicSDNode *AT = cast<AtomicSDNode>(Op);
SDValue Result = DAG.getMemIntrinsicNode(X86ISD::LCMPXCHG_DAG, DL, Tys,
- Ops, 5, T, MMO);
+ Ops, 5, T, AT->getMemOperand(),
+ AT->getTargetFlags());
SDValue cpOut =
DAG.getCopyFromReg(Result.getValue(0), DL, Reg, T, Result.getValue(1));
return cpOut;
@@ -11986,6 +11987,7 @@ static SDValue LowerLOAD_SUB(SDValue Op, SelectionDAG &DAG) {
cast<AtomicSDNode>(Node)->getSrcValue(),
cast<AtomicSDNode>(Node)->getAlignment(),
cast<AtomicSDNode>(Node)->getOrdering(),
+ cast<AtomicSDNode>(Node)->getTargetFlags(),
cast<AtomicSDNode>(Node)->getSynchScope());
}
@@ -12007,6 +12009,7 @@ static SDValue LowerATOMIC_STORE(SDValue Op, SelectionDAG \
&DAG) {
Node->getOperand(1), Node->getOperand(2),
cast<AtomicSDNode>(Node)->getMemOperand(),
cast<AtomicSDNode>(Node)->getOrdering(),
+ cast<AtomicSDNode>(Node)->getTargetFlags(),
cast<AtomicSDNode>(Node)->getSynchScope());
return Swap.getValue(1);
}
@@ -12179,6 +12182,7 @@ static void ReplaceATOMIC_LOAD(SDNode *Node,
Node->getOperand(1), Zero, Zero,
cast<AtomicSDNode>(Node)->getMemOperand(),
cast<AtomicSDNode>(Node)->getOrdering(),
+ cast<AtomicSDNode>(Node)->getTargetFlags(),
cast<AtomicSDNode>(Node)->getSynchScope());
Results.push_back(Swap.getValue(0));
Results.push_back(Swap.getValue(1));
@@ -12199,9 +12203,10 @@ ReplaceATOMIC_BINARY_64(SDNode *Node, \
SmallVectorImpl<SDValue>&Results,
Node->getOperand(2), DAG.getIntPtrConstant(1));
SDValue Ops[] = { Chain, In1, In2L, In2H };
SDVTList Tys = DAG.getVTList(MVT::i32, MVT::i32, MVT::Other);
- SDValue Result =
- DAG.getMemIntrinsicNode(NewOp, dl, Tys, Ops, 4, MVT::i64,
- cast<MemSDNode>(Node)->getMemOperand());
+ const AtomicSDNode *AT = cast<AtomicSDNode>(Node);
+ SDValue Result = DAG.getMemIntrinsicNode(NewOp, dl, Tys, Ops, 4, MVT::i64,
+ AT->getMemOperand(),
+ AT->getTargetFlags());
SDValue OpsF[] = { Result.getValue(0), Result.getValue(1)};
Results.push_back(DAG.getNode(ISD::BUILD_PAIR, dl, MVT::i64, OpsF, 2));
Results.push_back(Result.getValue(2));
@@ -12314,11 +12319,12 @@ void X86TargetLowering::ReplaceNodeResults(SDNode *N,
N->getOperand(1),
swapInH.getValue(1) };
SDVTList Tys = DAG.getVTList(MVT::Other, MVT::Glue);
- MachineMemOperand *MMO = cast<AtomicSDNode>(N)->getMemOperand();
+ const AtomicSDNode *AT = cast<AtomicSDNode>(N);
unsigned Opcode = Regs64bit ? X86ISD::LCMPXCHG16_DAG :
X86ISD::LCMPXCHG8_DAG;
- SDValue Result = DAG.getMemIntrinsicNode(Opcode, dl, Tys,
- Ops, 3, T, MMO);
+ SDValue Result = DAG.getMemIntrinsicNode(Opcode, dl, Tys, Ops, 3, T,
+ AT->getMemOperand(),
+ AT->getTargetFlags());
SDValue cpOutL = DAG.getCopyFromReg(Result.getValue(0), dl,
Regs64bit ? X86::RAX : X86::EAX,
HalfT, Result.getValue(1));
--
1.7.9.5
["0002-Add-HLE-target-feature.patch" (0002-Add-HLE-target-feature.patch)]
From 5f18d83c4c633c43becfcb2557f831e3df717815 Mon Sep 17 00:00:00 2001
From: Michael Liao <michael.hliao@gmail.com>
Date: Thu, 5 Jul 2012 23:38:57 -0700
Subject: [PATCH 2/4] Add HLE target feature
---
lib/Target/X86/X86.td | 4 +++-
lib/Target/X86/X86InstrInfo.td | 1 +
lib/Target/X86/X86Subtarget.cpp | 5 +++++
lib/Target/X86/X86Subtarget.h | 4 ++++
4 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/lib/Target/X86/X86.td b/lib/Target/X86/X86.td
index 0216252..810acee 100644
--- a/lib/Target/X86/X86.td
+++ b/lib/Target/X86/X86.td
@@ -120,6 +120,8 @@ def FeatureBMI2 : SubtargetFeature<"bmi2", "HasBMI2", "true",
"Support BMI2 instructions">;
def FeatureRTM : SubtargetFeature<"rtm", "HasRTM", "true",
"Support RTM instructions">;
+def FeatureHLE : SubtargetFeature<"hle", "HasHLE", "true",
+ "Support HLE">;
def FeatureADX : SubtargetFeature<"adx", "HasADX", "true",
"Support ADX instructions">;
def FeatureLeaForSP : SubtargetFeature<"lea-sp", "UseLeaForSP", "true",
@@ -201,7 +203,7 @@ def : Proc<"core-avx2", [FeatureAVX2, FeatureCMPXCHG16B, FeatureFastUAMem,
FeatureRDRAND, FeatureF16C, FeatureFSGSBase,
FeatureMOVBE, FeatureLZCNT, FeatureBMI,
FeatureBMI2, FeatureFMA,
- FeatureRTM]>;
+ FeatureRTM, FeatureHLE]>;
def : Proc<"k6", [FeatureMMX]>;
def : Proc<"k6-2", [Feature3DNow]>;
diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td
index 84c278c..46daaad 100644
--- a/lib/Target/X86/X86InstrInfo.td
+++ b/lib/Target/X86/X86InstrInfo.td
@@ -603,6 +603,7 @@ def HasLZCNT : Predicate<"Subtarget->hasLZCNT()">;
def HasBMI : Predicate<"Subtarget->hasBMI()">;
def HasBMI2 : Predicate<"Subtarget->hasBMI2()">;
def HasRTM : Predicate<"Subtarget->hasRTM()">;
+def HasHLE : Predicate<"Subtarget->hasHLE()">;
def HasADX : Predicate<"Subtarget->hasADX()">;
def FPStackf32 : Predicate<"!Subtarget->hasSSE1()">;
def FPStackf64 : Predicate<"!Subtarget->hasSSE2()">;
diff --git a/lib/Target/X86/X86Subtarget.cpp b/lib/Target/X86/X86Subtarget.cpp
index 0f2c008..a9955ce 100644
--- a/lib/Target/X86/X86Subtarget.cpp
+++ b/lib/Target/X86/X86Subtarget.cpp
@@ -310,6 +310,10 @@ void X86Subtarget::AutoDetectSubtargetFeatures() {
HasBMI = true;
ToggleFeature(X86::FeatureBMI);
}
+ if ((EBX >> 4) & 0x1) {
+ HasHLE = true;
+ ToggleFeature(X86::FeatureHLE);
+ }
if (IsIntel && ((EBX >> 5) & 0x1)) {
X86SSELevel = AVX2;
ToggleFeature(X86::FeatureAVX2);
@@ -439,6 +443,7 @@ void X86Subtarget::initializeEnvironment() {
HasBMI = false;
HasBMI2 = false;
HasRTM = false;
+ HasHLE = false;
HasADX = false;
IsBTMemSlow = false;
IsUAMemFast = false;
diff --git a/lib/Target/X86/X86Subtarget.h b/lib/Target/X86/X86Subtarget.h
index e97da4b..411494a 100644
--- a/lib/Target/X86/X86Subtarget.h
+++ b/lib/Target/X86/X86Subtarget.h
@@ -121,6 +121,9 @@ protected:
/// HasRTM - Processor has RTM instructions.
bool HasRTM;
+ /// HasHLE - Processor has HLE.
+ bool HasHLE;
+
/// HasADX - Processor has ADX instructions.
bool HasADX;
@@ -253,6 +256,7 @@ public:
bool hasBMI() const { return HasBMI; }
bool hasBMI2() const { return HasBMI2; }
bool hasRTM() const { return HasRTM; }
+ bool hasHLE() const { return HasHLE; }
bool hasADX() const { return HasADX; }
bool isBTMemSlow() const { return IsBTMemSlow; }
bool isUnalignedMemAccessFast() const { return IsUAMemFast; }
--
1.7.9.5
["0003-Add-XACQ-XREL-prefix-and-encoding-asm-printer-suppor.patch" (0003-Add-XACQ-XREL-prefix-and-encoding-asm-printer-suppor.patch)]
From 3fc0f1c4b089f16cc064437ba238c5e17b67ea04 Mon Sep 17 00:00:00 2001
From: Michael Liao <michael.hliao@gmail.com>
Date: Thu, 5 Jul 2012 21:32:14 -0700
Subject: [PATCH 3/4] Add XACQ/XREL prefix and encoding/asm-printer support
---
lib/Target/X86/AsmParser/X86AsmParser.cpp | 3 +-
lib/Target/X86/InstPrinter/X86ATTInstPrinter.cpp | 10 ++++++
lib/Target/X86/InstPrinter/X86IntelInstPrinter.cpp | 10 ++++++
lib/Target/X86/MCTargetDesc/X86BaseInfo.h | 12 +++++++-
lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp | 10 ++++++
lib/Target/X86/X86InstrFormats.td | 32 ++++++++++++--------
lib/Target/X86/X86InstrInfo.td | 4 +++
test/MC/X86/x86_64-hle-encoding.s | 25 +++++++++++++++
8 files changed, 91 insertions(+), 15 deletions(-)
create mode 100644 test/MC/X86/x86_64-hle-encoding.s
diff --git a/lib/Target/X86/AsmParser/X86AsmParser.cpp \
b/lib/Target/X86/AsmParser/X86AsmParser.cpp index 8c4c447..b9dc8bb 100644
--- a/lib/Target/X86/AsmParser/X86AsmParser.cpp
+++ b/lib/Target/X86/AsmParser/X86AsmParser.cpp
@@ -1515,7 +1515,8 @@ ParseInstruction(ParseInstructionInfo &Info, StringRef Name, \
SMLoc NameLoc, Name == "lock" || Name == "rep" ||
Name == "repe" || Name == "repz" ||
Name == "repne" || Name == "repnz" ||
- Name == "rex64" || Name == "data16";
+ Name == "rex64" || Name == "data16" ||
+ Name == "xacquire" || Name == "xrelease";
// This does the actual operand parsing. Don't parse any more if we have a
diff --git a/lib/Target/X86/InstPrinter/X86ATTInstPrinter.cpp \
b/lib/Target/X86/InstPrinter/X86ATTInstPrinter.cpp index e357710..7764961 100644
--- a/lib/Target/X86/InstPrinter/X86ATTInstPrinter.cpp
+++ b/lib/Target/X86/InstPrinter/X86ATTInstPrinter.cpp
@@ -47,6 +47,16 @@ void X86ATTInstPrinter::printInst(const MCInst *MI, raw_ostream \
&OS, if (TSFlags & X86II::LOCK)
OS << "\tlock\n";
+ if (TSFlags & X86II::XACQUIRE) {
+ assert(!(TSFlags & X86II::XRELEASE) && "unknown HLE prefix hints!");
+ OS << "\txacquire\n";
+ }
+
+ if (TSFlags & X86II::XRELEASE) {
+ assert(!(TSFlags & X86II::XACQUIRE) && "unknown HLE prefix hints!");
+ OS << "\txrelease\n";
+ }
+
// Try to print any aliases first.
if (!printAliasInstr(MI, OS))
printInstruction(MI, OS);
diff --git a/lib/Target/X86/InstPrinter/X86IntelInstPrinter.cpp \
b/lib/Target/X86/InstPrinter/X86IntelInstPrinter.cpp index 141f4a4..734dfe2 100644
--- a/lib/Target/X86/InstPrinter/X86IntelInstPrinter.cpp
+++ b/lib/Target/X86/InstPrinter/X86IntelInstPrinter.cpp
@@ -39,6 +39,16 @@ void X86IntelInstPrinter::printInst(const MCInst *MI, raw_ostream \
&OS, if (TSFlags & X86II::LOCK)
OS << "\tlock\n";
+ if (TSFlags & X86II::XACQUIRE) {
+ assert(!(TSFlags & X86II::XRELEASE) && "unknown HLE prefix hints!");
+ OS << "\txacquire\n";
+ }
+
+ if (TSFlags & X86II::XRELEASE) {
+ assert(!(TSFlags & X86II::XACQUIRE) && "unknown HLE prefix hints!");
+ OS << "\txrelease\n";
+ }
+
printInstruction(MI, OS);
// Next always print the annotation.
diff --git a/lib/Target/X86/MCTargetDesc/X86BaseInfo.h \
b/lib/Target/X86/MCTargetDesc/X86BaseInfo.h index 9e68388..fb21398 100644
--- a/lib/Target/X86/MCTargetDesc/X86BaseInfo.h
+++ b/lib/Target/X86/MCTargetDesc/X86BaseInfo.h
@@ -415,9 +415,19 @@ namespace X86II {
LOCKShift = FPTypeShift + 3,
LOCK = 1 << LOCKShift,
+ // TSX/HLE prefix
+ TSXShift = LOCKShift + 1,
+ TSXMask = 3 << TSXShift,
+
+ // XACQUIRE - Specifies that this instruction has XACQUIRE HLE prefix hint
+ XACQUIRE = 1 << TSXShift,
+
+ // XRELEASE - Specifies that this instruction has XRELEASE HLE prefix hint
+ XRELEASE = 2 << TSXShift,
+
// Segment override prefixes. Currently we just need ability to address
// stuff in gs and fs segments.
- SegOvrShift = LOCKShift + 1,
+ SegOvrShift = TSXShift + 2,
SegOvrMask = 3 << SegOvrShift,
FS = 1 << SegOvrShift,
GS = 2 << SegOvrShift,
diff --git a/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp \
b/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp index 122204a..f227d7c 100644
--- a/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp
+++ b/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp
@@ -851,6 +851,16 @@ void X86MCCodeEmitter::EmitOpcodePrefix(uint64_t TSFlags, \
unsigned &CurByte, if (TSFlags & X86II::LOCK)
EmitByte(0xF0, CurByte, OS);
+ if (TSFlags & X86II::XACQUIRE) {
+ assert(!(TSFlags & X86II::XRELEASE) && "unknown HLE prefix hints!");
+ EmitByte(0xF2, CurByte, OS);
+ }
+
+ if (TSFlags & X86II::XRELEASE) {
+ assert(!(TSFlags & X86II::XACQUIRE) && "unknown HLE prefix hints!");
+ EmitByte(0xF3, CurByte, OS);
+ }
+
// Emit segment override opcode prefix as needed.
EmitSegmentOverridePrefix(TSFlags, CurByte, MemOperand, MI, OS);
diff --git a/lib/Target/X86/X86InstrFormats.td b/lib/Target/X86/X86InstrFormats.td
index 44e574d..d5bd098 100644
--- a/lib/Target/X86/X86InstrFormats.td
+++ b/lib/Target/X86/X86InstrFormats.td
@@ -99,6 +99,8 @@ class OpSize { bit hasOpSizePrefix = 1; }
class AdSize { bit hasAdSizePrefix = 1; }
class REX_W { bit hasREX_WPrefix = 1; }
class LOCK { bit hasLockPrefix = 1; }
+class XACQ { bit hasXAcquire = 1; }
+class XREL { bit hasXRelease = 1; }
class SegFS { bits<2> SegOvrBits = 1; }
class SegGS { bits<2> SegOvrBits = 2; }
class TB { bits<5> Prefix = 1; }
@@ -163,6 +165,8 @@ class X86Inst<bits<8> opcod, Format f, ImmType i, dag outs, dag \
ins, bit hasREX_WPrefix = 0; // Does this inst require the REX.W prefix?
FPFormat FPForm = NotFP; // What flavor of FP instruction is this?
bit hasLockPrefix = 0; // Does this inst have a 0xF0 prefix?
+ bit hasXAcquire = 0; // Does this instruction require an XACQUIRE prefix?
+ bit hasXRelease = 0; // Does this instruction require an XRELEASE prefix?
bits<2> SegOvrBits = 0; // Segment override prefix.
Domain ExeDomain = d;
bit hasVEXPrefix = 0; // Does this inst require a VEX prefix?
@@ -187,19 +191,21 @@ class X86Inst<bits<8> opcod, Format f, ImmType i, dag outs, dag \
ins, let TSFlags{16-14} = ImmT.Value;
let TSFlags{19-17} = FPForm.Value;
let TSFlags{20} = hasLockPrefix;
- let TSFlags{22-21} = SegOvrBits;
- let TSFlags{24-23} = ExeDomain.Value;
- let TSFlags{32-25} = Opcode;
- let TSFlags{33} = hasVEXPrefix;
- let TSFlags{34} = hasVEX_WPrefix;
- let TSFlags{35} = hasVEX_4VPrefix;
- let TSFlags{36} = hasVEX_4VOp3Prefix;
- let TSFlags{37} = hasVEX_i8ImmReg;
- let TSFlags{38} = hasVEX_L;
- let TSFlags{39} = ignoresVEX_L;
- let TSFlags{40} = has3DNow0F0FOpcode;
- let TSFlags{41} = hasMemOp4Prefix;
- let TSFlags{42} = hasXOP_Prefix;
+ let TSFlags{21} = hasXAcquire;
+ let TSFlags{22} = hasXRelease;
+ let TSFlags{24-23} = SegOvrBits;
+ let TSFlags{26-25} = ExeDomain.Value;
+ let TSFlags{34-27} = Opcode;
+ let TSFlags{35} = hasVEXPrefix;
+ let TSFlags{36} = hasVEX_WPrefix;
+ let TSFlags{37} = hasVEX_4VPrefix;
+ let TSFlags{38} = hasVEX_4VOp3Prefix;
+ let TSFlags{39} = hasVEX_i8ImmReg;
+ let TSFlags{40} = hasVEX_L;
+ let TSFlags{41} = ignoresVEX_L;
+ let TSFlags{42} = has3DNow0F0FOpcode;
+ let TSFlags{43} = hasMemOp4Prefix;
+ let TSFlags{44} = hasXOP_Prefix;
}
class PseudoI<dag oops, dag iops, list<dag> pattern>
diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td
index 46daaad..04d8f19 100644
--- a/lib/Target/X86/X86InstrInfo.td
+++ b/lib/Target/X86/X86InstrInfo.td
@@ -1460,6 +1460,10 @@ def REP_PREFIX : I<0xF3, RawFrm, (outs), (ins), "rep", []>;
def REPNE_PREFIX : I<0xF2, RawFrm, (outs), (ins), "repne", []>;
}
+// HLE hint prefix
+def : MnemonicAlias<"xacquire", "repne">;
+def : MnemonicAlias<"xrelease", "rep">;
+
// String manipulation instructions
def LODSB : I<0xAC, RawFrm, (outs), (ins), "lodsb", [], IIC_LODS>;
diff --git a/test/MC/X86/x86_64-hle-encoding.s b/test/MC/X86/x86_64-hle-encoding.s
new file mode 100644
index 0000000..4109fb4
--- /dev/null
+++ b/test/MC/X86/x86_64-hle-encoding.s
@@ -0,0 +1,25 @@
+// RUN: llvm-mc -triple x86_64-unknown-unknown --show-encoding %s | FileCheck %s
+
+// CHECK: lock
+// CHECK: encoding: [0xf0]
+// CHECK: repne
+// CHECK: encoding: [0xf2]
+ lock xacquire xaddq %rax, sym(%rip)
+
+// CHECK: repne
+// CHECK: encoding: [0xf2]
+// CHECK: lock
+// CHECK: encoding: [0xf0]
+ xacquire lock xaddq %rax, sym(%rip)
+
+// CHECK: lock
+// CHECK: encoding: [0xf0]
+// CHECK: rep
+// CHECK: encoding: [0xf3]
+ lock xrelease xaddq %rax, sym(%rip)
+
+// CHECK: rep
+// CHECK: encoding: [0xf3]
+// CHECK: lock
+// CHECK: encoding: [0xf0]
+ xrelease lock xaddq %rax, sym(%rip)
--
1.7.9.5
["0004-Enable-HLE-code-generation.patch" (0004-Enable-HLE-code-generation.patch)]
From 5cef473f18c43c646911c4d51f6e6a79293ff3fd Mon Sep 17 00:00:00 2001
From: Michael Liao <michael.hliao@gmail.com>
Date: Thu, 14 Feb 2013 22:05:25 -0800
Subject: [PATCH 4/4] Enable HLE code generation
- Add test cases
---
lib/Target/X86/X86ISelDAGToDAG.cpp | 208 ++++++++++++++++++++----------------
lib/Target/X86/X86ISelLowering.cpp | 40 +++++--
lib/Target/X86/X86InstrCompiler.td | 130 ++++++++++++++++------
lib/Target/X86/X86InstrInfo.td | 162 +++++++++++++++++++++++++++-
test/CodeGen/X86/hle-atomic16.ll | 188 ++++++++++++++++++++++++++++++++
test/CodeGen/X86/hle-atomic32.ll | 188 ++++++++++++++++++++++++++++++++
test/CodeGen/X86/hle-atomic64.ll | 188 ++++++++++++++++++++++++++++++++
test/CodeGen/X86/hle-atomic8.ll | 188 ++++++++++++++++++++++++++++++++
8 files changed, 1161 insertions(+), 131 deletions(-)
create mode 100644 test/CodeGen/X86/hle-atomic16.ll
create mode 100644 test/CodeGen/X86/hle-atomic32.ll
create mode 100644 test/CodeGen/X86/hle-atomic64.ll
create mode 100644 test/CodeGen/X86/hle-atomic8.ll
diff --git a/lib/Target/X86/X86ISelDAGToDAG.cpp b/lib/Target/X86/X86ISelDAGToDAG.cpp
index 6f13186..380df63 100644
--- a/lib/Target/X86/X86ISelDAGToDAG.cpp
+++ b/lib/Target/X86/X86ISelDAGToDAG.cpp
@@ -1494,12 +1494,20 @@ SDNode *X86DAGToDAGISel::SelectAtomic64(SDNode *Node, unsigned Opc) {
SDValue In2L = Node->getOperand(2);
SDValue In2H = Node->getOperand(3);
+ unsigned TargetFlags
+ = Subtarget->hasHLE() ? cast<MemIntrinsicSDNode>(Node)->getTargetFlags() :
+ 0;
+ assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+
+ SDValue TFlag = CurDAG->getTargetConstant(TargetFlags, MVT::i8);
+
SDValue Tmp0, Tmp1, Tmp2, Tmp3, Tmp4;
if (!SelectAddr(Node, In1, Tmp0, Tmp1, Tmp2, Tmp3, Tmp4))
return NULL;
MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);
MemOp[0] = cast<MemSDNode>(Node)->getMemOperand();
- const SDValue Ops[] = { Tmp0, Tmp1, Tmp2, Tmp3, Tmp4, In2L, In2H, Chain};
+ const SDValue Ops[] = { Tmp0, Tmp1, Tmp2, Tmp3, Tmp4, In2L, In2H, TFlag,
+ Chain};
SDNode *ResNode = CurDAG->getMachineNode(Opc, Node->getDebugLoc(),
MVT::i32, MVT::i32, MVT::Other, Ops,
array_lengthof(Ops));
@@ -1535,97 +1543,104 @@ enum AtomicSz {
AtomicSzEnd
};
-static const uint16_t AtomicOpcTbl[AtomicOpcEnd][AtomicSzEnd] = {
+enum AtomicTargetFlags {
+ TargetFlagNone,
+ TargetFlagXAcquire,
+ TargetFlagXRelease,
+ AtomicTfEnd
+};
+
+static const uint16_t AtomicOpcTbl[AtomicOpcEnd][AtomicSzEnd][AtomicTfEnd] = {
{
- X86::LOCK_ADD8mi,
- X86::LOCK_ADD8mr,
- X86::LOCK_ADD16mi8,
- X86::LOCK_ADD16mi,
- X86::LOCK_ADD16mr,
- X86::LOCK_ADD32mi8,
- X86::LOCK_ADD32mi,
- X86::LOCK_ADD32mr,
- X86::LOCK_ADD64mi8,
- X86::LOCK_ADD64mi32,
- X86::LOCK_ADD64mr,
+ { X86::LOCK_ADD8mi, X86::LOCK_ADDACQ8mi, X86::LOCK_ADDREL8mi },
+ { X86::LOCK_ADD8mr, X86::LOCK_ADDACQ8mr, X86::LOCK_ADDREL8mr },
+ { X86::LOCK_ADD16mi8, X86::LOCK_ADDACQ16mi8, X86::LOCK_ADDREL16mi8 },
+ { X86::LOCK_ADD16mi, X86::LOCK_ADDACQ16mi, X86::LOCK_ADDREL16mi },
+ { X86::LOCK_ADD16mr, X86::LOCK_ADDACQ16mr, X86::LOCK_ADDREL16mr },
+ { X86::LOCK_ADD32mi8, X86::LOCK_ADDACQ32mi8, X86::LOCK_ADDREL32mi8 },
+ { X86::LOCK_ADD32mi, X86::LOCK_ADDACQ32mi, X86::LOCK_ADDREL32mi },
+ { X86::LOCK_ADD32mr, X86::LOCK_ADDACQ32mr, X86::LOCK_ADDREL32mr },
+ { X86::LOCK_ADD64mi8, X86::LOCK_ADDACQ64mi8, X86::LOCK_ADDREL64mi8 },
+ { X86::LOCK_ADD64mi32, X86::LOCK_ADDACQ64mi32, X86::LOCK_ADDREL64mi32 },
+ { X86::LOCK_ADD64mr, X86::LOCK_ADDACQ64mr, X86::LOCK_ADDREL64mr }
},
{
- X86::LOCK_SUB8mi,
- X86::LOCK_SUB8mr,
- X86::LOCK_SUB16mi8,
- X86::LOCK_SUB16mi,
- X86::LOCK_SUB16mr,
- X86::LOCK_SUB32mi8,
- X86::LOCK_SUB32mi,
- X86::LOCK_SUB32mr,
- X86::LOCK_SUB64mi8,
- X86::LOCK_SUB64mi32,
- X86::LOCK_SUB64mr,
+ { X86::LOCK_SUB8mi, X86::LOCK_SUBACQ8mi, X86::LOCK_SUBREL8mi },
+ { X86::LOCK_SUB8mr, X86::LOCK_SUBACQ8mr, X86::LOCK_SUBREL8mr },
+ { X86::LOCK_SUB16mi8, X86::LOCK_SUBACQ16mi8, X86::LOCK_SUBREL16mi8 },
+ { X86::LOCK_SUB16mi, X86::LOCK_SUBACQ16mi, X86::LOCK_SUBREL16mi },
+ { X86::LOCK_SUB16mr, X86::LOCK_SUBACQ16mr, X86::LOCK_SUBREL16mr },
+ { X86::LOCK_SUB32mi8, X86::LOCK_SUBACQ32mi8, X86::LOCK_SUBREL32mi8 },
+ { X86::LOCK_SUB32mi, X86::LOCK_SUBACQ32mi, X86::LOCK_SUBREL32mi },
+ { X86::LOCK_SUB32mr, X86::LOCK_SUBACQ32mr, X86::LOCK_SUBREL32mr },
+ { X86::LOCK_SUB64mi8, X86::LOCK_SUBACQ64mi8, X86::LOCK_SUBREL64mi8 },
+ { X86::LOCK_SUB64mi32, X86::LOCK_SUBACQ64mi32, X86::LOCK_SUBREL64mi32 },
+ { X86::LOCK_SUB64mr, X86::LOCK_SUBACQ64mr, X86::LOCK_SUBREL64mr }
},
{
- 0,
- X86::LOCK_INC8m,
- 0,
- 0,
- X86::LOCK_INC16m,
- 0,
- 0,
- X86::LOCK_INC32m,
- 0,
- 0,
- X86::LOCK_INC64m,
+ { 0, 0, 0 },
+ { X86::LOCK_INC8m, X86::LOCK_INCACQ8m, X86::LOCK_INCREL8m },
+ { 0, 0, 0 },
+ { 0, 0, 0 },
+ { X86::LOCK_INC16m, X86::LOCK_INCACQ16m, X86::LOCK_INCREL16m },
+ { 0, 0, 0 },
+ { 0, 0, 0 },
+ { X86::LOCK_INC32m, X86::LOCK_INCACQ32m, X86::LOCK_INCREL32m },
+ { 0, 0, 0 },
+ { 0, 0, 0 },
+ { X86::LOCK_INC64m, X86::LOCK_INCACQ64m, X86::LOCK_INCREL64m }
},
{
- 0,
- X86::LOCK_DEC8m,
- 0,
- 0,
- X86::LOCK_DEC16m,
- 0,
- 0,
- X86::LOCK_DEC32m,
- 0,
- 0,
- X86::LOCK_DEC64m,
+ { 0, 0, 0 },
+ { X86::LOCK_DEC8m, X86::LOCK_DECACQ8m, X86::LOCK_DECREL8m },
+ { 0, 0, 0 },
+ { 0, 0, 0 },
+ { X86::LOCK_DEC16m, X86::LOCK_DECACQ16m, X86::LOCK_DECREL16m },
+ { 0, 0, 0 },
+ { 0, 0, 0 },
+ { X86::LOCK_DEC32m, X86::LOCK_DECACQ32m, X86::LOCK_DECREL32m },
+ { 0, 0, 0 },
+ { 0, 0, 0 },
+ { X86::LOCK_DEC64m, X86::LOCK_DECACQ64m, X86::LOCK_DECREL64m }
},
{
- X86::LOCK_OR8mi,
- X86::LOCK_OR8mr,
- X86::LOCK_OR16mi8,
- X86::LOCK_OR16mi,
- X86::LOCK_OR16mr,
- X86::LOCK_OR32mi8,
- X86::LOCK_OR32mi,
- X86::LOCK_OR32mr,
- X86::LOCK_OR64mi8,
- X86::LOCK_OR64mi32,
- X86::LOCK_OR64mr,
+ { X86::LOCK_OR8mi, X86::LOCK_ORACQ8mi, X86::LOCK_ORREL8mi },
+ { X86::LOCK_OR8mr, X86::LOCK_ORACQ8mr, X86::LOCK_ORREL8mr },
+ { X86::LOCK_OR16mi8, X86::LOCK_ORACQ16mi8, X86::LOCK_ORREL16mi8 },
+ { X86::LOCK_OR16mi, X86::LOCK_ORACQ16mi, X86::LOCK_ORREL16mi },
+ { X86::LOCK_OR16mr, X86::LOCK_ORACQ16mr, X86::LOCK_ORREL16mr },
+ { X86::LOCK_OR32mi8, X86::LOCK_ORACQ32mi8, X86::LOCK_ORREL32mi8 },
+ { X86::LOCK_OR32mi, X86::LOCK_ORACQ32mi, X86::LOCK_ORREL32mi },
+ { X86::LOCK_OR32mr, X86::LOCK_ORACQ32mr, X86::LOCK_ORREL32mr },
+ { X86::LOCK_OR64mi8, X86::LOCK_ORACQ64mi8, X86::LOCK_ORREL64mi8 },
+ { X86::LOCK_OR64mi32, X86::LOCK_ORACQ64mi32, X86::LOCK_ORREL64mi32 },
+ { X86::LOCK_OR64mr, X86::LOCK_ORACQ64mr, X86::LOCK_ORREL64mr }
},
{
- X86::LOCK_AND8mi,
- X86::LOCK_AND8mr,
- X86::LOCK_AND16mi8,
- X86::LOCK_AND16mi,
- X86::LOCK_AND16mr,
- X86::LOCK_AND32mi8,
- X86::LOCK_AND32mi,
- X86::LOCK_AND32mr,
- X86::LOCK_AND64mi8,
- X86::LOCK_AND64mi32,
- X86::LOCK_AND64mr,
+ { X86::LOCK_AND8mi, X86::LOCK_ANDACQ8mi, X86::LOCK_ANDREL8mi },
+ { X86::LOCK_AND8mr, X86::LOCK_ANDACQ8mr, X86::LOCK_ANDREL8mr },
+ { X86::LOCK_AND16mi8, X86::LOCK_ANDACQ16mi8, X86::LOCK_ANDREL16mi8 },
+ { X86::LOCK_AND16mi, X86::LOCK_ANDACQ16mi, X86::LOCK_ANDREL16mi },
+ { X86::LOCK_AND16mr, X86::LOCK_ANDACQ16mr, X86::LOCK_ANDREL16mr },
+ { X86::LOCK_AND32mi8, X86::LOCK_ANDACQ32mi8, X86::LOCK_ANDREL32mi8 },
+ { X86::LOCK_AND32mi, X86::LOCK_ANDACQ32mi, X86::LOCK_ANDREL32mi },
+ { X86::LOCK_AND32mr, X86::LOCK_ANDACQ32mr, X86::LOCK_ANDREL32mr },
+ { X86::LOCK_AND64mi8, X86::LOCK_ANDACQ64mi8, X86::LOCK_ANDREL64mi8 },
+ { X86::LOCK_AND64mi32, X86::LOCK_ANDACQ64mi32, X86::LOCK_ANDREL64mi32 },
+ { X86::LOCK_AND64mr, X86::LOCK_ANDACQ64mr, X86::LOCK_ANDREL64mr }
},
{
- X86::LOCK_XOR8mi,
- X86::LOCK_XOR8mr,
- X86::LOCK_XOR16mi8,
- X86::LOCK_XOR16mi,
- X86::LOCK_XOR16mr,
- X86::LOCK_XOR32mi8,
- X86::LOCK_XOR32mi,
- X86::LOCK_XOR32mr,
- X86::LOCK_XOR64mi8,
- X86::LOCK_XOR64mi32,
- X86::LOCK_XOR64mr,
+ { X86::LOCK_XOR8mi, X86::LOCK_XORACQ8mi, X86::LOCK_XORREL8mi },
+ { X86::LOCK_XOR8mr, X86::LOCK_XORACQ8mr, X86::LOCK_XORREL8mr },
+ { X86::LOCK_XOR16mi8, X86::LOCK_XORACQ16mi8, X86::LOCK_XORREL16mi8 },
+ { X86::LOCK_XOR16mi, X86::LOCK_XORACQ16mi, X86::LOCK_XORREL16mi },
+ { X86::LOCK_XOR16mr, X86::LOCK_XORACQ16mr, X86::LOCK_XORREL16mr },
+ { X86::LOCK_XOR32mi8, X86::LOCK_XORACQ32mi8, X86::LOCK_XORREL32mi8 },
+ { X86::LOCK_XOR32mi, X86::LOCK_XORACQ32mi, X86::LOCK_XORREL32mi },
+ { X86::LOCK_XOR32mr, X86::LOCK_XORACQ32mr, X86::LOCK_XORREL32mr },
+ { X86::LOCK_XOR64mi8, X86::LOCK_XORACQ64mi8, X86::LOCK_XORREL64mi8 },
+ { X86::LOCK_XOR64mi32, X86::LOCK_XORACQ64mi32, X86::LOCK_XORREL64mi32 },
+ { X86::LOCK_XOR64mr, X86::LOCK_XORACQ64mr, X86::LOCK_XORREL64mr }
}
};
@@ -1690,6 +1705,17 @@ SDNode *X86DAGToDAGISel::SelectAtomicLoadArith(SDNode *Node, EVT NVT) {
DebugLoc dl = Node->getDebugLoc();
+ unsigned TargetFlags
+ = Subtarget->hasHLE() ? cast<AtomicSDNode>(Node)->getTargetFlags() : 0;
+
+ assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+
+ enum AtomicTargetFlags TFlag = TargetFlagNone;
+ if (TargetFlags & 1)
+ TFlag = TargetFlagXAcquire;
+ if (TargetFlags & 2)
+ TFlag = TargetFlagXRelease;
+
// Optimize common patterns for __sync_or_and_fetch and similar arith
// operations where the result is not used. This allows us to use the "lock"
// version of the arithmetic instruction.
@@ -1718,7 +1744,7 @@ SDNode *X86DAGToDAGISel::SelectAtomicLoadArith(SDNode *Node, EVT NVT) {
Op = ADD;
break;
}
-
+
Val = getAtomicLoadArithTargetConstant(CurDAG, dl, Op, NVT, Val);
bool isUnOp = !Val.getNode();
bool isCN = Val.getNode() && (Val.getOpcode() == ISD::TargetConstant);
@@ -1728,35 +1754,35 @@ SDNode *X86DAGToDAGISel::SelectAtomicLoadArith(SDNode *Node, EVT NVT) {
default: return 0;
case MVT::i8:
if (isCN)
- Opc = AtomicOpcTbl[Op][ConstantI8];
+ Opc = AtomicOpcTbl[Op][ConstantI8][TFlag];
else
- Opc = AtomicOpcTbl[Op][I8];
+ Opc = AtomicOpcTbl[Op][I8][TFlag];
break;
case MVT::i16:
if (isCN) {
if (immSext8(Val.getNode()))
- Opc = AtomicOpcTbl[Op][SextConstantI16];
+ Opc = AtomicOpcTbl[Op][SextConstantI16][TFlag];
else
- Opc = AtomicOpcTbl[Op][ConstantI16];
+ Opc = AtomicOpcTbl[Op][ConstantI16][TFlag];
} else
- Opc = AtomicOpcTbl[Op][I16];
+ Opc = AtomicOpcTbl[Op][I16][TFlag];
break;
case MVT::i32:
if (isCN) {
if (immSext8(Val.getNode()))
- Opc = AtomicOpcTbl[Op][SextConstantI32];
+ Opc = AtomicOpcTbl[Op][SextConstantI32][TFlag];
else
- Opc = AtomicOpcTbl[Op][ConstantI32];
+ Opc = AtomicOpcTbl[Op][ConstantI32][TFlag];
} else
- Opc = AtomicOpcTbl[Op][I32];
+ Opc = AtomicOpcTbl[Op][I32][TFlag];
break;
case MVT::i64:
- Opc = AtomicOpcTbl[Op][I64];
+ Opc = AtomicOpcTbl[Op][I64][TFlag];
if (isCN) {
if (immSext8(Val.getNode()))
- Opc = AtomicOpcTbl[Op][SextConstantI64];
+ Opc = AtomicOpcTbl[Op][SextConstantI64][TFlag];
else if (i64immSExt32(Val.getNode()))
- Opc = AtomicOpcTbl[Op][ConstantI64];
+ Opc = AtomicOpcTbl[Op][ConstantI64][TFlag];
}
break;
}
diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
index d525e3d..423329a 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -12768,12 +12768,21 @@ static MachineBasicBlock *EmitXBegin(MachineInstr *MI, MachineBasicBlock *MBB,
}
// Get CMPXCHG opcode for the specified data type.
-static unsigned getCmpXChgOpcode(EVT VT) {
+static unsigned getCmpXChgOpcode(EVT VT, unsigned TargetFlags) {
+ assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+
+ static unsigned CmpXChgOps[4][3] = {
+ { X86::LCMPXCHG8, X86::LCMPXCHGACQ8, X86::LCMPXCHGREL8 },
+ { X86::LCMPXCHG16, X86::LCMPXCHGACQ16, X86::LCMPXCHGREL16 },
+ { X86::LCMPXCHG32, X86::LCMPXCHGACQ32, X86::LCMPXCHGREL32 },
+ { X86::LCMPXCHG64, X86::LCMPXCHGACQ64, X86::LCMPXCHGREL64 },
+ };
+
switch (VT.getSimpleVT().SimpleTy) {
- case MVT::i8: return X86::LCMPXCHG8;
- case MVT::i16: return X86::LCMPXCHG16;
- case MVT::i32: return X86::LCMPXCHG32;
- case MVT::i64: return X86::LCMPXCHG64;
+ case MVT::i8: return CmpXChgOps[0][TargetFlags];
+ case MVT::i16: return CmpXChgOps[1][TargetFlags];
+ case MVT::i32: return CmpXChgOps[2][TargetFlags];
+ case MVT::i64: return CmpXChgOps[3][TargetFlags];
default:
break;
}
@@ -12916,7 +12925,7 @@ X86TargetLowering::EmitAtomicLoadArith(MachineInstr *MI,
MachineFunction::iterator I = MBB;
++I;
- assert(MI->getNumOperands() <= X86::AddrNumOperands + 4 &&
+ assert(MI->getNumOperands() <= X86::AddrNumOperands + 5 &&
"Unexpected number of operands");
assert(MI->hasOneMemOperand() &&
@@ -12928,6 +12937,7 @@ X86TargetLowering::EmitAtomicLoadArith(MachineInstr *MI,
unsigned DstReg, SrcReg;
unsigned MemOpndSlot;
+ unsigned TargetFlags;
unsigned CurOp = 0;
@@ -12935,12 +12945,13 @@ X86TargetLowering::EmitAtomicLoadArith(MachineInstr *MI,
MemOpndSlot = CurOp;
CurOp += X86::AddrNumOperands;
SrcReg = MI->getOperand(CurOp++).getReg();
+ TargetFlags = MI->getOperand(CurOp++).getImm();
const TargetRegisterClass *RC = MRI.getRegClass(DstReg);
MVT::SimpleValueType VT = *RC->vt_begin();
unsigned AccPhyReg = getX86SubSuperRegister(X86::EAX, VT);
- unsigned LCMPXCHGOpc = getCmpXChgOpcode(VT);
+ unsigned LCMPXCHGOpc = getCmpXChgOpcode(VT, TargetFlags);
unsigned LOADOpc = getLoadOpcode(VT);
// For the atomic load-arith operator, we generate
@@ -13148,7 +13159,7 @@ X86TargetLowering::EmitAtomicLoadArith6432(MachineInstr *MI,
MachineFunction::iterator I = MBB;
++I;
- assert(MI->getNumOperands() <= X86::AddrNumOperands + 7 &&
+ assert(MI->getNumOperands() <= X86::AddrNumOperands + 8 &&
"Unexpected number of operands");
assert(MI->hasOneMemOperand() &&
@@ -13161,6 +13172,7 @@ X86TargetLowering::EmitAtomicLoadArith6432(MachineInstr *MI,
unsigned DstLoReg, DstHiReg;
unsigned SrcLoReg, SrcHiReg;
unsigned MemOpndSlot;
+ unsigned TargetFlags;
unsigned CurOp = 0;
@@ -13170,11 +13182,21 @@ X86TargetLowering::EmitAtomicLoadArith6432(MachineInstr *MI,
CurOp += X86::AddrNumOperands;
SrcLoReg = MI->getOperand(CurOp++).getReg();
SrcHiReg = MI->getOperand(CurOp++).getReg();
+ TargetFlags = MI->getOperand(CurOp++).getImm();
+
+ assert(!(TargetFlags && !Subtarget->hasHLE()) &&
+ "'targetflags' is specified while HLE is disabled.");
+
+ assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+
+ bool IsXAcq = TargetFlags & 1;
+ bool IsXRel = TargetFlags & 2;
const TargetRegisterClass *RC = &X86::GR32RegClass;
const TargetRegisterClass *RC8 = &X86::GR8RegClass;
- unsigned LCMPXCHGOpc = X86::LCMPXCHG8B;
+ unsigned LCMPXCHGOpc = IsXAcq ? X86::LCMPXCHG8BACQ :
+ IsXRel ? X86::LCMPXCHG8BREL : X86::LCMPXCHG8B;
unsigned LOADOpc = X86::MOV32rm;
// For the atomic load-arith operator, we generate
diff --git a/lib/Target/X86/X86InstrCompiler.td b/lib/Target/X86/X86InstrCompiler.td
index f387962..353a4b4 100644
--- a/lib/Target/X86/X86InstrCompiler.td
+++ b/lib/Target/X86/X86InstrCompiler.td
@@ -515,32 +515,38 @@ multiclass PSEUDO_ATOMIC_LOAD_BINOP<string mnemonic> {
let usesCustomInserter = 1, mayLoad = 1, mayStore = 1 in {
let Defs = [EFLAGS, AL] in
def NAME#8 : I<0, Pseudo, (outs GR8:$dst),
- (ins i8mem:$ptr, GR8:$val),
+ (ins i8mem:$ptr, GR8:$val, i8imm:$flags),
!strconcat(mnemonic, "8 PSEUDO!"), []>;
let Defs = [EFLAGS, AX] in
def NAME#16 : I<0, Pseudo,(outs GR16:$dst),
- (ins i16mem:$ptr, GR16:$val),
+ (ins i16mem:$ptr, GR16:$val, i8imm:$flags),
!strconcat(mnemonic, "16 PSEUDO!"), []>;
let Defs = [EFLAGS, EAX] in
def NAME#32 : I<0, Pseudo, (outs GR32:$dst),
- (ins i32mem:$ptr, GR32:$val),
+ (ins i32mem:$ptr, GR32:$val, i8imm:$flags),
!strconcat(mnemonic, "32 PSEUDO!"), []>;
let Defs = [EFLAGS, RAX] in
def NAME#64 : I<0, Pseudo, (outs GR64:$dst),
- (ins i64mem:$ptr, GR64:$val),
+ (ins i64mem:$ptr, GR64:$val, i8imm:$flags),
!strconcat(mnemonic, "64 PSEUDO!"), []>;
}
}
-multiclass PSEUDO_ATOMIC_LOAD_BINOP_PATS<string name, string frag> {
+multiclass PSEUDO_ATOMIC_LOAD_BINOP_PATS_WITH_FLAG<string name, string frag, int flag> {
def : Pat<(!cast<PatFrag>(frag # "_8") addr:$ptr, GR8:$val),
- (!cast<Instruction>(name # "8") addr:$ptr, GR8:$val)>;
+ (!cast<Instruction>(name # "8") addr:$ptr, GR8:$val, flag)>;
def : Pat<(!cast<PatFrag>(frag # "_16") addr:$ptr, GR16:$val),
- (!cast<Instruction>(name # "16") addr:$ptr, GR16:$val)>;
+ (!cast<Instruction>(name # "16") addr:$ptr, GR16:$val, flag)>;
def : Pat<(!cast<PatFrag>(frag # "_32") addr:$ptr, GR32:$val),
- (!cast<Instruction>(name # "32") addr:$ptr, GR32:$val)>;
+ (!cast<Instruction>(name # "32") addr:$ptr, GR32:$val, flag)>;
def : Pat<(!cast<PatFrag>(frag # "_64") addr:$ptr, GR64:$val),
- (!cast<Instruction>(name # "64") addr:$ptr, GR64:$val)>;
+ (!cast<Instruction>(name # "64") addr:$ptr, GR64:$val, flag)>;
+}
+
+multiclass PSEUDO_ATOMIC_LOAD_BINOP_PATS<string name, string frag> {
+ defm : PSEUDO_ATOMIC_LOAD_BINOP_PATS_WITH_FLAG<name, !strconcat(frag, "_none"), 0>;
+ defm : PSEUDO_ATOMIC_LOAD_BINOP_PATS_WITH_FLAG<name, !strconcat(frag, "_xacq"), 1>;
+ defm : PSEUDO_ATOMIC_LOAD_BINOP_PATS_WITH_FLAG<name, !strconcat(frag, "_xrel"), 2>;
}
// Atomic exchange, and, or, xor
@@ -566,7 +572,7 @@ multiclass PSEUDO_ATOMIC_LOAD_BINOP6432<string mnemonic> {
let usesCustomInserter = 1, Defs = [EFLAGS, EAX, EDX],
mayLoad = 1, mayStore = 1, hasSideEffects = 0 in
def NAME#6432 : I<0, Pseudo, (outs GR32:$dst1, GR32:$dst2),
- (ins i64mem:$ptr, GR32:$val1, GR32:$val2),
+ (ins i64mem:$ptr, GR32:$val1, GR32:$val2, i8imm:$flags),
!strconcat(mnemonic, "6432 PSEUDO!"), []>;
}
@@ -685,11 +691,25 @@ def NAME#64mi8 : RIi8<{ImmOpc8{7}, ImmOpc8{6}, ImmOpc8{5}, ImmOpc8{4},
}
-defm LOCK_ADD : LOCK_ArithBinOp<0x00, 0x80, 0x83, MRM0m, "add">;
-defm LOCK_SUB : LOCK_ArithBinOp<0x28, 0x80, 0x83, MRM5m, "sub">;
-defm LOCK_OR : LOCK_ArithBinOp<0x08, 0x80, 0x83, MRM1m, "or">;
-defm LOCK_AND : LOCK_ArithBinOp<0x20, 0x80, 0x83, MRM4m, "and">;
-defm LOCK_XOR : LOCK_ArithBinOp<0x30, 0x80, 0x83, MRM6m, "xor">;
+defm LOCK_ADD : LOCK_ArithBinOp<0x00, 0x80, 0x83, MRM0m, "add">;
+defm LOCK_ADDACQ : LOCK_ArithBinOp<0x00, 0x80, 0x83, MRM0m, "add">, XACQ;
+defm LOCK_ADDREL : LOCK_ArithBinOp<0x00, 0x80, 0x83, MRM0m, "add">, XREL;
+
+defm LOCK_SUB : LOCK_ArithBinOp<0x28, 0x80, 0x83, MRM5m, "sub">;
+defm LOCK_SUBACQ : LOCK_ArithBinOp<0x28, 0x80, 0x83, MRM5m, "sub">, XACQ;
+defm LOCK_SUBREL : LOCK_ArithBinOp<0x28, 0x80, 0x83, MRM5m, "sub">, XREL;
+
+defm LOCK_OR : LOCK_ArithBinOp<0x08, 0x80, 0x83, MRM1m, "or">;
+defm LOCK_ORACQ : LOCK_ArithBinOp<0x08, 0x80, 0x83, MRM1m, "or">, XACQ;
+defm LOCK_ORREL : LOCK_ArithBinOp<0x08, 0x80, 0x83, MRM1m, "or">, XREL;
+
+defm LOCK_AND : LOCK_ArithBinOp<0x20, 0x80, 0x83, MRM4m, "and">;
+defm LOCK_ANDACQ : LOCK_ArithBinOp<0x20, 0x80, 0x83, MRM4m, "and">, XACQ;
+defm LOCK_ANDREL : LOCK_ArithBinOp<0x20, 0x80, 0x83, MRM4m, "and">, XREL;
+
+defm LOCK_XOR : LOCK_ArithBinOp<0x30, 0x80, 0x83, MRM6m, "xor">;
+defm LOCK_XORACQ : LOCK_ArithBinOp<0x30, 0x80, 0x83, MRM6m, "xor">, XACQ;
+defm LOCK_XORREL : LOCK_ArithBinOp<0x30, 0x80, 0x83, MRM6m, "xor">, XREL;
// Optimized codegen when the non-memory output is not used.
multiclass LOCK_ArithUnOp<bits<8> Opc8, bits<8> Opc, Format Form,
@@ -712,7 +732,12 @@ def NAME#64m : RI<Opc, Form, (outs), (ins i64mem:$dst),
}
defm LOCK_INC : LOCK_ArithUnOp<0xFE, 0xFF, MRM0m, "inc">;
+defm LOCK_INCACQ : LOCK_ArithUnOp<0xFE, 0xFF, MRM0m, "inc">, XACQ;
+defm LOCK_INCREL : LOCK_ArithUnOp<0xFE, 0xFF, MRM0m, "inc">, XREL;
+
defm LOCK_DEC : LOCK_ArithUnOp<0xFE, 0xFF, MRM1m, "dec">;
+defm LOCK_DECACQ : LOCK_ArithUnOp<0xFE, 0xFF, MRM1m, "dec">, XACQ;
+defm LOCK_DECREL : LOCK_ArithUnOp<0xFE, 0xFF, MRM1m, "dec">, XREL;
// Atomic compare and swap.
multiclass LCMPXCHG_UnOp<bits<8> Opc, Format Form, string mnemonic,
@@ -749,20 +774,39 @@ let isCodeGenOnly = 1 in {
}
let Defs = [EAX, EDX, EFLAGS], Uses = [EAX, EBX, ECX, EDX] in {
-defm LCMPXCHG8B : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg8b",
- X86cas8, i64mem,
- IIC_CMPX_LOCK_8B>;
+defm LCMPXCHG8B : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg8b",
+ X86cas8_none, i64mem,
+ IIC_CMPX_LOCK_8B>;
+defm LCMPXCHG8BACQ : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg8b",
+ X86cas8_xacq, i64mem,
+ IIC_CMPX_LOCK_8B>, XACQ;
+defm LCMPXCHG8BREL : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg8b",
+ X86cas8_xrel, i64mem,
+ IIC_CMPX_LOCK_8B>, XREL;
}
let Defs = [RAX, RDX, EFLAGS], Uses = [RAX, RBX, RCX, RDX],
Predicates = [HasCmpxchg16b] in {
-defm LCMPXCHG16B : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg16b",
- X86cas16, i128mem,
- IIC_CMPX_LOCK_16B>, REX_W;
+defm LCMPXCHG16B : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg16b",
+ X86cas16_none, i128mem,
+ IIC_CMPX_LOCK_16B>, REX_W;
+defm LCMPXCHG16BACQ : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg16b",
+ X86cas16_xrel, i128mem,
+ IIC_CMPX_LOCK_16B>, REX_W, XACQ;
+defm LCMPXCHG16BREL : LCMPXCHG_UnOp<0xC7, MRM1m, "cmpxchg16b",
+ X86cas16_xrel, i128mem,
+ IIC_CMPX_LOCK_16B>, REX_W, XREL;
}
-defm LCMPXCHG : LCMPXCHG_BinOp<0xB0, 0xB1, MRMDestMem, "cmpxchg",
- X86cas, IIC_CMPX_LOCK_8, IIC_CMPX_LOCK>;
+defm LCMPXCHG : LCMPXCHG_BinOp<0xB0, 0xB1, MRMDestMem, "cmpxchg",
+ X86cas_none, IIC_CMPX_LOCK_8,
+ IIC_CMPX_LOCK>;
+defm LCMPXCHGACQ : LCMPXCHG_BinOp<0xB0, 0xB1, MRMDestMem, "cmpxchg",
+ X86cas_xacq, IIC_CMPX_LOCK_8,
+ IIC_CMPX_LOCK>, XACQ;
+defm LCMPXCHGREL : LCMPXCHG_BinOp<0xB0, 0xB1, MRMDestMem, "cmpxchg",
+ X86cas_xrel, IIC_CMPX_LOCK_8,
+ IIC_CMPX_LOCK>, XREL;
// Atomic exchange and add
multiclass ATOMIC_LOAD_BINOP<bits<8> opc8, bits<8> opc, string mnemonic,
@@ -799,9 +843,15 @@ multiclass ATOMIC_LOAD_BINOP<bits<8> opc8, bits<8> opc, string mnemonic,
}
}
-defm LXADD : ATOMIC_LOAD_BINOP<0xc0, 0xc1, "xadd", "atomic_load_add",
- IIC_XADD_LOCK_MEM8, IIC_XADD_LOCK_MEM>,
- TB, LOCK;
+defm LXADD : ATOMIC_LOAD_BINOP<0xc0, 0xc1, "xadd", "atomic_load_add_none",
+ IIC_XADD_LOCK_MEM8, IIC_XADD_LOCK_MEM>,
+ TB, LOCK;
+defm LXADDACQ : ATOMIC_LOAD_BINOP<0xc0, 0xc1, "xadd", "atomic_load_add_xacq",
+ IIC_XADD_LOCK_MEM8, IIC_XADD_LOCK_MEM>,
+ TB, LOCK, XACQ;
+defm LXADDREL : ATOMIC_LOAD_BINOP<0xc0, 0xc1, "xadd", "atomic_load_add_xrel",
+ IIC_XADD_LOCK_MEM8, IIC_XADD_LOCK_MEM>,
+ TB, LOCK, XREL;
def ACQUIRE_MOV8rm : I<0, Pseudo, (outs GR8 :$dst), (ins i8mem :$src),
"#ACQUIRE_MOV PSEUDO!",
@@ -818,16 +868,36 @@ def ACQUIRE_MOV64rm : I<0, Pseudo, (outs GR64:$dst), (ins i64mem:$src),
def RELEASE_MOV8mr : I<0, Pseudo, (outs), (ins i8mem :$dst, GR8 :$src),
"#RELEASE_MOV PSEUDO!",
- [(atomic_store_8 addr:$dst, GR8 :$src)]>;
+ [(atomic_store_none_8 addr:$dst, GR8 :$src)]>;
def RELEASE_MOV16mr : I<0, Pseudo, (outs), (ins i16mem:$dst, GR16:$src),
"#RELEASE_MOV PSEUDO!",
- [(atomic_store_16 addr:$dst, GR16:$src)]>;
+ [(atomic_store_none_16 addr:$dst, GR16:$src)]>;
def RELEASE_MOV32mr : I<0, Pseudo, (outs), (ins i32mem:$dst, GR32:$src),
"#RELEASE_MOV PSEUDO!",
- [(atomic_store_32 addr:$dst, GR32:$src)]>;
+ [(atomic_store_none_32 addr:$dst, GR32:$src)]>;
def RELEASE_MOV64mr : I<0, Pseudo, (outs), (ins i64mem:$dst, GR64:$src),
"#RELEASE_MOV PSEUDO!",
- [(atomic_store_64 addr:$dst, GR64:$src)]>;
+ [(atomic_store_none_64 addr:$dst, GR64:$src)]>;
+
+multiclass ATOMIC_STORE<bits<8> opc8, bits<8> opc, string mnemonic, string frag> {
+ let isCodeGenOnly = 1 in {
+ def NAME#8mr : I<opc8, MRMDestMem, (outs), (ins i8mem:$dst, GR8:$src),
+ !strconcat(mnemonic, "{b}\t{$src, $dst|$dst, $src}"),
+ [(!cast<PatFrag>(frag # "_8") addr:$dst, GR8:$src)]>;
+ def NAME#16mr : I<opc, MRMDestMem, (outs), (ins i16mem:$dst, GR16:$src),
+ !strconcat(mnemonic, "{w}\t{$src, $dst|$dst, $src}"),
+ [(!cast<PatFrag>(frag # "_16") addr:$dst, GR16:$src)]>,
+ OpSize;
+ def NAME#32mr : I<opc, MRMDestMem, (outs), (ins i32mem:$dst, GR32:$src),
+ !strconcat(mnemonic, "{l}\t{$src, $dst|$dst, $src}"),
+ [(!cast<PatFrag>(frag # "_32") addr:$dst, GR32:$src)]>;
+ def NAME#64mr : RI<opc, MRMDestMem, (outs), (ins i64mem:$dst, GR64:$src),
+ !strconcat(mnemonic, "{q}\t{$src, $dst|$dst, $src}"),
+ [(!cast<PatFrag>(frag # "_64") addr:$dst, GR64:$src)]>;
+ }
+}
+
+defm XRELEASE_MOV : ATOMIC_STORE<0x88, 0x89, "mov", "atomic_store_xrel">, XREL;
//===----------------------------------------------------------------------===//
// Conditional Move Pseudo Instructions.
diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td
index 04d8f19..1d272c1 100644
--- a/lib/Target/X86/X86InstrInfo.td
+++ b/lib/Target/X86/X86InstrInfo.td
@@ -754,6 +754,162 @@ def trunc_su : PatFrag<(ops node:$src), (trunc node:$src), [{
return N->hasOneUse();
}]>;
+// Helper frag for atomic with target flags.
+class ATOMIC_NONE<dag ops, dag frag> : PatFrag<ops, frag, [{
+ unsigned TargetFlags = cast<AtomicSDNode>(N)->getTargetFlags();
+ assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+ return !Subtarget->hasHLE() || (TargetFlags == 0);
+}]>;
+
+class ATOMIC_XACQ<dag ops, dag frag> : PatFrag<ops, frag, [{
+ unsigned TargetFlags = cast<AtomicSDNode>(N)->getTargetFlags();
+ assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+ return Subtarget->hasHLE() && (TargetFlags & 1);
+}]>;
+
+class ATOMIC_XREL<dag ops, dag frag> : PatFrag<ops, frag, [{
+ unsigned TargetFlags = cast<AtomicSDNode>(N)->getTargetFlags();
+ assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+ return Subtarget->hasHLE() && (TargetFlags & 2);
+}]>;
+
+class MEMINTRINSIC_NONE<dag ops, dag frag> : PatFrag<ops, frag, [{
+ unsigned TargetFlags = cast<MemIntrinsicSDNode>(N)->getTargetFlags();
+ assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+ return !Subtarget->hasHLE() || (TargetFlags == 0);
+}]>;
+
+class MEMINTRINSIC_XACQ<dag ops, dag frag> : PatFrag<ops, frag, [{
+ unsigned TargetFlags = cast<MemIntrinsicSDNode>(N)->getTargetFlags();
+ assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+ return Subtarget->hasHLE() && (TargetFlags & 1);
+}]>;
+
+class MEMINTRINSIC_XREL<dag ops, dag frag> : PatFrag<ops, frag, [{
+ unsigned TargetFlags = cast<MemIntrinsicSDNode>(N)->getTargetFlags();
+ assert(((TargetFlags & 3) != 3) && "unknown 'targetflags'");
+ return Subtarget->hasHLE() && (TargetFlags & 2);
+}]>;
+
+multiclass atomic_unop<string frag> {
+ def _none_8 : ATOMIC_NONE<(ops node:$ptr),
+ (!cast<PatFrag>(frag # "_8") node:$ptr)>;
+ def _none_16 : ATOMIC_NONE<(ops node:$ptr),
+ (!cast<PatFrag>(frag # "_16") node:$ptr)>;
+ def _none_32 : ATOMIC_NONE<(ops node:$ptr),
+ (!cast<PatFrag>(frag # "_32") node:$ptr)>;
+ def _none_64 : ATOMIC_NONE<(ops node:$ptr),
+ (!cast<PatFrag>(frag # "_64") node:$ptr)>;
+ def _xacq_8 : ATOMIC_XACQ<(ops node:$ptr),
+ (!cast<PatFrag>(frag # "_8") node:$ptr)>;
+ def _xacq_16 : ATOMIC_XACQ<(ops node:$ptr),
+ (!cast<PatFrag>(frag # "_16") node:$ptr)>;
+ def _xacq_32 : ATOMIC_XACQ<(ops node:$ptr),
+ (!cast<PatFrag>(frag # "_32") node:$ptr)>;
+ def _xacq_64 : ATOMIC_XACQ<(ops node:$ptr),
+ (!cast<PatFrag>(frag # "_64") node:$ptr)>;
+ def _xrel_8 : ATOMIC_XREL<(ops node:$ptr),
+ (!cast<PatFrag>(frag # "_8") node:$ptr)>;
+ def _xrel_16 : ATOMIC_XREL<(ops node:$ptr),
+ (!cast<PatFrag>(frag # "_16") node:$ptr)>;
+ def _xrel_32 : ATOMIC_XREL<(ops node:$ptr),
+ (!cast<PatFrag>(frag # "_32") node:$ptr)>;
+ def _xrel_64 : ATOMIC_XREL<(ops node:$ptr),
+ (!cast<PatFrag>(frag # "_64") node:$ptr)>;
+}
+
+multiclass atomic_binop<string frag> {
+ def _none_8 : ATOMIC_NONE<(ops node:$ptr, node:$val),
+ (!cast<PatFrag>(frag # "_8") node:$ptr, node:$val)>;
+ def _none_16 : ATOMIC_NONE<(ops node:$ptr, node:$val),
+ (!cast<PatFrag>(frag # "_16") node:$ptr, node:$val)>;
+ def _none_32 : ATOMIC_NONE<(ops node:$ptr, node:$val),
+ (!cast<PatFrag>(frag # "_32") node:$ptr, node:$val)>;
+ def _none_64 : ATOMIC_NONE<(ops node:$ptr, node:$val),
+ (!cast<PatFrag>(frag # "_64") node:$ptr, node:$val)>;
+ def _xacq_8 : ATOMIC_XACQ<(ops node:$ptr, node:$val),
+ (!cast<PatFrag>(frag # "_8") node:$ptr, node:$val)>;
+ def _xacq_16 : ATOMIC_XACQ<(ops node:$ptr, node:$val),
+ (!cast<PatFrag>(frag # "_16") node:$ptr, node:$val)>;
+ def _xacq_32 : ATOMIC_XACQ<(ops node:$ptr, node:$val),
+ (!cast<PatFrag>(frag # "_32") node:$ptr, node:$val)>;
+ def _xacq_64 : ATOMIC_XACQ<(ops node:$ptr, node:$val),
+ (!cast<PatFrag>(frag # "_64") node:$ptr, node:$val)>;
+ def _xrel_8 : ATOMIC_XREL<(ops node:$ptr, node:$val),
+ (!cast<PatFrag>(frag # "_8") node:$ptr, node:$val)>;
+ def _xrel_16 : ATOMIC_XREL<(ops node:$ptr, node:$val),
+ (!cast<PatFrag>(frag # "_16") node:$ptr, node:$val)>;
+ def _xrel_32 : ATOMIC_XREL<(ops node:$ptr, node:$val),
+ (!cast<PatFrag>(frag # "_32") node:$ptr, node:$val)>;
+ def _xrel_64 : ATOMIC_XREL<(ops node:$ptr, node:$val),
+ (!cast<PatFrag>(frag # "_64") node:$ptr, node:$val)>;
+}
+
+multiclass atomic_ternop<string frag> {
+ def _none_8 : ATOMIC_NONE<(ops node:$ptr, node:$cmp, node:$swap),
+ (!cast<PatFrag>(frag # "_8") node:$ptr, node:$cmp, node:$swap)>;
+ def _none_16 : ATOMIC_NONE<(ops node:$ptr, node:$cmp, node:$swap),
+ (!cast<PatFrag>(frag # "_16") node:$ptr, node:$cmp, node:$swap)>;
+ def _none_32 : ATOMIC_NONE<(ops node:$ptr, node:$cmp, node:$swap),
+ (!cast<PatFrag>(frag # "_32") node:$ptr, node:$cmp, node:$swap)>;
+ def _none_64 : ATOMIC_NONE<(ops node:$ptr, node:$cmp, node:$swap),
+ (!cast<PatFrag>(frag # "_64") node:$ptr, node:$cmp, node:$swap)>;
+ def _xacq_8 : ATOMIC_XACQ<(ops node:$ptr, node:$cmp, node:$swap),
+ (!cast<PatFrag>(frag # "_8") node:$ptr, node:$cmp, node:$swap)>;
+ def _xacq_16 : ATOMIC_XACQ<(ops node:$ptr, node:$cmp, node:$swap),
+ (!cast<PatFrag>(frag # "_16") node:$ptr, node:$cmp, node:$swap)>;
+ def _xacq_32 : ATOMIC_XACQ<(ops node:$ptr, node:$cmp, node:$swap),
+ (!cast<PatFrag>(frag # "_32") node:$ptr, node:$cmp, node:$swap)>;
+ def _xacq_64 : ATOMIC_XACQ<(ops node:$ptr, node:$cmp, node:$swap),
+ (!cast<PatFrag>(frag # "_64") node:$ptr, node:$cmp, node:$swap)>;
+ def _xrel_8 : ATOMIC_XREL<(ops node:$ptr, node:$cmp, node:$swap),
+ (!cast<PatFrag>(frag # "_8") node:$ptr, node:$cmp, node:$swap)>;
+ def _xrel_16 : ATOMIC_XREL<(ops node:$ptr, node:$cmp, node:$swap),
+ (!cast<PatFrag>(frag # "_16") node:$ptr, node:$cmp, node:$swap)>;
+ def _xrel_32 : ATOMIC_XREL<(ops node:$ptr, node:$cmp, node:$swap),
+ (!cast<PatFrag>(frag # "_32") node:$ptr, node:$cmp, node:$swap)>;
+ def _xrel_64 : ATOMIC_XREL<(ops node:$ptr, node:$cmp, node:$swap),
+ (!cast<PatFrag>(frag # "_64") node:$ptr, node:$cmp, node:$swap)>;
+}
+
+// FIXME: some primitives doesn't support XACQUIRE or XRELEASE: e.g.
+// 'load' cannot be used with neither XACQUIRE or XRELEASE;
+// 'store' can only be used with XRELEASE;
+
+defm atomic_cmp_swap : atomic_ternop<"atomic_cmp_swap">;
+defm atomic_load_add : atomic_binop<"atomic_load_add">;
+defm atomic_swap : atomic_binop<"atomic_swap">;
+defm atomic_load_sub : atomic_binop<"atomic_load_sub">;
+defm atomic_load_and : atomic_binop<"atomic_load_and">;
+defm atomic_load_or : atomic_binop<"atomic_load_or">;
+defm atomic_load_xor : atomic_binop<"atomic_load_xor">;
+defm atomic_load_nand : atomic_binop<"atomic_load_nand">;
+defm atomic_load_min : atomic_binop<"atomic_load_min">;
+defm atomic_load_max : atomic_binop<"atomic_load_max">;
+defm atomic_load_umin : atomic_binop<"atomic_load_umin">;
+defm atomic_load_umax : atomic_binop<"atomic_load_umax">;
+defm atomic_store : atomic_binop<"atomic_store">;
+defm atomic_load : atomic_unop<"atomic_load">;
+
+multiclass memintrinsic_unop<SDNode opnode> {
+ def _none : MEMINTRINSIC_NONE<(ops node:$ptr), (opnode node:$ptr)>;
+ def _xacq : MEMINTRINSIC_XACQ<(ops node:$ptr), (opnode node:$ptr)>;
+ def _xrel : MEMINTRINSIC_XREL<(ops node:$ptr), (opnode node:$ptr)>;
+}
+
+multiclass memintrinsic_ternop<SDNode opnode> {
+ def _none : MEMINTRINSIC_NONE<(ops node:$ptr, node:$val, node:$imm),
+ (opnode node:$ptr, node:$val, node:$imm)>;
+ def _xacq : MEMINTRINSIC_XACQ<(ops node:$ptr, node:$val, node:$imm),
+ (opnode node:$ptr, node:$val, node:$imm)>;
+ def _xrel : MEMINTRINSIC_XREL<(ops node:$ptr, node:$val, node:$imm),
+ (opnode node:$ptr, node:$val, node:$imm)>;
+}
+
+defm X86cas : memintrinsic_ternop<X86cas>;
+defm X86cas8 : memintrinsic_unop<X86cas8>;
+defm X86cas16 : memintrinsic_unop<X86cas16>;
+
//===----------------------------------------------------------------------===//
// Instruction list.
//
@@ -1350,7 +1506,11 @@ multiclass ATOMIC_SWAP<bits<8> opc8, bits<8> opc, string mnemonic, string frag,
}
}
-defm XCHG : ATOMIC_SWAP<0x86, 0x87, "xchg", "atomic_swap", IIC_XCHG_MEM>;
+defm XCHG : ATOMIC_SWAP<0x86, 0x87, "xchg", "atomic_swap_none", IIC_XCHG_MEM>;
+let isCodeGenOnly = 1 in {
+defm XCHGACQ : ATOMIC_SWAP<0x86, 0x87, "xchg", "atomic_swap_xacq", IIC_XCHG_MEM>, XACQ;
+defm XCHGREL : ATOMIC_SWAP<0x86, 0x87, "xchg", "atomic_swap_xrel", IIC_XCHG_MEM>, XREL;
+}
// Swap between registers.
let Constraints = "$val = $dst" in {
diff --git a/test/CodeGen/X86/hle-atomic16.ll b/test/CodeGen/X86/hle-atomic16.ll
new file mode 100644
index 0000000..f6c7374
--- /dev/null
+++ b/test/CodeGen/X86/hle-atomic16.ll
@@ -0,0 +1,188 @@
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=+hle | FileCheck %s --check-prefix X64HLE
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=-hle | FileCheck %s --check-prefix X64NOHLE
+
+@sc16 = external global i16
+
+; 16-bit
+
+define void @atomic_fetch_add16() nounwind {
+; X64HLE: atomic_fetch_add16
+; X64NOHLE: atomic_fetch_add16
+ %t0 = atomicrmw add i16* @sc16, i16 1 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: incw
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: incw
+ %t1 = atomicrmw add i16* @sc16, i16 5 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: xaddw
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: xaddw
+ %t2 = atomicrmw add i16* @sc16, i16 %t1 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: addw
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: addw
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_sub16() nounwind {
+; X64HLE: atomic_fetch_sub16
+; X64NOHLE: atomic_fetch_sub16
+ %t3 = atomicrmw sub i16* @sc16, i16 1 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: decw
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: decw
+ %t4 = atomicrmw sub i16* @sc16, i16 5 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: xaddw
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: xaddw
+ %t5 = atomicrmw sub i16* @sc16, i16 %t4 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: subw
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: subw
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_logic16() nounwind {
+; X64HLE: atomic_fetch_logic16
+; X64NOHLE: atomic_fetch_logic16
+ %t6 = atomicrmw and i16* @sc16, i16 5 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: andw
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: andw
+ %t7 = atomicrmw or i16* @sc16, i16 5 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: orw
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: orw
+ %t8 = atomicrmw xor i16* @sc16, i16 5 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: xorw
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: xorw
+ %t9 = atomicrmw nand i16* @sc16, i16 5 acquire, !targetflags !1
+; X64HLE: andw
+; X64HLE: notw
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: cmpxchgw
+; X64NOHLE: andw
+; X64NOHLE: notw
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: cmpxchgw
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_minmax16() nounwind {
+; X64HLE: atomic_fetch_minmax16
+; X64NOHLE: atomic_fetch_minmax16
+ %t0 = atomicrmw max i16* @sc16, i16 5 acquire, !targetflags !0
+; X64HLE: cmpw
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: cmpxchgw
+; X64NOHLE: cmpw
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: cmpxchgw
+ %t1 = atomicrmw min i16* @sc16, i16 5 acquire, !targetflags !1
+; X64HLE: cmpw
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: cmpxchgw
+; X64NOHLE: cmpw
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: cmpxchgw
+ %t2 = atomicrmw umax i16* @sc16, i16 5 acquire, !targetflags !0
+; X64HLE: cmpw
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: cmpxchgw
+; X64NOHLE: cmpw
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: cmpxchgw
+ %t3 = atomicrmw umin i16* @sc16, i16 5 acquire, !targetflags !1
+; X64HLE: cmpw
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: cmpxchgw
+; X64NOHLE: cmpw
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: cmpxchgw
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_misc16() nounwind {
+; X64HLE: atomic_fetch_misc16
+; X64NOHLE: atomic_fetch_misc16
+ %t4 = cmpxchg i16* @sc16, i16 0, i16 1 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: cmpxchgw
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: cmpxchgw
+ store atomic i16 0, i16* @sc16 release, align 2, !targetflags !1
+; X64HLE-NOT: lock
+; X64HLE: xrelease
+; X64HLE: movw
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: movw
+ %t5 = atomicrmw xchg i16* @sc16, i16 %t4 acquire, !targetflags !0
+; X64HLE-NOT: lock
+; X64HLE: xacquire
+; X64HLE: xchgw
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: xchgw
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+!0 = metadata !{i32 1}
+!1 = metadata !{i32 2}
diff --git a/test/CodeGen/X86/hle-atomic32.ll b/test/CodeGen/X86/hle-atomic32.ll
new file mode 100644
index 0000000..02f4bef
--- /dev/null
+++ b/test/CodeGen/X86/hle-atomic32.ll
@@ -0,0 +1,188 @@
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=+hle | FileCheck %s --check-prefix X64HLE
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=-hle | FileCheck %s --check-prefix X64NOHLE
+
+@sc32 = external global i32
+
+; 32-bit
+
+define void @atomic_fetch_add32() nounwind {
+; X64HLE: atomic_fetch_add32
+; X64NOHLE: atomic_fetch_add32
+ %t0 = atomicrmw add i32* @sc32, i32 1 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: incl
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: incl
+ %t1 = atomicrmw add i32* @sc32, i32 5 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: xaddl
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: xaddl
+ %t2 = atomicrmw add i32* @sc32, i32%t1 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: addl
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: addl
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_sub32() nounwind {
+; X64HLE: atomic_fetch_sub32
+; X64NOHLE: atomic_fetch_sub32
+ %t0 = atomicrmw sub i32* @sc32, i32 1 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: decl
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: decl
+ %t1 = atomicrmw sub i32* @sc32, i32 5 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: xaddl
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: xaddl
+ %t2 = atomicrmw sub i32* @sc32, i32%t1 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: subl
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: subl
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_logic32() nounwind {
+; X64HLE: atomic_fetch_logic32
+; X64NOHLE: atomic_fetch_logic32
+ %t0 = atomicrmw and i32* @sc32, i32 5 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: andl
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: andl
+ %t1 = atomicrmw or i32* @sc32, i32 5 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: orl
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: orl
+ %t2 = atomicrmw xor i32* @sc32, i32 5 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: xorl
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: xorl
+ %t3 = atomicrmw nand i32* @sc32, i32 5 acquire, !targetflags !1
+; X64HLE: andl
+; X64HLE: notl
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: cmpxchgl
+; X64NOHLE: andl
+; X64NOHLE: notl
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: cmpxchgl
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_minmax32() nounwind {
+; X64HLE: atomic_fetch_minmax32
+; X64NOHLE: atomic_fetch_minmax32
+ %t0 = atomicrmw max i32* @sc32, i32 5 acquire, !targetflags !0
+; X64HLE: cmpl
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: cmpxchgl
+; X64NOHLE: cmpl
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: cmpxchgl
+ %t1 = atomicrmw min i32* @sc32, i32 5 acquire, !targetflags !1
+; X64HLE: cmpl
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: cmpxchgl
+; X64NOHLE: cmpl
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: cmpxchgl
+ %t2 = atomicrmw umax i32* @sc32, i32 5 acquire, !targetflags !0
+; X64HLE: cmpl
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: cmpxchgl
+; X64NOHLE: cmpl
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: cmpxchgl
+ %t3 = atomicrmw umin i32* @sc32, i32 5 acquire, !targetflags !1
+; X64HLE: cmpl
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: cmpxchgl
+; X64NOHLE: cmpl
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: cmpxchgl
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_misc32() nounwind {
+; X64HLE: atomic_fetch_misc32
+; X64NOHLE: atomic_fetch_misc32
+ %t0 = cmpxchg i32* @sc32, i32 0, i32 1 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: cmpxchgl
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: cmpxchgl
+ store atomic i32 0, i32* @sc32 release, align 4, !targetflags !1
+; X64HLE-NOT: lock
+; X64HLE: xrelease
+; X64HLE: movl
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: movl
+ %t1 = atomicrmw xchg i32* @sc32, i32 %t0 acquire, !targetflags !0
+; X64HLE-NOT: lock
+; X64HLE: xacquire
+; X64HLE: xchgl
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: xchgl
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+!0 = metadata !{i32 1}
+!1 = metadata !{i32 2}
diff --git a/test/CodeGen/X86/hle-atomic64.ll b/test/CodeGen/X86/hle-atomic64.ll
new file mode 100644
index 0000000..f155aed
--- /dev/null
+++ b/test/CodeGen/X86/hle-atomic64.ll
@@ -0,0 +1,188 @@
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=+hle | FileCheck %s --check-prefix X64HLE
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=-hle | FileCheck %s --check-prefix X64NOHLE
+
+@sc64 = external global i64
+
+; 64-bit
+
+define void @atomic_fetch_add64() nounwind {
+; X64HLE: atomic_fetch_add64
+; X64NOHLE: atomic_fetch_add64
+ %t0 = atomicrmw add i64* @sc64, i64 1 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: incq
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: incq
+ %t1 = atomicrmw add i64* @sc64, i64 5 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: xaddq
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: xaddq
+ %t2 = atomicrmw add i64* @sc64, i64 %t1 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: addq
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: addq
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_sub64() nounwind {
+; X64HLE: atomic_fetch_sub64
+; X64NOHLE: atomic_fetch_sub64
+ %t3 = atomicrmw sub i64* @sc64, i64 1 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: decq
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: decq
+ %t4 = atomicrmw sub i64* @sc64, i64 5 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: xaddq
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: xaddq
+ %t5 = atomicrmw sub i64* @sc64, i64 %t4 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: subq
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: subq
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_logic64() nounwind {
+; X64HLE: atomic_fetch_logic64
+; X64NOHLE: atomic_fetch_logic64
+ %t6 = atomicrmw and i64* @sc64, i64 5 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: andq
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: andq
+ %t7 = atomicrmw or i64* @sc64, i64 5 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: orq
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: orq
+ %t8 = atomicrmw xor i64* @sc64, i64 5 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: xorq
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: xorq
+ %t9 = atomicrmw nand i64* @sc64, i64 5 acquire, !targetflags !1
+; X64HLE: andq
+; X64HLE: notq
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: cmpxchgq
+; X64NOHLE: andq
+; X64NOHLE: notq
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: cmpxchgq
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_minmax64() nounwind {
+; X64HLE: atomic_fetch_minmax64
+; X64NOHLE: atomic_fetch_minmax64
+ %t0 = atomicrmw max i64* @sc64, i64 5 acquire, !targetflags !0
+; X64HLE: cmpq
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: cmpxchgq
+; X64NOHLE: cmpq
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: cmpxchgq
+ %t1 = atomicrmw min i64* @sc64, i64 5 acquire, !targetflags !1
+; X64HLE: cmpq
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: cmpxchgq
+; X64NOHLE: cmpq
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: cmpxchgq
+ %t2 = atomicrmw umax i64* @sc64, i64 5 acquire, !targetflags !0
+; X64HLE: cmpq
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: cmpxchgq
+; X64NOHLE: cmpq
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: cmpxchgq
+ %t3 = atomicrmw umin i64* @sc64, i64 5 acquire, !targetflags !1
+; X64HLE: cmpq
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: cmpxchgq
+; X64NOHLE: cmpq
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: cmpxchgq
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_misc64() nounwind {
+; X64HLE: atomic_fetch_misc64
+; X64NOHLE: atomic_fetch_misc64
+ %t4 = cmpxchg i64* @sc64, i64 0, i64 1 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: cmpxchgq
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: cmpxchgq
+ store atomic i64 0, i64* @sc64 release, align 8, !targetflags !1
+; X64HLE-NOT: lock
+; X64HLE: xrelease
+; X64HLE: movq
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: movq
+ %t5 = atomicrmw xchg i64* @sc64, i64 %t4 acquire, !targetflags !0
+; X64HLE-NOT: lock
+; X64HLE: xacquire
+; X64HLE: xchgq
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: xchgq
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+!0 = metadata !{i32 1}
+!1 = metadata !{i32 2}
diff --git a/test/CodeGen/X86/hle-atomic8.ll b/test/CodeGen/X86/hle-atomic8.ll
new file mode 100644
index 0000000..6631a8e
--- /dev/null
+++ b/test/CodeGen/X86/hle-atomic8.ll
@@ -0,0 +1,188 @@
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=+hle | FileCheck %s --check-prefix X64HLE
+; RUN: llc < %s -O0 -march=x86-64 -mcpu=corei7 -mattr=-hle | FileCheck %s --check-prefix X64NOHLE
+
+@sc8 = external global i8
+
+; 8-bit
+
+define void @atomic_fetch_add8() nounwind {
+; X64HLE: atomic_fetch_add8
+; X64NOHLE: atomic_fetch_add8
+ %t0 = atomicrmw add i8* @sc8, i8 1 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: incb
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: incb
+ %t1 = atomicrmw add i8* @sc8, i8 5 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: xaddb
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: xaddb
+ %t2 = atomicrmw add i8* @sc8, i8 %t1 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: addb
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: addb
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_sub8() nounwind {
+; X64HLE: atomic_fetch_sub8
+; X64NOHLE: atomic_fetch_sub8
+ %t3 = atomicrmw sub i8* @sc8, i8 1 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: decb
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: decb
+ %t4 = atomicrmw sub i8* @sc8, i8 5 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: xaddb
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: xaddb
+ %t5 = atomicrmw sub i8* @sc8, i8 %t4 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: subb
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: subb
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_logic8() nounwind {
+; X64HLE: atomic_fetch_logic8
+; X64NOHLE: atomic_fetch_logic8
+ %t6 = atomicrmw and i8* @sc8, i8 5 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: andb
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: andb
+ %t7 = atomicrmw or i8* @sc8, i8 5 acquire, !targetflags !1
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: orb
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: orb
+ %t8 = atomicrmw xor i8* @sc8, i8 5 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: xorb
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: xorb
+ %t9 = atomicrmw nand i8* @sc8, i8 5 acquire, !targetflags !1
+; X64HLE: andb
+; X64HLE: notb
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: cmpxchgb
+; X64NOHLE: andb
+; X64NOHLE: notb
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: cmpxchgb
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_minmax8() nounwind {
+; X64HLE: atomic_fetch_minmax8
+; X64NOHLE: atomic_fetch_minmax8
+ %t0 = atomicrmw max i8* @sc8, i8 5 acquire, !targetflags !0
+; X64HLE: cmpb
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: cmpxchgb
+; X64NOHLE: cmpb
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: cmpxchgb
+ %t1 = atomicrmw min i8* @sc8, i8 5 acquire, !targetflags !1
+; X64HLE: cmpb
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: cmpxchgb
+; X64NOHLE: cmpb
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: cmpxchgb
+ %t2 = atomicrmw umax i8* @sc8, i8 5 acquire, !targetflags !0
+; X64HLE: cmpb
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: cmpxchgb
+; X64NOHLE: cmpb
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: cmpxchgb
+ %t3 = atomicrmw umin i8* @sc8, i8 5 acquire, !targetflags !1
+; X64HLE: cmpb
+; X64HLE: cmov
+; X64HLE: lock
+; X64HLE-NEXT: xrelease
+; X64HLE: cmpxchgb
+; X64NOHLE: cmpb
+; X64NOHLE: cmov
+; X64NOHLE: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: cmpxchgb
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+define void @atomic_fetch_misc8() nounwind {
+; X64HLE: atomic_fetch_misc8
+; X64NOHLE: atomic_fetch_misc8
+ %t4 = cmpxchg i8* @sc8, i8 0, i8 1 acquire, !targetflags !0
+; X64HLE: lock
+; X64HLE-NEXT: xacquire
+; X64HLE: cmpxchgb
+; X64NOHLE: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: cmpxchgb
+ store atomic i8 0, i8* @sc8 release, align 1, !targetflags !1
+; X64HLE-NOT: lock
+; X64HLE: xrelease
+; X64HLE: movb
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xrelease
+; X64NOHLE: movb
+ %t5 = atomicrmw xchg i8* @sc8, i8 %t4 acquire, !targetflags !0
+; X64HLE-NOT: lock
+; X64HLE: xacquire
+; X64HLE: xchgb
+; X64NOHLE-NOT: lock
+; X64NOHLE-NOT: xacquire
+; X64NOHLE: xchgb
+ ret void
+; X64HLE: ret
+; X64NOHLE: ret
+}
+
+!0 = metadata !{i32 1}
+!1 = metadata !{i32 2}
--
1.7.9.5
["0001-Add-target-flags-support-for-atomic-ops.patch" (0001-Add-target-flags-support-for-atomic-ops.patch)]
From 1ca5090753a8b82b9a9da33a176b89b1145c904b Mon Sep 17 00:00:00 2001
From: Michael Liao <michael.hliao@gmail.com>
Date: Sun, 1 Jul 2012 00:22:15 -0700
Subject: [PATCH 1/2] Add target flags support for atomic ops
---
lib/CodeGen/CGBuiltin.cpp | 40 ++++---
lib/CodeGen/CGExpr.cpp | 44 +++++---
test/CodeGen/atomic-ops-targetflags.c | 193 +++++++++++++++++++++++++++++++++
3 files changed, 251 insertions(+), 26 deletions(-)
create mode 100644 test/CodeGen/atomic-ops-targetflags.c
diff --git a/lib/CodeGen/CGBuiltin.cpp b/lib/CodeGen/CGBuiltin.cpp
index 9e09131..2633e28 100644
--- a/lib/CodeGen/CGBuiltin.cpp
+++ b/lib/CodeGen/CGBuiltin.cpp
@@ -1075,7 +1075,9 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl *FD,
Value *NewVal = Builder.getInt8(1);
Value *Order = EmitScalarExpr(E->getArg(1));
if (isa<llvm::ConstantInt>(Order)) {
- int ord = cast<llvm::ConstantInt>(Order)->getZExtValue();
+ unsigned ord = cast<llvm::ConstantInt>(Order)->getZExtValue();
+ unsigned flags = ord >> 16;
+ ord = ord & 0xFFFF; // Mask off target flags.
AtomicRMWInst *Result = 0;
switch (ord) {
case 0: // memory_order_relaxed
@@ -1107,6 +1109,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl \
*FD, break;
}
Result->setVolatile(Volatile);
+ if (flags) {
+ llvm::MDNode *TargetFlags = llvm::MDNode::get(getLLVMContext(),
+ Builder.getInt32(flags));
+ Result->setMetadata("targetflags", TargetFlags);
+ }
return RValue::get(Builder.CreateIsNotNull(Result, "tobool"));
}
@@ -1124,7 +1131,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl *FD,
llvm::AcquireRelease, llvm::SequentiallyConsistent
};
- Order = Builder.CreateIntCast(Order, Builder.getInt32Ty(), false);
+ Order = Builder.CreateIntCast(Order, Builder.getInt16Ty(), false);
llvm::SwitchInst *SI = Builder.CreateSwitch(Order, BBs[0]);
Builder.SetInsertPoint(ContBB);
@@ -1139,12 +1146,12 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl \
*FD, Builder.CreateBr(ContBB);
}
- SI->addCase(Builder.getInt32(0), BBs[0]);
- SI->addCase(Builder.getInt32(1), BBs[1]);
- SI->addCase(Builder.getInt32(2), BBs[1]);
- SI->addCase(Builder.getInt32(3), BBs[2]);
- SI->addCase(Builder.getInt32(4), BBs[3]);
- SI->addCase(Builder.getInt32(5), BBs[4]);
+ SI->addCase(Builder.getInt16(0), BBs[0]);
+ SI->addCase(Builder.getInt16(1), BBs[1]);
+ SI->addCase(Builder.getInt16(2), BBs[1]);
+ SI->addCase(Builder.getInt16(3), BBs[2]);
+ SI->addCase(Builder.getInt16(4), BBs[3]);
+ SI->addCase(Builder.getInt16(5), BBs[4]);
Builder.SetInsertPoint(ContBB);
return RValue::get(Builder.CreateIsNotNull(Result, "tobool"));
@@ -1161,7 +1168,9 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl *FD,
Value *NewVal = Builder.getInt8(0);
Value *Order = EmitScalarExpr(E->getArg(1));
if (isa<llvm::ConstantInt>(Order)) {
- int ord = cast<llvm::ConstantInt>(Order)->getZExtValue();
+ unsigned ord = cast<llvm::ConstantInt>(Order)->getZExtValue();
+ unsigned flags = ord >> 16;
+ ord = ord & 0xFFFF; // Mask off target flags.
StoreInst *Store = Builder.CreateStore(NewVal, Ptr, Volatile);
Store->setAlignment(1);
switch (ord) {
@@ -1176,6 +1185,11 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl \
*FD, Store->setOrdering(llvm::SequentiallyConsistent);
break;
}
+ if (flags) {
+ llvm::MDNode *TargetFlags = llvm::MDNode::get(getLLVMContext(),
+ Builder.getInt32(flags));
+ Store->setMetadata("targetflags", TargetFlags);
+ }
return RValue::get(0);
}
@@ -1190,7 +1204,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl *FD,
llvm::Monotonic, llvm::Release, llvm::SequentiallyConsistent
};
- Order = Builder.CreateIntCast(Order, Builder.getInt32Ty(), false);
+ Order = Builder.CreateIntCast(Order, Builder.getInt16Ty(), false);
llvm::SwitchInst *SI = Builder.CreateSwitch(Order, BBs[0]);
for (unsigned i = 0; i < 3; ++i) {
@@ -1201,9 +1215,9 @@ RValue CodeGenFunction::EmitBuiltinExpr(const FunctionDecl *FD,
Builder.CreateBr(ContBB);
}
- SI->addCase(Builder.getInt32(0), BBs[0]);
- SI->addCase(Builder.getInt32(3), BBs[1]);
- SI->addCase(Builder.getInt32(5), BBs[2]);
+ SI->addCase(Builder.getInt16(0), BBs[0]);
+ SI->addCase(Builder.getInt16(3), BBs[1]);
+ SI->addCase(Builder.getInt16(5), BBs[2]);
Builder.SetInsertPoint(ContBB);
return RValue::get(0);
diff --git a/lib/CodeGen/CGExpr.cpp b/lib/CodeGen/CGExpr.cpp
index ba400b8..e9853bc 100644
--- a/lib/CodeGen/CGExpr.cpp
+++ b/lib/CodeGen/CGExpr.cpp
@@ -3045,7 +3045,8 @@ EmitPointerToDataMemberBinaryExpr(const BinaryOperator *E) {
static void
EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, llvm::Value *Dest,
llvm::Value *Ptr, llvm::Value *Val1, llvm::Value *Val2,
- uint64_t Size, unsigned Align, llvm::AtomicOrdering Order) {
+ uint64_t Size, unsigned Align, llvm::AtomicOrdering Order,
+ llvm::MDNode *TargetFlags = 0) {
llvm::AtomicRMWInst::BinOp Op = llvm::AtomicRMWInst::Add;
llvm::Instruction::BinaryOps PostOp = (llvm::Instruction::BinaryOps)0;
@@ -3066,6 +3067,8 @@ EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, llvm::Value \
*Dest, llvm::AtomicCmpXchgInst *CXI =
CGF.Builder.CreateAtomicCmpXchg(Ptr, LoadVal1, LoadVal2, Order);
CXI->setVolatile(E->isVolatile());
+ if (TargetFlags)
+ CXI->setMetadata("targetflags", TargetFlags);
llvm::StoreInst *StoreVal1 = CGF.Builder.CreateStore(CXI, Val1);
StoreVal1->setAlignment(Align);
llvm::Value *Cmp = CGF.Builder.CreateICmpEQ(CXI, LoadVal1);
@@ -3080,6 +3083,8 @@ EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, llvm::Value \
*Dest, Load->setAtomic(Order);
Load->setAlignment(Size);
Load->setVolatile(E->isVolatile());
+ if (TargetFlags)
+ Load->setMetadata("targetflags", TargetFlags);
llvm::StoreInst *StoreDest = CGF.Builder.CreateStore(Load, Dest);
StoreDest->setAlignment(Align);
return;
@@ -3095,6 +3100,8 @@ EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, llvm::Value \
*Dest, Store->setAtomic(Order);
Store->setAlignment(Size);
Store->setVolatile(E->isVolatile());
+ if (TargetFlags)
+ Store->setMetadata("targetflags", TargetFlags);
return;
}
@@ -3157,6 +3164,8 @@ EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *E, llvm::Value \
*Dest, llvm::AtomicRMWInst *RMWI =
CGF.Builder.CreateAtomicRMW(Op, Ptr, LoadVal1, Order);
RMWI->setVolatile(E->isVolatile());
+ if (TargetFlags)
+ RMWI->setMetadata("targetflags", TargetFlags);
// For __atomic_*_fetch operations, perform the operation again to
// determine the value which was written.
@@ -3412,34 +3421,40 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E, \
llvm::Value *Dest) { if (Dest && !E->isCmpXChg()) Dest = Builder.CreateBitCast(Dest, \
IPtrTy);
if (isa<llvm::ConstantInt>(Order)) {
- int ord = cast<llvm::ConstantInt>(Order)->getZExtValue();
+ unsigned ord = cast<llvm::ConstantInt>(Order)->getZExtValue();
+ unsigned flags = ord >> 16;
+ ord = ord & 0xFFFF; // Mask off target flags.
+ llvm::MDNode *TargetFlags = 0;
+ if (flags)
+ TargetFlags = llvm::MDNode::get(getLLVMContext(),
+ Builder.getInt32(flags));
switch (ord) {
case 0: // memory_order_relaxed
EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
- llvm::Monotonic);
+ llvm::Monotonic, TargetFlags);
break;
case 1: // memory_order_consume
case 2: // memory_order_acquire
if (IsStore)
break; // Avoid crashing on code with undefined behavior
EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
- llvm::Acquire);
+ llvm::Acquire, TargetFlags);
break;
case 3: // memory_order_release
if (IsLoad)
break; // Avoid crashing on code with undefined behavior
EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
- llvm::Release);
+ llvm::Release, TargetFlags);
break;
case 4: // memory_order_acq_rel
if (IsLoad || IsStore)
break; // Avoid crashing on code with undefined behavior
EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
- llvm::AcquireRelease);
+ llvm::AcquireRelease, TargetFlags);
break;
case 5: // memory_order_seq_cst
EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
- llvm::SequentiallyConsistent);
+ llvm::SequentiallyConsistent, TargetFlags);
break;
default: // invalid order
// We should not ever get here normally, but it's hard to
@@ -3470,7 +3485,10 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E, \
llvm::Value *Dest) { // MonotonicBB is arbitrarily chosen as the default case; in \
practice, this // doesn't matter unless someone is crazy enough to use something \
that // doesn't fold to a constant for the ordering.
- Order = Builder.CreateIntCast(Order, Builder.getInt32Ty(), false);
+ //
+ // Cast to i16 to mask off the target flags. So far, if order cannot be
+ // folded into a constant, target flags are ignored.
+ Order = Builder.CreateIntCast(Order, Builder.getInt16Ty(), false);
llvm::SwitchInst *SI = Builder.CreateSwitch(Order, MonotonicBB);
// Emit all the different atomics
@@ -3483,28 +3501,28 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E, \
llvm::Value *Dest) { EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
llvm::Acquire);
Builder.CreateBr(ContBB);
- SI->addCase(Builder.getInt32(1), AcquireBB);
- SI->addCase(Builder.getInt32(2), AcquireBB);
+ SI->addCase(Builder.getInt16(1), AcquireBB);
+ SI->addCase(Builder.getInt16(2), AcquireBB);
}
if (!IsLoad) {
Builder.SetInsertPoint(ReleaseBB);
EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
llvm::Release);
Builder.CreateBr(ContBB);
- SI->addCase(Builder.getInt32(3), ReleaseBB);
+ SI->addCase(Builder.getInt16(3), ReleaseBB);
}
if (!IsLoad && !IsStore) {
Builder.SetInsertPoint(AcqRelBB);
EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
llvm::AcquireRelease);
Builder.CreateBr(ContBB);
- SI->addCase(Builder.getInt32(4), AcqRelBB);
+ SI->addCase(Builder.getInt16(4), AcqRelBB);
}
Builder.SetInsertPoint(SeqCstBB);
EmitAtomicOp(*this, E, Dest, Ptr, Val1, Val2, Size, Align,
llvm::SequentiallyConsistent);
Builder.CreateBr(ContBB);
- SI->addCase(Builder.getInt32(5), SeqCstBB);
+ SI->addCase(Builder.getInt16(5), SeqCstBB);
// Cleanup and return
Builder.SetInsertPoint(ContBB);
diff --git a/test/CodeGen/atomic-ops-targetflags.c \
b/test/CodeGen/atomic-ops-targetflags.c new file mode 100644
index 0000000..82f211f
--- /dev/null
+++ b/test/CodeGen/atomic-ops-targetflags.c
@@ -0,0 +1,193 @@
+// RUN: %clang_cc1 %s -emit-llvm -o - -triple=i686-apple-darwin9 | FileCheck %s
+
+// Also test serialization of atomic operations here, to avoid duplicating the
+// test.
+// RUN: %clang_cc1 %s -emit-pch -o %t -triple=i686-apple-darwin9
+// RUN: %clang_cc1 %s -include-pch %t -triple=i686-apple-darwin9 -emit-llvm -o - | \
FileCheck %s +#ifndef ALREADY_INCLUDED
+#define ALREADY_INCLUDED
+
+// Basic IRGen tests for __c11_atomic_* and GNU __atomic_*
+
+typedef enum memory_order {
+ memory_order_relaxed, memory_order_consume, memory_order_acquire,
+ memory_order_release, memory_order_acq_rel, memory_order_seq_cst
+} memory_order;
+
+#define TFLAG (1 << 16)
+
+int fi1(_Atomic(int) *i) {
+ // CHECK: @fi1
+ // CHECK: load atomic i32* {{.*}} seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+ return __c11_atomic_load(i, memory_order_seq_cst | TFLAG);
+}
+
+int fi1a(int *i) {
+ // CHECK: @fi1a
+ // CHECK: load atomic i32* {{.*}} seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+ int v;
+ __atomic_load(i, &v, memory_order_seq_cst | TFLAG);
+ return v;
+}
+
+int fi1b(int *i) {
+ // CHECK: @fi1b
+ // CHECK: load atomic i32* {{.*}} seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+ return __atomic_load_n(i, memory_order_seq_cst | TFLAG);
+}
+
+void fi2(_Atomic(int) *i) {
+ // CHECK: @fi2
+ // CHECK: store atomic i32 {{.*}} seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+ __c11_atomic_store(i, 1, memory_order_seq_cst | TFLAG);
+}
+
+void fi2a(int *i) {
+ // CHECK: @fi2a
+ // CHECK: store atomic i32 {{.*}} seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+ int v = 1;
+ __atomic_store(i, &v, memory_order_seq_cst | TFLAG);
+}
+
+void fi2b(int *i) {
+ // CHECK: @fi2b
+ // CHECK: store atomic i32 {{.*}} seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+ __atomic_store_n(i, 1, memory_order_seq_cst | TFLAG);
+}
+
+int fi3(_Atomic(int) *i) {
+ // CHECK: @fi3
+ // CHECK: atomicrmw and {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+ // CHECK-NOT: and
+ return __c11_atomic_fetch_and(i, 1, memory_order_seq_cst | TFLAG);
+}
+
+int fi3a(int *i) {
+ // CHECK: @fi3a
+ // CHECK: atomicrmw xor {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+ // CHECK-NOT: xor
+ return __atomic_fetch_xor(i, 1, memory_order_seq_cst | TFLAG);
+}
+
+int fi3b(int *i) {
+ // CHECK: @fi3b
+ // CHECK: atomicrmw add {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+ // CHECK: add
+ return __atomic_add_fetch(i, 1, memory_order_seq_cst | TFLAG);
+}
+
+int fi3c(int *i) {
+ // CHECK: @fi3c
+ // CHECK: atomicrmw nand {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+ // CHECK-NOT: and
+ return __atomic_fetch_nand(i, 1, memory_order_seq_cst | TFLAG);
+}
+
+int fi3d(int *i) {
+ // CHECK: @fi3d
+ // CHECK: atomicrmw nand {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+ // CHECK: and
+ // CHECK: xor
+ return __atomic_nand_fetch(i, 1, memory_order_seq_cst | TFLAG);
+}
+
+_Bool fi4(_Atomic(int) *i) {
+ // CHECK: @fi4
+ // CHECK: cmpxchg i32* %{{.*}}, {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+ int cmp = 0;
+ return __c11_atomic_compare_exchange_strong(i, &cmp, 1, memory_order_acquire | \
TFLAG, memory_order_acquire); +}
+
+_Bool fi4a(int *i) {
+ // CHECK: @fi4
+ // CHECK: cmpxchg i32* %{{.*}}, {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+ int cmp = 0;
+ int desired = 1;
+ return __atomic_compare_exchange(i, &cmp, &desired, 0, memory_order_acquire | \
TFLAG, memory_order_acquire); +}
+
+_Bool fi4b(int *i) {
+ // CHECK: @fi4
+ // CHECK: cmpxchg i32* %{{.*}}, {{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+ int cmp = 0;
+ return __atomic_compare_exchange_n(i, &cmp, 1, 1, memory_order_acquire | TFLAG, \
memory_order_acquire); +}
+
+float ff1(_Atomic(float) *d) {
+ // CHECK: @ff1
+ // CHECK: load atomic i32* {{.*}} monotonic, {{.*}}, !targetflags !{{[0-9]+}}
+ return __c11_atomic_load(d, memory_order_relaxed | TFLAG);
+}
+
+void ff2(_Atomic(float) *d) {
+ // CHECK: @ff2
+ // CHECK: store atomic i32 {{.*}} release, {{.*}}, !targetflags !{{[0-9]+}}
+ __c11_atomic_store(d, 1, memory_order_release | TFLAG);
+}
+
+float ff3(_Atomic(float) *d) {
+ return __c11_atomic_exchange(d, 2, memory_order_seq_cst | TFLAG);
+}
+
+int* fp1(_Atomic(int*) *p) {
+ // CHECK: @fp1
+ // CHECK: load atomic i32* {{.*}} seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+ return __c11_atomic_load(p, memory_order_seq_cst | TFLAG);
+}
+
+int* fp2(_Atomic(int*) *p) {
+ // CHECK: @fp2
+ // CHECK: store i32 4
+ // CHECK: atomicrmw add {{.*}} monotonic, !targetflags !{{[0-9]+}}
+ return __c11_atomic_fetch_add(p, 1, memory_order_relaxed | TFLAG);
+}
+
+int *fp2a(int **p) {
+ // CHECK: @fp2a
+ // CHECK: store i32 4
+ // CHECK: atomicrmw sub {{.*}} monotonic, !targetflags !{{[0-9]+}}
+ // Note, the GNU builtins do not multiply by sizeof(T)!
+ return __atomic_fetch_sub(p, 4, memory_order_relaxed | TFLAG);
+}
+
+_Complex float fc(_Atomic(_Complex float) *c) {
+ // CHECK: @fc
+ // CHECK: atomicrmw xchg i64* %{{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+ return __c11_atomic_exchange(c, 2, memory_order_seq_cst | TFLAG);
+}
+
+typedef struct X { int x; } X;
+X fs(_Atomic(X) *c) {
+ // CHECK: @fs
+ // CHECK: atomicrmw xchg i32* %{{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+ return __c11_atomic_exchange(c, (X){2}, memory_order_seq_cst | TFLAG);
+}
+
+X fsa(X *c, X *d) {
+ // CHECK: @fsa
+ // CHECK: atomicrmw xchg i32* %{{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+ X ret;
+ __atomic_exchange(c, d, &ret, memory_order_seq_cst | TFLAG);
+ return ret;
+}
+
+_Bool fsb(_Bool *c) {
+ // CHECK: @fsb
+ // CHECK: atomicrmw xchg i8* %{{.*}}, {{.*}}, !targetflags !{{[0-9]+}}
+ return __atomic_exchange_n(c, 1, memory_order_seq_cst | TFLAG);
+}
+
+char flag1;
+volatile char flag2;
+void test_and_set() {
+ // CHECK: atomicrmw xchg i8* @flag1, i8 1 seq_cst, !targetflags !{{[0-9]+}}
+ __atomic_test_and_set(&flag1, memory_order_seq_cst | TFLAG);
+ // CHECK: atomicrmw volatile xchg i8* @flag2, i8 1 acquire, !targetflags \
!{{[0-9]+}} + __atomic_test_and_set(&flag2, memory_order_acquire | TFLAG);
+ // CHECK: store atomic volatile i8 0, i8* @flag2 release, {{.*}}, !targetflags \
!{{[0-9]+}} + __atomic_clear(&flag2, memory_order_release | TFLAG);
+ // CHECK: store atomic i8 0, i8* @flag1 seq_cst, {{.*}}, !targetflags !{{[0-9]+}}
+ __atomic_clear(&flag1, memory_order_seq_cst | TFLAG);
+}
+
+#endif
--
1.7.9.5
["0002-Add-mhle-option-support-and-populate-pre-defined-mac.patch" (0002-Add-mhle-option-support-and-populate-pre-defined-mac.patch)]
From df54e5f8e988b4fca5cd06ac9e7ea608086efbc0 Mon Sep 17 00:00:00 2001
From: Michael Liao <michael.hliao@gmail.com>
Date: Sun, 8 Jul 2012 14:07:19 -0700
Subject: [PATCH 2/2] Add '-mhle' option support and populate pre-defined
macros
- 3 pre-defined macros are added if HLE is turned on
* __HLE__
* __ATOMIC_HLE_ACQUIRE__
* __ATOMIC_HLE_RELEASE__
---
include/clang/Driver/Options.td | 2 ++
lib/Basic/Targets.cpp | 23 +++++++++++++++++++++--
test/Preprocessor/predefined-arch-macros.c | 6 ++++++
3 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/include/clang/Driver/Options.td b/include/clang/Driver/Options.td
index cafd7d7..47fd862 100644
--- a/include/clang/Driver/Options.td
+++ b/include/clang/Driver/Options.td
@@ -885,6 +885,7 @@ def mno_fma : Flag<["-"], "mno-fma">, Group<m_x86_Features_Group>;
def mno_xop : Flag<["-"], "mno-xop">, Group<m_x86_Features_Group>;
def mno_f16c : Flag<["-"], "mno-f16c">, Group<m_x86_Features_Group>;
def mno_rtm : Flag<["-"], "mno-rtm">, Group<m_x86_Features_Group>;
+def mno_hle : Flag<["-"], "mno-hle">, Group<m_x86_Features_Group>;
def mno_thumb : Flag<["-"], "mno-thumb">, Group<m_Group>;
def marm : Flag<["-"], "marm">, Alias<mno_thumb>;
@@ -928,6 +929,7 @@ def mfma : Flag<["-"], "mfma">, Group<m_x86_Features_Group>;
def mxop : Flag<["-"], "mxop">, Group<m_x86_Features_Group>;
def mf16c : Flag<["-"], "mf16c">, Group<m_x86_Features_Group>;
def mrtm : Flag<["-"], "mrtm">, Group<m_x86_Features_Group>;
+def mhle : Flag<["-"], "mhle">, Group<m_x86_Features_Group>;
def mips16 : Flag<["-"], "mips16">, Group<m_Group>;
def mno_mips16 : Flag<["-"], "mno-mips16">, Group<m_Group>;
def mxgot : Flag<["-"], "mxgot">, Group<m_Group>;
diff --git a/lib/Basic/Targets.cpp b/lib/Basic/Targets.cpp
index eaf2e7d..70d10e7 100644
--- a/lib/Basic/Targets.cpp
+++ b/lib/Basic/Targets.cpp
@@ -1608,6 +1608,7 @@ class X86TargetInfo : public TargetInfo {
bool HasBMI2;
bool HasPOPCNT;
bool HasRTM;
+ bool HasHLE;
bool HasSSE4a;
bool HasFMA4;
bool HasFMA;
@@ -1759,8 +1760,8 @@ public:
: TargetInfo(triple), SSELevel(NoSSE), MMX3DNowLevel(NoMMX3DNow),
HasAES(false), HasPCLMUL(false), HasLZCNT(false), HasRDRND(false),
HasBMI(false), HasBMI2(false), HasPOPCNT(false), HasRTM(false),
- HasSSE4a(false), HasFMA4(false), HasFMA(false), HasXOP(false),
- HasF16C(false), CPU(CK_Generic) {
+ HasHLE(false), HasSSE4a(false), HasFMA4(false), HasFMA(false),
+ HasXOP(false), HasF16C(false), CPU(CK_Generic) {
BigEndian = false;
LongDoubleFormat = &llvm::APFloat::x87DoubleExtended;
}
@@ -1966,6 +1967,7 @@ void X86TargetInfo::getDefaultFeatures(llvm::StringMap<bool> &Features) const {
Features["bmi2"] = false;
Features["popcnt"] = false;
Features["rtm"] = false;
+ Features["hle"] = false;
Features["fma4"] = false;
Features["fma"] = false;
Features["xop"] = false;
@@ -2039,6 +2041,7 @@ void X86TargetInfo::getDefaultFeatures(llvm::StringMap<bool> &Features) const {
setFeatureEnabled(Features, "bmi", true);
setFeatureEnabled(Features, "bmi2", true);
setFeatureEnabled(Features, "rtm", true);
+ setFeatureEnabled(Features, "hle", true);
setFeatureEnabled(Features, "fma", true);
break;
case CK_K6:
@@ -2188,6 +2191,8 @@ bool X86TargetInfo::setFeatureEnabled(llvm::StringMap<bool> &Features,
Features["f16c"] = true;
else if (Name == "rtm")
Features["rtm"] = true;
+ else if (Name == "hle")
+ Features["hle"] = true;
} else {
if (Name == "mmx")
Features["mmx"] = Features["3dnow"] = Features["3dnowa"] = false;
@@ -2252,6 +2257,8 @@ bool X86TargetInfo::setFeatureEnabled(llvm::StringMap<bool> &Features,
Features["f16c"] = false;
else if (Name == "rtm")
Features["rtm"] = false;
+ else if (Name == "hle")
+ Features["hle"] = false;
}
return true;
@@ -2308,6 +2315,11 @@ void X86TargetInfo::HandleTargetFeatures(std::vector<std::string> &Features) {
continue;
}
+ if (Feature == "hle") {
+ HasHLE = true;
+ continue;
+ }
+
if (Feature == "sse4a") {
HasSSE4a = true;
continue;
@@ -2532,6 +2544,12 @@ void X86TargetInfo::getTargetDefines(const LangOptions &Opts,
if (HasRTM)
Builder.defineMacro("__RTM__");
+ if (HasHLE) {
+ Builder.defineMacro("__HLE__");
+ Builder.defineMacro("__ATOMIC_HLE_ACQUIRE", Twine(1U << 16));
+ Builder.defineMacro("__ATOMIC_HLE_RELEASE", Twine(2U << 16));
+ }
+
if (HasSSE4a)
Builder.defineMacro("__SSE4A__");
@@ -2620,6 +2638,7 @@ bool X86TargetInfo::hasFeature(StringRef Feature) const {
.Case("pclmul", HasPCLMUL)
.Case("popcnt", HasPOPCNT)
.Case("rtm", HasRTM)
+ .Case("hle", HasHLE)
.Case("sse", SSELevel >= SSE1)
.Case("sse2", SSELevel >= SSE2)
.Case("sse3", SSELevel >= SSE3)
diff --git a/test/Preprocessor/predefined-arch-macros.c b/test/Preprocessor/predefined-arch-macros.c
index 680f39a..4303735 100644
--- a/test/Preprocessor/predefined-arch-macros.c
+++ b/test/Preprocessor/predefined-arch-macros.c
@@ -509,11 +509,14 @@
// RUN: -target i386-unknown-linux \
// RUN: | FileCheck %s -check-prefix=CHECK_CORE_AVX2_M32
// CHECK_CORE_AVX2_M32: #define __AES__ 1
+// CHECK_CORE_AVX2_M32: #define __ATOMIC_HLE_ACQUIRE 65536
+// CHECK_CORE_AVX2_M32: #define __ATOMIC_HLE_RELEASE 131072
// CHECK_CORE_AVX2_M32: #define __AVX__ 1
// CHECK_CORE_AVX2_M32: #define __BMI2__ 1
// CHECK_CORE_AVX2_M32: #define __BMI__ 1
// CHECK_CORE_AVX2_M32: #define __F16C__ 1
// CHECK_CORE_AVX2_M32: #define __FMA__ 1
+// CHECK_CORE_AVX2_M32: #define __HLE__ 1
// CHECK_CORE_AVX2_M32: #define __LZCNT__ 1
// CHECK_CORE_AVX2_M32: #define __MMX__ 1
// CHECK_CORE_AVX2_M32: #define __PCLMUL__ 1
@@ -536,11 +539,14 @@
// RUN: -target i386-unknown-linux \
// RUN: | FileCheck %s -check-prefix=CHECK_CORE_AVX2_M64
// CHECK_CORE_AVX2_M64: #define __AES__ 1
+// CHECK_CORE_AVX2_M64: #define __ATOMIC_HLE_ACQUIRE 65536
+// CHECK_CORE_AVX2_M64: #define __ATOMIC_HLE_RELEASE 131072
// CHECK_CORE_AVX2_M64: #define __AVX__ 1
// CHECK_CORE_AVX2_M64: #define __BMI2__ 1
// CHECK_CORE_AVX2_M64: #define __BMI__ 1
// CHECK_CORE_AVX2_M64: #define __F16C__ 1
// CHECK_CORE_AVX2_M64: #define __FMA__ 1
+// CHECK_CORE_AVX2_M64: #define __HLE__ 1
// CHECK_CORE_AVX2_M64: #define __LZCNT__ 1
// CHECK_CORE_AVX2_M64: #define __MMX__ 1
// CHECK_CORE_AVX2_M64: #define __PCLMUL__ 1
--
1.7.9.5
_______________________________________________
cfe-dev mailing list
cfe-dev@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic