Introduction
The OpenHMPP directive-based programming model offers a syntax to offload computations on hardware accelerators and to optimize data movement to/from the hardware memory. The model is based on works initialized by CAPS (Compiler and Architecture for Embedded and Superscalar Processors), a common project fromOpenHMPP concept
OpenHMPP is based on the concept of codelets, functions that can be remotely executed on HWAs.The OpenHMPP codelet concept
A codelet has the following properties: # It is aCodelet RPCs
HMPP provides synchronous and asynchronous RPC. Implementation of asynchronous operation is hardware dependent.HMPP Memory Model
HMPP considers two address spaces: the host processor one and the HWA memory.Directives concept
The OpenHMPP directives may be seen as “meta-information” added in the application source code. They are safe meta-information i.e. they do not change the original code behavior. They address the remote execution (RPC) of a function as well as the transfers of data to/from the HWA memory. The table below introduces the OpenHMPP directives. OpenHMPP directives address different needs: some of them are dedicated to declarations and others are dedicated to the management of the execution.Concept of set of directives
One of the fundamental points of the HMPP approach is the concept of directives and their associated labels which makes it possible to expose a coherent structure on a whole set of directives disseminated in an application. There are two kinds of labels: * One associated to a codelet. In general, the directives carrying this kind of labels are limited to the management of only one codelet (called stand-alone codelet in the remainder of the document to distinguish it from the group of codelets). * One associated to a group of codelets. These labels are noted as follow: “OpenHMPP Directives Syntax
In order to simplify the notations,General syntax
The general syntax of OpenHMPP directives is: * For C language: #pragma hmpp <grp_label> odelet_label directive_type ,
: is a unique identifier naming a group of codelets. In cases where no groups are defined in the application, this label can simply miss. Legal label name must follow this grammar: -z,A-Z,_a-z,A-Z,0-9,_]*. Note that the “< >” characters belong to the syntax and are mandatory for this kind of label.
*codelet_label
: is a unique identifier naming a codelet. Legal label name must follow this grammar: -z,A-Z,_a-z,A-Z,0-9,_]*
*directive
: is the name of the directive;
*directive_parameters
: designates some parameters associated to the directive. These parameters may be of different kinds and specify either some arguments given to the directive either a mode of execution (asynchronous versus synchronous for example);
* /code>: is a character used to continue the directive on the next line (same for C and FORTRAN).
Directive parameters
The parameters associated to a directive may be of different types.
Below are the directive parameters defined in OpenHMPP:
* version = major.minormicro
Micro may refer to:
Measurement
* micro- (μ), a metric prefix denoting a factor of 10−6
Places
* Micro, North Carolina, town in U.S.
People
* DJ Micro, (born Michael Marsicano) an American trance DJ and producer
*Chii Tomiya (都宮 ちい ...
/code>: specifies the version of the HMPP directives to be considered by the preprocessor.
* args rg_itemssize=
: specifies the size of a non scalar parameter (an array).
* args rg_itemsio= out, inout/code>: indicates that the specified function arguments are either input, output or both. By default, unqualified arguments are inputs.
* cond = "expr"
: specifies an execution condition as a boolean C or Fortran expression that needs to be true in order to start the execution of the group or codelets.
* target=target_name target_name
: specifies which targets to try to use in the given order.
* asynchronous
: specifies that the codelet execution is not blocking (default is synchronous).
* args ">arg_items>advancedload=true
: indicates that the specified parameters are preloaded. Only in or inout parameters can be preloaded.
* args rg_itemsnoupdate=true
: this property specifies that the data is already available on the HWA and so that no transfer is needed. When this property is set, no transfer is done on the considered argument
* args ">arg_items>addr=""
:
is an expression that gives the address of the data to upload.
* args ">arg_items>const=true
: indicates that the argument is to be uploaded only once.
OpenHMPP directives
Directives for declaring and executing a codelet
A codelet
directive declares a computation to be remotely executed on a hardware accelerator.
For the codelet
directive:
*The codelet label is mandatory and must be unique in the application
*The group label is not required if no group is defined.
*The codelet directive is inserted just before the function declaration.
The syntax of the directive is:
#pragma hmpp <grp_label> codelet_label codelet
,_version_=_major.minor ,_version_=_major.minor[.micro">span_style="color:#339933;">,_version_=_major.minor[.micro.html" ;"title="micro.html" ;"title="span style="color:#339933;">, version = major.minor[.micro">span style="color:#339933;">, version = major.minor[.micro">micro.html" ;"title="span style="color:#339933;">, version = major.minor[.micro">span style="color:#339933;">, version = major.minor[.micro
,_args_arg_items.html"_;"title="/span>arg_items">/span>arg_items io=in.html" ;"title="span style="color:#339933;">, args .html" ;"title="/span>arg_items">/span>arg_itemsio=in">out, inout*
[, args .html" ;"title="/span>arg_items">/span>arg_itemssize=]*
[, args .html" ;"title="/span>arg_items">/span>arg_itemsconst=true]*
,_cond_=_"expr".html" ;"title="span style="color:#339933;">, cond = "expr"">span style="color:#339933;">, cond = "expr" ,_target=target_name[:target_name.html" ;"title="span style="color:#339933;">, target=target_name target_name">span_style="color:#339933;">,_target=target_name[:target_name .html" ;"title="target_name">span style="color:#339933;">, target=target_name[:target_name ">target_name">span style="color:#339933;">, target=target_name[:target_name
More than one codelet directive can be added to a function in order to specify different uses or different execution contexts. However, there can be only one codelet directive for a given call site label.
The callsite
directive specifies how the use a codelet at a given point in the program.
The syntax of the directive is:
#pragma hmpp <grp_label> codelet_label callsite
,_asynchronous.html" ;"title="span style="color:#339933;">, asynchronous">span style="color:#339933;">, asynchronous
[, args .html" ;"title="/span>arg_items">/span>arg_itemssize=]*
[, args .html" ;"title="/span>arg_items">/span>arg_itemsadvancedload= true.html" ;"title="true">false*
[, args .html" ;"title="/span>arg_items">/span>arg_itemsaddr="expr"]*
[, args .html" ;"title="/span>arg_items">/span>arg_itemsnoupdate=true]*
An example is shown here :
/* declaration of the codelet */
#pragma hmpp simple1 codelet, args utvio=inout, target=CUDA
static void matvec(int sn, int sm, float inv m float inm nm float *outv){
int i, j;
for (i = 0 ; i < sm ; i++) {
float temp = outv
for (j = 0 ; j < sn ; j++) {
temp += inv * inm j];
}
outv = temp;
}
int main(int argc, char **argv) {
int n;
........
/* codelet use */
#pragma hmpp simple1 callsite, args utvsize={n}
matvec(n, m, myinc, inm, myoutv);
........
}
In some cases, a specific management of the data throughout the application is required (CPU/GPU data movements optimization, shared variables...).
The group
directive allows the declaration of a group of codelets. The parameters defined in this directive are applied to all codelets belonging to the group.
The syntax of the directive is:
#pragma hmpp <grp_label> group
,_version_=_. ,_version_=_.[.">span_style="color:#339933;">,_version_=_.[. .html" ;"title=".html" ;"title="span style="color:#339933;">, version = .[.">span style="color:#339933;">, version = .[. ">.html" ;"title="span style="color:#339933;">, version = .[.">span style="color:#339933;">, version = .[.
[, target = target_name[:target_name]*?
[, cond = “expr”]?
Data transfers directives to optimize communication overhead
When using a HWA, the main bottleneck is often the data transfers between the HWA and the main processor.
To limit the communication overhead, data transfers can be overlapped with successive executions of the same codelet by using the asynchronous property of the HWA.
* allocate directive
The allocate
directive locks the HWA and allocates the needed amount of memory.
#pragma hmpp <grp_label> allocate [,args .html" ;"title="/span>arg_items">/span>arg_itemssize={dimsize[,dimsize]*}]*
* release directive
The release
directive specifies when to release the HWA for a group or a stand-alone codelet.
#pragma hmpp <grp_label> release
* advancedload directive
The advancedload
directive prefetches data before the remote execution of the codelet.
#pragma hmpp <grp_label> odelet_label advancedload
,args .html" ;"title="/span>arg_items">/span>arg_items/span>
[,args .html" ;"title="/span>arg_items">/span>arg_itemssize={dimsize[,dimsize]*}]*
[,args .html" ;"title="/span>arg_items">/span>arg_itemsaddr="expr"]*
[,args .html" ;"title="/span>arg_items">/span>arg_itemssection={ subscript_triplet,.html" ;"title="span style="color:#990000;">subscript_triplet,">span style="color:#990000;">subscript_triplet,}]*
[,asynchronous]
* delegatedstore directive
The delegatedstore
directive is a synchronization barrier to wait for an asynchronous codelet execution to complete and to then download the results.
#pragma hmpp <grp_label> odelet_label delegatedstore
,args .html" ;"title="/span>arg_items">/span>arg_items/span>
[,args .html" ;"title="/span>arg_items">/span>arg_itemsaddr="expr"]*
[,args .html" ;"title="/span>arg_items">/span>arg_itemssection={ subscript_triplet,.html" ;"title="span style="color:#990000;">subscript_triplet,">span style="color:#990000;">subscript_triplet,}]*
* Asynchronous Computations
The synchronize
directive specifies to wait until the completion of an asynchronous callsite execution.
For the synchronize directive, the codelet label is always mandatory and the group label is required if the codelet belongs to a group.
#pragma hmpp <grp_label> codelet_label synchronize
* Example
In the following example, the device initialization, memory allocation and upload of the input data are done only once outside the loop and not in each iteration of the loop.
The synchronize
directive allows to wait for the asynchronous execution of the codelet to complete before launching another iteration. Finally the delegatedstore
directive outside the loop uploads the sgemm result.
int main(int argc, char **argv) {
#pragma hmpp sgemm allocate, args in1;vin2;voutsize={size,size}
#pragma hmpp sgemm advancedload, args in1;vin2;vout args ,n,k,alpha,beta
for ( j = 0 ; j < 2 ; j ++) {
#pragma hmpp sgemm callsite, asynchronous, args in1;vin2;voutadvancedload=true, args ,n,k,alpha,betaadvancedload=true
sgemm (size, size, size, alpha, vin1, vin2, beta, vout);
#pragma hmpp sgemm synchronize
}
#pragma hmpp sgemm delegatedstore, args out #pragma hmpp sgemm release
Sharing data between codelets
Those directives map together all the arguments sharing the given name for all the group.
The types and dimensions of all mapped arguments must be identical.
The map
directive maps several arguments on the device.
#pragma hmpp <grp_label> map, args .html" ;"title="/span>arg_items">/span>arg_items/span>
This directive is quite similar as the map
directive except that the arguments to be mapped are directly specified by their name. The mapbyname
directive is equivalent to multiple map
directives.
#pragma hmpp <grp_label> mapbyname ,variableName.html" ;"title="span style="color:#339933;">,variableName">span style="color:#339933;">,variableName
Global variable
The resident
directive declares some variables as global within a group. Those variables can then be directly accessed from any codelet belonging to the group.
This directive applies to the declaration statement just following it in the source code.
The syntax of this directive is:
#pragma hmpp <grp_label> resident
,_args[::var_name.html" ;"title="span style="color:#339933;">, args ">span_style="color:#339933;">,_args[::var_name io=in.html" ;"title=":var_name">span style="color:#339933;">, args[::var_name io=in">out, inout*
,_args[::var_name.html" ;"title="span style="color:#339933;">, args[::var_name">span style="color:#339933;">, args[::var_name size={dimsize[,dimsize]*}]*
,_args[::var_name.html" ;"title="span style="color:#339933;">, args ">span_style="color:#339933;">,_args[::var_name addr="expr".html" ;"title=":var_name">span style="color:#339933;">, args[::var_name addr="expr"">:var_name">span style="color:#339933;">, args[::var_nameaddr="expr"
,_args[::var_name.html" ;"title="span style="color:#339933;">, args[::var_name">span style="color:#339933;">, args[::var_name const=true]*
The notation ::var_name
with the prefix ::
, indicates an application's variable declared as resident.
Acceleration of regions
A region is a merge of the codelet/callsite directives. The goal is to avoid code restructuration to build the codelet. Therefore, all the attributes available for codelet
or callsite
directives can be used on regions
directives.
In C language:
#pragma hmpp >.html" ;"title="MyGroup>">MyGroup> .html" ;"title="/span>label">/span>labelregion
,_args_arg_items.html"_;"title="/span>arg_items">/span>arg_items io= in.html"_;"title="span_style="color:#339933;">,_args_arg_items.html"_;"title="/span>arg_items">/span>arg_items io=in">out.html" ;"title="in.html" ;"title="span style="color:#339933;">, args .html" ;"title="/span>arg_items">/span>arg_itemsio=in">out">inout*
[, cond = "expr"/span><
[, args .html" ;"title="/span>arg_items">/span>arg_itemsconst=true]*
[, target=target_name[:target_name]*]
[, args .html" ;"title="/span>arg_items">/span>arg_itemssize={dimsize[,dimsize]*}]*
[, args .html" ;"title="/span>arg_items">/span>arg_itemsadvancedload= true.html" ;"title="true">false*
[, args .html" ;"title="/span>arg_items">/span>arg_itemsaddr="expr"]*
[, args .html" ;"title="/span>arg_items">/span>arg_itemsnoupdate=true]*
,_asynchronous.html" ;"title="span style="color:#339933;">, asynchronous">span style="color:#339933;">, asynchronous
[, private= .html" ;"title="/span>arg_items">/span>arg_items/span>]*
{
C BLOCK STATEMENTS
}
Implementations
The OpenHMPP Open Standard is based on HMPP Version 2.3 (May 2009, CAPS entreprise).
The OpenHMPP directive-based programming model is implemented in:
* CAPS Compilers, CAPS Entreprise compilers for hybrid computing
* PathScale ENZO Compiler Suite (support the NVIDIA GPUs)
OpenHMPP is used by HPC actors in Oil & Gas, Energy, Manufacturing, Finance, Education & Research.
See also
* GPGPU
General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
* Parallel computing
Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different fo ...
* OpenACC
OpenACC (for ''open accelerators'') is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/GPU systems.
As in OpenMP, the programme ...
* OpenCL
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
References
{{Parallel computing
Application programming interfaces
C programming language family
Fortran
Parallel computing