SDS Pragmas
Optimizations in SDSoC
This section describes pragmas for theSDSoC™system compilers,sdscc
andsds++
to assist system optimization.
TheSDSoCenvironment system compilers target a base platform and invoke theVivado®High-Level Synthesis (HLS) tool to compile synthesizeable C/C++ functions into programmable logic. Using theSDSoCIDE, or sdscc/sds++ command line options, you select functions from your source program to run in hardware, specify accelerator and system clocks, and set properties on data transfers.
In theSDSoCenvironment, you control the system generation process by structuring hardware functions and calls to hardware functions to balance communication and computation, and by inserting pragmas into your source code to guide the system compiler. TheSDSoCcompiler automatically chooses the best possible system port to use for any data transfer, but allows you to override this selection by using pragmas. You can also specify pragmas to select different data movers for your hardware function arguments, and use pragmas to control the number of data elements that are transferred to/from the hardware function.
All pragmas specific to theSDSoCenvironment are prefixed with#pragma SDS
and should be inserted into C/C++ source code, either immediately prior to a function declaration or at a function call site for optimization of a specific function call.
#pragma SDS data access_pattern(in_a:SEQENTIAL, out_b:SEQUENTIAL) void f1(int in_a[20], int out_b[20]);
The SDS pragmas include the types specified below:
Type | Pragmas |
---|---|
Data Access Patterns | |
Data Transfer Size | |
Memory Attributes | |
Data Mover Type | |
SDSoCPlatform Interfaces to External Memory | |
Hardware Buffer Depth | |
Asynchronous Function Execution | |
Specifying Resource Binding | |
Hardware/Software Tracing |
pragma SDS async
Description
The ASYNC pragma must be paired with theWAITpragma to support manual control of the hardware function synchronization.
The ASYNC pragma is specified immediately preceding a call to a hardware function, directing the compiler not to automatically generate the wait based on data flow analysis. The WAIT pragma must be inserted at an appropriate point in the program to direct the CPU to wait until the associatedASYNC
function call with the same ID has completed.
In the presence of an ASYNC pragma, theSDSoCsystem compiler does not generate ansds_wait()
in the stub function for the associated call. The program must contain the matchingsds_wait(ID)
or#pragma SDS wait(ID)
at an appropriate point to synchronize the controlling thread running on the CPU with the hardware function thread. An advantage of using the#pragma SDS wait(ID)
over thesds_wait(ID)
function call is that the source code can then be compiled by compilers other than theSDSoCcompiler, likegcc
, that does not interpret either ASYNC, or WAIT pragmas.
Syntax
Place the pragma in the C source immediately before the function call:
#pragma SDS async() ... #pragma SDS wait()
: Is a user-defined ID for theASYNC
/WAIT
pair specified as a compile time unsigned integer constant.
Example 1
The following code snippet shows an example of using these pragmas with different IDs:
{ #pragma SDS async(1) mmult(A, B, C); #pragma SDS async(2) mmult(D, E, F); ... #pragma SDS wait(1) #pragma SDS wait(2) }
The program running on the hardware first transfersA
andB
to the mmult hardware and returns immediately. Then the program transfersD
andE
to the mmult hardware and returns immediately. When the program later executes to the point of#pragma SDS wait(1)
, it waits for the outputC
to be ready. When the program later executes to the point of#pragma SDS wait(2)
, it waits for the outputF
to be ready.
Example 2
The following code snippet shows an example of using these pragmas with the sameID
to pipeline the data transfer and accelerator execution:
for (int i = 0; i < pipeline_depth; i++) { #pragma SDS async(1) mmult_accel(A[i%NUM_MAT], B[i%NUM_MAT], C[i%NUM_MAT]); } for (int i = pipeline_depth; i < NUM_TESTS; i++) { #pragma SDS wait(1) #pragma SDS async(1) mmult_accel(A[i%NUM_MAT], B[i%NUM_MAT], C[i%NUM_MAT]); } for (int i = 0; i < pipeline_depth; i++) { #pragma SDS wait(1) }
In the above example, the first loop ramps up the pipeline with a depth ofpipeline_depth
, the second loop executes the pipeline, and the third loop ramps down the pipeline. The hardware buffer depth (pragma SDS data buffer_depth) should be set to the same value aspipeline_depth
. The goal of this pipeline is to transfer data to the accelerator for the next execution while the current execution is not finished. Refer to "Increasing System Parallelism and Concurrency" inSDSoC Profiling and Optimization Guidefor more information.
See Also
pragma SDS data access_pattern
Description
This pragma must be specified immediately preceding a function declaration, or immediately preceding another#pragma SDS
bound to the function declaration.
This pragma specifies the data access pattern in the hardware function. TheSDSoCcompiler checks the value of this pragma to determine the hardware interface to synthesize. If the access pattern isSEQUENTIAL
, a streaming interface (such asap_fifo
) will be generated. Otherwise, withRANDOM
access pattern, a RAM interface will be generated. Refer to"Data Motion Network Generation in SDSoC" in theSDSoC Environment Profiling and Optimization Guide(UG1235)for more information on the use of this pragma in data motion network generation.
Syntax
The syntax for this pragma is:
#pragma SDS data access_pattern(ArrayName:)
Where:
ArrayName
: Specifies one of the formal parameters of the function to assign the pragma to.: can be either SEQUENTIAL
orRANDOM
. The default isRANDOM
.
Example 1
The following code snippet shows an example of using this pragma for the array argument (A
):
#pragma SDS data access_pattern(A:SEQUENTIAL) void foo(int A[1024], int B[1024]);
In the example shown above, a streaming interface will be generated for argumentA
, while a RAM interface will be generated for argumentB
. The access pattern for argumentA
must be A[0], A[1], A[2], ... , A[1023], and all elements must be accessed only once. On the other hand, argumentB
can be accessed in a random fashion, and each element can be accessed zero or more times.
Example 2
The following code snippet shows an example of using this pragma for a pointer argument:
#pragma SDS data access_pattern(A:SEQUENTIAL) #pragma SDS data copy(A[0:1024]) void foo(int *A, int B[1024]);
In the above example, if argumentA
is intended to be a streaming port, the two pragmas shown must be applied. Without these,SDSoCtool synthesizes argumentA
as a register (IN, OUT, or INOUT based on the usage ofA
in functionfoo
).
Example 3
The following code snippet shows the combination of theZERO_COPYpragma and the ACCESS_PATTERN pragma:
#pragma SDS data zero_copy(A) #pragma SDS data access_pattern(A:SEQUENTIAL) void foo(int A[1024], int B[1024]);
In the above example, the ACCESS_PATTERN pragma is ignored. After the ZERO_COPY pragma is applied to an argument, an AXI Master interface will be synthesized for that argument. Refer to"Zero Copy Data Mover" in theSDSoC Environment Profiling and Optimization Guide(UG1235)for more information.
See Also
pragma SDS data buffer_depth
Description
This pragma must be specified immediately preceding a function declaration, or immediately preceding another#pragma SDS
bound to the function declaration, and applies to all the callers of the function.
This pragma only applies to arrays that map to block RAM or FIFO interfaces. For a block RAM-mapped array, the
- BRAM: 1 ≤
≤ 4, and 2 ≤ ArraySize ≤ 16384. - FIFO:
= 2n, where 4 ≤ n ≤ 20.
Syntax
The syntax of this pragma is:
#pragma SDS data buffer_depth(ArrayName:)
Where:
ArrayName
: Specifies one of the formal parameters of the function to assign the pragma to.: must be a compile-time constant value. - Multiple arrays can be specified as a comma separated list in one pragma. For example:
#pragma SDS data buffer_depth(ArrayName1:BufferDepth1, ArrayName2:BufferDepth2)
Example 1
This example specifies a multi-buffer of size 4 used for the RAM interface of argumenta
:
#pragma SDS data buffer_depth(a:4) void foo(int a[1024], b[1024);
See Also
pragma SDS data copy
Description
Thepragma SDS data copy | zero_copy
must be specified immediately preceding a function declaration, or immediately preceding another#pragma SDS
bound to the function declaration.
ZERO_COPY
pragma are mutually exclusive and should not be specified together on the same object.
The COPY pragma implies that data is explicitly copied between the host processor memory and the hardware function. A suitable data mover performs the data transfer. See "Improving System Performance" inSDSoC Profiling and Optimization Guidefor more information.
The ZERO_COPY means that the hardware function accesses the data directly from shared memory through an AXI master bus interface.
Syntax
The syntax for this pragma is:
#pragma SDS data copy|zero_copy(ArrayName[:])
Where:
ArrayName[
: specifies the function parameter or argument to assign the pragma to, and the array dimension and data transfer size.: ] ArrayName
: must be one of the formal parameters of the function definition, not from the prototype (where parameter names are optional) but from the function definition.
: Optionally specifies the number of elements from the first element in the array. It must be specified as a compile-time constant.IMPORTANT:Thevalue is currently ignored, and should be specified as 0.
: Specifies the number of elements transferred from the array for the specified dimension. It can be an arbitrary expression as long as the expression can be resolved at runtime inside the function.TIP:As shown in the examples below,can be a C arithmetic expression involving other scalar arguments of the same function. - For a multi-dimensional array, each dimension should be separately specified. For example, for a two-dimensional array, use:
pragma SDS data copy(ArrayName[offset_dim1:length1][offset_dim2:length2])
- Multiple arrays can be specified in the same pragma, using a comma separated list. For example, use:
pragma SDS data copy(ArrayName1[offset1:length1], ArrayName2[offset2:length2])
- The
[
argument is optional, and is only needed if the data transfer size for an array cannot be determined at compile time. When this is not specified, the: ] COPY
orZERO_COPY
pragma is only used to select between copying the memory to/from the accelerator through a data mover versus directly accessing the processor memory by the accelerator. To determine the array size, theSDSoCcompiler analyzes the callers to the accelerator function to determine the transfer size based on the memory allocation APIs for the array, for example,malloc
orsds_alloc
. If the analysis fails, it checks the argument type to see if the argument type has a compile-time array size and uses that size as the data transfer size. If the data transfer size cannot be determined, the compiler generates an error message so that you can specify the data size with[
. If the data size is different between the caller and the callee, or different between multiple callers, the compiler also generates an error message so that you can correct the source code or use this pragma to override the compiler analysis.: ]
Example 1
The following example applies the COPY pragma to both the "A" and "B" arguments of the accelerator functionfoo
right before the function declaration. Notice the
option is specified as an expression,size*size
:
#pragma SDS data copy(A[0:size*size], B[0:size*size]) void foo(int *A, int *B, int size);
TheSDSoCsystem compiler will replace the body of the functionfoo
with accelerator control, data transfer, and data synchronization code. The following code snippet shows the data transfer part:
void _p0_foo_0(int *A, int *B, int size) { ... cf_send_i(&(_p0_swinst_foo_0.A), A, (size*size) * 4, &_p0_request_0); cf_receive_i(&(_p0_swinst_foo_0.B), B, (size*size) * 4, &_p0_request_1); ... }
As shown above, the offset valuesize*size
is used to tell theSDSoCruntime the number of elements of arrays "A" and "B."
cf_send_i
and
cf_receive_i
functions require the number of bytes to transfer, so the compiler will multiply the number of elements specified by
with the number of bytes for each element (4 in this case).
Example 2
The following code snippet shows an example of applying the ZERO_COPY pragma, instead of the COPY pragma above:
#pragma SDS data zero_copy(A[0:size*size], B[0:size*size]) void foo(int *A, int *B, int size);
The body of functionfoo
becomes:
void _p0_foo_0(int *A, int *B, int size) { ... cf_send_ref_i(&(_p0_swinst_foo_0.A), A, (size*size) * 4, &_p0_request_0); cf_receive_ref_i(&(_p0_swinst_foo_0.B), B, (size*size) * 4, &_p0_request_1); ... }
Thecf_send_ref_i
andcf_receive_ref_i
functions only transfer the reference or pointer of the array to the accelerator, and the accelerator accesses the processor memory directly.
Example 3
The following example shows a ZERO_COPY pragma with multiple arrays specified to generate a direct memory interface with DDR and the hardware function:
#pragma SDS data zero_copy(in1[0:mat_dim*mat_dim], in2[0:mat_dim*mat_dim], out[0:mat_dim*mat_dim]) void matmul_partition_accel(int *in1, // Read-Only Matrix 1 int *in2, // Read-Only Matrix 2 int *out, // Output Result int mat_dim); // Matrix Dim (assumed only square matrix)
Example 4
A DATA COPY pragma instructs the compiler to insert the transfer size expression into the corresponding send/receive call within stub function body. As a result, it is essential that the argument names used in the function declaration match the argument names in the function definition. The following code snippet illustrates a common mistake: using an argument name in the function declaration that is different from the argument name used in the function definition:
"foo.h" #pragma SDS data copy(in_A[0:1024]) void foo(int *in_A, int *out_B); "foo.cpp" #include "foo.h" void foo(int *A, int *B) { ... }
Any C/C++ compiler will ignore the argument name in the function declaration, because the C/C++ standard makes the argument name in the function declaration optional. Only the argument name in the function definition is used by the compiler. However, theSDSoCcompiler will issue a warning when trying to apply the pragma:
WARNING: [SDSoC 0-0] Cannot find argument in_A in accelerator function foo(int *A, int *B)
See Also
pragma SDS data data_mover
Description
This pragma must be specified immediately preceding a function declaration, or immediately preceding another#pragma SDS
bound to the function declaration. This pragma applies to all the callers of the bound function.
By default, theSDSoCcompiler chooses the type of the data mover automatically by analyzing the code. The DATA_MOVER pragma can be used to override the compiler default. This pragma specifies the HW IP type, orDataMover
, used to transfer an array argument.
The FASTDMA data mover supports a wider data-width to support higher bandwidth for data transfer. ForZynq® UltraScale+™ MPSoCthe data-width is from 64-bits to 256-bits. ForZynq-7000the data-width is 64-bits.
TheSDSoC™compiler automatically assigns an instance of the data mover HW IP to use for transferring the corresponding array. The:id
can be specified to assign a specific data mover instance for the associated formal parameter. If more than two formal parameters have the sameDataMover
and the sameid
, they will share the same data mover HW IP instance.
sds_alloc()
.
Syntax
The syntax for this pragma is:
#pragma SDS data data_mover(ArrayName:DataMover[:id])
Where:
ArrayName
: Specifies one of the formal parameters of the function to assign the pragma to.DataMover
: Must be one of the following:- AXIFIFO: used for non-contiguous memory, <300 bytes.
- AXIDMA_SIMPLE: used for contiguous memory, <32MB.
- AXIDMA_SG: can be used for either contiguous or non-contiguous memory, >300 bytes.
- FASTDMA: contiguous memory only. The pragma is required when FASTDMA is desired.
:id
: is optional, but must be specified as a positive integer when it is used.- Multiple arrays can be specified in one pragma, separated by a comma (,). For example:
#pragma SDS data data_mover(ArrayName1:DataMover[:id], ArrayName2:DataMover[:id])
Example 1
The following code snippet shows an example of specifying the data mover ID in the pragma:
#pragma SDS data data_mover(A:AXIDMA_SG:1, B:AXIDMA_SG:1) void foo(int A[1024], int B[1024]);
In the example above, the same instance of the AXIDMA_SG IP is shared to transfer data for argumentsA
, andB
, because the same data mover ID has been specified.
Example 2
#pragma SDS data data_mover(A:FASTDMA,B:FASTDMA,C:FASTDMA,D:AXIDMA_SIMPLE,E:AXIDMA_SIMPLE) void foo(float A[1024], float B[1024], float C[1024], int D[1024], int E[1024]);
The compiler will transfer arrays A, B, and C with individual FASTDMA data movers, and arrays D, E with individual AXIDMA_SIMPLE data movers.
See Also
pragma SDS data mem_attribute
Description
This pragma must be specified immediately preceding a function declaration, or immediately preceding another#pragma SDS
bound to the function declaration. This pragma applies to all the callers of the function.
For an operating system like Linux that supports virtual memory, user-space allocated memory is paged, which can affect system performance. TheSDSoCruntime also provides an API to allocate physically contiguous memory. The pragmas in this section can be used to tell the compiler whether the arguments have been allocated in physically contiguous memory.
Syntax
The syntax for this pragma is:
#pragma SDS data mem_attribute(ArrayName:contiguity)
Where:
ArrayName
: Specifies one of the formal parameters of the function to assign the pragma to.Contiguity
: Must be specified as eitherPHYSICAL_CONTIGUOUS
orNON_PHYSICAL_CONTIGUOUS
. The default value isNON_PHYSICAL_CONTIGUOUS
:PHYSICAL_CONTIGUOUS
means that all memory corresponding to the associatedArrayName
is allocated usingsds_alloc
.NON_PHYSICAL_CONTIGUOUS
means that all memory corresponding to the associatedArrayName
is allocated usingmalloc
or as a free variable on the stack. This helps theSDSoCcompiler select the optimal data mover.
- Multiple arrays can be specified in one pragma, separated by a comma (,). For example:
#pragma SDS data mem_attribute(ArrayName:contiguity, ArrayName:contiguity)
Example 1
The following code snippet shows an example of specifying thecontiguity
attribute:
#pragma SDS data mem_attribute(A:PHYSICAL_CONTIGUOUS) void foo(int A[1024], int B[1024]);
In the example above, the user tells theSDSoCcompiler that arrayA
is allocated in the memory block that is physically contiguous. TheSDSoCcompiler then choosesAXIDMA_SIMPLE
instead ofAXIDMA_SG
, because the former is smaller and faster for transferring physically contiguous memory.
See Also
pragma SDS data sys_port
Description
This pragma must be specified immediately preceding a function declaration, or immediately preceding another#pragma SDS
bound to the function declaration, and applies to all the callers of the function.
This pragma overrides theSDSoCcompiler default choice of memory port. If the SYS_PORT pragma is not specified for an array argument, the interface to the external memory is automatically determined by theSDSoCsystem compilers (sdscc/sds++) based on array memory attributes (cacheable or non-cacheable), array size, data mover used, etc.
TheZynq®-7000device provides a cache coherent interface (S_AXI_ACP) between programmable logic and external memory, and high-performance ports (S_AXI_HP) for non-cache coherent access. TheZynq® UltraScale+™ MPSoCprovides a cache coherent interface (S_AXI_HPCn_FPD), and non-cache coherent interface called (S_AXI_HPn_FPD).
Syntax
#pragma SDS data sys_port(:)
Where:
: Specifies one of the formal parameters of the function to assign the pragma to. : The SDSoC compiler recognizes predefined memory port types: ACP forZynq-7000devices only, HPC, HP, or MIG, which represent cache coherent access (ACP, HPC), high speed non-coherent access (HP), or memory accessible through a soft memory controller implemented in PL logic (MIG). You can also use a specific platform port name for the , but this is not recommended unless the compiler does not select the correct port, which could occur for a stream port in the platform. To get a list of platform ports, in a terminal shell, run sds++ -sds-pf-info
.For example, the
sds++ -sds-pfm-info zcu102
command returns the following under System Ports:System Ports Use the system port name in a sysport pragma, for example #pragma SDS data sys_port(parameter_name:system_port_name) System Port Name (Vivado BD instance name, Vivado BD port name) ps_e_S_AXI_HPC0_FPD (ps_e, S_AXI_HPC0_FPD) ps_e_S_AXI_HPC1_FPD (ps_e, S_AXI_HPC1_FPD) ps_e_S_AXI_HP0_FPD (ps_e, S_AXI_HP0_FPD) ps_e_S_AXI_HP1_FPD (ps_e, S_AXI_HP1_FPD) ps_e_S_AXI_HP2_FPD (ps_e, S_AXI_HP2_FPD) ps_e_S_AXI_HP3_FPD (ps_e, S_AXI_HP3_FPD)
In this case, the SYS_PORT pragma could be defined as:#pragma SDS data sys_port(Array1:ps_e_S_AXI_HPC0_FPD)
If
is defined using the HPC shortcut, then the argument, Array1, could be assigned to either HPC0, or HPC1, by theSDSoCcompiler. When the platform is created, the designer could specify a shortcut for a specific platform port, using thePFM.AXI_PORT
property. Refer to SDSoC Environment Platform Development Guidefor more information onPFM
properties. For example:set_property PFM.AXI_PORT {M_AXIS {type "M_AXIS" sptag "Counter"}} \ [get_bd_cells /stream_fifo]
This defines a SYS_PORT tag, "Counter", which can be specified in the pragma as:#pragma SDS data sys_port(Array1:Counter)
This would be the same as declaring the following:#pragma SDS data sys_port(Array1:stream_fifo_M_AXIS)
- Multiple arguments can be specified in one pragma, separated by commas:
#pragma SDS data sys_port(param1:port, param2:port)
Example 1
The following code snippet shows an example of using this pragma:
#pragma SDS data sys_port(A:HP) void foo(int A[1024], int B[1024]);
In the above example, if the caller passes an array (A) allocated with cache coherent calls, such asmalloc
, orsds_alloc
, theSDSoCcompiler uses an HP platform interface even though this might not be the best choice.
See Also
pragma SDS resource
Description
This pragma can be used at function call sites to manually specify resource binding.
The RESOURCE pragma is specified immediately preceding a call to a hardware function, directing the compiler to bind the caller to a specified accelerator instance. TheSDSoCcompiler identifies when multiple resource IDs have been specified for a function, and automatically generates a hardware accelerator and data motion network realizing the hardware functions in programmable logic.
Syntax
#pragma SDS resource()
Where:
: Must be a compile time unsigned integer constant. For the same function, each unique ID represents a unique instance of the hardware accelerator.
Example 1
The following code snippet shows an example of using this pragma with a differentID
:
{ #pragma SDS resource(1) mmult(A, B, C); #pragma SDS resource(2) mmult(D, E, F); ... }
In the previous example, the first call to functionmmult
will be bound to an accelerator with an ID of 1, and the second call tommult
will be bound to another accelerator with an ID of 2.
See Also
pragma SDS trace
Description
TheSDSoCenvironment tracing feature provides a detailed view of what is happening in the system during execution of an application, through the use of hardware/software event tracing. See theSDSoC Environment User Guidefor more information.
This pragma specifies the trace insertion for the accelerator with granularity at the function level or the argument level, to let you monitor the activity on the accelerator for debug purposes. When tracing is enabled, tracing instrumentation is automatically inserted into the software code, and hardware monitors are inserted into the hardware system during implementation of the hardware logic. You can monitor either the complete accelerator function, or an individual parameter of the function.
The type of trace can beSW
,HW
, or both.HW
trace means the "start" and "stop" of the corresponding hardware component, such as the start and stop of the hardware accelerator, or the start and stop of data transfer of the specified argument. This lets you monitor activity moving onto, and off of, the hardware. TheSW
trace lets you observe the software stub for the hardware accelerated function, to monitor the function, and arguments on the software side of the transaction. You can also monitor both the hardware, and software transactions.
Syntax
This pragma must be specified immediately preceding a function declaration, or immediately preceding another#pragma SDS
bound to the function declaration.
#pragma SDS trace([:SW|:HW][,[:SW|:HW]])
Where:
- : Specifies either the function name, or one of the parameters of the function.
[:SW|:HW]
: Specifies either HW tracing or SW tracing. The absence of this option indicates that both HW and SW traces are inserted.
Example 1
The following example traces the specified function,foo
:
#pragma SDS monitor trace(foo) void foo(int a, int b);
:HW
or
:SW
indicates that both traces are inserted for the accelerator.
Example 2
The following example demonstrates using this pragma to trace multiple arguments of the function.
#pragma SDS monitor trace(a, b:SW, c:HW) void foo(int a, int b, int *c);
In the previous example, bothHW
andSW
traces are inserted for argumenta
. Only theSW
trace is inserted for argumentb
. For argumentc
, only theHW
trace is inserted.
See Also
pragma SDS wait
Description
The WAIT pragma must be paired with the ASYNC pragma to support manual control of the hardware function synchronization.
The ASYNC pragma is specified immediately preceding a call to a hardware function, directing the compiler not to automatically generate the wait based on data flow analysis. The WAIT pragma must be inserted at an appropriate point in the program to direct the CPU to wait until the associated ASYNC function call with the same ID has completed.
See Also
pragma SDS data zero_copy
Description
The COPY pragma implies that data is explicitly copied between the host processor memory and the hardware function, using a suitable data mover for the transfer. The ZERO_COPY pragma means that the hardware function accesses the data directly from shared memory through an AXI master bus interface.
Example 1
The following example shows a ZERO_COPY pragma with multiple arrays specified to generate a direct memory interface with DDR and the hardware function:
#pragma SDS data zero_copy(in1[0:mat_dim*mat_dim], in2[0:mat_dim*mat_dim], out[0:mat_dim*mat_dim]) void matmul_partition_accel(int *in1, // Read-Only Matrix 1 int *in2, // Read-Only Matrix 2 int *out, // Output Result int mat_dim); // Matrix Dim (assumed only square matrix)
See Also
- SDSoC Environment Profiling and Optimization Guide(UG1235)