Thursday, May 30, 2013
AMD OpenCl APP SDK 2 5 update
AMD OpenCl APP SDK 2.5 update
With this latest release we have added key performance enhancements for APUs that free applications from the CPU-to-GPU bandwidth limitation imposed by the PCIe bus, achieving effective data transfer rates as high as 15GB/s. See the other post “CPU-to-GPU data transfers exceed 15GB/s using APU zero copy path” for additional details.
On Windows platforms the run-time now includes broad multi-GPU support. Included in this is OpenCL support for APU plus discrete GPU providing compute performance scaling across the GPUs, and including support for PowerExpress.
The Khronos FP64 extension is enabled for the “Cypress”, double precision capable, family of GPUs, and is planned to be enabled for all double precision capable GPUs as we go forward.
Further details on new features in this release are:
- Kernel launch times have been further reduced.
- The LLVM compiler version used for OpenCL kernels has been upgraded.
- Includes support for use of SSE3 and SSE4.
- Added support for partial use of FMA4 and XOP instructions.
- It is no longer necessary to use the -fno-alias compiler command line option.
- PCIe transfer overhead has been reduced under Linux.
- Transfers between CPUs and GPUs are improved for buffers declared with either the CL_MEM_USE_HOST_PTR or the CL_MEM_ALLOC_HOST_PTR flag.
- For APUs, zero copy buffers created as CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_ONLY offer improved GPU read performance.
- The runtime supports multi-GPU, including simultaneous use of the GPU on both and APU and a discrete GPU on systems running under Windows.
- OpenCL built-in functions leverage AVX on capable CPUs
- Support for PowerExpress 4.0.
- Support for atomic counters for discrete GPUs.
- Support for headless GPU operation.
- OpenCL can be used by a Windows service.
- UVD3 / MPEG-2 support.
- The clFFT library supports radix 3 and radix 5, including support for mixed radix 2/3/5.
- The BLAS library supports the D/S SYRK, D/S SYR2K, D/S GEMV, D/S SYMV functions.
- The Khronos FP64 extension is supported for the ATI Radeon™ HD 5900 and 5800 series, as well as the AMD FirePro™ V8800 and V8700 series.
- gDEBugger 6.0 extension is available for Visual Studio.
- Starting with Catalyst 11.8, improved runtime features appear regularly in the monthly Catalyst releases for Windows.
- Kernel Analyzer 1.9 supports Catalyst releases 11.4 to 11.7.
- APP Profiler provides
- Improved API trace.
- Improved timeline visualization
- Support for analyzing OpenCL Application trace.
- Thread ID and sequence number now are included in the profile output.
Overview of OpenCL AMD APP
OpenCLWhat is AMD APP Technology?
AMD APP technology is a set of advanced hardware and software technologies that enable AMD graphics processing cores (GPU), working in concert with the system’s x86 cores (CPU), to accelerate many applications beyond just graphics. This enables better balanced platforms capable of running demanding computing tasks faster than ever, and sets software developers on the path to optimize for AMD Accelerated Processing Units (APUs).
What is the AMD APP Software Development Kit?
The AMD APP Software Development Kit (SDK) is a complete development platform created by AMD to allow you to quickly and easily develop applications accelerated by AMD APP technology. The SDK allows you to develop your applications in a high-level language, OpenCL™ (Open Computing Language).
What is OpenCL™?
OpenCL™ is the first truly open and royalty-free programming standard for general-purpose computations on heterogeneous systems. OpenCL™ allows programmers to preserve their expensive source code investment and easily target both multi-core CPUs and the latest GPUs, such as those from AMD.
Developed in an open standards committee with representatives from major industry vendors, OpenCL™ gives users what they have been demanding: a cross-vendor, non-proprietary solution for accelerating their applications on their CPU and GPU cores.
More information and download can be obtained at http://goo.gl/7lf7S