Parallel Thread Execution

Parallel Thread Execution (PTX or NVPTX ) is a low-level parallel thread execution virtual machine and instruction set architecture used in Nvidia's Compute Unified Device Architecture (CUDA) programming environment. The Nvidia CUDA Compiler (NVCC) translates code written in CUDA, a C++-like language, into PTX instructions (an assembly language represented as American Standard Code for Information Interchange (ASCII) text), and the graphics driver contains a compiler which translates PTX instructions into executable binary code, which can run on the processing cores of Nvidia graphics processing units (GPUs). The GNU Compiler Collection also has basic ability to generate PTX in the context of OpenMP offloading. Inline PTX assembly can be used in CUDA.

Registers

PTX uses an arbitrarily large processor register set; the output from the compiler is almost pure static single-assignment form, with consecutive lines generally referring to consecutive registers. Programs start with declarations of the form It is a three-argument assembly language, and almost all instructions explicitly list the data type (in sign and width) on which they operate. Register names are preceded with a % character and constants are literal, e.g.: There are predicate registers, but compiled code in shader model 1.0 uses these only in conjunction with branch commands; the conditional branch is The instruction sets a predicate register to the result of comparing two registers of appropriate type, there is also a instruction, where sets the 32-bit register to if the 64-bit register is less than or equal to the 64-bit register. Otherwise is set to. There are a few predefined identifiers that denote pseudoregisters. Among others,, and contain, respectively, thread indices, block dimensions, block indices, and grid dimensions.

State spaces

Load and store commands refer to one of several distinct state spaces (memory banks), e.g.. There are eight state spaces: Shared memory is declared in the PTX file via lines at the start of the form: Writing kernels in PTX requires explicitly registering PTX modules via the CUDA Driver API, typically more cumbersome than using the CUDA Runtime API and Nvidia's CUDA compiler, nvcc. The GPU Ocelot project provided an API to register PTX modules alongside CUDA Runtime API kernel invocations, though the GPU Ocelot is no longer actively maintained.

This article is derived from Wikipedia and licensed under CC BY-SA 4.0. View the original article.

Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc.
Bliptext is not affiliated with or endorsed by Wikipedia or the Wikimedia Foundation.

Tools

View original History

Contents

Parallel Thread Execution

Registers

State spaces

Tools