Implicit call-graph duplication could reduce explicit code duplication in C++ AMP

15 November 2011

Since my previous blog about C++ AMP and how it would benefit from call-graph duplication implemented in Offload C++, Microsoft have released another C++ AMP demo that highlights an important pattern of current multicore software: separate implementation of the same functionality for different processors. The following 2 functions (one for CPU and one for GPU) taken from the mentioned MS blog implement virtually the same functionality except that the GPU implementation is annotated with restrict(direct3d) for overloading and it calls a GPU specific, high-performance equivalent of exp.

//----------------------------------------------------------------------------
// GPU implementation - Call value at period t : V(t) = S(t) - X
//----------------------------------------------------------------------------
float expiry_call_value(float s, float x, float vdt, int t) restrict(direct3d)
{
	float d = s * direct3d::fast_exp(vdt * (2.0f * t - NUM_STEPS)) - x;
	return (d > 0) ? d : 0;
}

//----------------------------------------------------------------------------
// CPU implementation - Call value at period t : V(t) = S(t) - X
//----------------------------------------------------------------------------
float expiry_call_value(float s, float x, float vdt, int t)
{
	float d = s * ::exp(vdt * (2.0f * t - NUM_STEPS)) - x;
	return (d > 0) ? d : 0.0f;
}

To make the similarity between GPU and CPU function even more obvious the GPU version could be refactored to have the same body as the CPU version by overloading ::exp with a GPU version:

static inline float exp(float arg) restrict(direct3d)  {
      return direct3d::fast_exp(arg); //just calls the fast GPU version
}

Now the GPU function is exactly the same as the CPU function except for the restrict(direct3d) directive:

float expiry_call_value(float s, float x, float vdt, int t)  restrict(direct3d)
{
	float d = s * ::exp(vdt * (2.0f * t - NUM_STEPS)) - x;//resolves to the GPU version because the call is inside an "restrict(direct3d" context.
	return (d > 0) ? d : 0;
}

The problem is having to maintain separate libraries with the same functionality for CPU and GPU is leading to the usual well-known problems with code duplication. Call-graph duplication could address this problem in C++AMP by not requiring the annotation of library functions for accelerators (such as the GPU) using "restrict(direct3d)". Upon calling a normal function from inside a "restrict(direct3d)" context the compiler generates a "restrict(directed)" annotated copy of that function and all functions it calls directly and indirectly, hence the name call-graph duplication. This allows CPU and GPU to share the same source base and tweaking performance by adding fast GPU overloads if required. Granted, the above code is very simple and the code duplication could in this example be achieved by using macros. However, not only do macros limit the debuggability of the code they generate, the power of call-graph duplication really shows on large, very complex call-graphs involving argument pointers to different memory types (See issues with pointer types), where the duplication is driven behind the scenes by the (pointer) types of the arguments, something that could not be done by macros because they are not part of the C++ type system. Call-graph duplication is part of the Offload C++ specification implemented by Codeplay's Offload compiler which was used to boost performance of AI and visual effects in the AAA PS3 title NASCAR The Game.

Codeplay Software Ltd has published this article only as an opinion piece. Although every effort has been made to ensure the information contained in this post is accurate and reliable, Codeplay cannot and does not guarantee the accuracy, validity or completeness of this information. The information contained within this blog is provided "as is" without any representations or warranties, expressed or implied. Codeplay Sofware Ltd makes no representations or warranties in relation to the information in this post.

oneAPI

oneAPI for NVIDIA®/AMD

oneAPI Construction Kit

SYCL™

Research Projects

All Updates

News

Press Updates

Blogs

Videos

About Us

Careers

Management Team

Collaborations

Press-Packs

Contact Us

Implicit call-graph duplication could reduce explicit code duplication in C++ AMP

15 November 2011

Uwe Dolinsky

Chief Scientist