Expert Post: Using Template Sorcery To Implement SYCL Interoperability

Posted on January 30, 2018 by Christopher Di Bella.

Whilst developing ComputeCpp, our implementation of the SYCL standard, we've come across some really interesting challenges. One of these is how we ensure C++ fundamental types are translated correctly from SYCL code through to OpenCL, retaining their correct size and signedness.

The SYCL programming model brings together the C++ programming language and the OpenCL platform for programming heterogeneous systems, to provide the portability of OpenCL with modern C++. However for a particular feature of OpenCL; built-in math functions such as abs() or max(), this introduces an interesting challenge. As SYCL is a C++ programming model, these functions should provide a C++ interface which takes standard C++ types such as char, int or long long, but the interface for the OpenCL built-in functions take OpenCL types such as cl_char, cl_int and cl_long. The challenge arises because ISO C++ and OpenCL make different guarantees as to the sizes and signedness of the types. This means that the SYCL interface must map to the corresponding OpenCL interface in such a way as to the size and signedness of the parameters is maintained.

If you've been writing C++ for a while, you probably know that the number of bytes in short through long long are implementation-defined. That means, given a few light constraints, it is up to the compiler to decide how many bytes are in each of the types. We're also promised that there are at least eight bits in a char, but nothing more is guaranteed by the C++ Standard. char is a distinct type to signed char and unsigned char (this is different to C, where the implementation decides char will either be signed char or unsigned char ). Finally, float, double, and long double are also implementation defined, but usually map to IEEE 754 single-precision and double-precision floating-point numbers, respectively.

This is problematic for OpenCL -- and hence SYCL -- as well as OpenGL. Since the details slightly differ for OpenCL and OpenGL, we'll focus on OpenCL and SYCL, and leave the OpenGL equivalent as an exercise for the reader.


OpenCL C defines a number of scalar types:

  • cl_char (8-bit signed integer)
  • cl_uchar (8-bit unsigned integer)
  • cl_short (16-bit signed integer)
  • cl_ushort (16-bit unsigned integer)
  • cl_int (32-bit signed integer)
  • cl_uint (32-bit unsigned integer)
  • cl_long (64-bit signed integer)
  • cl_ulong (64-bit unsigned integer)
  • cl_half (IEEE 754 half-precision floating-point)
  • cl_float (IEEE 754 single-precision floating-point), and
  • cl_double (IEEE 754 double-precision floating-point).

These scalar types are quite well-defined by the specification; unlike C++, where many of the details are left to the implementation. Another interesting point to note is that OpenCL C does not have three character types: cl_char is a signed integral, and cl_uchar is an unsigned integral.

Let's now define an OpenCL C function.

cl_int square(cl_int x)
{
   return x * x;
}

The square function represents bindings between OpenCL C built-in functions and how they can be directly used in SYCL. OpenCL C built-in functions are overloaded, and thus need to be mangled. This example demonstrates CXX Itanium mangling, but it's possible to use other mangling specifications also. We manually mangle them (as shown below) because we're calling them from C++, and because we don't know the underlying definition of the cl_* types. Since SYCL is a C++ programming model, we'd like to port the above code to C++ for general-purpose use. OpenCL provides a built-in square function for us to call.

// We use an `extern "C"` block to tell the compiler that the stuff inside has C-linkage, not C++-linkage.
extern "C" {
// OpenCL C function names as the C++ compiler sees them
// These might be compiled to x86_64, SPIR, SPIR-V, or PTX, when using ComputeCpp.
cl_char _Zsquarei8(cl_char);
cl_short _Zsquaraei16(cl_short);
cl_int _Zsquarei32(cl_int);
cl_long _Zsquarei64(cl_long);
cl_half _Zsquaref16(cl_half);
cl_float _Zsquaref32(cl_float);
cl_double _Zsquaref64(cl_double);
} // extern "C"

namespace cl { namespace sycl {
namespace detail {
inline cl_char square(const cl_char x) noexcept
{
return ::_Zsquarei8(x);
}

inline cl_short square(const cl_short x) noexcept
{
return ::_Zsquarei16(x);
}

inline cl_int square(const cl_int) noexcept
{
return ::_Zsquarei32(x);
}

inline cl_long square(const cl_long) noexcept
{
return ::_Zsquarei64(x);
}
} // namespace detail

template <typename T>
T square(const T t) noexcept
{
return ::cl::sycl::detail::square(t);
}
}} // namespace cl::sycl

Now, let's jump to a bit of user code that wants to employ this.

auto v = []{
constexpr auto size = 65535; // explained in the paragraph below
auto result = std::vector<int>(size, 0);
std::iota(begin(v), end(v), -size / 2);
return result;
}();
std::transform(std::par_unseq, begin(v), end(v), begin(v), cl::sycl::square<int>);

This code will work when you're running on exactly one implementation (e.g. x86_64), because int will map to one of cl_short , cl_int , or cl_long . If we add a second implementation (e.g. MSP430), int maps to different types when compiling with the same version of GCC. The C++ we've written is semantically valid for one implementation, when we know the details for the implementation we're coding against, but not for many. While 65535 2 fits in the range for an int on GCC-compiled x86_64 code, there is signed integral overflow on our MSP430; as prescribed by the C++ International Standard, this behavior is undefined[1]. Furthermore, different implementations for the same target are not necessarily consistent. For example, both Clang and GCC on x86_64 treat long as an 8-byte integer, but long is only four bytes on MSVC++[3]!

How do we guard against such madness? By using templates, of course!

template <typename T>
T square(const T t)
{
   return static_cast<T>(::cl::sycl::detail::square(::cl::sycl::detail::cpp_to_cl_cast(t)));
}

In reality, what we need to do is convert the parameter from its C++ type (which we can't control) to an OpenCL equivalent (which we can control), and then back to the original type (which the user expects). cl_cast is a bit tricky, so let's go slowly.

namespace cl { namespace sycl { namespace detail {
// std::conditional_t is from C++14, but SYCL 1.2.1 targets C++11, so we need to define this ourselves. template <bool B, typename T, typename F> using conditional_t = typename std::conditional<B, T, F>::type; template <typename F> using opencl_floating_t =
conditional_t<sizeof(F) == 2, cl_half,
conditional_t<sizeof(F) == 4, cl_float,
cl_double>>;
template <typename I, typename I8, typename I16, typename I32, typename I64> using opencl_integer_t = conditional_t<sizeof(I) == 1, I8, conditional_t<sizeof(I) == 2, I16, conditional_t<sizeof(I) == 4, I32,
I64>>>; template <typename I> using opencl_signed_integer_t = opencl_integer_t<I, cl_char, cl_short, cl_int, cl_long>; template <typename I> using opencl_unsigned_integer_t = opencl_integer_t<I, cl_uchar, cl_ushort, cl_uint, cl_ulong>; template <typename T> using opencl_t = conditional_t<std::is_integral<T>::value, conditional_t<std::is_signed<T>::value, opencl_signed_integer_t<T>,
opencl_unsigned_integer_t<T>>, opencl_floating_t<T>>;

We have here a series of type functions that take your C++ fundamental type as a parameter, and deduce the equivalent OpenCL scalar type.

  • opencl_floating_point_t uses conditional_t to work out if we've got a two-byte or a four-byte floating-point type. In the case that we have something other than a two-byte or four-byte floating-point number, we assume that it is eight-bytes. We'll come back to working out what happens if we have a larger type.
  • opencl_integer_t is very much the same as opencl_floating_point_t, but because we have signed and unsigned types to worry about, we parameterize the types, and delegate the actual types to opencl_signed_integer_t and opencl_unsigned_integer_t.
  • opencl_t is an abstraction over opencl_floating_point_t and opencl_integer_t. It helps simplify the next bit.

Now that we've provided our type function that deduces the type the user wants, let's write our cpp_to_cl_cast.

   template <typename T>
   constexpr auto cpp_to_cl_cast(const T t) noexcept -> opencl_t<T>
   {
      return static_cast<opencl_t<T>>(t);
   }
  
}}} // namespace cl::sycl::detail

Wow, that was simple! All we've done is ask the compiler to work out which type we want. Turning on optimisations will also inline our cpp_to_cl_cast.


Constraining types

We've worked out a neat conversion function, but when we pass in our own type, we run into some serious compiler issues. Since we're choosing to stick with C++11 in this demo, the Concepts TS isn't available to us, and we need to do a bit more hacking using type functions.

Let's now introduce a type function called enable_if . enable_if<B, T> takes a Boolean expression and a type, and behaves very much like a switch does. If the expression is true, then we get the type. If it isn't, then we do not.

This enable_if type function lets us employ something called Substitution Failure Is Not An Error (SFINAE for short). In not so many words, this means that instead of getting an error when a template fails to substitute a parameter, it moves on to the next template. Only when there are no templates left, do we get an error. SFINAE is ugly, but until we're able to use Concepts, we don't have many options.

template <typename T>
using opencl_t =
   enable_if_t<(std::is_integral<T>::value || std::numeric_limits<T>::is_iec559) && sizeof(T) <= 8,
      conditional_t<std::is_integral<T>::value,
         conditional_t<std::is_signed<T>::value, opencl_signed_integer_t<T>, opencl_unsigned_integer_t<T>>,
         opencl_floating_t<T>>>;

The constraint that we've imposed here is that opencl_t must be passed an integer or an IEEE 754 floating-point number, and that the parameter must be at most 8 bytes in size. You might be wondering just where the IEEE 754 check is being made: it turns out that there is a publication called IEC 559:1989, which is a publication identical to IEEE 754, but under the IEC banner[2]. Since C++ is an ISO/IEC International Standard, it makes sense that the C++ will refer to the IEC code, rather than the commonly known IEEE code (however frustrating this may be).

We should probably add this to our square too, but that will be left as an exercise for the reader.

Character types with more than eight bits

C++11 mandates that char has at least eight bits, but implementations are permitted to have larger character types. OpenCL, and hence SYCL, doesn't play with this, so we add a constraint on this as well:

static_assert(CHAR_BIT == 8, "SYCL requires char to have exactly 8 bits.");

Now that we've come to the end of the tutorial, you might be wondering how this all compiles. Thankfully, we can check out how various devices compile our code using Compiler Explorer. Our example has been compiled nine times:

  • GCC 7 for x86_64
  • Clang 5 for x86_64
  • GCC 6 for PowerPC 64
  • GCC 6 for MSP430
  • GCC 6 for ARM64
  • GCC 7 for ARM
  • Intel C++ Compiler 18 for x86_64
  • GCC 5 for MIPS 64
  • GCC 5 for MIPS

Conclusion

You might think this is all a bit much, and when I first started, that's what I thought too. Once you consider that your code is going to run on many implementations and thus be compiled by many compilers, you need to start considering the flexible rules that C++ offers implementation developers. What your int maps to on your CPU might be different to your GPU, which might be different again to your MSP430-based device. People don't like to be told which types to use in cases where fundamental types are freely available, so requiring everyone to use cl_int, etc., is a bad idea. They might also have to work with other libraries that aren't related OpenCL (and so they don't use OpenCL scalar types). cpp_to_cl_cast is a nice way of letting us continue to use fundamental types consistently and still use OpenCL built-in functions.

Fortunately, application programmers are protected by SYCL implementers: we implementers take the time out to write the conversion code, and give you the nice cl::sycl::square function to call. That's where all the magic happens.

So in summary, if you're an application developer, you might have learnt a little about how SYCL works under the hood. If you're looking to implement SYCL, you might have found a way to get the compiler to do your lifting for you.

References

  1. ISO/IEC JTC1/SC22/WG21. N3337 Working Draft, Standard for Programming Language C++. §3.9/4, page 69. Published 2012-01-16. Retrieved from http://www.openstd.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf.
  2. ISO/IEC JTC1/SC22/WG14. N1605 Information Technology — Programming languages, their environments, and system software interfaces — Floating-point extensions for C — Part 1: Binary floating-point arithmetic. Page iv. Published 2012-03-16. Retrived from: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1605.pdf. Date retrieved: 2018-01-17.
  3. Microsoft Developer Network. Fundamental Types (C++). Retrieved from: https://msdn.microsoft.com/en-us/library/cc953fe1.aspx. Date retrieved: 2018-01-29.