Buffer Reinterpret: Viewing data from a different perspective

Posted on March 9, 2018 by Marios Katsigiannis.

SYCL developers are used to strongly typed buffers, which ensure type safety for data manipulation and minimize the risk of accessing out of bounds memory. This is one of the main features of SYCL not available in OpenCL . Users, however, sometimes need more freedom to operate on their data. They might know that buffers of one type can be transformed into buffers of another type, similar to C++'s reinterpret_cast feature. There are many arguments, for and against such action, from users. As SYCL implementers, we can provide some additional guarantees to such a conversion. One of the latest additions to the SYCL 1.2.1 specification, is a feature called Buffer Reinterpret! It is similar to C++ reinterpret_cast, yet guarantees either both type and bounds safety, or graceful failure through typed exceptions.

Buffer Reinterpret comes as a method of the cl::sycl::buffer class with the following signature:

Buffer reinterpret interface

namespace cl {
namespace sycl {
template< typename T,
size_t dimensions = 1,
typename AllocatorT = cl::sycl::default_allocator<T>>
class buffer {
...
template < typename ReinterpretT,
int ReinterpretDim>
buffer< ReinterpretT,
ReinterpretDimensions,
AllocatorT>
reinterpret(
range<ReinterpretDimensions> reinterpretRange)
const;
...
}
}
}

A reinterpreted buffer is a new view of the same underlying SYCL buffer object., sharing host data and OpenCL resources, yet the type associated with the buffer is changed from T to ReinterpretT. In addition, the shape and the type of the buffer can be modified to the users need, but the total number of bytes must remain the same. As such, the buffer remains strongly typed, yet ensures correct bounds checking, while operating with a different type.

The SYCL 1.2.1 specification places the following constraints for this feature:

  1. The total size of the reinterpreted buffer, calculated as sizeof(ReinterpretT) * reinterpretRange[0] * reinterpretRange[1] * reinterpretRange[2] , should be equal to the total size of the original, calculated as sizeof(T) * get_range(0) * get_range(1) * get_range(2). The get_size() function of the reinterpreted should return a value equal to the get_size() of the original.
  2. If the size does not match, a cl::sycl::invalid_object_error exception will be thrown.
    • ComputeCpp CE 0.6.1 currently does not support throwing typed exceptions as indicated by the SYCL specification. This is an upcoming feature of a future version.
  3. The reinterpreted buffer will act like a copy of the original buffer.
      • This is not a deep copy, as the reinterpreted buffer references the data of the original buffer.
      • As such, the binary representation of the data in the original buffer will be used for the reinterpreted buffer's new data type.
        • e.g. When reinterpreting cl::sycl::buffer<uint8_t>(cl::sycl::range<1>(4)) containing {0xAB, 0xCD, 0xEF, 0x01} to cl::sycl::buffer<uint16_t>(cl::sycl::range<1>(2)) , the contents of the reinterpreted buffer will be viewed as { 0xABCD, 0xEF01 } on Intel CPU, Intel GPU, or any x86/x86_64 host.
      • Please be advised, in case of different endian-ness between the devices used and the host machine, the representation may vary between them. In such edge cases, the user is responsible for converting to a format that can be usable in all cases.

We have tested our implementation on many test cases, on Intel CPU and Intel GPU devices, as well as supported AMD devices. ComputeCPP CE 0.6.1 has a small bug that does not allow communication between reinterpreted and original buffers. It only occurs when mixing access between device and host, i.e. accessing on host, when a change has taken place on a device. For those cases there is a workaround that enforces the required communication. If a change takes place on host, then all buffers have the latest version of the data available.

The cases where a workaround is needed are when using the following:

  1. A runtime managed buffer with no user provided data, sharing data with reinterpreted. Data is set using the original buffer on the device and the change is visible on the reinterpreted buffer on host.
  2. A runtime managed buffer with no user provided data, sharing data with reinterpreted. Data is set using the reinterpreted buffer on device and the change is visible on the original buffer on host.
  3. A buffer with user-provided data, sharing data with reinterpreted. Data is set using the original buffer on the device and the change is visible on the reinterpreted buffer on host.
  4. A buffer with user-provided data, sharing data with reinterpreted. Data is set using the reinterpreted buffer on device and the change is visible on the original buffer on host.

    Step-by-step guide

    For normal reinterpret usage:

    1. Create the original buffer.
      • When using the constructor that accepts a user pointer for the initial data, initialized data will be available to the reinterpreted buffers right away.

      Original buffer
      constexpr size_t NUM = 20;
      cl::sycl::buffer<uint8_t> originalBuffer {cl::sycl::range<1>(NUM)};
    2. Use the original buffer, as usual.
    3. Create a reinterpreted buffer using:
      • At the point of creation of the reinterpreted buffer, both buffers should point to the same data.

      Reinterpreting
      // will work if (sizeof(float) == 4 && sizeof(uint8_t) == 1) otherwise it will throw an exception
      auto buffer2 = originalBuffer.reinterpret<float>(cl::sycl::range<1>(NUM / 4));
    4. Use both buffers as required. The original buffer can now be destroyed if needed.

      • Utilize the workaround outlined below to ensure copy-back from the buffer you used to the other buffers (original/reinterpreted)

    Examples

    Example 1: Buffer reinterpret on host

    {
    constexpr size_t size = 12;
    buffer<uint8_t> initial{range<1>(size)};
    {
    auto initialAcc =
    initial.get_access<access::mode::write>();
    for (uint8_t i = 0; i < size; ++i) {
    initialAcc[i] = i;
    }
    }
    auto reint = initial.reinterpret<int8_t>(range<1>(size));
    auto reintAcc =
    reint.get_access<access::mode::read>();
    for (size_t i = 0; i < size; ++i) {
    if(reintAcc[i] != i)
    std::cout << "Error, data change is not visible in the reinterpreted buffer"
    }
    }
     

    Example 2: Buffer Reinterpret on device

    {
    constexpr size_t size = 12;
    std::array<bool, size> success{false, false, false, false, false, false,
    false, false, false, false, false, false};

    buffer<uint8_t, 1> initial{range<1>(size)};
    cl::sycl::queue q{cl::sycl::default_selector()};
    // setup test data using the initial buffer accessor
    q.submit([&](cl::sycl::handler& cgh) {
    auto initialAcc = initial.get_access<access::mode::write,
    access::target::global_buffer>(cgh);
    cgh.parallel_for<class device_kernel_1>(range<1>(12), [=](id<1> i) {
    initialAcc[i] = static_cast<int8_t>(i.get(0));
    });
    });
    q.wait_and_throw();
    auto reint = initial.reinterpret<int8_t>(range<1>(size));
    {
    buffer<bool, 1> successBuffer(success.data(),
    cl::sycl::range<1>(success.size()));
    // read data using the reinterpreted buffer and verify them
    q.submit([&](cl::sycl::handler& cgh) {
    auto reintAcc =
    reint.get_access<access::mode::read, access::target::global_buffer>(cgh);
    auto success_acc =
    successBuffer
    .get_access<access::mode::write, access::target::global_buffer>(cgh);
    cgh.parallel_for<class device_kernel_2>(range<1>(size), [=](id<1> i) {
    success_acc[i] = (reintAcc[i] == i.get(0));
    });
    });
    q.wait_and_throw();
    }
    for (size_t i = 0; i < size; ++i) {
    if(!success[i])
    std::cout << "Error, data change is not visible in the reinterpreted buffer"
    }
    }
    }

    For ComputeCPP CE 0.6.1 due to the aforementioned bug, when updating any buffer on the device and reading the original or reinterpreted buffer on host, the following copy-back workaround will be required. Please note that this bug only affects ComputeCPP CE 0.6.1 and the workaround is not valid SYCL code. Later versions of ComputeCPP will not need this workaround:

    1. Create a scope, outside the submit statement.
    2. Inside the scope, create a host accessor for the buffer updated inside the submit statement
    3. Close the scope.
    4. At this point the original buffer and all the reinterpret ones will be able to view the last updates.

    Workaround Example

    cl::sycl::buffer<uint8_t> originalBuffer {cl::sycl::range<1>(NUM)};
    auto buffer2 = originalBuffer.reinterpret<float>(cl::sycl::range<1>(NUM / 4));
    //do stuff in the OpenCL with buffer2
    { //if you used buffer2
    auto buffer2Acc = buffer2.get_access<cl::sycl::access::mode::read>();
    } // now originalBuffer is updated