I am writing this blog long after my trip to the Kona C++ standard meeting due to unusually high business commitments post-meeting and using it as an opportunity to also look ahead to the C++20 content to be reviewed in Toronto. I will publish my usual update to the C++17 content slidedeck similar to my Dec 2016 Issaquah trip report. This will contain the final score card for C++17, including all the features with links, and an evaluation scorecard of what made it in based on what Bjarne had earlier suggested in 2015 as possible content for C++17.
The March Kona C++ Standard meeting is the last meeting where we dispose of all NB remaining comments, after which we integrate all changes from the C++17 Draft International Standard (DIS). This DIS will be voted by electronic ballot after the meeting. 10-15 July will be the Toronto meeting, where we will not be allowed to discuss anything on C++17, instead the committee will focus on C++20 and more specifically any in progress Technical Specifications (TSes) that need more attention.
This blog will summarize the results of the Kona meeting as well as talk about what is upcoming in Toronto using my latest keynote at the Italian C++ Users group meeting and at Scuderia Ferrari in Maranello on the Programming Model for Self-driving cars and SYCL. Marco Arena will be posting a video of that keynote and in that talk I presented on the final status of C++17 as well as discussed the status of executors, a small SG1 group I have been working with to build up the future of heterogeneous computing in ISO C++.
At the Kona meeting, it was all about clearing the remaining National body comments, there would not be significant changes to C++17. Most will be fixes and are minor. A complete table of how each National Body comment was treated can be found at the end of this blog.
We had hoped that the National Ballot vote result will be ready for the Toronto meeting on 10-15 July which I will be hosting, but in a last stroke twist of fate, it will be delayed until after the meeting because there are a few weeks of translation effort before the ballot can begin. As the translation effort adds an unanticipated amount of time, and there is still a 3 month ballot, the ballot unfortunately won't make it for the Toronto meeting. Previously in C++11 and C++14, ballot results were ready by the July meeting but most of this is now procedural, and the committee will focus completely on C++20 in Toronto. It will take some time for the process to be completed for publication by ISO but there is still a reasonable expectation it will be completed by end of year to make it into C++17. There is some anticipation to see if procedural delays would make this become C++18 :) but I do not think this will happen.
It is useful to look at the status of various projects at this point and see what has changed from the Issaquah meeting and what the status will be of what will make it into C++20. Of the Technical Specifications (TSes) heading into C++20 the following are ready to go and there is already interest at the Toronto meeting to either merge various TSes into the C++20 draft, or advance them to their next stage in the TS.
Concepts was published in November 2015 (ISO Store) and a final draft exists as n4553 but there is only one implementation (GCC 6) and the subsequent changes from the various national bodies' feedback have slowly deviated from the original implementation, though not significantly. The clang implementation still remains far behind and in recent discussions there have been concerns about the lack of ability to future proof the design. For example there is a missing part that was in the original concept design known as separate checking as well as concern about the C++ standard library becoming conceptified. However, recent opinion is turning towards accepting this once all defects have been processed and separate concept checking is likely to be discarded as it is impeding the acceptance of a perfectly good feature. The reality for the supporters of separate concept checking is that without this part, Concepts will not provide 100% error reporting on templates. This is because it is still possible for errors in the template definition side to go undetected. Despite the lack of separate checking, I predict the current Concept will be incorporated into C++20 once all the defects are processed this year or next.
This is my group and we published in September 2015 (ISO Store) with a final draft n4514 published in May 2015. We continue to meet regularly and are currently discussing the effects of shared_ptr inside a transaction because of how often shared_ptr exists in real code.
The central question we have been exploring is how fast can atomics and smart ptrs be outside a transaction if they have to interact with a transaction. One possible solution to make transactional shared_ptr safe to use is to make non-transactional shared_ptr (code outside of transactions) be converted into a mini-transactional section (just containing the shared_ptr). This way transactional shared_ptr knows how to interact with them. And that is why it will slow them down, especially if they do not have any interaction with transactional code as originally written (say inside a library). The potential solutions involve slowing the rest of the non-transactional code down and that feels sub-optimal to our SG and unlikely to be accepted by the rest of the C++ committee. The reason for this slow down involve the following main discussion points.
a) Could we have a special kind of transactional shared pointer? The group said we didn’t want this if we could avoid it as it would require extra syntax.
b) Could we have mini-transactions for all shared pointer operations? This would definitely slow everything down.
A third option being explored is deferring the destruction but our group felt that the semantics are not well-defined.
The discussion continues within SG5 but if there is a chance of C++20 having transactional memory, then it will likely be just the basic synchronized construct which offers a simple implementation for lock replacement (though it may not scale at extreme number of threads). This synchronized construct differs from the full atomic construct in that it does not rollback.
Library Fundamental 2
This TS advanced to being published out of the Kona meeting in March of this year (ISO Store) with a draft n4617 available in November 2016 containing source code information capture and various utilities. No further work will be done on this and it is a prime candidate to be advanced into C++20.
Ranges also advanced to a Preliminary Draft Technical Specification (PDTS) in Kona with draft n4651 available in March this year. Resolution of comments on the PDTS is now in active progress and it is hoped that we can complete the ballot comment resolution in Toronto and advance this to publication. Once publication occurs then it can be a candidate for entry into C++20 but the problem is much of this will depend on Concepts advancing into the C++20 draft and without that happening, Ranges is unlikely to be able to enter C++20. However, at Codeplay, we are also working on an interesting project on the intersection of ranges with parallelSTL which we have implemented in using SYCL in cpus and gpus. Expect to hear more on that.
Networking has advanced to a PDTS in Kona with Draft n4656 (2017-03-17). It contains a sockets library that is based on Boost.ASIO. Some parts of this also contain an executor-like system that is being incorporated into the executor discussions and resolution of the comments on the PDTS is now in active progress. It is hoped that we can complete the ballot comment resolution in Toronto and advance this to publication, once that is completed then there is every reason to see this advance into the C++20 draft.
This is still in development with the latest Draft n4647 available in March this year. The first version is based mostly on Microsoft’s design which puts macros in a side file and the Google group has returned with comments on the design based on incorporating macros, and it is the intention of the committee to add that experience to the draft. If that works out then we will vote out a PDTS at the Toronto or subsequent meeting. This remains one of the most hotly requested features and every accommodation will be made to prioritize it.
I am now also the editor of this TS and it was published in January last year (ISO Store) with a final draft available p0159r0 in October 2015. It contains improvements to the future that makes it non-blocking, unlike the current std::async futures. There are also latches and barriers as well as atomic smart pointers included in this TS. It was published too close to C++17 and there was deemed to be not enough usage experience. Most of the implementation is based on MS Visual Studio and HPX and I think Anthony Williams' Just Threads! has the most complete implementation. However, two papers posted in the pre-Toronto meeting could potentially block this TS from moving ahead. The two papers in the pre-meeting mailing - P0676 and P0701 are proposing a new kind of futures and these papers could have the effect of modifying the TS before it is accepted into C++20 which means it could be a few meetings before that will happen.
A PDTS was published in Kona with draft n4663 that contains a resumable function based on the Microsoft await design. National Body comments will be returned at the Toronto meeting for review and there are two implementations now. Microsoft has always had one and Gor has recently been busy adding this to clang. This is also known as the stack-less design as it uses the heap to transfer functions. The original stack-based proposal from Linden Lab (who delivers Second Life ) remains in competition and while there has been hope for unification of both designs, it seems increasingly unlikely. More likely is that we will have both designs as they serve different domains and assuming both will progress independently. There is every reason to expect at least one or maybe both forms to be in C++20.
There is now an early development draft p0194r2 from October 2015 with the rationale in p0385r2 from February this year. There is an alternative design p0590r0 from February 2017. This TS is based on code introspection and now also added code reification mechanisms. The Introspection proposal passed core language design review and the next stop is the design review of the library components. The group are targeting a Reflection TS in one of the upcoming C++ standard meetings and it too will have every chance of making C++20. Reflection has also now added responsibility of driving metaprogramming which has become popular recently based on Boost::Hana and similar libraries. Some aspect of reflection is important for heterogeneous C++.
There is now a unified proposal which has passed core language design review and the next step is a design review of the library components and it could have its own TS, or target C++20 directly. This has pre and post conditions and establishes a baseline for safety critical support in C++. On the part of Safety Critical I have noticed increasing industry pressure for safe C++, especially in the automotive embedded domain but also in medicine and of course cyber security. One area I am facilitating is a safety critical SG which may join forces with SG12's Undefined Behavior possibly using the Core Guideline Libraries to define safety critical components for each domain (automotive, avaiation, medical, cybersecurity).
This is still in early development and contains hazard pointers and read-copy-update libraries. These are lock-free programming libraries I have been working on with Maged Michael and Paul McKenney. We recently completed the wording for this. Also included are atomic views as well as other forms of concurrent data structures and queues. It is under such active development that I can’t say whether it will make C++20 but I hope it does as many of the facilities in here are in demand.
This takes us beyond the original Parallelism TS which contains Parallel STL for cpus. At Codeplay, we have implemented it not just for CPUs but also GPUs. This current version 2 is in early development but there is a draft n4578 from February 2016 that contains task blocks and SIMD support in the form of data_par. It is under such active development that again I can’t say whether it will make C++20 but the features are in demand so I hope it does.
Transactional Memory 2
More recently because of the emerging design for Executors which aims to have a uniform interface for all concurrency constructs, SG5 has started discussing the design for a lambda interface for transactional memory. This will enable direct plug-ins for executors and can control transaction-safety as part of the how of execution while also enabling execution context to control where transactional memory would run.
In a recent talk at the Embedded Multi-Core conference that showcases the software system for Infineon Auric chipsets used in automotive Electronic control units, I heard a surprising requirement for transactional memory in the domain of embedded controllers for automotive because they require a scalable system that does not deadlock. This could provide the usage basis that enables it to be added to C++ 20.
This is in Early development with a draft p0101 from September 2015. It provides many needed numeric facilities including:
- Built-in types such as decimal floating point
- Bounded types such as fixed point types
- Unbounded types
- Rounding and overflow
- Utilities such as overflow-detection, and multiprecision arithmetic
Some of this will make it into C++20 but more specifically, SG14 (Low Latency) has championed bounded types such as fixed-point integers because they help games developers have well defined precision at every interval instead of decreasing precision as they move far away from zero.
This also is in early development with a draft p0267r0 from February 2015 supporting a 2D drawing API based on the stateful Cairo interface. SG14 has argued for a stateless interface and the latest version P0267 aims to add the stateless interface in Toronto. This discussion will occur during the week of the 10th July 2017.
Executors describe execution and is the first and possibly most important step towards supporting heterogeneous computing by acting as an interface between concurrency constructs and the agents/resources, one of which can be a GPU core, or a SIMD unit. I have led a small group of experts from Google, NVidia , Codeplay , and NASDAQ to define a specification to enable the separation of concerns in defining where, when, and how execution is done in service of the constructs that exist in the current and future C++ standard - Concurrency, Parallelism, Transactional Memory, and Networking TS.
While originally meant as part of Concurrency TS the current thinking is to create a separate Executor TS for C++20. Work is continuing but this is one of the key cornerstones and I feel confident that it will make it into C++20. The video from the Italian C++ Users conference will describe the current state of executors, though expect that to change quickly as we iterate over the design.
For a long time, C++ has been outpaced by other models such as CUDA, OpenMP , OpenCL and HSA that have offered heterogeneous computing capability to enable dispatch to GPU, DSP, FPGA and other accelerators.
SG14 is an ISO C++ SG that works on low-latency in the areas of games, financial and embedded programming. One of their mandates is the support of heterogeneous programming in native C++ without having to drop to some other models that are needed today.
Since a SG14 evening session in Jacksonville demonstrating SYCL and HPX there was interest in supporting a mandate to drive towards a future C++ that supports distributed and heterogeneous computing. At a later meeting papers were submitted providing feedback on a variety of concurrency and language features from the perspective of heterogeneous devices. Since then it was decided that the best path towards it was by completing the unified executors proposal since this will provide a way forward that enables defining where, when and how work is executed. All of the current C++ standard execution functions such as std::invoke, std::async, parallel algorithms, etc assume the usage of std::thread on only CPUs. While progress is being made in a unified proposal for executors, and while it enables distributed and heterogeneous computing, executors deliberately make no attempt to support distributed and heterogeneous computing because it is simply too wide a scope.
Since then, there have been several papers trying to address distributed and heterogeneous computing (though none have identified in a complete manner all the issues that are faced by this domain) that requires a solution in the C++ standard. More specifically, a paper on the managed pointer identifies a possible solution towards synchronizing data between different nodes. A paper on asynchronous algorithms adds support for asynchronous algorithms vs the current synchronous versions.
For the Toronto meeting we are proposing a meta level paper on one of the key issues (that of data movement) for heterogeneous and distributed computing and continue discussions on the asynchronous managed pointers. Although we do not mandate a solution in the Data Movement paper, we are aiming at supporting channels in C++.
- Data Movement in C++: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0687r0.pdf
- Asynchronous Managed Pointers for Heterogeneous and Distributed Computing: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0567r1.html
Both papers are in collaboration with HPX and are based on our implementation experience in SYCL and HPX.
Finally, I would like to announce that I have been nominated to be the Chair of the SYCL standard replacing Andrew Richards who has guided the development of SYCL within Khronos to firmly establish a modern C++ style heterogeneous computing language. I will take this experience and bring it as one of the candidates to support heterogeneous ISO C++.