Sunday, 15 January 2012

The "C++ renaissance"

According to Herb Sutter, C++ (and C and Fortran) are unmatched for performance per dollar and, therefore, he believes C++ will once again top the programming language charts as mobile devices make power consumption (which he equates with performance) a high priority again:


Herb said this in his recent lecture Why C++? and other Microsoft employees from the Visual C++ team have called this prophecy the "C++ Renaissance".

Similar statements about the superior performance of C++ were commonplace over 15 years ago when Java was new and unproven but we now know they were almost entirely wrong. The world migrated from native to managed languages over 10 years ago and never looked back because their performance is more than adequate. Furthermore, the implementations of managed languages have improved substantially since then and even toy benchmarks now show them competing with or even beating native code. Furthermore, the difficulty of optimizing large code bases means that many real-world applications are substantially faster when written using modern tools. For example, we recently helped a client translate 10,000 lines of numerical C++ code into F# and were easily able to make it 10× faster than the original C++. This is typical because C++ is much more difficult to optimize, particularly in the context of multicore parallelism, and developers in the real-world rarely have the time to do it if they are still using C++.

So why would anyone be promoting C++ now when it has so many disadvantages. Several reasons, we believe. Firstly, C++0x was finally released as C++11 last year and Microsoft's Visual C++ team hope to ride a wave of hype following this. Secondly, Microsoft want to draw people back into C++ as a route to vendor lock-in. We have seen several big code bases written in "C++" using Microsoft's tools and none of them were remotely portable due to the extensive use of proprietary features. Ironically, Herb tries to play to portability in his preachings and even goes so far as to assert that C++ has the advantage of offering a single string type that is compatible with the operating systems. In addition, C++ is particularly bad for metaprogramming and, consequently, it is unusually difficult to translate code mechanically from C++ to other languages. Thirdly, what else do Microsoft's Visual C++ team have left to hype?

Fortunately, this recent hype from Microsoft seems to have been largely ignored...

Here is the current trend in the UK job market for C++ developers:


Is it about to change?

EDIT

Herb Sutter responded to some of my tweets about this. His responses are remarkable.

Firstly, he states that "no way should you need barriers to get sequential consistency". This is an amazing statement because all mainstream languages (Java, .NET, C++) and all dominant CPU architectures (ARM, Intel, AMD, PowerPC) require barriers to achieve sequential consistency. This property is ubiquitous because it is required for latency hiding to work effectively. The paper "Realization And Performance Comparison Of Sequential And Weak Memory Consistency Models In Network-on-Chip Based Multi-Core Systems" found that performance is around 40% worse if sequential consistency is imposed. ARM's Principal Software Engineer even went so far as to call sequentially consistent memory models a "nostalgic fantasy".

In reality, ARM and PowerPC have weak memory models that allow many memory operations to be reordered and x86 has a stronger memory model that prohibits most reordering but still allows reads to be moved before independent writes and, consequently, is also not sequentially consistent. Note that Herb's dream of sequential consistency is at odds with the desire for more latency hiding that he expressed in his latest article.

Then he predicts that CPU vendors will adopt stronger memory models within 2 years. Perhaps Herb has inside knowledge of Microsoft leaning on ARM to adopt the x86 memory model to ease Microsoft's Windows 8 port. Only time will tell if Herb's prediction is accurate but two interesting observations can be made. Firstly, Apple have sold 10,000,000s of iPad 2's that are running the 500,000 apps on their AppStore without anyone having complained about bugs caused by the weak memory model of the multicore ARMs inside them. Secondly, Intel weakened the x86 memory model when they added SSE and, consequently, also added the LFENCE, SFENCE and MFENCE instructions. So Herb is predicting that Intel will do a U-turn and ARM will throw away one of their largest performance advantages.

Finally, Herb asserts that "x86 is the canonical example of a strong hardware memory model". Although many people can be seen asserting that the x86 has a strong memory model, many experts describe it as a weak memory model because it does reorder and does not provide sequential consistency. For example, computer science researchers in this field from the University of Cambridge describe x86 as having a weak memory model in their paper "x86-TSO: A Rigorous and Usable Programmer’s Model for x86 Multiprocessors". And computer science researchers from the University of Oxford describe it as a weak memory model in their paper "Soundness of Data Flow Analyses for Weak Memory Models".

27 comments:

Mike said...

+1 for You post. But what's the alternative in Your view? Isn't F# a MS lock-in as well. You even have to pay for Visual Studio 2010, no free Express Edition avail. BR

David said...

@Mike, well contrarily to C#, the F# compiler is open source, which means the mono team (or anyone) didn't have to rewrite it for you to be able to use it on linux.

For express, you can install the Visual Studio Integrated Shell + F# and you got a free environment. I'd actually rather use that than my own VS Pro licence.

Dean Wampler said...

I recently wrote a high-performance network traffic server in C and Ruby; C for the performance and Ruby for everything else. We considered using C++, but rejected the idea. IMHO, the modularity features of C++ don't justify the complexity of using the language. You're far better off writing performance critical code in C and glueing the app together with a scripting language that doesn't suck. If I had to pick one language and it had to compile natively, I would pick D or maybe Go before using C++ again.

Flying Frog Consultancy Ltd. said...

@Mike: The utopian programming language they describe does not exist so there is no alternative but there is a whole zoo of languages out there and at least one will come closest to any given set of requirements. F# is a fantastic tool for a wide variety of problems but, as you say, it is not portable. You don't have to pay for Visual Studio 2010 to use it though, as David describes.

@David: F# being open source is a good thing but, in practice, I don't see anyone developing a production-quality implementation of the CLR so that benefit is academic for now.

@Dean Wampler: Absolutely. I'm expecting someone to develop something awesome using LLVM.

kripken said...

I agree with everything but this:

> Furthermore, the implementations of managed languages have improved substantially since then and even toy benchmarks now show them competing with or even beating native code.

Sure, on toy benchmarks. But the word "even" seems misplaced there - this isn't the case on any significant number of real-world projects I've seen.

As you say, managed languages are fast enough, and have gotten closer to C and C++. I'm open to being shown data otherwise, but AFAIK they aren't matching or exceeding them.

Flying Frog Consultancy Ltd. said...

@kripken: The Successive Over-Relaxation benchmark from the SciMark2 suite is one place people have reported Java beating C/C++ (and I have verified it myself). The reason is that pointers in C and C++ make alias analysis practically impossible whereas managed languages make it trivial, so C and C++ compilers are forced to emit code containing many more memory accesses than are actually necessary. This is one of the big advantages of Fortran in the context of numerical loops.

In theory, the restrict keyword addresses the problem in C and C++ but I was not able to get competitive performance from them on that benchmark even using restrict.

There are several other places where managed languages can and do beat C and C++. Tail call elimination can provide substantial performance improvements between mutually recursive functions for which there is no equally-capable alternative in C or C++. Metaprogramming can provide huge performance improvements in, for example, regular expression engines thanks to run-time code generation and compilation which, again, vanilla C and C++ cannot do. I have tested vector code using my own HLVM project and got LLVM generating code 4× faster than C or C++ compiled with GCC. Again, there is no standard way to write SSE code in C or C++...

So there are plenty of places where managed languages can beat C and C++.

kripken said...

There are some places, sure. For example, as you mention, Fortran can outperform C and C++ because it has superior alias semantics. Some managed languages have that too, but it isn't a managed language feature (Fortran isn't managed). It's a problem with C and C++, not a benefit of managed languages.

There is no standard way to do SSE etc. in C and C++, true as well, but there are ways. This is an advantage of modern languages making it easier to do development I would say, and that includes managed languages, but it isn't a feature of their being managed - you can add intrinsic SSE-like support to a new non-managed language.

I'm not criticizing managed languages, I love them and use them. But I don't think they outperform C and C++ in all my experience, and when they do, it isn't because of their being a managed language.

Flying Frog Consultancy Ltd. said...

@kripken: "...but it isn't a managed language feature". The point is that real managed languages do outperform C++ because pointer arithmetic is an impediment.

"it isn't a feature of their being managed - you can add intrinsic SSE-like support to a new non-managed language". They could have added vector intrinsics to C++ but the point is that they did not.

"I don't think they outperform C and C++ in all my experience". I just gave you several examples where you can see managed languages outperforming both C and C++ for yourself. Whether or not your experience covers it, the fact is that counter examples to their statements were already widely known.

kripken said...

Fair enough, my point though was that exceeding C/C++ in a few corner cases is different than matching C/C++ in performance in general.

I don't disagree that they outperform C/C++ in those very specific benchmarks - in fact, even JavaScript can outperform C/C++ in some (very contrived) benchmarks.

Flying Frog Consultancy Ltd. said...

@kripken: "my point though was that exceeding C/C++ in a few corner cases is different than matching C/C++ in performance in general". Agreed but experience tells me that larger C++ projects are even less competitive.

I have worked on two separate million-line C++ projects over the past 3 years and obtained 100× speedup over C++ on the first and 10× on the second so far (and it is early days yet).

The first OCaml project I worked on, 8 years ago now, was 5× faster than the original C++. That was about 50kLOC.

I can believe that companies who employ elite developers and invest huge effort in optimization can attain excellent performance using C++ but I believe that is completely unrepresentative of the industry as a whole. The vast majority of C++ developers are (or were) non-experts with insufficient time to optimize significantly. In such cases, managed code makes it easier to see and use appropriate data structures and algorithms and, consequently, yields the most efficient outcome for them.

For example, our current client has 50,000 employees and run on tens of millions of lines of C++ code but, of the employees in the team I am working in, we have two Mathematica experts, one C# expert and no C++ experts.

I have consulted for many large companies over the years and I see this pattern repeated over and over. The single most important goal is simplicity and, therefore, C++ is a big step in the wrong direction.

kripken said...

I agree that for big projects with a typical dev team, C++ is a bad idea. It's too complicated, and too risky in terms of possible bugs.

For performance though, I don't agree based on my experience. I've seen very large C++ codebases be very fast. It can be easier to optimize simpler languages, but I've not seen a clear case where that meant serious optimizations (10X) were just not done in C++.

In general I recommend simpler languages, C++ has almost no place in new projects - except where speed is of the utmost importance, in which case it is the best option (which is why AAA game engines, web browsers, etc. are all in C++). But 99% of projects don't need that amount of speed, and C++'s complexity is just a burden.

I would love however to see a counterexample.

Flying Frog Consultancy Ltd. said...

@kripken: "AAA game engines, web browsers". Games ok (although I'd also look at assembler) but web browsers would be much better written in managed code.

"I would love however to see a counterexample". The two big C++ projects I mentioned were a trading system and an insurance quotes system. The medium-size C++ project was a visualization library.

saynte said...

You left out (btw, how do you misquote a twitter post?) Herb Sutter's qualification that you should not require barriers on a load or read for SC.

The only thing that x86 can do is reorder the stores past loads. This is why it is stronger than ARM, as ARM can reorder independent memory operations in any way it wants. This means x86 only requires barriers on stores to prevent this reordering and attain sequential consistency; this is in line with what Herb said, and exactly what Intel and AMD do.

So, what is so remarkable that comment of Herb?

Flying Frog Consultancy Ltd. said...

@Saynte: "Herb Sutter's qualification that you should not require barriers on a load or read for SC". That is nonsensical. There is no such thing as a barrier "on" a memory operation. Barriers go between memory operations, not "on" them.

"The only thing that x86 can do is reorder the stores past loads... This means x86 only requires barriers on stores to prevent this reordering and attain sequential consistency; this is in line with what Herb said, and exactly what Intel and AMD do.". That is incorrect. Ancient x86 required barriers between writes and reads to prevent them from being reordered in the way you describe. Since 1999, x86 performs more reordering including streaming instructions and accesses to write-combined memory and, consequently, has the LFENCE and MFENCE instructions mentioned in this blog post.

"So, what is so remarkable that comment of Herb?". His comments are remarkable because they are self-contradictory. His argument was originally all about performance but now he wants to sacrifice ARM's efficient memory model and restrict consideration to inefficient non-vectorized user code.

saynte said...

You're correct, barriers do go between, but did you really not understand Herb's statement enough that you edited it out of your quote? I feel it's not a huge error.

Maybe it's up for debate, but the fact that new instructions were included that have weaker guarantees doesn't mean that the memory model was weakened. How could it? The pre-SSE2 couldn't possibly describe the behaviour of instructions that didn't exist.

Flying Frog Consultancy Ltd. said...

@Saynte: "the fact that new instructions were included that have weaker guarantees doesn't mean that the memory model was weakened". On the contrary, that is exactly what it means because the modern x86 instruction set (as a whole) provides far fewer guarantees than it used to.

saynte said...

The SSE2/x86 instruction provides all the same guarantees it used to, plus new guarantees for new instructions (with lfence/mfence to help out).

This is what I mean by debatable phrasing, you say it's weaker on the whole, but I don't consider that weaker, for my above reason.

Perhaps Herb doesn't either, but he definitely was not saying that Intel is SC without barriers, and his original phrasing that makes that clear. Are you going to restore his quote properly by the way?

Jules said...

Obviously Intel won't sacrifice backwards compatibility. That's 90% of the reason people buy their stuff. It was perfectly clear that he didn't mean that the memory model of the old subset of the instruction set was weakened.

Location: said...

I do not quite share your optimism regarding managed languages.
What do you set off against stack allocation? Yes, you can nicely split job into subtasks with ITask derived library/compiler implementations. But somewhere deeper, you have to do a number cranching and having swarm of heap objects flying around is not efficient.
Also take c++ templates plus optimizers - whole clouds of performance critical code may be collapsed even across method calls. Jitters just have no time to do it.

I think that 70% of code can be written in some gluing language like F# or Go - but not numerical libraries.

PS If you are happy VS10 user, just try to open an average project in VS05 or VS08 and sense the difference. Perhaps you just get used to slowness of VS2010. And guess what was introduced in VS10 to cause such the degradation?

Flying Frog Consultancy Ltd. said...

@Location: "What do you set off against stack allocation?". Managed languages can do stack allocation. You have some restrictions on interior pointers but, other than that, they are remarkably expressive.

"Also take c++ templates plus optimizers - whole clouds of performance critical code may be collapsed even across method calls. Jitters just have no time to do it". That is also possible in managed languages. In fact, this is the reason for inline in F#.

"I think that 70% of code can be written in some gluing language like F# or Go - but not numerical libraries". I see no reason to avoid managed languages for numerical libraries. You need the ability to remove most bounds checks and it must be possible to program without the run-time impeding throughput or latency significantly but that is already possible.

"And guess what was introduced in VS10 to cause such the degradation?". The is no evidence that managed languages caused that problem.

Tezka said...

I am interested to see some code examples of F# vs. C++, where the former beats the latter.

Tezka said...

btw,
from the perspective of a developer, investing time on learning C++ is probably more rewarding than Java or C#. I think part of the reason for declining number of jobs is the shortage of good C++ programmers. However, there are certain domains in which C++ remains the lingua franca, and the continually shrinking pool of competent candidates creates very attractive opportunities for those who have a decent command of C++ for many years to come. Just because a huge fleet of web companies and enterprise junk are using java and C# does not mean that companies with core competency in high performance applications are not doing well.

Serv said...

I just love your blog. What do you think about the language Nemerle? (http://nemerle.org/)

Flying Frog Consultancy Ltd. said...

@Tezka: See F# vs Unmanaged C++ for parallel numerics for example.

@Serv: Nemerle was another interesting language because it pushed the envelope in new directions. There is so much to learn from such projects.

Chris Farley said...

Great article! However I ran into some problems reading it from my Java web browser, because it turns out my browser doesn't exist. Instead I was able to read it in a browser written in C++, which was called "every browser in use today."

However I am eager to reread it with a managed language based browser, because as I learned from your post, rewriting C++ into managed languages typically results in a 10x speedup. So I am sure it will be much faster!

That's good, because the labels on your graph are taking a long time to load for me.

dT/dZ said...

You should never use C++ alone nowadays!

C++ is *very* good for the processing-intensive part of your applications (bottlenecks, numerics, raw processing). C++ (and C) also has the higher number of libraries available, and you can develop code with lots of abstractions and layers *without* performance penalty.

For all other parts, you should use Python, the best practical language nowadays, and it binds very well to C++, thanks to Boost.Python.

Jon Harrop said...

@Chris Farley: The existence of legacy code written in old languages is a strawman argument. The only major development in web browsers over the past decade was to use process isolation as a poor man's alternative to a managed VM.

dT/dZ: C++ and Python both have awful support for parallel and concurrent programming.