Wednesday, 27 October 2010

"The F# Asynchronous Programming Model" by Don Syme et al.

The creator of the F# programming language at Microsoft Research in Cambridge, Don Syme, recently had a paper accepted for the Practical Aspects of Declarative Languages conference. This paper provides a great introduction to asynchronous workflows and the MailboxProcessor in F#.

In fact, the use of monads to sequence asynchronous operations has a long history. For example, this approach has been used in the OCaml programming language, from which F# is descended, for at least 8 years. Specifically, Jérôme Vouillon's LWT library for OCaml made asynchronous programming easy. For example, the first F# sample given in this paper:

async { let! html = getWebPage "http://www.google.com"
return html.Length }

Could have been written in OCaml as follows in 2002:

getWebPage "http://www.google.com" >>=
String.length

In 2005, Jacques Carette et al.'s pa_monad syntax extension even added the same kind of syntactic sugar that F# provides for its asynchronous workflows, allowing the sample to be written in OCaml as:

perform
html <-- getWebPage "http://www.google.com"
return String.length html

For more information on asynchronous programming in F#, read Visual F# 2010 for Technical Computing.

Wednesday, 20 October 2010

What is the difference between parallel and concurrent programming?

Concurrent programming regards operations that appear to overlap and is primarily concerned with the complexity that arises due to non-deterministic control flow. The quantitative costs associated with concurrent programs are typically both throughput and latency. Concurrent programs are often IO bound but not always, e.g. concurrent garbage collectors are entirely on-CPU. The pedagogical example of a concurrent program is a web crawler. This program initiates requests for web pages and accepts the responses concurrently as the results of the downloads become available, accumulating a set of pages that have already been visited. Control flow is non-deterministic because the responses are not necessarily received in the same order each time the program is run. This characteristic can make it very hard to debug concurrent programs. Some applications are fundamentally concurrent, e.g. web servers must handle client connections concurrently. Erlang is a language designed specifically for distributed concurrent programming with fault tolerance but many other languages provide features for concurrent programming, such as asynchronous workflows in the F# programming language.

Parallel programming concerns operations that are overlapped for the specific goal of improving throughput. The difficulties of concurrent programming are evaded by making control flow deterministic. Typically, programs spawn sets of child tasks that run in parallel and the parent task only continues once every subtask has finished. This makes parallel programs much easier to debug. The hard part of parallel programming is performance optimization with respect to issues such as granularity and communication. The latter is still an issue in the context of multicores because there is a considerable cost associated with transferring data from one cache to another. Dense matrix-matrix multiply is a pedagogical example of parallel programming and it can be solved efficiently by using Strassen's divide-and-conquer algorithm and attacking the sub-problems in parallel. Cilk is pioneered the most promising techniques for high-performance parallel programming on shared-memory computers (including multicores) and its technology is now offered by Intel in their Threaded Building Blocks (TBB) and Microsoft in .NET 4. So this is also easily accessible from the F# programming language.


Don Syme on "Functional approaches to parallelism and concurrency"

Don Syme, creator of the F# programming language, recently gave a superb lecture on parallel and concurrent programming using F# at QCon 2010. Video and slides hosted by InfoQ here.

Parallel programming continues to be a hot topic in the face of multicore computing but, as Don points out, the world is also moving steadily towards more concurrent programming.

Work continues on our forthcoming "Multicore .NET" book that studies parallel programming using C# and F# in detail...


Tuesday, 19 October 2010

Can you repro this 64-bit .NET GC bug?

Update: Maoni Stephens of Microsoft, the author of this new "Background" garbage collector in .NET 4, has now reproduced the bug and acknowledged that we really are seeing GC pauses lasting several minutes.

Whilst developing low latency software on .NET, we have discovered a serious bug in the .NET 4 concurrent workstation garbage collector that can cause applications to hang for up to several minutes at a time.

On three of our machines the following simple C# program causes the GC to leak memory until none remains and a single mammoth GC cycle kicks in, stalling the program for several minutes (!) while 11Gb of heap is recycled:

static void Main(string[] args) {
  var q = new System.Collections.Generic.Queue(); 
  while (true) { 
    q.Enqueue(0); 
    if (q.Count > 1000000) 
      q.Dequeue(); 
  } 
}

You need to compile for x64 on a 64-bit Windows OS with .NET 4 and run with the default (concurrent workstation) GC using the default (interactive) latency setting.

Here's what the Task Manager looks like when running this program on this machine:

Note that 11Gb of heap have been leaked here when this program requires no more than 100Mb of memory.

We have now accumulated around a dozen repros of this bug, written in F# as well as C#, and it appears to be related to a bug in the GC write barrier when most of gen0 survives. However, Microsoft have not yet been able to reproduce it. Can you?

Saturday, 16 October 2010

ARM-based iPads choke Intel-based netbook sales

The recent news that Apple are selling around 18 million of their ARM-based iPads per year reminded us of our article Will Intel lose the computer market to ARM in 2012? from January. Following their success, there are now a growing number of competitors itching to release ARM-based tablet PCs of their own, like Marvell's $99 Moby tablet.

Compare just those iPad sales to the 35 million netbooks of all brands sold in 2009 and the predicted 36 million netbooks to be sold in 2010 and it looks as though Intel may at least lose the mobile market to ARM in 2012.


Sunday, 10 October 2010

Towards concurrent garbage collection for GHC

Simon Marlow of Microsoft Research recently published a blog post entitled First results from GHC's new garbage collector. As his beautiful graphs show so clearly, this is a first step towards concurrent garbage collection. The blog post describes this advancement entirely from the perspective of throughput because the ability to collect per-thread nursery generations independently removes some of the blocking that was wasting mutator time in the previous version.

However, we believe that concurrent programming may become a killer application domain for Haskell and, in that context, latency can be critical. If GHC's garbage collector is made more concurrent, by allowing the old generation to be collected independently as well, then pause times could be greatly reduced and Haskell would have a considerable advantage over competing technologies like .NET.

We have found that even the best-behaved .NET programs that allocate still suffer GC pauses of around 20ms, over an order of magnitude longer than the 600µs pause times indicated on the graphs for this new version of GHC. Real .NET applications that were not designed from the ground up to attain low latency suffer stalls lasting several seconds!