Thursday, 10 October 2013

Herb Sutter's favorite C++ 10-liner has a memory management bug

In a recently-posted video, Herb Sutter (a prominent C++ expert) describes his favorite C++ 10-liner as “a thread-safe reference-counted object cache”:

shared_ptr<widget> get_widget(int id) {
  static map<int, weak_ptr<widget>> cache;
  static mutex m;

  lock_guard<mutex> hold(m);
  auto sp = cache[id].lock();
  if (!sp) cache[id] = sp = load_widget(id);
  return sp;
}

This example is very interesting. Firstly, it manages to pull in reference counting, weak references and a mutex which are all very rare in modern programming. Secondly, it contains a memory leak that is difficult to fix in C++ because APIs are burdened with memory management details and this API is incapable of expressing deterministic cleanup because there is no facility for a widget's destructor to remove its entry in the map. Finally, the correct name for this data structure is a concurrent weak dictionary, specifically one with weak values. You'll find correct implementations of this data structure are widely available for C#, F# and Java such as the one here.

The obvious fix is to sweep stale entries from the map when get_widget is called but this leaves floating garbage in the map between calls to get_widget, is asymptotically less efficient and incurs unbounded pauses for an unbounded number of threads.

Update: Matthew Avery (from the USA) suggests altering the API and semantics of the functions involved so load_widget returns a shared_ptr with a custom deleter that removes the stale map entry as soon as a widget is destructed. If this idea can be made to work then it would be the only deterministic solution to have been proposed to date.

9 comments:

Trillian said...

You haven't actually said what the leak is. I presume it is that the map can grow to be arbitrarily big even though the widgets stop being referenced, but I don't see how a GC would solve this particular problem. .NET's WeakReference doesn't provide any more hooks than weak_ptr to let the cache know that its target has been collected. What am I missing?

Jon Harrop said...

@Trillian: You're spot on (a commenter described the leak). A direct port to a garbage collected language would also leak in exactly the same way, yes. But you wouldn't do a direct port: you'd use a weak hash table and it would use the GC to clear itself out incrementally.

Sandro Magi said...

Not exactly true Trillian, because on the CLR you can create your own WeakReference from a GCHandle struct, and then eliminate the dictionary entry in the finalizer. It's longer than this 10 lines of code of course, but eminently doable.

Trillian said...

@Sandro Magi: Not sure how GCHandle or finalizers are helpful here. To run code when a value gets collected, you would have to use TValue's finalizer, and that's intrusive. In C++ though, you could have a custom smart pointer which removes the cache entry when the strong reference count drops to zero (not that it's a great idea).

@Jon Harrop: Weak hash tables sound neat. Can they be implemented using the primitives provided by the CLR? Don't you need to be able to run arbitrary code (cache entry removal) when an arbitrary object (TValue) gets GC'ed? Either that or scan through the hash table to remove "dead" entries once in a while, but then you can do that in C++.

Just being curious here, weak references are not a topic I'm super familiar with.

Sandro Magi said...

Sorry, to clarify I meant something like:

Dictionary<WeakReference<Slot<TValue>>>

Where Slot implements the finalizer logic to clean up reclaimed slots.

Herb Sutter said...

Hi Jon,

Right, I didn't bother with cleaning up unused entries, which didn't matter in the original app (which reused a small set of popular id's). Still, "leak" is an overstatement.

Refcounting and weak refs and mutexes may be rare in your world but they're common and quite practical (and deterministic, unlike relying on GC for more than raw memory cleanup).

The statement that the API is inherently unable to do cleanup is not true -- there are many solutions that don't disturb the API, one of which is to periodically remove detached weak_ptrs, such as on every Nth call to get_widget while you're holding the mutex anyway. And it's still 10 lines of code, since the original was actually only seven. :) People emailed me referring this post and showing similar short and clear solutions that remove the unused weak_ptrs immediately and deterministically when the last strong ref goes away (no cleanup sweep needed) and still fit into ~10 lines -- exercise for the reader, takes 3-4 extra lines that are similar to Sandro's approach of using the finalizer (but uses the shared_ptr deleter which is better because it's deterministic).

Cheers,

Herb

Jon Harrop said...

@Herb: I agree that "leak" is an overstatement in the context of the original problem (widgets) but describing this code as "a thread-safe reference-counted object cache" is a triumph of hope over reality when it leaks memory for every single object it has ever seen.

"Refcounting and weak refs and mutexes may be rare in your world but they're common and quite practical (and deterministic, unlike relying on GC for more than raw memory cleanup)". Refcounting and mutexes are, in general, non-deterministic (because decrements and acquires race).

"...periodically remove detached weak_ptrs, such as on every Nth call to get_widget while you're holding the mutex anyway". That assumes get_widget is called periodically. If get_widget really is called periodically (which is unlikely), that solution incurs unbounded pauses for an unbounded number of threads.

Matthew Avery has proposed a fix where load_widget returns a shared_ptr with a custom deleter that removes the stale map entry. If that can be made to work then it would be the only potentially-deterministic solution to have been proposed. And, of course, it requires API changes.

Helmut Zeisel said...

What about just cleaning the cash if it is too big?

if (!sp)
{
const static int max_sz = 1000;
if (cache.size() > max_sz)
{
cache.clear();
}
cache[id] = sp = load_widget(id);
}

Of course after clearing the cache creating the objects take longer, but in typical use cases of a cache, this is no problem.

Helmut

Helmut Zeisel said...
This comment has been removed by the author.