del.icio.us Digg DZone Reddit StumbleUpon
Concurrent Programming for Practicing Software Engineers - Willie Wheeler
« Previous | 1 | 2 | 3 | 4 | 5 | 6 | Next »

How parallelism gives rise to race conditions

Even among parallelizable problems, some are harder than others. The easy ones are the ones that don't involve threads sharing state with one another. For example, if you have five independent work queues, and the goal is simply to process all the tasks in all the queues, then there's nothing especially tough about this. You start five threads, run them against the five queues, and then move on once they're done.

The trickier problems are the ones that involve threads sharing state with one another. Let's see why.

Threads and shared state

Shared state presents a challenge in concurrent programming because threads can interfere with one another if their access to that shared state isn't properly coordinated. If you've ever tried to collaborate with other authors on a single document, you might have some experience with this problem. (Though Google Docs does a nice job of supporting multiple simultaneous authors.) You make some changes to the document, hoping that your coauthor didn't make his own changes to the part that you worked on. If he did, then there may be a mess to clean up.

The essence of the problem is that at least one guy is updating the document while somebody else is either reading it or writing it. We talked about the case where you have two authors writing to the document, but it can be one guy writing and someone else reading. If one person is in the middle of updating emergency information for hurricane evacuees and another person reads a partial update, the result could be dangerous.

Note that there's no problem if you have a bunch of people simply reading a document at the same time. The problem only comes up if at least one of them is writing to it.

The problem we've just described is called a race condition. The idea is that if threads aren't properly coordinated, then their concurrent work is vulnerable to problems associated with "unlucky timing". There can be situations in which one thread is trying to update shared state and one or more other threads are trying to access it (either read or write). In effect the threads "race" against one another and the outcome of their work is dependent on the nondeterministic outcome of their race.

Let's look at a code example.

An example of code that allows race conditions

Originally I was going to present the standard (and probably a bit tired) web page hit counter example. But then today I saw somebody's purportedly threadsafe class that is in fact susceptible to race conditions. The code was written by an experienced engineer. So let's use that example instead, shown in Listing 1 below, as it's instructive.

Listing 1. A class that's supposed to be threadsafe, but isn't
public class InMemoryUserDao implements UserDao {
	private Map<String, User> users;

	public InMemoryUserDao() {
		users = Collections.synchronizedMap(new HashMap<String, User>());
	}

	public boolean userExists(String username) {
		return users.containsKey(username);
	}

	public void createUser(User user) {
		String username = user.getUsername();
		if (userExists(username)) {
			throw new DuplicateUserException();
		}
		users.put(username, user);
	}

	public User findUser(String username) {
		User user = users.get(username);
		if (user == null) {
			throw new NoSuchUserException();
		}
		return user;
	}

	public void updateUser(User user) {
		String username = user.getUsername();
		if (!userExists(username)) {
			throw new NoSuchUserException();
		}
		users.put(username, user);
	}

	public void removeUser(String username) {
		if (!userExists(username)) {
			throw new NoSuchUserException();
		}
		users.remove(username);
	}
}

What do you think? See anything? The engineer claims that the class is threadsafe because we've wrapped our hashmap with a synchronized instance.

Threadsafety problems with the code

It turns out that that's not correct. It's a common misconception that as long as you're dealing with a threadsafe collection (like a synchronized map), you've taken care of any threadsafety issues you might have otherwise had. We'll use this example to show why this belief is just plain wrong.

Before I expose the race conditions, let me state some plausible assumptions about the intended semantics for the class:

  • Calling createUser() with an user whose username already exists in the map should generate a DuplicateUserException.
  • Calling updateUser() with an user whose username does not already exist should generate a NoSuchUserException.
  • Calling removeUser() with an user whose username does not already exist should generate a NoSuchUserException.

Now let's see how the implementation fails to support those semantics, and in particular how the implementation permits race conditions.

createUser() is broken. Here's a race that shows that createUser() does not have the intended semantics:

  1. Thread T1 enters createUser(), with argument user U1 having username N. N is not already in the map.
  2. Thread T1 successfully makes it past the if test, but not yet to the line that puts N in the map.
  3. Context switch to thread T2, which also enters createUser(), this time with argument user U2 having the same username N that U1 has. (You can imagine, for example, two users registering with the same username at the same time.) N is still not in the map, because thread T1 hasn't put it there yet.
  4. Thread T2 completes its call to createUser() and exits. User U2 and username N are now in the map.
  5. Thread T1 completes the method and executes. User U1 has been placed into the map under the key N.

Poof! User U2 simply disappears from the map, having been overwritten by user U1. T1 never encounters the DuplicateUserException it's supposed to encounter.

updateUser() is broken. This one is a little more subtle than the createUser() case. Here's the race:

  1. Thread T1 enters updateUser() with user U having name N. The user already exists in the map, so T1 passes the if test successfully. It hasn't reached the put() part yet.
  2. Context switch to thread T2, which enters and exits removeUser() with username N (the same one that T1 was using). The user U is removed from the map.
  3. Thread T1 completes, placing U back in the map under key N. User U has essentially been reinstated.

The problem here is that the removeUser() method completed successfully, there was no createUser() call, and yet there's still a user sitting in the map with no error having been thrown. Given the semantics described above, that should not be possible. The only two possible outcomes ought to be: (1) the update occurs before the remove, which would mean that when the two threads complete, U is no longer in the map; or (2) the remove occurs before the update, which would mean that U should be gone, and the update should have generated an exception. In either case, U should not be in the map when the two threads complete, and yet there it is. So if T2 belongs to an admin who thought he was removing a troublesome user, guess again.

removeUser() is broken. We've already seen in the race above that there are situations under which removeUser() would be expected to remove a user but does not actually do that. Here's another race for you, probably less significant, but still a bug:

  1. Thread T1 enters removeUser() with existing username N, and makes it past the if test.
  2. Context switch to thread T2, which enters and exits removeUser() with the same username N.
  3. Thread T1 completes and exits the method.

In this case we have two methods that called the removeUser() method with the same username, but neither thread encountered a NoSuchUserException. That is not compatible with the semantics of the class.

Now that we've seen some problems, let's try to understand what's happening here.

What the three broken methods have in common

One thing that all three cases have in common is that they follow a "check-then-act" pattern, where both the "check" and "act" parts reference shared data (in this case, the hashmap). The createUser() method checks whether the username already exists, and if not, adds the user to the map. The updateUser() method checks whether the username already exists, and if so, updates the user. removeUser() checks whether the username exists, and if so, deletes the user.

Note that the findUser() method does not follow the same pattern. It does a single get() against the map, and then does a check that checks the returned user rather than checking something against the map. Though it looks like almost the same thing, it is really a completely different situation because we're performing only one action against the shared data instead of multiple actions.

It turns out that the difference between createUser(), updateUser() and removeUser(), on the one hand, and the findUser() method, on the other, is important from a threadsafety point of view. Essentially what's happening is that the findUser() method is atomic (with a certain qualification, which we'll discuss in the next section), meaning that it executes as a single action. The createUser(), updateUser() and removeUser() methods, on the other hand, are not atomic. The various check and act operations can be interleaved in arbitrary ways across different threads. Intuitively what that means is that checks, which are supposed to provide safety, can become invalid because other actions can be scheduled in between any given check-act pair. The safety checks are invalidated and the class behaves in strange ways (especially as load increases, which is usually the most inopportune time for such issues to arise).

Now let's look at how we can solve race conditions such as the ones we've been considering.

Social bookmarks: del.icio.us Digg DZone Reddit StumbleUpon
« Previous | 1 | 2 | 3 | 4 | 5 | 6 | Next »

Comments (6)

Hi,

I like your article because you described the classic problems and solutions well. But there are some minor issues with the code:

1) the userMap in the InMemoryUserDao is not published safely and therefor relies on an external happens before rule to prevent reordering and visibility problems. This could lead to a NullPointerException when accessing the users reference. In most cases with constructors this is not an issue, but imho still a bad practice. It can be fixed by making users final. For more information see http://pveentjer.wordpress.com/2008/04/02/jmm-constructors-dont-have-visiblity-problems/

2) I understand that you are working towards a solution with the InMemoryUserDao and step by step solve the problem. But I would add the advice to less experienced readers that they should look at really concurrent structures like the ConcurrentMap. It has atomic operations for the check/act problems (like putIfAbsent) but implementations can also perform better because they use lock striping for example instead of a big lock.

3) Instead of using AND the internal synchronization of the SynchronizedMap AND the synchronization in the InMemoryUserDao, I would drop the SynchronizedMap completely and replace it by a standard Hashmap (make sure that all methods are going through the lock) It makes reasoning about locking easier. So if you can't use the ConcurrentMap (e.g. a non standard datastructure) just have one location where the locking is done.
By Peter Veentjer on Oct 13, 2008 at 9:01 AM PDT
Well-written clear article handling threading and concurrency. Outstanding at the conceptual level.

The implementation details will of course be language specific and pattern specific, and there exists more than one way to accomplish it.

The value to me in the article is the clear handling of the concepts of threading, concurrency, related issues, and strategies to manage them.
By Dave Milner on Oct 13, 2008 at 10:44 AM PDT
Great article Willie. C# programmers that may be interested in how to accomplish synchronized methods should look into decorating the methods with the MethodImpl attribute

[MethodImpl(MethodImplOptions.Synchronized)]

or just wrap the entire content of your method in a lock statement.

e.g.

public class MyClass
{
public void MyMethod()
{
lock(typeof(MyClass))
{
// contents of method
}
}
}
By Collin Couch on Oct 13, 2008 at 11:01 AM PDT
@Peter: Ah, good catch re #1. Yes, declaring users final will prevent consumer threads from seeing the DAO instance half-published (object created but state not initialized) by the publisher thread. I agree with both of your other points as well. In particular, I like your #3. Concurrency semantics are communicated through documentation more than through language features and so stylistic issues become important. Thanks for the feedback.
By Willie Wheeler on Oct 13, 2008 at 10:11 PM PDT

he myth of pandora is ancient, appears in several distinct Greek versions, pandora armbandand has been interpreted in many ways. In all literary versions, Neu Eingetroffen however, Pandora Armbänder the myth is a rosetta stone kind of theodicy, addressing the question pop information, web easy get, sports fashion, news-fashionof why there is evil in the world. In the seventh hot-winter century BC, Hesiod, both in his Theogony (briefly, without naming Pandora outright rosetta stone language, rosetta stone spanish, abercrombie and fitch, Abercrombie Fitch

By pandora schmuck on Aug 30, 2010 at 10:57 PM PDT

Post a comment

Your name:
Your e-mail address (won't be displayed):
Your web site (optional):
example: www.xyz.com
Your comment:
Preview:
By You
Please help us reduce comment spam:
Spring Annotations RefCard
Check out the new DZone Spring Annotations Refcard by Craig Walls!

What's New?

2009-08-30 - Check out my two-part series on DZone: Spring Integration: A Hands-On Tutorial.
2009-03-25 - My new article Getting Started with Spring Batch 2.0 is available on DZone.
Home | Consulting | Tech Articles | Mailing List | Contact | Spring Blog
Copyright © 2008 Wheeler Software, LLC.