The key to avoiding race conditions is to coordinate the multiple threads when accessing shared data. There are different ways to do this, but the simplest is the one your mom taught you when you were a kid: take turns.
Here's how you take turns in software. Suppose that you have some shared data that you want to protect, such as a list of all the posts in a given web-based discussion forum, or a web page hit counter. You have various blocks of code throughout your application that access that data, both reading it and writing it. These blocks of code don't have to be in the same class (though they often are), but they're all accessing the same data.
(Keep in mind that when we describe the data as shared data, we mean that it's shared by multiple threads, not that multiple blocks of code access it. If multiple blocks of code access the data but only one thread ever does, then it's not shared data.)
The trick is to protect the code so that only one thread can enter it at a time. I don't mean that only one thread can enter one of the code blocks at a time. I mean that if you consider the whole set of code blocks as a single group, only one thread can enter that group at a time. So if one thread enters one of the blocks, then all other threads need to be kept out of all other blocks until that first thread is done.
The way we enforce this behavior is to associate with each block of code you want to protect something called a mutual exclusion lock, or mutex. People often just call it a lock as well. You associate a lock with any given piece of data you want to protect, and all the code blocks that access that data use that same lock. A thread is not allowed to enter the code block unless it acquires the lock. So whenever thread T1 wants to enter, it attempts to grab the lock. If the lock is available, then T1 takes it and other threads have to wait until T1 is done. If instead some other thread T2 has the lock, then T1 has to wait until T2 is done.
When a thread leaves the block of code protected by the mutex, known as the critical section, it releases the lock.
Incidentally, I always thought it was kind of funny we talk about grabbing locks. It seems like we should say we grab a key so we can get past the lock. But that's not how people talk about it.
We now introduce the principal way of implementing a mutex in Java, the
synchronized keyword.
synchronized keywordIn Java the primary mechanism for enforcing turn-taking across threads is
the synchronized keyword. You basically put the code you want to
protect inside a synchronized block, specifying a lock object (or
more exactly, specifying an object whose monitor will be used as a
lock—in Java, each object has a "monitor" that can be used as a mutex).
If you want to synchronize access to the whole method, you just use the
synchronized keyword on the method itself. In this latter case, the
locking object is the one on which the method is being called.
Here's an example of a synchronized block:
synchronized (myList) {
int lastIndex = myList.size() - 1;
myList.remove(lastIndex);
}
Here, we're using the myList instance as the lock. Anytime a
thread wants to enter this block of code, or any other block of code protected
by the same lock, it needs to grab the lock on the myList
instance first. And it releases the lock upon exiting the synchronized
block (i.e., the critical section).
Here's an example of a synchronized method:
public class Order {
...
public synchronized void removeLastLineItem() {
int lastIndex = myList.size() - 1;
myList.remove(lastIndex);
}
}
In this case, a thread wanting to enter the removeLastLineItem()
method would need to grab the lock on the relevant Order
instance first, and would release the lock after exiting the method.
Armed with our understanding of mutexes and the synchronized
keyword, let's revisit our example from the previous page.
Listing 2 shows how to apply thread synchronization to make the class threadsafe.
public final class InMemoryUserDao implements UserDao {
private Map<String, User> users;
public InMemoryUserDao() {
users = Collections.synchronizedMap(new HashMap<String, User>());
}
public boolean userExists(String username) {
return users.containsKey(username);
}
public void createUser(User user) {
String username = user.getUsername();
synchronized (users) {
if (userExists(username)) {
throw new DuplicateUserException();
}
users.put(username, user);
}
}
public User findUser(String username) {
User user = users.get(username);
if (user == null) {
throw new NoSuchUserException();
}
return user;
}
public void updateUser(User user) {
String username = user.getUsername();
synchronized (users) {
if (!userExists(username)) {
throw new NoSuchUserException();
}
users.put(username, user);
}
}
public void removeUser(String username) {
synchronized (users) {
if (!userExists(username)) {
throw new NoSuchUserException();
}
users.remove(username);
}
}
}
In Listing 2 we've addressed the three issues that we raised earlier. First
note that the users instance is entirely encapsulated by our
InMemoryUserDao class (I've made the class final to
prevent inheritance from breaking our encapsulation here), so that means it
is entirely within our power to protect every possible access to the
users instance. (Hm, that's probably a good interview question:
"Discuss the relationship between encapsulation and threadsafety.")
Next note that we've addressed the general problem that we saw by making
every "check-then-act" occurrence atomic. Now the code is such that if thread
T checks something (such as checking whether a username exists), we can be sure
that the answer doesn't change before the action part, because no other thread
is able to access the users instance at all until T is done with
it.
There's one detail to discuss. Take a look at the userExists()
and findUser() methods. In both cases we access the
users instance, but there's no synchronized block.
The reason this isn't a problem is that both of the following are true:
users map, andCollections.synchronizedMap(...) wrapper.If either of those two conditions had failed, we'd need to put the calls in a
synchronized block. It's useful to discuss these a little more
carefully.
Let's take (1) first. Note that even though we don't have an explicit
synchronized block, each individual method call we make against
users is threadsafe. That's because the users map is
internally synchronized on the monitor for the users instance
itself. That is, the map handles synchronization for individual method calls
so we don't have to. If we're just going to call a single method on an internally
synchronized method, we're usually OK to do so.
The place where we have to pay attention to threadsafety is where we call
multiple methods on the users instance. There, the internal
synchronization doesn't help us at all because that synchronization doesn't
prevent the thread from releasing the lock in between the multiple method calls.
And when it releases the lock, another thread can come in and change the data.
If your code logic assumes that the multiple method calls behave as an atomic
unit, you have a problem. Therefore you must provide your own synchronization
on the relevant lock when you use check-then-act and similar patterns.
It is a common error to assume that just because the object is internally synchronized, you don't have to worry about threadsafety anymore. That is simply not true, as we've just explained.
Now take (2). Clearly if the map didn't have any internal synchronization,
then we'd need to provide it ourselves, even for single method calls against
the map. Otherwise threads could get in and muck things up for other threads.
This is true even in many cases where it looks like individual
threads are performing atomic actions. For example, if you have some threads
making get() calls against a HashMap, and another
set of threads making put() calls against the same HashMap,
you may think that they are all performing atomic operations and therefore it's
OK to do this. But that is not correct. The problem is that they only look
like atomic operations; behind the scenes, they're impacting the size of a shared
backing array, which in turn impacts how hashcodes are mapped to buckets, which impacts
the get() and put() calls themselves. If the class had
internal synchronization then get() and put() would
both be atomic from the perspective of external code, but without that, they
aren't atomic at all.
That was your crash course on solving race conditions with thread synchronization. It's not a simple topic, but you should now have the basics. But even though thread synchronization helps solve one problem, unfortunately it creates another. The problem is that in forcing threads to stop and wait their turn, you create the potential for the situation in which threads might wait indefinitely before they can move on. This is called a deadlock, and it's the subject of the next section.