del.icio.us Digg DZone Reddit StumbleUpon
Concurrent Programming for Practicing Software Engineers - Willie Wheeler
« Previous | 1 | 2 | 3 | 4 | 5 | 6 | Next »

What is a thread?

Let's begin by introducing the star of the show, the thread. The following is probably more accurately considered a description rather than a definition, even though it reads a little like the latter.

A thread is a sequential flow of control with its own program counter and call stack. It shares state, memory and resources with other threads in the same process. Since each thread gets its own call stack, local variables aren't shared. Instance and class variables, however, are shared across threads. It's also possible to define variables that exist outside of methods but are still local to a specific thread, and these are called thread-local variables.

OK, so that's what a thread is. What can we do with them?

Serialism and parallelism

In software development we're asked to solve different kinds of problem. We may be asked to write code that takes a bunch of information about a loan applicant and produces a verdict as to whether that applicant should get the loan. Maybe we have to write a web-based shopping cart. Or maybe we have to write code to make a guy's arms and legs flail about in a realistic fashion when he falls off a cliff in a video game.

For any given problem there are typically multiple ways to solve it, and different ways to categorize the solutions. One useful way of distinguishing solutions is to separate serial from parallel solutions. Let's look at that distinction in more detail.

Serial approaches to solving problems

Some problems have solutions that are essentially a single series of steps. If you're writing a home affordability calculator, for example, the steps might be as follows:

  1. Collect data from the user, such as their annual salary, the amount of the down payment, their monthly bill payments, whatever.
  2. Crunch some numbers.
  3. Show the maximum dollar amount the user can afford and an ad for a local Realtor.

Our recipe for solving the "home affordability problem" is said to be serial because it's essentially just a series of steps to carry out. We might even go so far as to describing the problem itself as a serial problem, which is just a shorthand way of saying that the most plausible and reasonable solutions to the problem are serial solutions. Either way, the essence of serialism is that we have a series of steps that a single control flow can carry out.

Now let's look at the alternative to serialism, which would be parallelism.

kevindooley
Photo credit: kevindooley

Parallel approaches to solving problems

Some problems are amenable to solutions involving multiple concurrent control flows, which is to say that they have parallel solutions. To take a non-computing example, say you have a room with toys scattered all over the floor. You want the room clean. As potential participants to the cleanup effort you have yourself and three kids. A possibly-relevant piece of background information is that one child is entirely responsible for the mess.

Readers with young children will no doubt recognize this classic problem from the field of parenting algorithms. Fortunately, well-known serial and parallel solutions are available. The best approach in any given case depends strongly upon your goals:

  1. If you're trying to emphasize fairness and personal responsibility, you might opt for the obvious serial solution even though it's slower. (In reality this isn't a serial solution; it inevitably involves close supervisory involvement by a supervisory parent thread. We can ignore that detail.)
  2. If you're in a big hurry because you need to get the kids off to school, you might choose the obvious parallel solution, despite the unfairness.
  3. If you're in an even bigger hurry you might choose the "other" serial solution. (Hint: it doesn't involve any kids.)

Whether we're talking about serialism or parallelism, there is a recurring concept, which is that of a flow of control. In the serial case we have exactly one flow. In the parallel case we have more.

Computer scientists, hardware engineers and software engineers spend a lot of time with the distinction between serialism and parallelism. Let's look at why that is.

Serialism vs. parallelism: why do we care?

In a word, performance. In many cases—not all, but many—parallel solutions are simply faster than serial solutions, owing to the "many hands make light work" effect. And from a practical perspective, that's the main reason we care about concurrency (i.e., parallelism) despite its many attendant complications.

The rest of this article explains what some of those complications are and some ways to overcome them.

Social bookmarks: del.icio.us Digg DZone Reddit StumbleUpon
« Previous | 1 | 2 | 3 | 4 | 5 | 6 | Next »

Comments (4)

Hi,

I like your article because you described the classic problems and solutions well. But there are some minor issues with the code:

1) the userMap in the InMemoryUserDao is not published safely and therefor relies on an external happens before rule to prevent reordering and visibility problems. This could lead to a NullPointerException when accessing the users reference. In most cases with constructors this is not an issue, but imho still a bad practice. It can be fixed by making users final. For more information see http://pveentjer.wordpress.com/2008/04/02/jmm-constructors-dont-have-visiblity-problems/

2) I understand that you are working towards a solution with the InMemoryUserDao and step by step solve the problem. But I would add the advice to less experienced readers that they should look at really concurrent structures like the ConcurrentMap. It has atomic operations for the check/act problems (like putIfAbsent) but implementations can also perform better because they use lock striping for example instead of a big lock.

3) Instead of using AND the internal synchronization of the SynchronizedMap AND the synchronization in the InMemoryUserDao, I would drop the SynchronizedMap completely and replace it by a standard Hashmap (make sure that all methods are going through the lock) It makes reasoning about locking easier. So if you can't use the ConcurrentMap (e.g. a non standard datastructure) just have one location where the locking is done.
By Peter Veentjer on Oct 13, 2008 at 9:01 AM PDT
Well-written clear article handling threading and concurrency. Outstanding at the conceptual level.

The implementation details will of course be language specific and pattern specific, and there exists more than one way to accomplish it.

The value to me in the article is the clear handling of the concepts of threading, concurrency, related issues, and strategies to manage them.
By Dave Milner on Oct 13, 2008 at 10:44 AM PDT
Great article Willie. C# programmers that may be interested in how to accomplish synchronized methods should look into decorating the methods with the MethodImpl attribute

[MethodImpl(MethodImplOptions.Synchronized)]

or just wrap the entire content of your method in a lock statement.

e.g.

public class MyClass
{
public void MyMethod()
{
lock(typeof(MyClass))
{
// contents of method
}
}
}
By Collin Couch on Oct 13, 2008 at 11:01 AM PDT
@Peter: Ah, good catch re #1. Yes, declaring users final will prevent consumer threads from seeing the DAO instance half-published (object created but state not initialized) by the publisher thread. I agree with both of your other points as well. In particular, I like your #3. Concurrency semantics are communicated through documentation more than through language features and so stylistic issues become important. Thanks for the feedback.
By Willie Wheeler on Oct 13, 2008 at 10:11 PM PDT

Post a comment

Your name:
Your e-mail address (won't be displayed):
Your web site (optional):
example: www.xyz.com
Your comment:
Preview:
By You
Please help us reduce comment spam:
Spring in Practice
My brother and I are writing Spring in Practice for Manning!

What's New?

2009-08-30 - Check out my two-part series on DZone: Spring Integration: A Hands-On Tutorial.
2009-03-25 - My new article Getting Started with Spring Batch 2.0 is available on DZone.
Home | Consulting | Tech Articles | Mailing List | Contact | Spring Blog
Copyright © 2008 Wheeler Software, LLC.