Sunday, March 15, 2015

Lambda Expressions, Captured Variables, and For Loops: A Dangerous Combination

I get a lot of great questions and comments when I do live presentations. Last weekend at So Cal Code Camp was no different. Now it took me a full week to get to these because I've done 2 presentations and about 900 miles of driving this past week (and I'm still getting back into the swing of things). But better late than never.

Today we'll look at the dangers of capturing a variable from a "for" loop. This is something that I often mention in my presentations, but I've never actually put together a sample for this. At the So Cal Code Camp, I had someone come up and comment that she had run into this exact issue in her code. That's a great reason to sit down and put together a code sample to help others.

[Update 4/20/2015: A video version of this topic is also available: Captured Variables and for Loops.]

Captured Variables in Lambdas
In my presentation on Lambda expressions (Learn to Love Lambdas (and LINQ, Too!)), I show how captured variables work and what they're good for. If you're not familiar with captured variables, they are referred to as "closures" in other languages (leave it to Microsoft to pick their own terminology).

This means that we can we can "capture" a variable that is currently in scope when we assign our lambda expression (or anonymous delegate) and then use that variable later, even if it would have normally gone out of scope. For more information, you can check the PDF document or video with the session materials.

One of the great things about captured variables is that they allow us to scope things more appropriately. Instead of having a class-level variable that is accessible to everything in our class, we can have a method-level variable that is accessible to the original method and the anonymous delegate. (And this is exactly what the demo code from the session shows. So check out the code samples and walk-through.)

There's something very important to keep in mind about captured variables:
The value of a captured variable is the value at the time it is used, not the value at the time it was captured.
This means that if the value of the variable is changed after we capture it, we may get unexpected results.

Captured Variables in For Loops
Normally, we don't need to worry about this too much. But there is one situation where we may run into trouble: capturing the indexer of a "for" loop.

Let's run through some code to see the problem. And then we'll see that there is a very simple fix.

This code is available in the "BONUSCapturedVariables" branch of the "lambdas-and-linq" project on GitHub: https://github.com/jeremybytes/lambdas-and-linq. The code is in the "CapturedVariables" project of the solution.

We'll start with a class for our data. This is a "Person" object, and it's pretty simple:


This has 5 properties and an override of the "ToString" method. We'll be using this to output our objects to the console.

Capturing the Wrong Thing
In the console application, we'll start off with doing things the wrong way -- which also happens to be the logical way if we're new to captured variables:


In the "BadCapture" method, we have a "for" loop. Inside the loop, we create a new Task and use a lambda expression to call the "OutputPerson" method.

Notice the parameters of "OutputPerson": we have a list of person objects and an integer to represent the index of the item we want to output. (Granted, this is a bit of a contrived example, but it will show the problem that we run into.)

Now back in our lambda expression, we "capture" the indexer from the for loop (the "i" variable). So we would expect that for each new task (and each call to "OutputPerson") we would have a different value that would represent the current index of the "for" loop.

Let's see what happens. Here's the rest of the console application:


When we run the application, we get a runtime error:


This isn't good. If we look at this closely, we get an "ArgumentOutOfRange" exception. This probably means that we're trying to index past the end of our "people" collection.

Let's add a "try/catch" block to try to figure out what's going on:


Instead of getting a runtime exception, we'll get output to our console to show us the value of our parameter. And this is probably not what we expect:


This shows us that each time we call "OutputPerson", it is called with "7" as the index. Why? Because...
The value of a captured variable is the value at the time it is used, not the value at the time it was captured.
When the captured variables actually get used, the "for" loop has completed it's run. In this case, we have 7 items in our "people" collection. That means the final value for "i" (our indexer) is "7". So when the captured variable gets used, it's value is "7" (which happens to be beyond the end of our zero-based collection).

Fixing Things Up a Bit
Now before we look at the right way to do this, we're going to make a bit of a change to the "BadCapture" method. Since we're dealing with tasks, and we're not quite sure what order things will run/complete in, I want to make sure that all of the tasks from this method have completed before we move on. Here's the code for that:


We start by creating an array of Task objects. This will be the same as the number of items that we have in our collection (7).

Then inside the "for" loop, we save off the result of "Task.Run" (which happens to be a Task) as an element of our array.

Finally, by calling "Task.WaitAll(tasks)", we're specifying that this "BadCapture" method should not return until all of the tasks have completed. This will make sure that things don't get mixed up with our next set of tasks.

Capturing an Index the Right Way
The fix to this problem is pretty simple. Instead of capturing the indexer of the "for" loop, we create a local variable which holds a copy of the value. Here's that code in a separate method:


All we did here was create a variable that is local to the body of the "for" loop called "capturedIndex". This will be a variable that is a copy of the value of the "i" indexer. Since this is a value type (an integer), this makes a copy of the value.

Then notice that when we call "OutputPerson", we use the new "capturedIndex" as a parameter. This has the effect of capturing this new locally-scoped variable.

And on each iteration through the "for" loop, we get a new "capturedIndex" variable. So in the end, we capture 7 different variables. And each variable has the expected value because each separate instance of "capturedIndex" does not get changed after it is captured.

Here's our updated console application:


And when we run this, we get a much better output:


This shows us the original bad output plus the new good output. One thing to note is that our values are not coming out in any particular order. This is because the Tasks run when the task scheduler allows them to. If we run the application again, we get results in a different order:


This has to do with how Tasks work and is not affected by our lambda expressions or captured variables. We'll save this as a topic for another day. (In the meantime, feel free to take a look at my collected articles on Task, await, and asynchronous methods.)

Now as a final step, I'll add the same Task-handling to the "GoodCapture" method as we did to the "BadCapture" method. This will just make sure that all the tasks have finished running before moving on in the console application:


As a reminder, you can get all of this code in the "CapturedVariables" project of the lambdas-and-linq project on GitHub. The changes have all been rolled into the "master" branch, but you can see the specific branch with this code here BONUSCapturedVariables branch on lambdas-and-linq.

Wrap Up
Captured variables are really cool. They let us scope our variables more appropriately, and they can give us easy access to data without having to create a class-level or global variable. But we need to remember that the value of the captured variable is the value at the time we *use* it, not the value at the time we capture it.

We usually only run into problems when we try to capture variables that are constantly changing -- such as indexers in "for" loops. But as we've seen, there is a pretty simple solution to this. We just need to make a copy to a separate, unchanging local variable, and then capture that.

If you attend one of my presentations, please feel free to ask questions or make comments. These help me know what I should be writing about or expanding on. In my experience, if one person has a particular question, there are probably plenty of other folks who have it as well.

Happy Coding!

No comments:

Post a Comment