Introduction

In this chapter we’re going to talk about the idea of the “meaning” of a program via its intent, which is not always apparent by looking at the code alone. For this, we will start by first talking about indexing, which is a very common thing for students to fudge.

Thoughts like:

I’ll just add a +1 or a -1 and hope for the best

or:

either < or should work

are very common thoughts behind programmers who employ random tweaks in their code, hoping for the best.

In this chapter we’ll try to fix that, and more.

What I want you to focus on is the meaning behind the lines of code, rather than the lines of code themselves. If the meaning becomes clearer, the hope is that:

  1. You’ll be able to read code and figure out its intent easier;
  2. Evaluate code that you write in a more precise and less wishy-washy way.

Indexing Woes

As you progress through your time in programming (especially when you eventually have to come up with your own algorithms), at some point you’ll start wrestling with indexing into arrays. A classic example of this is writing binary search in a way that doesn’t cause a segmentation fault or one that fails to terminate.

So before we talk about the main meat of this chapter, we’ll first address indexing.

Indexing actually comes in two forms. An index either:

  1. Points at an element of an array.
  2. Points at the boundary between two elements of an array.

For example, in an array like this:

index:       0   1   2   3
element:    8   6   7   5
boundary: |   |   |   |   |
          0   1   2   3   4

The element index 2 points at the value 7. The boundary index 2, on the other hand, points between 6 and 7. Both are useful, but they mean different things.

What this means is that when you make indices, depending on what you want to do, you can use these to represent different things.

Getting indices to represent ideas

Let’s start simple with an example you might be familiar with, adding up all elements in the array.

 
int total_elements(int arr[], size_t size){
	int total = 0;
	for(size_t idx = 0; idx < size; ++idx){
		total += arr[idx];
		// Point A
	}
	return total;
}

At this point you might be wondering “What’s so special about this? We’re just going through each element and adding them to our total.” Which isn’t wrong, but we’re going to try to establish a stronger link between the indices or our loops, and the intermediate state of the program as it iterates.

I want you to take note here that no matter the iteration, we can actually make the following statement that relates total and idx:

At every iteration, at point A, total is actually the sum of all elements from index 0 to to idx (inclusive). I.e. total = arr[0] + arr[1] + ... + arr[idx].

So based on this interpretation, you could say that idx is pointing at the last element that we have summed up to in total.

More importantly, this statement that holds true no matter the iteration is what we call an invariant. Invariants are statements that help us reason about the correctness of our programs. For example, given the above program and its invariant, since the invariant holds for every iteration, it must also hold for the very final iteration. By then, notice that idx == size - 1, so it must be the case that at point A, total is arr[0] + arr[1] + ... + arr[size - 1], i.e. the sum of the whole array.

Before we start moving on to more complicated examples I want you to take a short while to think about how the whole thing ties together conceptually.

  1. We want to look at our array indices as representing something beyond just an index.
  2. That representation should come in the form a statement about the program’s behaviour/correctness.

If this makes sense, move onto the next part where we’ll start showing a more involved example to take this one step further and try to see meaning from lines of code.


Indices for intervals

Along a similar vein, we can represent things like intervals with a pair of indices. Let’s use binary search as an example. There are many ways to write binary search. But we’ll give one such implementation here.

 
// returns the rightmost index whose value is at most `value`
size_t binary_search(int arr[], size_t length, int value){
	size_t left = 0, right = length; // Line A
	while(left + 1 < right) { // Line B
		size_t mid = (right - left) / 2 + left;
		if(arr[mid] <= value){ // Line C
			left = mid; // Line D
		} else {
			right = mid; // Line E
		}
	}
 
	return left; // Line F
}

Again, there are many ways to write binary search (and many wrong ways too), but knowing the choices behind every line will help you discern between working ones and buggy ones. We’ll use this example to show you a few techniques.

The Goal of the Program

This binary search aims to return the rightmost index i such that arr[i] <= value. That is to say, arr[i+1] > value. So for example, given an array like:

[-10, -5, 3, 3, 4, 4, 4, 6, 6, 15, 21]

If we were to look for value 6, this program should return index 8. Since arr[8] <= 6 and arr[9] > 6.

If we were to look for a value that doesn’t exist, like , then it should return index since arr[1] <= 0 and arr[2] > 0.

There is one notable exception here, what if we wanted to look for a value like ? Then technically our program can’t really return any index, though you’ll notice the code does return in that case. Something the caller of the function is going to have to handle as a separate case.

Indexing for intervals

Take a look at line A, why is it left = 0 and right = length? Why not other values like left = 0 and right = length - 1? In fact, many common implementations prefer setting right = length - 1. So does this mean we have a buggy implementation? Not quite.

We’ll need to see the program as a whole, of course. But let’s start with the point of the program, and how line A serves that goal.