There is something called the maximum subarray problem, and it goes something like this.

You are given an array of numbers: something like 3,5,-2,-1,5,3,1,-4,-6,5. Your job is to come up with the subarray with the highest sum.

For this example it would be the subarray [3,5,-2,-1,5,3,1], which has a sum of 14. No other subarray gives a higher sum.

Let’s look at a few more:

- The sequence -3,-3,-2,-4,-2,-5,-4 has the highest sum being -2, given by the subarray [-2].
- The sequence 1,2,-1,-2,1 has the highest sum being 3, given by the subarray [1,2].
- The sequence 3,2,1,4 has the highest sum being 10, given by the subarray [3,2,1,4].

It’s an interesting problem with many applications. How do we solve it?

### Solving the problem

While there are many ways to solve this problem, the best solution is Kadane’s Algorithm. It’s a deceptively simple algorithm, yet also deceptively difficult to understand.

Here’s the Python algorithm given on Wikipedia:

1 2 3 4 5 6 |
def max_subarray(A): max_ending_here = max_so_far = A[0] for x in A[1:]: max_ending_here = max(x, max_ending_here + x) max_so_far = max(max_so_far, max_ending_here) return max_so_far |

Here’s an R implementation, with a trace added to help figure out what’s going on.

1 2 3 4 5 6 7 8 9 10 11 |
find_max_subarray = function(arr){ max_ending_here = arr[1] max_so_far = arr[1] for(i in 2:length(arr)){ max_ending_here = max(arr[i], max_ending_here + arr[i]) max_so_far = max(max_so_far, max_ending_here) print(paste( max_ending_here,max_so_far,arr[i])) #for tracing } return(max_so_far) } |

Try running the algorithm yourself. Feed it different subsequences and look at the trace to see if you can figure out how it works.

Understanding this algorithm is not a simple task. Here’s how I understand it.

### Understanding Kadane’s Algorithm

For each number in the sequence, we do this:

Each number “decides” if it wants to use the previous subarray sum, or if it wants to start a new subarray from scratch. At every comparison, it also checks to see if its subarray sum is higher than any previous one.

Let’s look at an example – the sequence 1,-2,3,-1,2.

We iterate through the numbers, starting at the second number in the sequence (-2).

At this point, the maximum subarray sum seen so far is 1 – the sum of the first element in the sequence. This is also the previous subarray sum.

The new subarray sum of -1 is passed to the next number. This is the sum of the current subarray [1,-2].

The next number (3) gets the previous subarray sum of-1:

We start a new sequence at this point. The current subarray is [3] and the current subarray sum is 3. The maximum subarray sum is also updated, since 3 is higher than everything we had before.

The next number (-1) is passed the previous subarray sum of 3:

Now the subarray is [3,-1] and the subarray sum is 2. Do you see how this goes now?

The last number is passed the subarray sum of 2:

The subarray at this point is [3,-1,2] and the subarray sum is 4. The max_so_far variable is updated to read 4.

At this point the sequence has ended and max_so_far is returned – which is 4. This is the highest subarray sum of the sequence, and the subarray was [3,-1,2].

And that’s it!

Hope you enjoyed this post! Let me know by leaving a comment if you need clarification on anything.