Letting the compiler guarantee sortedness

It's not often that I wake up on a Sunday and the first thing my brain thinks of is a leetcode problem from two years ago. In fact, it's never. Until today.

I woke up and found myself mulling over a "gotcha" that I left unaddressed back in June 2023 when solving a fine little leetcode problem.

The tl;dr of the problem is you're given a list of stock prices for subsequent days and you have to find the maximum profit possible if you bought and sold the stock at most twice. That is, if you had a stock price list of [ 3, 1, 0, 0, 1, 2, 3 ], the best you can do is buy on the 4th day (for ₹0) and then sell it on 7th day for ₹3, you make a profit of ₹3. (There are some constraints here: you can only buy or sell on a given day; you cannot buy more unless you sell etc. Check out the writeup for more info)

SortedArray type is not all that safe

The solution I managed to cook up worked well but one of the flaws it had was something to do with sorted arrays. In the solution, I needed a sorted array of elements (elements here being some kind of pairs/tuples). In order to ensure some of the functions only allowed a sorted array (and guaranteed this at the type-level), I created a new type called SortedArray which, parameterized, looked like this:

newtype SortedArray a = SortedArray (Array a)

The problem was, I could construct a SortedArray but it could end up being completely unsorted. I just had to do this:

totallyMisleading = SortedArray ([7,6,5,7,2,0])

If I had to actually make a true-to-its-name SortedArray, I had to do this:

betterArray = SortedArray (sort [5,4,2,8,1,9,8])

It still doesn't prevent me from doing SortedArray (unsortedArray) which is a disaster waiting to happen.

Making it impossible to construct a SortedArray unless you use a function

The way to prevent this from happening is to prevent SortedArray constructor from being usable. That is easy: we put this SortedArray in a new module and only export the type.

module SortedArray (SortedArray) where
import Prelude

newtype SortedArray a = SortedArray (Array a)

Now, we add a specific function that generates a SortedArray from any array:

import Data.Array (sort)

toSorted :: forall a. Ord a => Array a -> SortedArray a
toSorted xs = SortedArray (sort xs)

You could, at this point, go: what's that Ord a thingy?

Custom data types are not really sortable... unless we tell the compiler so

The key function here is the sort which is the default "array sort". The expression sort xs works only because we tell the compiler, through the annotation, that whatever elements we're dealing with in the array are "orderable" (through that Ord a constraint in the type declaration).

So this will work out of the box:

sortedNumbers = toSorted [4,3,5,2,6]
-- sortedNumbers == [2,3,4,5,6]

But in my original solution, I was dealing with custom data types. Like these:

data StockDay = StockDay Int Int
data BuySell = BuySell StockDay StockDay

An array of elements of these kinds does not lend itself to be sorted.

sort ([StockDay 1 2, StockDay 3 4, StockDay 1 5])
-- Does not compile.

The reason is that Ord a constraint. The compiler does not know how to order an array of StockDays or BuySells because there is no Ord implementation for these data types. In English, the compiler just doesn't know how to compare two StockDays or two BuySells. If it knew that, it would know how to sort an array of StockDays or BuySells.

So, all that was left was to write some Ord instances for these two custom data types:

instance ordStockDay :: Ord StockDay where
  compare = sortStockDay

instance ordBuySell :: Ord BuySell where
  compare = sortBuySell

-- implementations of these sort functions are cut for brevity
-- sortStockDay :: StockDay -> StockDay -> Ordering
-- sortBuySell :: BuySell -> BuySell -> Ordering

derive instance eqStockDay :: Eq StockDay
derive instance eqBuySell :: Eq BuySell

(The Eq instance derivations are required in order to be able to write Ord instances for those data types.)

With these small changes, I could refactor pertinent bits of the code to get a type-level, compile-time guarantee that whatever function needed to use a SortedArray was getting a sorted array for sure.

Mucking about in other languages

Typically, these posts conclude with a sentence or two about how great this typeclass-supported polymorphism is and how you could reap the benefits mostly in languages like Haskell/Purescript (or OCaml and maybe Rust).

But in digging a little deeper into how other languages would implement a type-level guarantee of array sorted-ness even for custom data types and interfaces, I learnt that many languages support these kinds of shenanigans almost out of the box.

Mostly, it's a matter of declaring the custom "compare" functions for the custom data types and then passing it to the constructors that construct the SortedArray type.

Some allow you to "auto-derive" the sorting if the underlying types already have comparison methods defined in the standard library. For example, a Tuple/Pair. Haskell/Purescript allows us to auto-derive too but auto-derived instances for tuples and custom data types can be tricky. They can be tricky for strings too (should I compare strings lexically or by length?). The key concept here is defining what is a "comparison" is hard for some kinds of data where you have more than one dimension on which you can compare.

In these implementations, a pattern emerges. We define a custom compare function for the custom data type, then that is passed to the constructor.

Are you passing the comparator implicitly or explicitly?

When I say that's passed to the constructor, there are two ways this happens. Either it's explicit: that is, the custom compare function is passed as an argument to the constructor. Or it's implicit: the custom compare function is nowhere in the construction arguments but because a comparison method is available and associated with the datatype, the compiler (or the runtime) knows what to do, how to sort etc.

In the following examples, assume two things

we have written a constructor that would construct a SortedArray. The interface of this constructor will either need explicit passing of the compare function or not.
the code is not exactly correct. It exists more as an approximation, for illustrating how one gets things done in the langauge.

For instance, here's Typescript, an explicit beast:

type SD = { price: number; day: number };
type BuySell = { first: SD; second: SD };

// Custom comparators
const compareSD = (a: SD, b: SD) => a.price < b.price;
const compareBuySell = (a: BuySell, b: BuySell) =>
  a.second.price - a.first.price < b.second.price - b.first.price;

// Constructors; assume "SortedArray" is some class-like implementation that has a "from" method
// from: <T>(T[], compareFn: (a:T,b:T) => boolean)
const sortedSD = SortedArray.from([{ first: 3, second: 1 }], compareSD);
const sortedBuySell = SortedArray.from([buySell1, buySell2], compareBuySell);

Or Golang:

// Custom comparison functions
func compareSD(a, b SD) bool {
    // some comparison here
}

func compareBuySell(a, b BuySell) bool {
    // some comparison here
}

// Constructors
sortedSD := NewSortedArrayWith([]SD{{1,2}, {3,1}}, compareSD)
sortedBuySell := NewSortedArrayWith([]BuySell{bs1, bs2}, compareBuySell)

Contrast this with Kotlin:

data class SD(val first: Int, val second: Int) : Comparable<SD> {
    override fun compareTo(other: SD) = {} // some custom compare method here
}

data class BuySell(val first: SD, val second: SD) : Comparable<BuySell> {
    override fun compareTo(other: BuySell) = {} // some custom compare method here
}

// Constructors
val sortedSD = SortedArray.from(listOf(SD(3,1), SD(1,2)))
val sortedBuySell = SortedArray.from(listOf(BuySell(sd1, sd2), BuySell(sd3, sd4)))

Or even the more verbose, Python, we override __lt__, __eq__ and __gt__ methods:

@dataclass
class SD:
    first: int
    second: int

    def __lt__(self, other: 'SD') -> bool:
        # some custom comparison goes here

    def __eq__(self, other: 'SD') -> bool:
        # some custom equality check goes here

@total_ordering  # Generates other comparison methods from __lt__ and __eq__
@dataclass
class BuySell:
    first: SD
    second: SD

    def __lt__(self, other: 'BuySell') -> bool:
        # Custom: compare by sum of SD values
        self_sum = self.first.first + self.first.second + self.second.first + self.second.second
        other_sum = other.first.first + other.first.second + other.second.first + other.second.second
        return self_sum < other_sum

    def __eq__(self, other: 'BuySell') -> bool:
        return (self.first, self.second) == (other.first, other.second)


def main() -> None:
    sd_items: List[SD] = [SD(3, 1), SD(1, 2), SD(2, 1)]
    sorted_sd: SortedArray[SD] = SortedArray.from_list(sd_items)

    buysell_items: List[BuySell] = [
        BuySell(SD(1, 2), SD(3, 4)),  # sum = 10
        BuySell(SD(2, 1), SD(1, 1)),  # sum = 5
        BuySell(SD(5, 5), SD(5, 5))   # sum = 20
    ]
    sorted_buysell: SortedArray[BuySell] = SortedArray.from_list(buysell_items)

The beauty of the implicit way is that we can abstract away (from our working memory) how a list/array of custom data type is sorted (or how the elements are compared). And we don't need to pass it on every time we try to generate a sorted list of things.

Kotlin solves with the Comparable<T> type which mandates that you write a compareTo function. Haskell/Purescript solve this with the Ord a constraint which requires you to provide a compare function for your custom data types (if you need sorting, comparing functionality for those data types).

The difference between explicit and implicit is kind of stark if we think about it in layman terms. When we try to construct a SortedArray, the explicit ones are going, "Okay you asked me to sort a list of things, but can you also tell me how to compare the things?" every time we try to generate a sorted array. The implicit ones are going, "Okay you asked me to sort a list of comparable things and somewhere in the codebase you've mentioned how to compare this specific thing so I will now construct the sorted array for you."

There is one pitfall though.

What does it even mean to "compare" two things?

It is mostly straightforward if we think of numbers. Comparison is greater-than or less-than or equal-to. Booleans can be compared too if we assume True > False.

But what about strings? Should we compare them just lexically or should we also include the length in the comparison? Or what if, for a specific use-case, we only want to compare the length?

As an example, how to compare "apple" and "pear" depends on the context. Lexically, "apple" comes first. By length, "pear" comes first. Typically, we would do lexical followed by length — a useful but arbitrary convention.

Things get complex when we talk about product types. Product types are your objects or structs or dataclasses with multiple fields. These have multiple "dimensions" and what that means is there are many ways of comparing them. And each way could be valid and multiple comparing methods could be necessary.

Take my StockDay.

data StockDay = StockDay Price Day
type Price = Int
type Day = Int

This is basically a Tuple, but represented as data for type-level convenience.

What does comparing or sorting two StockDays mean? In my limited use-case, it just means comparing on Price. But there could be a use-case where I want to sort them by Day.

The Ord instance will completely fail me in this case.

A naive, incomplete, pair-product-only solution would be:

class Sortable a where
  compareFst :: a -> a -> Ordering
  compareSnd :: a -> a -> Ordering
  compareBoth :: a -> a -> Ordering

instance sortableStockDay :: Sortable StockDay where
  compareFst (StockDay p1 _) (StockDay p2 _) =
    if p1 < p2 then LT else GT
  compareSnd (StockDay _ d1) (StockDay _ d2) =
    if d1 < d2 then LT else GT
  compareBoth s1 s2 =
    case compareFst s1 s2 of
      EQ -> compareSnd s1 s2
      ord -> ord

But there is no way for me to let the compiler or the runtime know dynamically which comparing function to use when I sort an array:

toSorted :: Array a -> SortedArray a
toSorted xs = SortedArray (Array.sort xs)
-- but notice how I cant tell the sort algo which sort to use

I would have to write a different function for each type of sort:

toSortedFst :: forall a. Sortable a => Array a -> SortedArray a
toSortedFst xs = SortedArray (Array.sortBy compareFst xs)

toSortedSnd :: forall a. Sortable a => Array a -> SortedArray a
toSortedSnd xs = SortedArray (Array.sortBy compareSnd xs)

And now notice how it's lost some guarantees again because I could end up using the toSortedFst where I should be using toSortedSnd and there would be nothing preventing me from doing that. The types would check.

Languages like OCaml and Rust have context-specific comparisons. So you could essentially have multiple ways to compare and when you try to sort, you can pass it an additional information that tells the compiler/runtime which comparison to use.

(* Define the interface *)
module type COMPARABLE = sig
  type t
  val compare : t -> t -> int
end

(* Different comparison strategies *)
module StringLexical : COMPARABLE with type t = string = struct
  type t = string
  let compare = String.compare
end

module StringByLength : COMPARABLE with type t = string = struct
  type t = string
  let compare s1 s2 = compare (String.length s1) (String.length s2)
end

module StringByLengthThenLexical : COMPARABLE with type t = string = struct
  type t = string
  let compare s1 s2 =
    let len_cmp = compare (String.length s1) (String.length s2) in
    if len_cmp = 0 then String.compare s1 s2 else len_cmp
end

(* Generic sort function *)
let sort (module Cmp : COMPARABLE) list =
  List.sort Cmp.compare list

(* Usage - explicit choice of comparison *)
let words = ["hello"; "hi"; "world"; "a"; "programming"]

let () =
  (* Sort lexicographically *)
  let lexical = sort (module StringLexical) words in
Printf.printf "Lexical: %s\n" (String.concat "; " lexical);
  (* Output: Lexical: a; hello; hi; programming; world *)

  (* Sort by length *)
  let by_length = sort (module StringByLength) words in
  Printf.printf "By length: %s\n" (String.concat "; " by_length);
  (* Output: By length: a; hi; hello; world; programming *)

  (* Sort by length then lexical *)
  let combined = sort (module StringByLengthThenLexical) words in
  Printf.printf "Length then lexical: %s\n" (String.concat "; " combined)
  (* Output: Length then lexical: a; hi; hello; world; programming *)

(* You can even parameterize functions by comparison strategy *)
let find_max (module Cmp : COMPARABLE) list =
  match list with
  | [] -> None
  | hd :: tl -> Some (List.fold_left (fun acc x ->
      if Cmp.compare x acc > 0 then x else acc) hd tl)

let () =
  let words = ["cat"; "elephant"; "a"; "dog"] in

  (* Max by lexical order *)
  (match find_max (module StringLexical) words with
   | Some max -> Printf.printf "Lexical max: %s\n" max  (* elephant *)
   | None -> ());

  (* Max by length *)
  (match find_max (module StringByLength) words with
   | Some max -> Printf.printf "Length max: %s\n" max   (* elephant *)
   | None -> ())

We're using the same "sort" keyword but we get to instruct the language to use a different algorithm using that (module ....) declaration. That is rad.

Arguably, these are edge-cases. It's not often that one needs multiple strategies to compare two things which then affects how one sorts their list. But the fact that language designers in Rust and OCaml thought of these cases as well — and generalised them enough — is quite fascinating.

But it's just a wrapper! I shouldn't have to rewrite all methods

The SortedArray is merely a wrapper around Array. But because of the wrapping, any time I want to use an Array-like function (eg head, or tail or !! which is an operator to access the element at an index), I have to re-implement it. Like this:

head :: forall a. Ord a => SortedArray a -> Maybe a
head (SortedArray xs) = Array.head xs

filter :: forall a. Ord a => (a -> Boolean) -> SortedArray a -> SortedArray a
filter pred (SortedArray xs) = SortedArray (Array.filter pred xs)

-- and so on

You'd think that with all the advancements in compilers and type systems and type theory, this overhead wouldn't be required. After all, SortedArray is just a wrapper around a simple Array.

Of course, languages like Haskell/Purescript lets you kind of get rid of this overhead.

To do this, we just do some "derive"s:

derive instance newtypeSortedArray :: Newtype (SortedArray a) _
derive newtype instance functorSortedArray :: Functor SortedArray
derive newtype instance foldableSortedArray :: Foldable SortedArray
derive newtype instance traversableSortedArray :: Traversable SortedArray
derive newtype instance applySortedArray :: Apply SortedArray
derive newtype instance applicativeSortedArray :: Applicative SortedArray
derive newtype instance bindSortedArray :: Bind SortedArray
derive newtype instance monadSortedArray :: Monad SortedArray

This gives us the ability to run operations like map and fold but for other array operations like length, we have to write our functions. Instead of doing unwrap/wrap over and over again, the newtype derivation allows us to simply use a generic function called coerce which does the unwrapping and wrapping for us:

import Data.NewType (coerce)

length :: SortedArray a -> Int
length = coerce Array.length

All this is great but there are downsides and flaws to this approach.

When we derive newtype instance for a type like SortedArray, the compiler makes the SortedArray constructor "exposed". That goes all the way back to my original problem with my old solution — exposing the constructor means I could construct a non-sorted-array and wrap it in SortedArray to make it look like it is sorted. But it won't be.

The other issue, specific to this use-case, is that deriving functions like map automatically does not keep the sorted-ness guarantee.

This is wrong (and depending on the usage, dangerous):

map :: (a -> b) -> SortedArray a -> SortedArray b
map = coerce Array.map

wrong = map (\x -> x * -1) (SortedArray [1,2,3,4])
-- wrong == SortedArray [-1,-2,-3,-4]

alsoWrong = map (\x -> if x > 4 then x * -1 else x) (SortedArray [1,2,3,6,7])
-- alsoWrong == SortedArray [1,2,3,-6,-7]

And there's nothing that informs me of that unless I peak into the mapping function.

The correct map goes like this:

map : forall a b. Ord a => Ord b => (a -> b) -> SortedArray a -> SortedArray b
map f xs = toSorted (Array.map f (fromSorted xs))
-- or more succinctly
-- map f = toSorted <<< Array.map f <<< fromSorted

There are a handful of other functions that would break sorting if coerce-d through newtype derivations. Like mapWithIndex / indexedMap, insert, update, cons/snoc/push etc. Anything that updates elements (in-place or as a new result) while retaining the array-like structure is a problem.

I started this writeup as a journal entry after toying around with SortedArray's compiler guarantees. But as I wrote, I began exploring around and ended up in various rabbit holes of parameterisation, polymorphism and typeclass-like behaviours of other languages.

Some wonderful abstractions here if you are mindful of the gotchas.