Merging Case Classes with Monoid and Shapeless

April 26, 2018

The other day at work I came across a structure similar to the following. The data, types, and logic have all been greatly simplified to protect you from irrelevant noise.

Essentially, all of our fields are either Option or Vector. Later we have some code that’s building our Info data using something like the following.

That last getAllInfo method stuck out to me. It seemed that all we were really trying to do there was combine Info values. Sound familiar?

Making a Monoid

Data types which are combinable form a Monoid (fancy math term from Category Theory). For this example, we’ll be using the cats library.

Monoid is a type class which contains two methods -

Aside: Astute readers may notice that I’ve left out Monoid’s superclass Semigroup, which is actually where its combine method comes from. Monoid builds on Semigroup by introducing the empty method, sometimes also referred to as the identity. As such, Monoid is often defined as a Semigroup with identity. Not really a prerequisite for understanding this article, but worth pointing out for completeness.

Let’s define a Monoid instance for Info -

Now we can combine Info values!

Alternatively we can use the |+| operator -

You can also use Foldable to collapse a sequence of values with combineAll. This can be useful if you have many or even an indeterminate number of values to combine -

In our case, we actually had Option[Info], so we can piggy-back off of Option’s instance of Monoid which simply appends the values inside of Somes, discarding the Nones.

Putting it all together, our getAllInfo logic can become -

Much simpler right?

Generalizing Merge

Now, our Monoid definition is just a little tedious in this case, but imagine we have more fields on our case class. Writing this by hand can get a little unwieldy and error-prone (particularly since our case class has default fields). Can we abstract this further?

Let’s look at that Monoid definition again, specifically where we’re merging the fields -

So for Option values we need to use .orElse, and for Vector values we need to use ++. Is there some abstraction that does this for us? Why yes, yes there is!

Enter MonoidK. It’s very similar to Monoid except it operates on a higher kind. It has a superclass: SemigroupK, but we’ll simplify the definition for now.

So we could abstract away our hand-written combines with -

But that doesn’t really feel much better. It still requires us to write each field by hand and, strictly by programmer diligence, to not forget a field.

Ad tedium ⇒ Correctness

Let’s try to abstract away the field boilerplate with shapelessGeneric type class.

Generic allows us to convert our case class into an HList and back again. The intermediate HList representation provides us with ways to perform generic operations on the structure before we re-assemble it.

Let’s introduce our own type class called GMerge which will provide generic merging functionality for types which have a Generic instance (which includes all case classes) where all of the fields support MonoidK.

Goal now is to implement the default method which derives our GMerge instance. Let’s start by just attempting to think about the logical flow.

  1. Obtain the generic representation (HList) of both x and y
  2. Zip the HList values together, producing an HList of Tuple2 ((B, B))
  3. Run a merging function over the zipped HList, merging the elements of each Tuple2 together with MonoidK, returning a new HList matching the original type structure.
  4. Convert the “merged” HList

In the case of Info, the types would look something like -

You can actually run all of the code above in an Scala console for some defined x and y values. The next part, however, won’t work, but demonstrates the principle of what we need to do.

So somehow we need to build a function which can be mapped over an HList of Tuple2, merging the fields accordingly with MonoidK. Let’s take a whack at it.

Let’s try it out -

Alright! Let’s supply it to .map and be done with this.

error: polymorphic expression cannot be instantiated to expected type;
 found   : [F[_], V](t: (F[V], F[V]))(implicit F: cats.MonoidK[F])F[V]
 required: shapeless.Poly

Huh? We gave it a perfectly fine function to work with, but it wants us to give it a…Poly? Why?

Polymorphic Functions

The problem is that simple (non-generic) data structures like List are homogeneously typed: they only really contain elements of a single type.

In the above case, our List elements are all typed (Option[String], Option[String]) and there is no way to have different types at different element positions. Contrast this with HList -

Here each element has its own type. Scala doesn’t have a built-in way to deal with generic polymorphism (not to be confused with generics which is actually parametric polymorphism). This is where shapeless’ Poly comes in.

In languages like Haskell, functions are already polymorphic. In Scala we don’t have that luxury, so Poly provides us with polymorphic function support. There are arity variants such as Poly1, Poly2, etc. Let’s reimplement our mergeTuple function in terms of Poly1 -

To get a deeper understanding of how Poly works, check out the Shapeless 2.0 overview documentation.

Let’s take our new polyMerge function for a spin -

It works! Now we can add in our polyMerge function into our Info example. Picking up from where we left off -

Nice! One simplification we can make is to use .zipWith instead of .zip().map(). This also means that we need to use Poly2 instead of Poly1, which actually turns out to be simpler -

Alright, let’s try to put it all together now into our default method and get to auto-deriving already!

Deriving GMerge

When working with Shapeless, compiler errors can be a little cryptic, so I’ll walk through each step, demonstrating how to resolve them as we encounter each one.

Here’s our original implementation with Info removed for type A.

Error:(45, 20) could not find implicit value for parameter gen: shapeless.Generic[A]
    val g = Generic[A]

Following the compiler’s orders, let’s move the Generic from the method body and into the implicit constraints.

Error:(50, 27) value zipWith is not a member of g.Repr
      val hlistR = hlistX.zipWith(hlistY)(polyMerge)

Problem now is that the compiler doesn’t know that our generic Repr is an HList. Let’s tell it.

Error:(50, 42) could not find implicit value for parameter
  zipWith: shapeless.ops.hlist.ZipWith[g.Repr,g.Repr,polyMerge.type]
      val hlistR = hlistX.zipWith(hlistY)(polyMerge)

This part might be a little more confusing, so let’s look at (a simplified version of) the ZipWIth type class.

The type arguments correspond as follows -

  • L - The left HList being zipped
  • R - The right HList being zipped
  • P - The Poly function being applied
  • Out - The return type of zipWith, which is the resulting HList type returned from applying our Poly function to each of the zipped elements.

In our case the left and right element types of each HList should be the same since we’re just merging As together, and the return type should also be the same. For the Poly type, we can name function directly with polyMerge.type. This gives us a constraint that looks like -

Putting it all together gives us -

And it compiles! Now let’s try to use it to derive a combine implementation for Monoid[Info] -

Error:(24, 58) could not find implicit value for evidence parameter of type GMerge[Info]
    override def combine(x: Info, y: Info): Info = GMerge[Info].merge(x, y)

This error is extraordinarily cryptic and will happen if you are missing one of the necessary implicits to allow polyMerge to work for each of your case class fields. In this case, we need to bring in the MonoidK instances for Option and Vector, so let’s do that -

And it compiles!

The Final Product

For completeness, here’s the full implementation. It’s a good practice to keep the Info case class and GMerge trait in their own respective files, but for simplicity (yet again) just including it in a single, working snippet.

Aside: You could go even further and derive your own empty method using Shapeless as well; however, as of this writing I found this a little more challenging than expected due to a scalac bug. The solution involves writing a proxy type class to delegate to MonoidK. See the linked ticket for more info.

A good next step would be to write a test for your case class to ensure that your Monoid instance obeys the laws.

Happy Deriving!