Merging Case Classes with Monoid and Shapeless
The other day at work I came across a structure similar to the following. The data, types, and logic have all been greatly simplified to protect you from irrelevant noise.
final case class Info(
name: Option[String] = None,
city: Option[String] = None,
friendIds: Vector[Int] = Vector.empty
)
Essentially, all of our fields are either Option
or Vector
. Later we have some code that’s building our Info
data using something like the following.
// Query for profile info, setting the name and city fields if they exist.
def getProfileInfo(id: Int): Future[Option[Info]] =
db.getProfileRecord(id).map(_.map(r => Info(name = r.name, city = r.city)))
// Query for friend info, setting the friendIds field.
def getFriendInfo(id: Int): Future[Option[Info]] =
db.getFriends(id).map(_.map(r => Info(friendIds = r.friendIds)))
// Fetch all info sources, joining the results.
def getAllInfo(id: Int): Future[Either[String, Info]] =
for {
optProfileInfo <- getProfileInfo(id)
optFriendInfo <- getFriendInfo(id)
} yield (optProfileInfo, optFriendInfo) match {
case (Some(profileInfo), Some(friendInfo)) =>
Right(profileInfo.copy(friendIds = friendInfo.friendIds))
case (Some(profileInfo), None) =>
Right(profileInfo)
case (None, Some(friendInfo)) =>
Right(friendInfo)
case (None, None) =>
Left("Failed to fetch info")
}
That last getAllInfo
method stuck out to me. It seemed that all we were really trying to do there was combine Info
values. Sound familiar?
Making a Monoid
Data types which are combinable form a Monoid (fancy math term from Category Theory). For this example, we’ll be using the cats library.
Monoid
is a type class which contains two methods -
Aside: Astute readers may notice that I’ve left out Monoid
’s superclass Semigroup
, which is actually where its combine
method comes from. Monoid
builds on Semigroup
by introducing the empty
method, sometimes also referred to as the identity. As such, Monoid
is often defined as a Semigroup
with identity. Not really a prerequisite for understanding this article, but worth pointing out for completeness.
Let’s define a Monoid
instance for Info
-
import cats.Monoid
object Info {
implicit val monoid: Monoid[Info] = new Monoid[Info] {
override def empty: Info = Info()
override def combine(x: Info, y: Info): Info = {
val Info(name1, city1, friendIds1) = x
val Info(name2, city2, friendIds2) = y
Info(name1.orElse(name2), city1.orElse(city2), friendIds1 ++ friendIds2)
}
}
}
Now we can combine Info
values!
Monoid[Info].combine(
Info(None, Some("a"), Vector(1)),
Info(Some("c"), Some("d"), Vector(2))
)
// Info(Some(c),Some(a),Vector(1, 2))
Alternatively we can use the |+|
operator -
import cats.syntax.monoid._
Info(None, Some("a"), Vector(1)) |+| Info(Some("c"), Some("d"), Vector(2))
// Info(Some(c),Some(a),Vector(1, 2))
You can also use Foldable
to collapse a sequence of values with combineAll
. This can be useful if you have many or even an indeterminate number of values to combine -
import cats.instances.list._
import cats.syntax.foldable._
List(
Info(None, Some("a"), Vector(1)),
Info(Some("c"), Some("d"), Vector(2))
).combineAll
// Info(Some(c),Some(a),Vector(1, 2))
In our case, we actually had Option[Info]
, so we can piggy-back off of Option
’s instance of Monoid
which simply appends the values inside of Some
s, discarding the None
s.
List(None: Option[Info]).combineAll
// None
List(
None,
Some(Info(Some("c"), Some("d"), Vector(2)))
).combineAll
// Some(Info(Some(c),Some(d),Vector(2)))
List[Option[Info]](
Some(Info(None, Some("a"), Vector(1))),
Some(Info(Some("c"), Some("d"), Vector(2)))
).combineAll
// Some(Info(Some(c),Some(a),Vector(1, 2)))
Putting it all together, our getAllInfo
logic can become -
import cats.instances.list._
import cats.instances.option._
import cats.syntax.foldable._
def getAllInfo(id: Int): Future[Either[String, Info]] =
for {
optProfileInfo <- getProfileInfo(id)
optFriendInfo <- getFriendInfo(id)
} yield List(
optProfileInfo,
optFriendInfo
).combineAll.toRight("Failed to fetch info")
Much simpler right?
Generalizing Merge
Now, our Monoid definition is just a little tedious in this case, but imagine we have more fields on our case class. Writing this by hand can get a little unwieldy and error-prone (particularly since our case class has default fields). Can we abstract this further?
Let’s look at that Monoid
definition again, specifically where we’re merging the fields -
So for Option
values we need to use .orElse
, and for Vector
values we need to use ++
. Is there some abstraction that does this for us? Why yes, yes there is!
Enter MonoidK. It’s very similar to Monoid
except it operates on a higher kind. It has a superclass: SemigroupK
, but we’ll simplify the definition for now.
So we could abstract away our hand-written combines with -
import cats.MonoidK.ops._
import cats.instances.vector._
import cats.instances.option._
...
Info(name1.combineK(name2), city1.combineK(city2), friendIds1.combineK(friendIds2))
But that doesn’t really feel much better. It still requires us to write each field by hand and, strictly by programmer diligence, to not forget a field.
Ad tedium ⇒ Correctness
Let’s try to abstract away the field boilerplate with shapeless’ Generic
type class.
import shapeless._
val g = Generic[Info]
// g: shapeless.Generic[Info]{type Repr = Option[String] :: Option[String] :: scala.collection.immutable.Vector[Int] :: shapeless.HNil} = anon$macro$8$1@381fe083
g.to(Info())
// None :: None :: Vector() :: HNil
val hlist = g.to(Info(Some("foo"), None, Vector(1, 2)))
// Some(foo) :: None :: Vector(1, 2) :: HNil
g.from(hlist)
// Info(Some(foo),None,Vector(1, 2))
Generic
allows us to convert our case class into an HList
and back again. The intermediate HList
representation provides us with ways to perform generic operations on the structure before we re-assemble it.
Let’s introduce our own type class called GMerge
which will provide generic merging functionality for types which have a Generic
instance (which includes all case classes) where all of the fields support MonoidK
.
trait GMerge[A] {
def merge(x: A, y: A): A
}
object GMerge {
// Provide a helper method for summoning a GMerge instance.
def apply[A: GMerge]: GMerge[A] = implicitly
// Auto-derive an instance of GMerge.
implicit def default[A]: GMerge[A] = new GMerge[A] {
override def merge(x: A, y: A): A = ???
}
}
Goal now is to implement the default
method which derives our GMerge
instance. Let’s start by just attempting to think about the logical flow.
- Obtain the generic representation (
HList
) of bothx
andy
- Zip the
HList
values together, producing anHList
ofTuple2
((B, B)
) - Run a merging function over the zipped
HList
, merging the elements of eachTuple2
together withMonoidK
, returning a newHList
matching the original type structure. - Convert the “merged”
HList
In the case of Info
, the types would look something like -
val x = Info(Some("foo"), None, Vector(1, 2))
val y = Info(None, Some("bar"), Vector.empty)
val g = Generic[Info]
val hlistX: Option[String] :: Option[String] :: Vector[Int] :: HNil = g.to(x)
// Some(foo) :: None :: Vector(1, 2) :: HNil
val hlistY: Option[String] :: Option[String] :: Vector[Int] :: HNil = g.to(y)
// None :: Some(bar) :: Vector() :: HNil
val zipped: (Option[String], Option[String]) ::
(Option[String], Option[String]) ::
(Vector[Int], Vector[Int]) ::
HNil = hlistX.zip(hlistY)
// (Some(foo),None) :: (None,Some(bar)) :: (Vector(1, 2),Vector()) :: HNil
You can actually run all of the code above in an Scala console for some defined x
and y
values. The next part, however, won’t work, but demonstrates the principle of what we need to do.
def f = ??? // Function for merging fields.
val hlistR: Option[String] :: Option[String] :: Vector[Int] :: HNil = zipped.map(f)
val result: Info = g.from(hlistR)
So somehow we need to build a function which can be mapped over an HList
of Tuple2
, merging the fields accordingly with MonoidK
. Let’s take a whack at it.
def mergeTuple[F[_], V](t: (F[V], F[V]))(implicit F: MonoidK[F]): F[V] = t match {
case (x, y) => F.combineK(x, y)
}
Let’s try it out -
mergeTuple((Option.empty[String], Option("bar")))
// res10: Option[String] = Some(bar)
mergeTuple((Option("foo"), Option("bar")))
// res11: Option[String] = Some(foo)
mergeTuple((Vector(1), Vector(2)))
// res12: scala.collection.immutable.Vector[Int] = Vector(1, 2)
Alright! Let’s supply it to .map
and be done with this.
error: polymorphic expression cannot be instantiated to expected type;
found : [F[_], V](t: (F[V], F[V]))(implicit F: cats.MonoidK[F])F[V]
required: shapeless.Poly
zipped.map(mergeTuple)
^
Huh? We gave it a perfectly fine function to work with, but it wants us to give it a…Poly
? Why?
Polymorphic Functions
The problem is that simple (non-generic) data structures like List
are homogeneously typed: they only really contain elements of a single type.
List((None, Option("bar")), (Option("baz"), None))
// res14: List[(Option[String], Option[String])] = List((None,Some(bar)), (Some(baz),None))
res14.map(mergeTuple[Option, String])
// res15: List[Option[String]] = List(Some(bar), Some(baz))
In the above case, our List
elements are all typed (Option[String], Option[String])
and there is no way to have different types at different element positions. Contrast this with HList
-
hlistX
// res16: Option[String] :: Option[String] :: Vector[Int] :: shapeless.HNil = Some(foo) :: None :: Vector(1, 2) :: HNil
Here each element has its own type. Scala doesn’t have a built-in way to deal with generic polymorphism (not to be confused with generics which is actually parametric polymorphism). This is where shapeless’ Poly
comes in.
In languages like Haskell, functions are already polymorphic. In Scala we don’t have that luxury, so Poly
provides us with polymorphic function support. There are arity variants such as Poly1
, Poly2
, etc. Let’s reimplement our mergeTuple
function in terms of Poly1
-
object polyMerge extends Poly1 {
implicit def cases[F[_], V](implicit F: MonoidK[F]): Case.Aux[(F[V], F[V]), F[V]] =
at[(F[V], F[V])] { case (x, y) => F.combineK(x, y) }
}
To get a deeper understanding of how Poly
works, check out the Shapeless 2.0 overview documentation.
Let’s take our new polyMerge
function for a spin -
polyMerge((Vector(1), Vector(2)))
// res17: scala.collection.immutable.Vector[Int] = Vector(1, 2)
polyMerge((None: Option[String], Option("bar")))
// res18: Option[String] = Some(bar)
polyMerge((Option("foo"), Option("bar")))
// res19: Option[String] = Some(foo)
It works! Now we can add in our polyMerge
function into our Info
example. Picking up from where we left off -
val hlistR = zipped.map(polyMerge)
// Some(foo) :: Some(bar) :: Vector(1, 2) :: HNil
val res: Info = g.from(hlistR)
// Info(Some(foo),Some(bar),Vector(1, 2))
Nice! One simplification we can make is to use .zipWith
instead of .zip().map()
. This also means that we need to use Poly2
instead of Poly1
, which actually turns out to be simpler -
object polyMerge extends Poly2 {
implicit def cases[F[_], V](implicit F: MonoidK[F]): Case.Aux[F[V], F[V], F[V]] =
at[F[V], F[V]](F.combineK)
}
Alright, let’s try to put it all together now into our default
method and get to auto-deriving already!
Deriving GMerge
When working with Shapeless, compiler errors can be a little cryptic, so I’ll walk through each step, demonstrating how to resolve them as we encounter each one.
Here’s our original implementation with Info
removed for type A
.
implicit def default[A]: GMerge[A] = new GMerge[A] {
override def merge(x: A, y: A): A = {
val g = Generic[A]
val hlistX = g.to(x)
val hlistY = g.to(y)
val hlistR = hlistX.zipWith(hlistY)(polyMerge)
g.from(hlistR)
}
}
Error:(45, 20) could not find implicit value for parameter gen: shapeless.Generic[A]
val g = Generic[A]
Following the compiler’s orders, let’s move the Generic
from the method body and into the implicit constraints.
implicit def default[A](
implicit
g: Generic[A]
): GMerge[A] = new GMerge[A] {
override def merge(x: A, y: A): A = {
val hlistX = g.to(x)
val hlistY = g.to(y)
val hlistR = hlistX.zipWith(hlistY)(polyMerge)
g.from(hlistR)
}
}
Error:(50, 27) value zipWith is not a member of g.Repr
val hlistR = hlistX.zipWith(hlistY)(polyMerge)
Problem now is that the compiler doesn’t know that our generic Repr
is an HList
. Let’s tell it.
implicit def default[A, L <: HList](
implicit
g: Generic.Aux[A, L]
): GMerge[A] = new GMerge[A] {
override def merge(x: A, y: A): A = {
val hlistX = g.to(x)
val hlistY = g.to(y)
val hlistR = hlistX.zipWith(hlistY)(polyMerge)
g.from(hlistR)
}
}
Error:(50, 42) could not find implicit value for parameter
zipWith: shapeless.ops.hlist.ZipWith[g.Repr,g.Repr,polyMerge.type]
val hlistR = hlistX.zipWith(hlistY)(polyMerge)
This part might be a little more confusing, so let’s look at (a simplified version of) the ZipWIth
type class.
trait ZipWith[L <: HList, R <: HList, P <: Poly2] { type Out <: HList }
object ZipWith {
type Aux[L <: HList, R <: HList, P <: Poly2, Out0 <: HList] =
ZipWith[L, R, P] { type Out = Out0 }
}
The type arguments correspond as follows -
L
- The leftHList
being zippedR
- The rightHList
being zippedP
- ThePoly
function being appliedOut
- The return type ofzipWith
, which is the resultingHList
type returned from applying ourPoly
function to each of the zipped elements.
In our case the left and right element types of each HList
should be the same since we’re just merging A
s together, and the return type should also be the same. For the Poly
type, we can name function directly with polyMerge.type
. This gives us a constraint that looks like -
Putting it all together gives us -
implicit def default[A, L <: HList](
implicit
g: Generic.Aux[A, L],
z: ZipWith.Aux[L, L, polyMerge.type, L]
): GMerge[A] = new GMerge[A] {
override def merge(x: A, y: A): A = {
val hlistX = g.to(x)
val hlistY = g.to(y)
val hlistR = hlistX.zipWith(hlistY)(polyMerge)
g.from(hlistR)
}
}
And it compiles! Now let’s try to use it to derive a combine
implementation for Monoid[Info]
-
object Info {
implicit val monoid: Monoid[Info] = new Monoid[Info] {
override def empty: Info = Info()
override def combine(x: Info, y: Info): Info = GMerge[Info].merge(x, y)
}
}
Error:(24, 58) could not find implicit value for evidence parameter of type GMerge[Info]
override def combine(x: Info, y: Info): Info = GMerge[Info].merge(x, y)
This error is extraordinarily cryptic and will happen if you are missing one of the necessary implicits to allow polyMerge
to work for each of your case class fields. In this case, we need to bring in the MonoidK
instances for Option
and Vector
, so let’s do that -
And it compiles!
The Final Product
For completeness, here’s the full implementation. It’s a good practice to keep the Info
case class and GMerge
trait in their own respective files, but for simplicity (yet again) just including it in a single, working snippet.
import cats.{Monoid, MonoidK}
import cats.instances.vector._
import cats.instances.option._
import shapeless._
import shapeless.ops.hlist.ZipWith
final case class Info(
name: Option[String] = None,
city: Option[String] = None,
friendIds: Vector[Int] = Vector.empty
)
object Info {
implicit val monoid: Monoid[Info] = new Monoid[Info] {
override def empty: Info = Info()
override def combine(x: Info, y: Info): Info = GMerge[Info].merge(x, y)
}
}
trait GMerge[A] {
def merge(x: A, y: A): A
}
object GMerge {
def apply[A: GMerge]: GMerge[A] = implicitly
implicit def default[A, L <: HList](
implicit
g: Generic.Aux[A, L],
z: ZipWith.Aux[L, L, polyMerge.type, L]
): GMerge[A] = new GMerge[A] {
override def merge(x: A, y: A): A = g.from(g.to(x).zipWith(g.to(y))(polyMerge))
}
object polyMerge extends Poly2 {
implicit def cases[F[_], V](implicit F: MonoidK[F]): Case.Aux[F[V], F[V], F[V]] =
at[F[V], F[V]](F.combineK)
}
}
Aside: You could go even further and derive your own empty
method using Shapeless as well; however, as of this writing I found this a little more challenging than expected due to a scalac bug. The solution involves writing a proxy type class to delegate to MonoidK
. See the linked ticket for more info.
A good next step would be to write a test for your case class to ensure that your Monoid
instance obeys the laws.
import cats.kernel.laws.discipline.MonoidTests
import org.scalacheck.Arbitrary
import org.scalatest.FunSuite
import org.typelevel.discipline.scalatest.Discipline
class InfoSpec extends FunSuite with Discipline {
implicit val arbInfo: Arbitrary[Info] = ???
checkAll("Monoid[Info]", MonoidTests[Info].monoid)
}
Happy Deriving!