Announcing the launch of vocab!

I am pleased to announce that my command line utitlity vocab is available on github and ready for use by the community!

vocab is a tool to help you expand your vocabulary, designed with simplicity in mind.

Motivation

Being an avid reader of non fiction I often come across words I don’t know. Sometimes I can infer the meaning of an unknown word from its context and sometimes I can’t. When I can’t, I’m forced to pause my reading and look it up. Sometimes the word will come up a second time and I’ve already forgot its meaning! I want to better remember those words so I can expand my vocabulary and not be forced to interrupt my reading flow.

Being a developer I spend a lot of my computer time in the terminal. I created vocab as a flashcard-like vocabulary tool for the terminal designed with simplicity in mind. The goal of vocab is to help the user remember words they want to remember. Implementation details below!

Implementation Details

This section will describe the inner workings of vocab such as data persistence, how it selects words for a practice session, and functional programming techniques that were used. If you haven’t read vocab’s README please do that before reading this section.

Data Persistence

Word and practice session data are stored in CSV files on your machine. The path to these files is OS dependent and I used the directories-jvm project to abstract that away from my program.

Ad-hoc polymorphism for converting a data object to its CSV representation

Ad-hoc polymorphism is a technique for adding functionality to types on the fly as an alternative to subclassing. In scala the type class pattern is one technique to achieve ad-hoc polymorphism. Let’s see how type classes are used in vocab to achieve ad-hoc polymorphism.

There are two main case classes in vocab representing the data model:

case class Word(
  word: String,
  definition: String,
  partOfSpeech: Option[SpeechPart],
  numTimesPracticed: Int
)

case class PracticeSession(
  sessionType: PracticeSessionType,
  numWords: Int, 
  duration: Int,
  timestamp: Int,
  didFinish: Boolean 
)

Before writing Word and PracticeSession objects to the storage files they need to be converted to comma separated values. Using scala’s implicits we can wrap Word and PracticeSession objects in objects with types that provide this behavior:

sealed trait ToCSVRepr {
  // A comma separated value representation of the implementing class
  def toCSVRepr(): String
}

// A wrapper class for converting a word to its CSV representation
implicit class WordToCSVRepr(word: Word) extends ToCSVRepr {
  def toCSVRepr: String = ???
}

// A wrapper class for converting a practice session to its CSV representation
implicit class PracticeSessionToCSVRepr(practiceSession: PracticeSession) extends ToCSVRepr {
  def toCSVRepr: String = ???
}

We’ve defined an two implicit classes, one to wrap Words and one to wrap PracticeSession’s and now we can call toCSVRepr directly on Word and PracticeSession objects, even though toCSVRepr isn’t defined in those classes!

Using Phantom Types To Make The Application Class Safer

Phantom types are useful for enforcing an ordering in a workflow. We can tell the compiler to only allow certain actions to occur in a specific order.

There are two distinct steps in running the vocab application:

Interpreting the command line arguments
Running the provided command

The steps are encapsulated in functions called on an instance of the Application class.

// Parses the command line arguments
def parseArgs(...) = ???

// Runs the command generated from command line parsing
def runCommand(...) = ???

It obviously doesn’t make sense to call runCommand before calling parseArgs. We can tell the compiler to make an invalid ordering like this impossible using phantom types.

Let’s define a trait for each state of the Application: parsing the arguments, running the command, and being done:

object Application {
  sealed trait State
  object State {
    sealed trait ParseArgs extends State
    sealed trait RunCommand extends State
    sealed trait PostCommand extends State
  }
}

Now let’s make the Application class generic and impose the restriction that the type parameter must be a subtype of Application.State:

case class Application[S <: Application.State](...) { ... }

Using the concept of implicit evidence we tell the compiler that methods defined on Application can only be called for certain subtypes of S.

// Application must be in ParseArgs state to call this method
def parseArgs(args: Seq[String])(implicit ev: S =:= ParseArgs): Application[RunCommand]

// Application must be in RunCommand state to call this method
def runCommand(implicit ev: S =:= RunCommand): Application[PostCommand]

This tells the compiler that parseArgs can only be called on Applications in the ParseArgs state and runCommand can only be called on Applications in the RunCommand state.

Trying to call runCommand before parseArgs would yield the following error:

> val app = Application[Application.State.ParseArgs]()
> app.runCommand

Cannot prove that application.Application.State.ParseArgs =:= application.Application.State.RunCommand.

While this example is almost trivial, enforcing an ordering on method calls could add a great deal of safety for a complex application with many states.

Reflection

The project’s first realease has 6,332 lines of code (LOC) written and 4,035 LOC removed. Roughly 2 lines of code were removed for every 3 written! I attribute this to a lot of unnecessary complexity and over-engineering in the beginning that I eventually removed and refactored. I think I started planning and generalizing the application for use cases that were never going to exist. Once I fleshed out a reasonable feature set and stuck to implementing only that feature set I was no longer over-engineering or over-generalizing.

Functional handling of side effects

Cats and Scalaz offer datatypes for handling side effects and I’m aware there is a great benefit to using IO monads for your program’s input/ouput. This is something I hope to understand better in the future but I prioritized launching the project over learning how to work with side effect mondas because I was eager to start using vocab myself!

Manually reading/writing CSV files

CSV reading and writing is a solved problem and there are many libraries available. I wrote my own as an exercise in understanding the type class pattern, which I’m happy to say I’m now comfortable with.

Manually Parsing Command Line Arguments

vocab was a simple enough program for this to be tenable but I really should invest time into learning how to use a solid argument parsing framework. I recently discoverd scopt and it looks promising. I will definitely use a third party argument parsing framework next time I build a command line utility.

Written on April 9, 2020