Announcing the launch of vocab!
I am pleased to announce that my command line utitlity
vocab is available on github and ready
for use by the community!
vocab is a tool to help you expand your vocabulary, designed with simplicity in
mind.
Motivation
Being an avid reader of non fiction I often come across words I don’t know. Sometimes I can infer the meaning of an unknown word from its context and sometimes I can’t. When I can’t, I’m forced to pause my reading and look it up. Sometimes the word will come up a second time and I’ve already forgot its meaning! I want to better remember those words so I can expand my vocabulary and not be forced to interrupt my reading flow.
Being a developer I spend a lot of my computer time in the terminal. I created
vocab as a flashcard-like vocabulary tool for the terminal designed with
simplicity in mind. The goal of vocab is to help the user remember words they
want to remember. Implementation details below!
Implementation Details
This section will describe the inner workings of vocab such as data
persistence, how it selects words for a practice session, and functional
programming techniques that were used. If you haven’t read vocab’s
README
please do that before reading this section.
Data Persistence
Word and practice session data are stored in CSV files on your machine. The path to these files is OS dependent and I used the directories-jvm project to abstract that away from my program.
Ad-hoc polymorphism for converting a data object to its CSV representation
Ad-hoc polymorphism is a technique for adding functionality to types on the fly
as an alternative to subclassing. In scala the type class pattern is one
technique to achieve ad-hoc polymorphism. Let’s see how type classes are used in
vocab to achieve ad-hoc polymorphism.
There are two main case classes in vocab representing the data model:
case class Word(
word: String,
definition: String,
partOfSpeech: Option[SpeechPart],
numTimesPracticed: Int
)
case class PracticeSession(
sessionType: PracticeSessionType,
numWords: Int,
duration: Int,
timestamp: Int,
didFinish: Boolean
)
Before writing Word and PracticeSession objects to the storage files they
need to be converted to comma separated values. Using scala’s implicits we can
wrap Word and PracticeSession objects in objects with types that provide
this behavior:
sealed trait ToCSVRepr {
// A comma separated value representation of the implementing class
def toCSVRepr(): String
}
// A wrapper class for converting a word to its CSV representation
implicit class WordToCSVRepr(word: Word) extends ToCSVRepr {
def toCSVRepr: String = ???
}
// A wrapper class for converting a practice session to its CSV representation
implicit class PracticeSessionToCSVRepr(practiceSession: PracticeSession) extends ToCSVRepr {
def toCSVRepr: String = ???
}
We’ve defined an two implicit classes, one to wrap Words and one to wrap
PracticeSession’s and now we can call toCSVRepr directly on Word and
PracticeSession objects, even though toCSVRepr isn’t defined in those
classes!
Using Phantom Types To Make The Application Class Safer
Phantom types are useful for enforcing an ordering in a workflow. We can tell the compiler to only allow certain actions to occur in a specific order.
There are two distinct steps in running the vocab application:
- Interpreting the command line arguments
- Running the provided command
The steps are encapsulated in functions called on an instance of the
Application class.
// Parses the command line arguments
def parseArgs(...) = ???
// Runs the command generated from command line parsing
def runCommand(...) = ???
It obviously doesn’t make sense to call runCommand before calling parseArgs.
We can tell the compiler to make an invalid ordering like this impossible
using phantom types.
Let’s define a trait for each state of the Application: parsing the arguments, running the command, and being done:
object Application {
sealed trait State
object State {
sealed trait ParseArgs extends State
sealed trait RunCommand extends State
sealed trait PostCommand extends State
}
}
Now let’s make the Application class generic and impose the restriction that
the type parameter must be a subtype of Application.State:
case class Application[S <: Application.State](...) { ... }
Using the concept of implicit evidence we tell the compiler that
methods defined on Application can only be called for certain subtypes of S.
// Application must be in ParseArgs state to call this method
def parseArgs(args: Seq[String])(implicit ev: S =:= ParseArgs): Application[RunCommand]
// Application must be in RunCommand state to call this method
def runCommand(implicit ev: S =:= RunCommand): Application[PostCommand]
This tells the compiler that parseArgs can only be called on Applications in
the ParseArgs state and runCommand can only be called on Applications in the
RunCommand state.
Trying to call runCommand before parseArgs would yield the following
error:
> val app = Application[Application.State.ParseArgs]()
> app.runCommand
Cannot prove that application.Application.State.ParseArgs =:= application.Application.State.RunCommand.
While this example is almost trivial, enforcing an ordering on method calls could add a great deal of safety for a complex application with many states.
Reflection
The project’s first realease has 6,332 lines of code (LOC) written and 4,035 LOC removed. Roughly 2 lines of code were removed for every 3 written! I attribute this to a lot of unnecessary complexity and over-engineering in the beginning that I eventually removed and refactored. I think I started planning and generalizing the application for use cases that were never going to exist. Once I fleshed out a reasonable feature set and stuck to implementing only that feature set I was no longer over-engineering or over-generalizing.
Functional handling of side effects
Cats and
Scalaz offer datatypes for handling side
effects and I’m aware there is a great benefit to using IO monads for your
program’s input/ouput. This is something I hope to understand better in the
future but I prioritized launching the project over learning how to work with
side effect mondas because I was eager to start using vocab myself!
Manually reading/writing CSV files
CSV reading and writing is a solved problem and there are many libraries available. I wrote my own as an exercise in understanding the type class pattern, which I’m happy to say I’m now comfortable with.
Manually Parsing Command Line Arguments
vocab was a simple enough program for this to be tenable but I really should
invest time into learning how to use a solid argument parsing framework. I
recently discoverd scopt and it looks
promising. I will definitely use a third party argument parsing framework next
time I build a command line utility.