Mangling Swift Source Code With SwiftSyntax
Introduction
At my work I recently added SwiftLint to one of our iOS apps. I highly recommend it to the iOS devs out there with no linting on their projects!
In order for SwiftLint to work its magic it has to be able to inspect and modify Swift source code. The authors of SwiftLint opted for SourceKit, a “framework for supporting IDE features like indexing, syntax-coloring, code-completion, etc.” Working with SwiftLint piqued my curiosity about other frameworks that facilitate the handling of Swift source code. My googling led me to SwiftSyntax, a set of Swift bindings for libSyntax.
I started tinkering with SwiftSyntax just for fun and built some linter-inspired source code rewriters. Here is a short tutorial by example on getting started with the framework. This tutorial assumes you are familiar with Swift and have an elementary understanding of programming language parsing.
Building a Syntax tree from a source file
Each element in the source code’s abstract syntax tree (AST) is represented as a struct inheriting from the
Syntax
struct and
adopting the SyntaxProtocol
protocol. This protocol defines common attributes
of nodes in the AST such as child nodes, parent nodes, and more. You can
generate a syntax node representing the entire source file using the
SyntaxParser
.
In the release I used, this is named SyntaxTreeParser
.
import SwiftSyntax
let url = URL(fileURLWithPath: "HelloWorld.swift")
let sourceFile = try! SyntaxTreeParser.parse(url)
Voila! We now have a root syntax node for the file at pathToFile
.
Traversing the Syntax Tree
Traversing the tree generated in the previous section is made simple by the
SyntaxRewriter
class. If you take a look at its implementation, there is a corresponding visit(_:)
method for every type of node that can appear in a syntax tree.
The default implementation simply visits the node’s children recursively. We can override visit(_:)
for each type of node we’re interested in.
Example 1 - Rewriting Integer Literals
In this example we’re going to clean up the integer literals in our code
base. Longer integer literals that aren’t separated by underscores are hard to read. For
example, it’s much easier to discern the value of x
written this way:
let x = 1_000_000_000
than it is written this way:
let x = 1000000000
We aim to group the digits of large integer literals into threes, making them easier to read.
A visit to this nice AST
explorer tells us that
1000000000
corresponds to an IntegerLiteralExpr
. After some digging in the
SwiftSyntax source code I discovered that what we’re looking for is a
TokenSyntax
whose “kind” is an integer literal, similar to the
example in SwiftSyntax
’s
README.
Let’s create a skeleton of our integer literal rewriter class, overriding
visit(_:)
for TokenSyntax
nodes.
final class IntegerLiteralRewriter: SyntaxRewriter {
override func visit(_ token: TokenSyntax) -> Syntax {
return super.visit(token)
}
}
The next step is to implement visit(_:)
.
Filtering for Integer Literal Nodes
Let’s return early if the TokenSyntax kind is not what we’re looking for
guard case .integerLiteral(let digits) = token.tokenKind else {
return super.visit(token)
}
A full list of kinds for TokenSyntax nodes is available in the
TokenKind
enum.
Reformatting the digits
Here is my implementation for reformatting the digits. Feel free to write your own as an exercise!
There are two steps:
- Remove any existing underscores - this is in case underscores have been used
to format the integer literal in a way that is different from our desired
format. For
example, we don’t want to deal with a literal like
100_0_0_0
. - Add in the underscores.
// Remove existing underscores
let integerTextWithoutUnderscores = String(text.filter {
("0"..."9").contains($0) })
// Starting from the least significant digit, we will add an underscore
// every three digits
var integerTextWithUnderscores = ""
for (i, c) in integerTextWithoutUnderscores.reversed().enumerated() {
if i % 3 == 0 && i != 0 { // don't add an underscore to the beginning!
integerTextWithUnderscores.append("_")
}
integerTextWithUnderscores.append(c)
}
integerTextWithUnderscores = String(integerTextWithUnderscores.reversed())
Returning A New TokenSyntax Node
All Syntax Nodes are structs whose members cannot be modified. We need
to return a copy of the original node with the updated integer literal. We can
use a with
API for this.
// Return the same integer literal token, but with the underscores
let newToken = token.withKind(.integerLiteral(integerTextWithUnderscores))
return super.visit(newToken)
And there we have it! Let’s see how it behaves on this test case:
Test Input
let x = 100000000
let y = 1198756
let z = 987654321
Test Output
let x = 100_000_000
let y = 1_198_756
let z = 987_654_321
Example 2 - Converting Snake Case Declarations To Camel Case
As Swift programmers we detest snake case. It is anathema to writing beautiful, swifty code ;). We’re going to write a class to convert any snake case declarations to camel case. Again, I used the AST explorer to determine what types of nodes we need to visit.
Here are the two functions we need to override:
open func visit(_ node: IdentifierExprSyntax) -> ExprSyntax
open func visit(_ node: FunctionParameterSyntax) -> Syntax
The first covers expressions like
let big_snake = small_snake + medium_snake
while the second covers function parameters like
func eatRats(with_snake snake: Snake, some_rats: [Rat]) {}
Transforming a snake case string to a camel case string
Before we deal with the syntax nodes we need a method for doing the conversion. As an exercise feel free to write your own implementation :). Here is mine:
/// Removes all underscores from the identifier and capitalizes characters
/// following underscores. Assumes `identifier` is a valid identifier. Ignores
/// leading underscores.
private func convertToCamelCase(_ identifier: String) -> String {
let identifier = Array(identifier)
var newIdentifier = ""
var i = 0
var hasSeenNonUnderscoreCharacter = false
while i < identifier.count {
if identifier[i] != "_" {
hasSeenNonUnderscoreCharacter = true
}
if identifier[i] == "_" && !hasSeenNonUnderscoreCharacter {
newIdentifier.append("_")
i += 1
} else if identifier[i] == "_" &&
i+1 < identifier.count &&
("a"..."z").contains(identifier[i+1]) {
newIdentifier.append(identifier[i+1].uppercased())
i += 2
} else if identifier[i] != "_" {
newIdentifier.append(identifier[i])
i += 1
} else {
i += 1
}
}
return newIdentifier
}
Visiting IdentifierExprSyntax Nodes
This is the easy one. It turns out an IdentifierExprSyntax
node’s identifer is
a TokenSyntax
object, so the implementation is very similar to our
IntegerLiteralRewriter
.
override func visit(_ node: IdentifierExprSyntax) -> ExprSyntax {
guard case .identifier(let identifier) = node.identifier.tokenKind else { return node }
if isSnakeCase(identifier) {
let newIdentifier = convertToCamelCase(identifier)
let newToken = node.identifier.withKind(.identifier(newIdentifier))
let newNode = node.withIdentifier(newToken)
return super.visit(newNode)
}
return super.visit(node)
}
Visiting FunctionParameterSyntax Nodes
This one is a bit more complicated. We have two cases to deal with:
- The function parameter has only a local name - this is the only option in
most programming languages: e.g.
func eatRats(snake: Snake)
- The function parameter has an external and local name. If you write Swift
code you have seen this before: e.g.
func eatRats(withSnake snake: Snake)
FunctionParameterSyntax
nodes have firstName
and secondName
properties.
They are both TokenSyntax
objects. Interestingly, depending on the case, the
propery that holds the local parameter name is different.
When the parameter only has a local name it’s stored in the firstName
property of the node. When the parameter has both the external and local name
the external name is stored in the firstName
property and the local name is
stored in the secondName
property.
We’ll handle each case separately.
override func visit(_ node: FunctionParameterSyntax) -> Syntax {
// If both firstName and secondName are non nil then it's a function
// parameter name like foo(withX x: ...) and we want to modify the
// secondName. If only firstName is non nil then it's a function parameter
// like foo(x: ...) and we wont to modify the first name.
if let _ = node.firstName,
let secondNameToken = node.secondName,
case .identifier(let identifier) = secondNameToken.tokenKind,
isSnakeCase(identifier) {
let newIdentifier = convertToCamelCase(identifier)
let newSecondName = secondNameToken.withKind(.identifier(newIdentifier))
let newNode = node.withSecondName(newSecondName)
return super.visit(newNode)
} else if let firstNameToken = node.firstName,
case .identifier(let identifier) = firstNameToken.tokenKind,
isSnakeCase(identifier) {
let newIdentifier = convertToCamelCase(identifier)
let newFirstName = firstNameToken.withKind(.identifier(newIdentifier))
let newNode = node.withFirstName(newFirstName)
return super.visit(newNode)
}
// We should never get here. You can't have an unnamed parameter!
return super.visit(node)
}
And there we have it! Let’s try a test case:
Test Input
import scary_snek
let scary_anaconda = 5
var scary_cobra = 10
let big_python = scary_cobra * scary_anaconda
Test Output
import scary_snek
let scaryAnaconda = 5
var scaryCobra = 10
let bigPython = scaryCobra + scaryAnaconda
Notice how it didn’t modify the import. That’s as expected because we didn’t
override the visit(_:)
function for imports!
The Meta Testcase
For a more thorough test I
rewrote SnakeCaseRewriter
using snake case
declarations! After running the snake case version of SnakeCaseRewriter
through SnakeCaseRewriter
, the output was the original SnakeCaseRewriter
source, as we would expect!
Conclusion
SwiftSyntax provides a great API for modifying Swift source code! All of my examples were linter inspired but if you come up with a creative use for SwiftSyntax let me know via email! :)
Source Code
All the source code and test cases for this post are available as a Swift Package here.