Elixir might not seem the obvious first choice for a language to build a compiler in but it might be a better fit than you think. Here are some of the techniques I used developing a compiler in Elixir.

Compiler?

This compiler takes as input

These both need to be scanned (lexical analysis), symbol tables built and the code generated to implement an arrangement based on the supplied program. This runs as a component of a web service.

Elixir

Elixir is a modern language, strong on textual processing with comprehensive regular expression and string manipulation libraries. It has fundamental support for textual lists (implemented as linked lists) and maps. Elixir maps are a data structure where a key relates to a value. A key does not have to be a scalar type. It can be a string for instance. In this way, they are similar to a hash in perl or an associative array in awk.

Why Elixir?

Elixir is built on the Erlang VM. It is a purely functional language and integrates with Erlang. The Erlang Open Telecom Platform (OTP) is a good infrastructure for process supervision and resilience generally. Importantly, programs which are written in Elixir, being functions, scale well. It was a design goal that the underlying web service scales well. Phoenix is a web server written in Elixir. Phoenix / Elixir was chosen as the web-service framework, with the service implemented in Elixir.

Lexical Analysis

No Lex and Yacc style tools were used. Scanners are implemented using recursive decent and regular expressions.

Syntax Analysis

A recursive descent approach was taken for the syntax analysis with progressive decomposition by calling functions. At each level, lexicons were generated with regular expressions, the syntax analysed and then further decomposed by calling appropriate functions to continue the lexical and syntactic analysis.

Example - Arrangement Parser

Here is an example of the lexical and syntactic approach with the parsing of an arrangement implemented as fileName (fName).

This function determines the syntactical structure and then calls the appropriate next-level function based on the match. There are four possible matches. All have the "N" part with both the "I" and "E" parts being optional. These are for introduction and ending sctions. The next level parsing function is passed the entire string. Often though, next level functions are passed components derived from the higher level. This function has been designed for readability and verification, not performance. The performance of the algorithm can be improved by not performing any further Regex.scan()'s once one is successful. As it stands, I find it easier to read.

Here is a fragment of one of the next-level functions which continues the parsing further.

The Regex.scan()'s regular expression used may return 4, 5, or 6 parts depending on the supplied string. These are extracted and then the logic continues to parse those components as required. This is a classical recursive decent approach. The final line of the function returns the parsed and converted string. The function of this library is to parse names converting them into an intermediary code that is more convenient for the code generation phase. Effectively, it expands macros and converts everything into a comma-separated sequence of musical-section-identifiers. These indicates a section's

Here's an example of some basic intermediary code,

KF,6A,6A,8B,8A,KM,6A,6A,8B,8A,KL,6A,6A,8B,8A,KE,2A

Symbol Table Management in Elixir

To generate code a symbol table needs to be maintained that holds the meta-data for the source audio file. This way values can be directly looked up in the table instead of repeatedly scanning the meta-data text file.

How to maintain a symbol table in Elixir?

There are a number of approaches to maintaining data in Elixir. One is Ecto, an Elixir database. However, we have no strict requirement for the data to persists beyond the execution of this compilation and also no requirement for concurrent access. A classic functional programming way to maintain a data structure with functional languages is to both pass the structure to a function and also to return the same structure from that function. This way the function can return any changes made to the data [Strictly, there are no variables in purely functional languages, so no data changes, rather new data is created based on preceding data. Being so used to the imperative world, we might refer to it as if it does though]. Below is an example with the highest level code generation function, generateSections().

generateSections()

The intermediary code has already been generated and placed in the Elixir map, configMap at configMap.sectionsList. generateSections() calls genSection() with an Elixir list for the intermediary code. genSection() generates a single section, depositing the generated code into an element of the ConfigMap and returns the updated configMap.

genSection()

Here's the genSection() function. The 2nd parameter is the Elixir list of the intermediary code. It's a typical Elixir recursive function which processes the 'head' of the list and calls itself recursively on its tail. This function updates the configMap and returns it as it's value. It uses a Regex.match?() to determine the syntax of the intermediary code supplied, and then calls Regex.scan() to scan and extract what is required. There is some logic to process any holds or pushes within the section and then finally it calls the next-level function, buildSection(), to generate the code into configMap. This is typical functional programming that is common to languages such as lisp and schema.

Conclusion

I hope I have shown the feasibility of developing compiler services in Elixir and some of the techniques that can be used. To simplify the description, I have not described these additional aspects -

This compiler service has been in production for 6 months and proven stable and performant. New extensions have been added to the service and releasing new versions has been straightforward. I recommend considering Elixir and Phoenix for complex web services incorporating compilation features.

Software Engineering, Cyber Threat Hunting and Intelligence, systems resilience and reliability.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store