Elixir might not seem the obvious first choice for a language to build a compiler in but it might be a better fit than you think. Here are some of the techniques I used developing a compiler in Elixir.
This compiler takes as input
- The Program — A textual description of a musical arrangement
- Libraries — A textual meta-data definition of an audio file, containing long-form audio samples, detailing aspects such as tempo, musical meter, alignment marks, audio level calibrations, the degree of rhythmic swing, and the positions, categorisations and characteristics of the long-form musical phrases it contains.
Long-form musical phrases might be anything up to 8-bars of playing.
These both need to be scanned (lexical analysis), symbol tables built and the code generated to implement an arrangement based on the supplied program. This runs as a component of a web service.
Elixir is a modern language, strong on textual processing with comprehensive regular expression and string manipulation libraries. It has fundamental support for textual lists (implemented as linked lists) and maps. Elixir maps are a data structure where a key relates to a value. A key does not have to be a scalar type. It can be a string for instance. In this way, they are similar to a hash in perl or an associative array in awk.
Elixir is built on the Erlang VM. It is a purely functional language and integrates with Erlang. The Erlang Open Telecom Platform (OTP) is a good infrastructure for process supervision and resilience generally. Importantly, programs which are written in Elixir, being functions, scale well. It was a design goal that the underlying web service scales well. Phoenix is a web server written in Elixir. Phoenix / Elixir was chosen as the web-service framework, with the service implemented in Elixir.
No Lex and Yacc style tools were used. Scanners are implemented using recursive decent and regular expressions.
A recursive descent approach was taken for the syntax analysis with progressive decomposition by calling functions. At each level, lexicons were generated with regular expressions, the syntax analysed and then further decomposed by calling appropriate functions to continue the lexical and syntactic analysis.
Example - Arrangement Parser
Here is an example of the lexical and syntactic approach with the parsing of an arrangement implemented as fileName (fName).
This function determines the syntactical structure and then calls the appropriate next-level function based on the match. There are four possible matches. All have the "N" part with both the "I" and "E" parts being optional. These are for introduction and ending sctions. The next level parsing function is passed the entire string. Often though, next level functions are passed components derived from the higher level. This function has been designed for readability and verification, not performance. The performance of the algorithm can be improved by not performing any further Regex.scan()'s once one is successful. As it stands, I find it easier to read.
Here is a fragment of one of the next-level functions which continues the parsing further.
The Regex.scan()'s regular expression used may return 4, 5, or 6 parts depending on the supplied string. These are extracted and then the logic continues to parse those components as required. This is a classical recursive decent approach. The final line of the function returns the parsed and converted string. The function of this library is to parse names converting them into an intermediary code that is more convenient for the code generation phase. Effectively, it expands macros and converts everything into a comma-separated sequence of musical-section-identifiers. These indicates a section's
- Length in bars,
- Energy Type - e.g. whether or not it is a bridge,
- Arrangement Function - whether it is an introduction, chorus or ending,
- Chorus Type - whether it is a component of a first, middle or last chorus, and
- Rhythmic Adjustments - the position of any breaks (holds) or pushes to the rhythm.
Here's an example of some basic intermediary code,
Symbol Table Management in Elixir
To generate code a symbol table needs to be maintained that holds the meta-data for the source audio file. This way values can be directly looked up in the table instead of repeatedly scanning the meta-data text file.
How to maintain a symbol table in Elixir?
There are a number of approaches to maintaining data in Elixir. One is Ecto, an Elixir database. However, we have no strict requirement for the data to persists beyond the execution of this compilation and also no requirement for concurrent access. A classic functional programming way to maintain a data structure with functional languages is to both pass the structure to a function and also to return the same structure from that function. This way the function can return any changes made to the data [Strictly, there are no variables in purely functional languages, so no data changes, rather new data is created based on preceding data. Being so used to the imperative world, we might refer to it as if it does though]. Below is an example with the highest level code generation function, generateSections().
The intermediary code has already been generated and placed in the Elixir map, configMap at configMap.sectionsList. generateSections() calls genSection() with an Elixir list for the intermediary code. genSection() generates a single section, depositing the generated code into an element of the ConfigMap and returns the updated configMap.
Here's the genSection() function. The 2nd parameter is the Elixir list of the intermediary code. It's a typical Elixir recursive function which processes the 'head' of the list and calls itself recursively on its tail. This function updates the configMap and returns it as it's value. It uses a Regex.match?() to determine the syntax of the intermediary code supplied, and then calls Regex.scan() to scan and extract what is required. There is some logic to process any holds or pushes within the section and then finally it calls the next-level function, buildSection(), to generate the code into configMap. This is typical functional programming that is common to languages such as lisp and schema.
I hope I have shown the feasibility of developing compiler services in Elixir and some of the techniques that can be used. To simplify the description, I have not described these additional aspects -
- Peep-hole optimisations of the generated code,
- Multi-level caches implemented to improve performance,
- Persistence to disk of symbol tables, again to improve the performance of subsequent compilations using exactly the same parameters.
This compiler service has been in production for 6 months and proven stable and performant. New extensions have been added to the service and releasing new versions has been straightforward. I recommend considering Elixir and Phoenix for complex web services incorporating compilation features.