Monday, 31 October 2016

Parsing Rebol Data with Lua PEG

I diverted myself from learning lua by working through the stockfetch program as I wanted to test how tricky it would be to read a Rebol data file into a Lua table using Lua's Parsing Expression Grammars (LPeg). I didn't attempt to write an actual conversion tool, just something that would give a good feeling for how difficult it would be to write such a tool.

It is necessary to understand that there really isn't such a thing as a Rebol data file. In Rebol, code is data and data is code. Any syntactically valid Rebol file can be evaluated, if a Rebol file contains functions they will be run. (The same is true of the languages descended from Rebol - Red, Boron and World.) 

Whilst functions are first class values in Lua, I don't think that variables can be treated so in Lua (though I could well be wrong). 

If I am correct, this means that it will only ever be possible to convert a subset of Rebol files to Lua tables.

For my purposes, I felt it was sufficient to confirm that both Rebol objects and blocks could be converted into Lua tables. 

When considering objects, I excluded the possibility of them containing functions or code executed as the object was loaded. For this test, a Rebol object was restricted to be akin to a keyword:value store.

Rebol blocks are ordered series of values, somewhat similar to a Lua table using integer keys.

I also applied the following restrictions for my "proof of concept":

  • Words can only be a single letter - a to z
  • Values can either be a block, an object or a single digit string - 0 to 9
  • Blocks and Objects cannot be empty
Once I started to understand LPeg, I found writing the trial convertor to be quite straightforward. I'm happy with the solution that I came up with as it appears very readable to me (especially compared with regex). I'm sure that it could be improved by someone more familiar with Lua and LPeg than I currently am.

Here's my function:

function reboldata.import (reb)
  local bind = function (...)
    local args = {...}
    local t = {}
    for i = 1, #args, 2 do
      t[args[i]] = args[i + 1] 
    return t
  local End = lpeg.P(-1)
  local Space = lpeg.S(' \n\t')^0
  local OpenBlock = Space * '['
  local CloseBlock = ']' * Space
  local EndString = lpeg.S(' \n\t') + CloseBlock
  local String = Space * lpeg.C(1 - EndString)^1 * Space
  local NotCloseBlock = (1 - CloseBlock)^0
  local Word = lpeg.R('az')
  local GetWord = lpeg.C(Word) * ':' * Space
  local Value = lpeg.R('09')
  local Element = (String + lpeg.V'Block') + lpeg.V'Object'
  local GetWordValue = GetWord * 
                       ((lpeg.C(Value) + lpeg.V'Block') + lpeg.V'Object') * Space
  local ObjectContent = (GetWordValue^0 / bind) + CloseBlock
  local ParseRebol = lpeg.P{
    PR = lpeg.Ct(lpeg.V'BO'),
    BO = (lpeg.V'O' + lpeg.V'B') + End,
    O = (lpeg.V'Object' * lpeg.V'B') + lpeg.V'Object',
    B = lpeg.V'Block' * lpeg.V'BO',
    Object = lpeg.P('make object!' * Space * 
             OpenBlock * ObjectContent * CloseBlock),
    Block = lpeg.Ct(OpenBlock * Element^0 * CloseBlock)
  return lpeg.match(ParseRebol, reb)

Full source and tests