What I'm up to these days

Syntax Trees in Guile

A few weeks ago my ling 200 class studied baby syntax. Baby syntax is like regular syntax, but a whole lot easier. We were given a small grammar which in total had about 10 phrase structure rules as well as an additional rule given in the exercise. These rules allow one to construct a tree which will represent the syntactic structure of the sentence. For example the phrase structure rule that we were first given is S -> NP VP. This rule gives a description of a sentence (S) whose children are a noun phrase (NP) and a verb phrase (VP). "Sally ate" would be an example of a noun phrase followed by a verb phrase. And here it is diagrammed in an image.

syntax tree for sally ate

I made this image in not very many lines of guile, and I want to show you how I did it.

Baby syntax in Guile

I thought it would be fun to try to implement some of the baby syntax from my text book in guile. For this exercise all I want is the syntax tree itself. I don't need to check syntax or parse a sentence into a tree or aynthing. To that end the program just fills the tree with silly nonsense words and it can produce rather large and uncombersome trees, in fact I believe the program can run indefinitely (although it's unlikely), but that just adds to the fun.

Scheme is a really good choice for this task. Guile's syntax is such that it's already easy to visualize the tree structure of it's own grammar. In fact these syntax trees are just quoted s-expressions that are built up one by one using small scheme funtions.

Let's start with the first rule for sentences, recall that it is S -> NP VP. To do this in guile all we need is a list with two children -- an np and a vp. Here is how I accomplished this in guile.

(define* (sentence #:optional
            (np (noun-phrase))
            (vb (verb-phrase)))
    `(s ,np ,vb))

This function can accept a predetermined phrase in case you have a sentence in mind, but the default argument is a call to the functions noun-phrase and verb-phrase, which you guessed it, produce trees for NP's and VP's.

Let's see how noun phrases are created.

(define (verb-phrase)
    (let ((choice (random 7)))
        (match choice
        (0 `(vp ,(random-word 'vp)))
        (1 `(vp ,(verb-phrase) ,(adverb)))
        (2 `(vp ,(transitive-verb) ,(noun-phrase)))
        (3 `(vp ,(ditransitive-verb) ,(noun-phrase) ,(noun-phrase)))
        (4 `(vp ,(sentence-verb) ,(sentence)))
        (5 `(vp ,(verb-phrase) ,(prepositional-phrase)))
        (6 `(vp ,(prepositional-verb-phrase)

Here you can get more of a sense of what is really going on. The switch statement corresponds almost exactly to the phrase-structure rules that produce verb phrases that are listed in the ling 200 textbook. The match statement is there to let randomness decide which phrase-structure rule we apply. The simplest verb-phrase would be the first item in the list (right next to 0), this is nothing more than and intransitive verb like "slept". The other phrase structure rules can include things like adverbs and preopositions.

All these words possible choices for words are stored in an associative list.

(define words (make-parameter
            '((n ("dog" "cat"  "coffee" "tea" "boy"))
                (np ("Sally" "Fluffy" "Rod" "Tod" "Seattle"))
                (det ("the" "a" "some"))
                (vp ("slept" "barked" "ran" "sneezed"))
                (tv ("liked" "devoured"))
                (dtv ("gave" "sent"))
                (sv ("thought" "said"))
                (pvp ("put"))
                (p ("to" "for" "with" "on" "under"))
                (adv ("quickly" "carefully" "furiously" "yesterday"))
                (adj ("fluffy" "cute" "grey")))))

These are then accessed randomly by another function called random-word

The rest of the tree building is done in a similar way. For each phrase structure rule there is a function named after the item on the left of the rule (i.e the S in S -> NP VP) and this function will (randomly in this case) produce a tree or a lexical entry that appears on one of the right hand sides for the rule.

Calling (sentence) will run through all the production rules and eventually (hopefully) produce a grammtically correct tree that conforms to our specification.

For example.

(s (np (det "a") (n (adj "fluffy") (n (adj "cute") (n "boy")))) (vp "barked"))


Now bracketed trees are all well and good until you get something like this

(s (np (det "the") (n (adj "cute") (n (n (adj "grey") (n "cat")) (pp (p "for") (np (det "some") (n (adj "grey") (n (n (adj "grey") (n (n "dog") (pp (p "to") (np (det "a") (n (adj "fluffy") (n (n "boy") (pp (p "for") (np "Fluffy")))))))) (pp (p "on") (np (det "a") (n "boy")))))))))) (vp (v "devoured") (np (det "a") (n "tea"))))

Enter Graphviz, the library made just for visualizing graphs! Now there is a graphviz library for guile, but I could never get it to work. Fortunately the input to graphviz can be almost sort of simple. The approach I took was a bit inelegant, but it works.

Here is the previous sentence diagrammed with the graphviz portion of the program.

big syntax tree

Which is something, I would say better, or atleast more fun anyway.


The entire source code can be found at https://gitlab.com/snippets/1775486

an example syntax tree

**This post was edited on 2018-11-20