This post will be quite a bit different from the content I usually share on my blog but as my blog played a role in the project itself I felt I would share the work I did. I am by no means an expert within this topic and many of my ideas were inspired by my lecturers and supervisor.
After a fairly quick and productive start to the year as far as posting to my blog goes my productivity dropped in the last few months with regards to the poetry I was writing. This was probably in part due to the project I was undertaking for my Final Year at University. With this post I wanted to share some of the work I did.
My project was a poetry generator that I focused on Haiku for the ease of generation. The list of words it could choose from came from selecting a small sample of the poems on my blog and breaking them down into words. Whilst the system is far from perfect and the generation still lacking here are a few of the poems that it came up with:
Rife scar when pain eye
Our guilty rope were whole
Until the heaven
Rather muffled pass
We are the broken we were
Thrown were help like shame
The System (Input and Generation)
The system breaks into a few components: The Input, The Generation and The Evaluation. The input stage takes in words from a selection of text files and tags them with part of speech tags (Noun, Adjective etc). I used the Natural Language Toolkit in Python for this. Without going into the complexities of the system the part of speech tags from my own poems were added for the purpose of generating tag sequences. In addition to the tags for the words in the word list I also tagged a large corpus of data and used it to save the common part of speech bi-grams (The data used to identify probable tag sequences).
Tag sequences in the case of this project were the primary focus of the generation step. The generator selects a random word from the word list to start the line and then attempts to predict (via probabilities) the next part of speech tag in the line. A simple example of this would be that an Adjective can often be followed by a Noun (e.g. Green Apple). To do this I used the part of speech bi-grams as mentioned previously to gather the possible part of speech tags and essentially threw them into a bag and picked one at random. By doing some simple book-keeping and tracking the previous part of speech tag a continuous sequence of tags could be maintained (although considering n-grams would be a valid extension of these ideas).
The evaluation step is where things get a little more complicated. For my project I wanted to investigate a few different ideas that I believed were important to poetry. Novelty, Coherence and Relevance. I chose these buzzwords to associate with the three tasks that were incorporated into the evaluation function.
Novelty refers to the use of words which are less common, penalising stop-words (things like AND or THE) and also penalising the use of common words from my own poetry (words such as THINK and WORLD came up a lot more frequently in the poems I selected for example). With this measure I hoped to ensure that the generator’s bias towards words I favour was not looked upon favourably in the evaluation stage however this does lead to my own poetry scoring poorly within the evaluation function (an unfortunate consequence for which the best solution may be to gather a wider range of works for a variety of poets to avoid the imposed bias).
Coherence refers to the structure of the sentences. Since the generator deals with the forward thinking ideas (part of speech from the first word to the second etc) this piece dealt with the opposite. The coherence of a line in the version I submitted was a calculation of the number of times in the frequency distribution that a line ended with a specific part of speech tags (making the assumption that each end of line was adequate as a sentence which, as evidenced throughout, is not often the case). The number of occurrences was down-scaled so as to not have a huge weight within the system but this was one area where further adjustments to accommodate the larger numbers without making the weight of this measure overwhelming would have been ideal (either through some form of normalisation or a ‘maximum per line’ to cap the score).
Relevance refers to how on topic the poems were. When building the generator one of the things that dictated the word list was the use of a seed word which I did not cover as thoroughly as I would have liked so it was entirely dependent on how good my poems are with respect to the tags I allocated them since the words in the word list all come from my own work and as such the words which score poorly on relevance can be partially attributed to my own work (whether that is to say the measure is insufficient or my poems are more abstract than would be optimal for this system).
As with any probabilistic system the reliance on data was a problem. If the data was tagged poorly or the corpus data had lots of outliers in terms of structure (due to domain or other reasons) the data could be skewed and this would lead to unwanted sequences of words:
A guy the old skin
In the surprise whether scales
They you are call pain
The last line in particular stacking pronouns is an unfortunate occurrence but it demonstrates the issue with data dependence. Building a big enough data set that does not contain any mistakes would be a sizeable task in itself since tagged data is expensive to obtain (either through personal time investment or paying someone else to invest their own time to do it) and even then the human gold standard may not be 100% accurate.
The system as a whole has many flaws (as well as the many positives) which I could discuss and explore as individual posts but for now I will summarise a few of the other limitations without going into too much detail.
- Part of Speech does not guarantee good English – Some words just aren’t supposed to go in sequence
- Human Evaluation, whilst subjective, is better for comparing computer poetry to a human gold standard
- Variety in form (Haiku, Quatrain etc) must be adequately compensated in the evaluation function if it is to be considered generalised
- Evaluating poetry without techniques limits the ‘poetic’ nature of the final product
So there you have it. That’s what I’ve been up to the past 7-8 months. It has been quite an adventure and I thoroughly enjoyed much of the work I was doing (although as with any project there were those moments where everything stopped progressing and I wanted to bash my head on the desk repeatedly).
I wrote a full explanation in my dissertation which I may post more from at another time if there is interest but for now I wanted to write things in a manner that the people who frequent my blog might find interesting. Ultimately what I did is not exceedingly complicated nor does it solve any issues that currently plague the field of Natural Language Processing but the results of the project are quite entertaining to share and I may post more example poems in the future (although the system can generate multiple poems in just a few minutes hence why I gave an explanation alongside the poems).
Figure is of home
Dial in head the weather
We enough table
That fractured nightmare
That a life an depression
Confined them late us
Speak understand pass
Another familiar need
We we think all this