Humans are naturally good at pattern recognition, which is why we can identify a familiar person from across a crowded street. We can also read a word even when the vowels are removed or its internal letters rearranged.
Artificial neural networks are a computational abstraction inspired by the neurons in our brain. Rather than apply high-level rules, they perform pattern recognition with a network of nodes, each behaving according to a mathematical function. At first, the network behaves somewhat randomly. But we can train it by feeding it sample data with a known result, and adjusting the strengths of connections between nodes based on how close they came to the desired output. Gradually, the network learns the desired behaviour.
Image recognition is a classic application for a neural network. Does a picture depict a cat? A postbox? If we feed the network enough example images and give feedback on its outputs, its performance can improve. A successful neural network can rise to eerie levels of accuracy in identifying postboxes, and it is hard to explain “how it works” in human terms. Its expertise is a mathematical pattern, distributed throughout layers of nodes and connections.
Once trained, the network can be used to generate new images, such as the trippy visions of DeepDream. By requesting a higher confidence value on the ‘postbox’ output, we can see the network attempt to insert its concept of a postbox into an original image.
An image can be fed to the network as a single moment; its file format is an array of numbers representing pixel colour values. But text is different because it functions as a sequence of letters. So how do we train a neural network to generate text?
The answer is to allow cycles within the network. In a traditional network, computations always feed forward through the layers of nodes. In a recurrent neural network (RNN), layers also connect in cycles, and this means that the network maintains a hidden state. We might think of this as remembering some context, and it is what allows the network to process a sequence, like written text or spoken word. The RNN in this project is a Long Short-Term Memory (LSTM) network.
We must choose between a character-based and word-based approach. With a character-based approach, the network predicts the next letter based on the current letter and the network’s internal context. Taking one character at a time might seem unlikely to produce readable results. But the outputs are often startling and recognisable, as we see in this project.
During training, the network repeatedly consumes examples of the text it will learn to imitate. Feed it the complete works of H.P. Lovecraft, and it will apparently see everything in terms of eldritch horrors. Feed it Jane Austen, and it will obsess over parties and suitors. Once sufficiently trained, the network can be used to generate new text by repeatedly predicting the next character, either from a random beginning or from some seed text. These characters assemble themselves into words, sentences, readable text; the result can feel magical.
Sources and Process
For the ImprovBot project, we obtained 100-word text descriptions from the Fringe of every show from 2011 to 2019. This amounted to 2,098,140 words. We forced The Bot to read all nine brochures — including additional shows that missed the print deadline — thirteen times in a row over four days. One initial idea was to repeat the process with an undergraduate student as a control, but this experimental refinement had to be dropped for ethical reasons.
The Bot’s output effectively mimicked Fringe show descriptions. Although impressive, it did not unswervingly produce output that fit the tight word limits the project demanded: 100-word show blurbs and 280-character tweets. So we developed strict Rules of Curation that we could use to select the most entertaining or interesting outputs. These walk a difficult line, dealing with the practicalities of an entertainment project whilst protecting The Bot’s output from human creative interference.
About halfway through the process of curation I decided that certain types of show were dominating the output and it would be interesting to train a second network on only the theatre section of the programmes: a mere 573,037 words. This reduced wordcount meant The Bot was able to read the material forty times and add a second flavour to its output.
Throughout Fringe 2020 you can read The Bot’s ideas for new shows as they appear here on the website, or in tweet-sized chunks from @improvbot_ai on Twitter.
Karpathy, A. (2015): ‘The Unreasonable Effectiveness of Recurrent Neural Networks‘, Andrej Karpathy Blog.