We’ve covered how @improvbot_ai works, and how the feed is curated. But what about permission to do this? Are you allowed to take 2 million words from 28,000 @edfringe shows, over 8 years, and just play with them? Well, yes and no. And what about ethics & copyright? We acknowledge that this work builds upon the programming information of years of previous shows, and its success will be based on whether it can mimic these effectively. And we have to get access to the data, and permission. And we have to think legally, and ethically, that it would be ok to do so.
The data: the last 8 years of Fringe programs are availalbe in digital form, over at the @edfestsAPI, https://api.edinburghfestivalcity.com. If you are interested in developing with Festivals data you can ask permission to get access to the various data sets on offer! ImprovBot is a @CreateInf project (more on that later!) – and that larger programme partners with @edfringe, so they knew we were Legit. We at @EdinburghUni and @CreateInfpitched our idea to @edfringe and they have been a great support of ImprovBot. We also got ethical clearance, following our internal processes for research ethics at the University of Edinburgh, as the chance that this will have a negative effect on a named individual or institution is very low.
What about copyright? These listings hold a huge amount of IP. The important thing is this is – for now – a non-profit, research project. Which means we are covered by the UK Copyright exceptions for text and data mining, and for parody and pastiche. There’s a good guide over on the Jisc website about copyright, and the exceptions for text and data mining, when its tied to research, if you want to learn more.
We’ve been careful to screen for negative mentions of real people and institutions as we curate the feed. Any likenesses to shows or individuals created by our AI is purely coincidental, although there’s some very culturally famous folks that appear (so far, Michael McIntyre, Carol Thatcher)! So we have a takedown policy, which follows usual takedown policy guidance: folks can contact us if there’s an issue and we will review the post straight away.
A lot of what we are doing is taking this idea for a walk. We hope that our transparency is useful, and encourages our audience to think about where data goes & comes from, how you can get to use it, and what it means to bridge this gap between the digital and creative industries.
If you are interested in learning more about these ethical and legal considerations when setting up a digital project, Creative Informatics (@CreateInf) has published its research ethics statement, and guidelines on how we will operationalise these: data management plan to follow soon. These are CC-BY licensed for other researchers to reuse! If you have any further questions, do ask us over on the Twitters.