AI-Generated Content Creation: A Dog, a Cat, and Several Unholy Hybrids

Social media platforms have, for many years, relied upon tools which track digital fingerprinting, conduct metadata analyses, and do advanced pattern recognition to critically analyse altered content for copyright infringement. With AI’s rapid advancements in visual product-generation, it seems easier than ever to appropriate copyrighted content to create high-quality derivative works which can undermine the economic interests attached to an original copyrighted work. This development threatens the status quo of authorship, labour, and control over expressive works. But, at the same time, I suspected that we may be taking for granted the labour involved in producing AI-generated derivatives.

As such, to understand what is involved in creating AI-generated content as well as to test the accessibility of AI tools to the public, I set out to alter an action sequence from the 1976 movie “The Shaggy D.A.”, the clip for which is displayed above. My project is ‘simple’— to replace the main subject, a dog, with my cat— Cammy (short for Camembert). I used this project to test the limits of what video-altering AIs can do, and what I discovered was a marked disparity between the perceived ease of creating AI-generated content, and the technical realities of modifying video works. While it may be relatively easy for AIs to generate content from scratch, making precise alterations to an existing video while otherwise preserving its original form proved to be an exceedingly complex affair— one which required a surprising amount of skill and judgment to achieve. 

The following discussion outlines the process undertaken to complete this project, the innovations made to achieve an acceptable result, and the limitations encountered during this process.

Original Clip from The Shaggy D.A. (1976)
AI-Assisted Video, Replacing the Dog with a Cat.

Preliminary Matters

The first challenge was finding an AI tool designed to make edits on videos, while preserving the original image. Mainstream AI tools don’t tend to provide means to superimpose the contents of a reference image into an AI video for fears that the public would use the tool for perverse and criminal purposes. As such, browsing the web for an ad-hoc AI tool which would allow me to replace the dog with a reference image of my cat was not easy; many of the AI tools I looked at did not work and probably to contained viruses. I eventually, I did find this AI editing tool.

This AI tool offers a ‘free trial’ which allows you to alter a 3-5 second clip when you sign up with your email. But this site was very ‘sus’ and it would be patently unwise to give it either my credit card details or my personal email; as such, for each attempt at editing a clip, I generated a temporary email to create a new account for each 3-5 second clip generated. You have to do all of this in a new incognito browser each time for the site to offer you the free trial.

The Video-Generating Process

Now onto the video generation itself; the entire escape sequence had two primary challenges which needed to be reconciled. For one, there are a number of scenes which use a stuntperson in a dog suit; a preservation of the unique ‘character’ of the scene demanded a replication of this visual element; this was challenging because secondly, the low resolution of the footage confused the AI model, which did not know what it was looking at where the stuntperson was involved.

Take this clip as an example:

When I first asked the AI to replace the stuntperson in this clip with a ‘stuntperson in an orange-and-white cat costume’, this was what it came up with:

Eventually, it did better with the form after I stopped using the language of ‘person in a fluffy dog/cat costume’ and switched to ‘orange and white fluffy blob’ or ‘mascot’.

After many attempts at giving a detailed description of what I wanted it to create— even just asking it to recolour the ‘black and white object in the image’ to be simply orange and white— it was clear that no amount of written prompting would get it to understand what sort of ‘fur suit’ or ‘mascot’ I was trying to replicate. My first ‘breakthrough’ came when I gave the AI a nondescript image of Cammy, as a kitten, from a top-down view.

The subsequent prompts which used this nondescript image of Cammy as a reference are much closer to what the original stuntperson-in-a-fur-suit looked like, but were still not great.

To better-replicate the stuntperson in the original clip, I thus went to ChatGPT to work on generating a reference image of a person in a cat suit like the one in the original clip. After several iterations, the result is the image to the left.

This more was the result of the prompt which first used the above reference image.

Though more humanoid at the beginning, the distortion of the object as it was sliding down the rope was unsatisfactory, and I suspected that the AI just really did not understand what physical motions were happening with the stuntperson in the scene, probably especially given the low resolution of the video.

I thus had ChatGPT give the AI a detailed second-by-second summary of the exact motions that the stuntperson was making in the original clip, which created much more creature-like results throughout.

The novel combination of elements also made other fur-suit scenes confusing, as the AI was not sure what to replace even after extensive ChatGPT-assisted prompting. At some point, though we were getting closer to the cat-costume concept, the AI started adding random elements such as the cat monster which appears at the end of the clip on the right.

Rendering a cat with complete features and realistic movements— and which looked like Cammy— also presented challenges as only one reference photo can be given in each video generation request.

A GPT-generated collage of reference images for my cat from all angles proved very useful for solving this problem.

The Final Product:

I hope that this project demonstrated, to you, that the exercise of considerable human skill and judgment can be involved in AI-created content, thus tempering popular assumptions that AI-assisted work is inherently characterised by trivial or negligible human input. And while it has its challenges, AI does enable laypersons to significantly expand the scope of their abilities to exercise creativity without vocational training in matters such as CGI.

Leave a Reply