I started doing assignment 2 in Week 4, which requires us to produce a 2-minute narrative video that only uses AI. I barely had any directions on how to start this project at first, so I used YouTube to search for AI website tools that showcase ways of producing videos, which was extremely useful to me. Now looking back at this process, I realise that this was probably the most useful step that I have taken throughout this assignment, which leads me to the first point that I want to discuss in this blog: The limited type of AI tools that are accessible for visual learners like me. Quite self-explanatory, visual learners learn most effectively when they are given visual guidance. As a visual learner, I hate reading long paragraphs (especially in English as it is my second language), so I found chatbots like ChatGPT and cloud harder to use on matters like website tutorials, due to their inherent lack of visual displacement.
Through browsing on YouTube, I found a video that gives me tips on “how to produce cinematic videos using Pika Labs” (Curious Refuge 2023). From the video, I learnt that videos generated directly by Pika Labs are easier to have inconsistent styles and/or lightings. To resolve this, we could generate images first and then convert them into videos. This suggestion sparks my memory on numerous studies I read that compares image and video generations, stating that video generation is subconsciously more complex than images, as the model must maintain ‘temporal coherence’ (Chen, Rao, Liu, et al., 2020:662). To maintain temporal coherence means that the model must ensure many types of consistencies such as lighting, shadow, and the object’s appearance, which is substantially harder than image generation. Therefore, it can be said that AI generated images tend to be more accurate, detailed, consistent, and have higher resolution. Consequently, I decided to generate images first before generating any videos. As I described my ideas about my story, ChatGPT helped me explore some suitable image generation tools. After rounds of experiments, I decided to use Midjourney, as I discovered that it has a very high reputation in the field of futuristic arts.
Before I started generated any images, I looked through the gallery on Midjourney. I discovered something interesting: Many professional AI artists on Midjourney include the types of cameras and lenses in their prompts as a way to let the AI know the specific effect they want their images to have. Although I found that fascinating, as someone that is not a professional in shooting gears, I decided to browse for the style that I like in the gallery, and then copy their camera and lens types. Additionally, I’ve also used the image feed function on Midjourney, which allows the tool to consider some images I uploaded for either the character or style. I found the style reference to be the most accurate and useful one – I might not have been playing it correctly, but the character and plot reference often gave me dodgy results.
But after all, Midjourney has been a great tool, and I was able to successfully generate 139 images using it. For next week, I plan to start using Pika Labs to convert my images to videos and investigate some AI tools for music and sound effect generations.
Word count: 523
References:
Curious Refuge (2023) ‘Create Cinematic AI Videos with Pika Labs’ [YouTube video], Curious Refuge, YouTube, accessed 15 August 2024. https://youtu.be/e2VEkTpOlR8?si=wukXf8MP3Vmtrxmn
Chen G, Rao Y, Liu J, Zhou J (07 November 2020) ‘World-Consistent Video-to-Video Synthesis’, Computer Vision – ECCV 2020, accessed 15 August 2024, Springer, Cham.