Week 4 (Augmenting Creativity)

I started doing assignment 2 in Week 4, which requires us to produce a 2-minute narrative video that only uses AI. I barely had any directions on how to start this project at first, so I used YouTube to search for AI website tools that showcase ways of producing videos, which was extremely useful to me. Now looking back at this process, I realise that this was probably the most useful step that I have taken throughout this assignment, which leads me to the first point that I want to discuss in this blog: The limited type of AI tools that are accessible for visual learners like me. Quite self-explanatory, visual learners learn most effectively when they are given visual guidance. As a visual learner, I hate reading long paragraphs (especially in English as it is my second language), so I found chatbots like ChatGPT and cloud harder to use on matters like website tutorials, due to their inherent lack of visual displacement.

Through browsing on YouTube, I found a video that gives me tips on “how to produce cinematic videos using Pika Labs” (Curious Refuge 2023). From the video, I learnt that videos generated directly by Pika Labs are easier to have inconsistent styles and/or lightings. To resolve this, we could generate images first and then convert them into videos. This suggestion sparks my memory on numerous studies I read that compares image and video generations, stating that video generation is subconsciously more complex than images, as the model must maintain ‘temporal coherence’ (Chen, Rao, Liu, et al., 2020:662). To maintain temporal coherence means that the model must ensure many types of consistencies such as lighting, shadow, and the object’s appearance, which is substantially harder than image generation. Therefore, it can be said that AI generated images tend to be more accurate, detailed, consistent, and have higher resolution. Consequently, I decided to generate images first before generating any videos. As I described my ideas about my story, ChatGPT helped me explore some suitable image generation tools. After rounds of experiments, I decided to use Midjourney, as I discovered that it has a very high reputation in the field of futuristic arts.

Before I started generated any images, I looked through the gallery on Midjourney. I discovered something interesting: Many professional AI artists on Midjourney include the types of cameras and lenses in their prompts as a way to let the AI know the specific effect they want their images to have. Although I found that fascinating, as someone that is not a professional in shooting gears, I decided to browse for the style that I like in the gallery, and then copy their camera and lens types. Additionally, I’ve also used the image feed function on Midjourney, which allows the tool to consider some images I uploaded for either the character or style. I found the style reference to be the most accurate and useful one – I might not have been playing it correctly, but the character and plot reference often gave me dodgy results.

But after all, Midjourney has been a great tool, and I was able to successfully generate 139 images using it. For next week, I plan to start using Pika Labs to convert my images to videos and investigate some AI tools for music and sound effect generations.

Word count: 523

References:

Curious Refuge (2023) ‘Create Cinematic AI Videos with Pika Labs’ [YouTube video], Curious Refuge, YouTube, accessed 15 August 2024. https://youtu.be/e2VEkTpOlR8?si=wukXf8MP3Vmtrxmn

Chen G, Rao Y, Liu J, Zhou J (07 November 2020) ‘World-Consistent Video-to-Video Synthesis’, Computer Vision – ECCV 2020, accessed 15 August 2024, Springer, Cham.

Week 3 (Augmenting Creativity)

One thing that I like the most about this studio is that it pushes us to actively learn through research. Throughout my learning journey, I’ve never taken any courses where the contents are so new, fresh, and constantly changing. I really enjoyed the mini presentation we did in Wednesday’s class, although I have to admit that I wasn’t fully awake for the first half of the lesson. 

 

In order to answer the questions that were given, we have to do quite a bit of research. After considering topics like Google’s Gemini 1.5 Pro claiming to surpass ChatGPT 4o, we decided to look into something that seems to be more related to our everyday life – the impacts of artificial intelligence on our ecosystem. To be honest, as someone that has dedicated $30 to it every month and used it every day, I got a bit guilty when I discovered that “one query to ChatGPT uses approximately as much electricity as could light one light bulb for about 20 minutes” (Dodge 2024). What was even more surprising to me is that, although the big-tech companies have proposed plans to reduce negative environmental impacts, their energy consumption and carbon emission have not shown any signs of decrease in the previous years. Instead, according to the Environmental Report google published earlier this year, their carbon dioxide emission has increased by 48% since 2019. 

 

I know that climate change and global warming have become old and cliche topics that almost make people sick of, but there have definitely been some moments during my research when I doubted whether AI is even worth developing considering the detrimental effects it poses on our ecosystem, and whether people have been blinded and actively chosen to not look at the negative side when the positives are deemed so “revolutionising”. But in the next moment, I heard the interesting stories presented by another group about how AI has contributed to analysing climate and animal behaviours to conserve the earth. For example, as Lily wrote in her blog, AI could be used to “speak or communicate with endangered birds” and “ lead us to understand why this breed is endangered” (Mao 2024). 

 

Through receiving different ideas from the activity, I found myself to be quite intrigued by the debates surrounding whether negative impacts brought by AI have outweighed the positives, and so I decided to do some further research. I found an interesting article published in April this year, which argues that the current evaluation of environmental impacts brought by AI is insufficient (Bugeau, Combaz, et al. 2024). This is because electricity flows and greenhouse gases have been almost the sole focus of many studies, and material flows such as the usage of water, human toxicity and non-renewable materials are often overlooked. I found this article interesting because it has addressed an issue that I haven’t noticed about the news report I used for the presentation: Just like the study states, the news article has indeed only looked at electricity and greenhouse gases, lacking information on other perspectives of AI.

 

Although there are already many studies done on evaluating the sustainability of AI, much information is still lacking compared to other fields of environmental studies because AI is still fastly developing. I will definitely make a note of the discoveries I made this week, as I believe that a topic like this that has interested me would be of greater use someday in my future. 

 

References:

 

Kaur D (August 2 2024). The hidden climate cost of AI: How tech giants are struggling to go green, AI News, https://www.artificialintelligence-news.com/news/the-hidden-climate-cost-ai-how-tech-giants-struggling-go-green/ 

 

Google (2024). Environmental Report, Google, https://www.gstatic.com/gumdrop/sustainability/google-2024-environmental-report.pdf 

 

Dodge J (July 10 2024). Artificial intelligence’s thirst for electricity Transcript, NPR, https://www.npr.org/2024/07/10/nx-s1-5028558/artificial-intelligences-thirst-for-electricity 

 

Mao L (August 11 2024). Week 3 blog post – Augmenting Creativity, https://www.mediafactory.org.au/lily-mao/2024/08/11/week-3-blog-post-augmenting-creativity/ 

 

Week 2 Blog (Augmenting Creativity)

One of the observations that intrigued me the most during the experiment with Leonardo was the difficulties that AI encountered when it is given negative instructions. While attempting to construct a self portrait using Leonardo, I used descriptions such as “a girl not wearing glasses”. To my surprise, almost all the responses I received were a girl wearing glasses. I then tried another prompt using similar negative instructions, such as “a girl not smiling”; and again, AI’s responses were also opposite to the instructions.

 

It immediately made me think of “The White Bear Problem”, which is describing the phenomenon that when a person is given the instruction to not think about a white bear, the first thing that comes to their head is the white bear. I then thought, do AI models have similar problems as well? Or maybe, rather than thinking of the white bear first and then executing the negative response like a human, AI is only capable of doing the first step i.e. responding with an image of a white bear?

 

I then conducted some research regarding these problems. According to a study about large multi-modal models (LMMS) and Hallucinations, AI models are more commonly trained to understand positive instructions, and thus they are inherently less good at processing negative instructions (Liu et al., 2024). They might also “over-rely on language priors and generate words more likely to go together with the instruction text regardless of the image content” (Liu et al., 2024, p.1), which suggests that AI models do not generate content through accurately interpreting the meaning of the instructions like humans do, but through analysing the language patterns and generate with a response that has the most repetitions with the description. 

 

This paper also discussed another type of negative instruction called “Existent Object Manipulation” (Liu et al., 2024), which is about providing inconsistent attributes of an existing object in the image. For example, asking the model about a “woman in blue pants” when the woman is in fact wearing red pants in the image. I think this might be relevant to another frustration I encountered while using Leonardo: After providing me an image of students in darker skin colours, I asked it to produce another image with students from different races. Somehow, it failed to achieve that. Is this an example of the incomplete training of existent object manipulation? I think this is a question that can only be answered after I read more papers. 

 

Overall, although the experiment with Leonardo has given me frustrations, it has certainly triggered some interest in me to study more about the mechanisms of AI and how scientists can train different models.

 

References:

Liu, F., Lin, K., Li, L., et al., 2024, ‘Mitigating hallucination in large multi-modal models via robust instruction tuning’, International Conference on Learning Representations (ICLR), Ithaca, accessed 31 August 2024, ProQuest One Academic database.

Week 1 Blog (Augmenting Creativity)

Honestly, Week One’s content is already a challenge for me. After only an hour sitting in the classroom, I started to worry about what I might be facing for the rest of the term. I started to question whether I was in my clear mind when I put this studio as my first preference – like, what did I get myself into? Coding? Do I honestly know anything about coding to get me through this course?

 

The answer is obviously no. I know that a lot of us might have come into the classroom saying that they’ve never learnt coding, but I doubt that any of them is as unknowledgeable as I am. A simple example to prove my point is: The first time that I ever heard about the names ‘javascript’ and ‘python’ was at our first lesson. I had to google what they are when almost the whole class was nodding to the question “do you know what they mean”. 

 

The Rock Paper Scissors game – which is supposed to be a ‘mini exercise’ – took me an entire hour, not to mention that I have actually been assisted with a Le Wagon tutor throughout the whole way. Frankly, for me, the codes are like undercover cops and I’m like an outlaw; whenever I thought I knew what they were, they turned out to be the other way.

 

But now, standing at the end of week one, I don’t regret my choice of studying AI as my first studio. I could’ve chosen so many other studios that I’m better at, such as learning about documentaries and interviews; but in that way, I would not be able to learn as much as I would in this studio. I knew from the start that, for my first studio in college, I must choose what I am bad at, not the other way around. 

 

I think this answers this week’s question about ‘what I expect of this studio and the studio leader’: I hope to get more practical knowledge and experience in the area of digital technology, while still being able to maintain what I’m good at by practising them with the combination of AI in this course. Although I know very little about coding, I know from all the news in all media platforms that the role AI can possibly play in media production is unlimited, which is why I believe that the knowledge I gained from this studio can be carried forward to my second, third, and even my own life-long studio as a media producer.