Module 3: Text-to-Image Machine Learning

More details to come

Project 3: Using Text-to-image

Your goal is to experience first-hand using a text-to-image (Stable Diffusion) to generate a piece of media.

You will need to:

f = open("/content/pg2641.txt", "r")
text = f.read()

This will require you leverage your Python skills from the first half of the class, as well as a bit of prompt engineering.

Optional: fine-tune the model to produce images in a particular style

As an example, I used the book A Room With A View, by E. M. Forster. I obtained the following noun phrase frequencies: (‘lucy’, 449), (‘cecil’, 235), (‘miss bartlett’, 198), (‘freddy’, 124).

Then, I generated the following four images. I used a promt for each image that included the most frequent work and the title of the book.

Lucy Cecil Miss Barlett Freddy

tips

The students who were in class on Thursday got some good tips - try to reach out and talk to them if you are stuck! One issue that came up was with the pipe.enable_xformers_memory_efficient_attention() line which seems to have broken since last week. You can just comment this out and change num_images = 4 to num_images = 1 and it should fix the out of memory error for you.

To turn in

Download any code you have written as .ipynb (in colab and/or jyupter). You will submit the code and the four images as seperate files. Most of you should expect to submit 5 (if you only used colab) or 6 (if you used colab and jyupter) files, including the 4 images. You don’t need to include the .txt unless you want to.


Lecture 3-1: Intro to Text-to-image

https://docs.google.com/presentation/d/16KVanb8DUQrtvyibYqlryn-ZnpCHDidwuv-aK8FjnhU/edit?usp=sharing


Lecture 3-2: Prompt Engineering

https://docs.google.com/presentation/d/1Tm-CL_Pynli3ZJmQeA3VwJ6TKikhSpRdfhetPLvTtpM/edit?usp=sharing


Lab 3: Testing out Stable Diffusion

The goal of this lab is to ensure you are ready to start interfacing with stable diffusion. Your goal is to explore the impact of the inference_steps parameter. Start by setting a random seed, and then design a prompt of your choosing, and generate 10 images, each with an increasing number of inference_steps (a range of roughly 1-150 is reasonable). Write up a short description of what you observe. How does the image change with more inference steps?

You should use this colab notebook as a starting point: [https://colab.research.google.com/drive/16vaTJi1o5139AlPPU-k-dggc3bx9qnL1?usp=sharing] Make a copy of the notebook in your own drive, then generate the images. Turn in your


Lecture 3-1: Fine tuning


Lecture 3-1: More fine tuning and wrappping up