The School of Computer Science is pleased to present…
Comparative Study of Generative Models for Text-to-Image Generation
MSc Thesis Defense by: Nazia Siddiqui
Date: Thursday, January 19, 2023
Time: 2:00 pm -3:30 pm
Location: Essex Hall, Room 122
Abstract:
The development of deep learning algorithms has tremendously helped computer vision applications, image processing methods, Artificial Intelligence, and Natural Language Processing. One such application is image synthesis, which is the creation of new images from text. Recent techniques for text-to-image synthesis offer an intriguing yet straight forward conversion capability from text to image and have become a popular research topic. Synthesis of images from text descriptors has practical and creative applications in computer-aided design, multimodal learning, digital art creation, etc. Non-Fungible Tokens (NFTs) are a form of digital art that is being used as tokens for trading across the globe. Text-to-image generators let anyone with enough creativity can develop digital art, which can be used as NFTs. They can also be beneficial for the development of synthetic datasets. Generative Adversarial Networks (GANs) is a generative model that can generate new data using a training set. Diffusion Models are another type of generative model which can create desired data samples from the noise by adding random noise to the data and then learning to reverse the diffusion process. This thesis compares both models to deter mine which is better at producing images that match the given description. We have implemented the Vector-Quantized GAN (VQGAN) + Connecting Text and Image (CLIP) model. It combines the VQGAN and CLIP machine learning techniques to create images from text input. The diffusion model that we have implemented is Guided Language to Image Diffusion for Generation and Editing (GLIDE). For both models, we use text input from the MS-COCO data set. This thesis is an attempt to assess and compare the images generated using text for both models using metrics like Inception Score (IS) and Frechet Inception Distance (FID). The semantic object accuracy score (SOA) is another metric that considers the caption used during the image generation process for analysis.
Keywords: Text to Image Generation, Generative Models, GAN’s, Diffusion Models
MSc Thesis Committee:
Internal Reader: Dr. Boubakeur Boufama
External Reader: Dr. Mohammed Khalid
Advisor: Dr. Imran Ahmad
Chair: Dr. Robin Gras
Bookings
Bookings are closed for this event.
No Responses