Computer Vision and Generative AI

Do you want to understand how computers see and interpret images while learning how to use AI to generate impressive visual content? If you are interested in the fascinating world of computer vision and generative AI models and would like to use these technologies practically, then look forward to this training. From object recognition to text-to-image generation, you will learn both the theoretical basics and the practical application of the latest AI image processing and generation technologies. In practical exercises in Google Colab, you will train your own models and experiment with state-of-the-art tools (e.g. stable diffusion).

Certificate of attendance from Spirit in Projects Advanced

Show dates

Goals

Understand the basics of computer vision and image processing
Get to know convolutional neural networks (CNNs) and vision transformers
Practical application of object detection and image segmentation
Understand and use generative models (GANs, diffusion models).
Master text-to-image and image-to-image generation
Practical experience with current tools (Stable Diffusion, DALL-E, Midjourney)
Fine-tuning and adaptation of computer vision models

Target Groups

AI Expert Data scientist Software developer ML engineer System Architect Software Architect UX/UI Designers Content creators and anyone who wants to get involved with computer vision and generative AI

Content

1. Basics of image processing

Digital images: pixels, colors, resolutions
Image preprocessing and augmentation
Feature Extraction
Classic computer vision methods
From classic methods to deep learning

2. Convolutional Neural Networks (CNNs)

Architecture of CNNs
Convolutional layer, pooling, activation functions
Known CNN architectures: VGG, ResNet, EfficientNet
Transfer learning with pre-trained models
Practical exercise: Image classification with transfer learning (Google Colab)

3. Vision Transformers (ViT)

Transformer architecture for computer vision
Self-attention mechanism for images
ViT vs. CNNs: Pros and Cons
Hybrid approaches
Practical exercise: ViT for image classification

4. Object detection and localization

Object Detection: YOLO, R-CNN Families
Single-Stage vs. Two-Stage Detectors
Bounding boxes and confidence scores
Real-time object detection
Practical exercise: Object recognition in images and videos (Google Colab)

5. Image segmentation

Semantic vs. instance segmentation
U-Net and Mask R-CNN
Use cases: medicine, autonomous driving
Practical exercise: Image segmentation with pre-trained models

6. Other computer vision applications

Face recognition and facial landmarks
Pose estimation
OCR (Optical Character Recognition)
Video analysis and action recognition
Practical Exercise: Multi-Task Computer Vision Pipeline

7. Basics of Generative Models

What are generative models?
Difference from discriminative models
Latent space and embeddings
Quality metrics for generated images (FID, IS)
Areas of application and ethics

8. Generative Adversarial Networks (GANs)

Architecture: Generator and Discriminator
Training Dynamics and Mode Collapse
StyleGAN and Progressive Growing
Conditional GANs
Practical exercise: Your own GAN experiments (Google Colab)

9. Diffusion Models

Basics of Diffusion Models
Forward and reverse process
Stable diffusion architecture
Latent diffusion models
Advantages over GANs

10. Text-to-image generation

OpenAI's and Google Text-to-Image models
Open source alternative (stable diffusion etc.)
Prompt engineering for image generation
Practical Exercise: Text-to-Image with Stable Diffusion (Google Colab)

11. Image-to-image translation and editing

Style transfer
Image inpainting and outpainting
Great resolution
ControlNet for precise image control
Practical exercise: Image manipulation with AI tools

12. Video Generation

Text-to-video: Runway Gen-2, Google Veo
Video editing with AI
Frame interpolation
Challenges of video generation

13. Fine-tuning and customization

Fine-tuning of pre-trained models
LoRA (Low-Rank Adaptation)
DreamBooth for personalized models
Dataset preparation
Practical exercise: Custom Model Fine-tuning (Google Colab)

14. Tools and Platforms

Hugging Face Diffusers
Stability AI
ComfyUI and Automatic1111 for stable diffusion
Commercial APIs: OpenAI DALL-E
Cloud platforms for computer vision

15. Ethics and legal aspects

Deepfakes and abuse
Copyright for AI-generated images
Bias in computer vision models
Watermarks and Provenance
Responsible AI use

Certification

For this training you will receive a certificate of participation from Spirit in Projects.