Computer Vision and Generative AI
Do you want to understand how computers see and interpret images while learning how to use AI to generate impressive visual content? If you are interested in the fascinating world of computer vision and generative AI models and would like to use these technologies practically, then look forward to this training. From object recognition to text-to-image generation, you will learn both the theoretical basics and the practical application of the latest AI image processing and generation technologies. In practical exercises in Google Colab, you will train your own models and experiment with state-of-the-art tools (e.g. stable diffusion).
Certificate of attendance from Spirit in Projects Advanced
Goals
- Understand the basics of computer vision and image processing
- Get to know convolutional neural networks (CNNs) and vision transformers
- Practical application of object detection and image segmentation
- Understand and use generative models (GANs, diffusion models).
- Master text-to-image and image-to-image generation
- Practical experience with current tools (Stable Diffusion, DALL-E, Midjourney)
- Fine-tuning and adaptation of computer vision models
Target Groups
Content
1. Basics of image processing
- Digital images: pixels, colors, resolutions
- Image preprocessing and augmentation
- Feature Extraction
- Classic computer vision methods
- From classic methods to deep learning
2. Convolutional Neural Networks (CNNs)
- Architecture of CNNs
- Convolutional layer, pooling, activation functions
- Known CNN architectures: VGG, ResNet, EfficientNet
- Transfer learning with pre-trained models
- Practical exercise: Image classification with transfer learning (Google Colab)
3. Vision Transformers (ViT)
- Transformer architecture for computer vision
- Self-attention mechanism for images
- ViT vs. CNNs: Pros and Cons
- Hybrid approaches
- Practical exercise: ViT for image classification
4. Object detection and localization
- Object Detection: YOLO, R-CNN Families
- Single-Stage vs. Two-Stage Detectors
- Bounding boxes and confidence scores
- Real-time object detection
- Practical exercise: Object recognition in images and videos (Google Colab)
5. Image segmentation
- Semantic vs. instance segmentation
- U-Net and Mask R-CNN
- Use cases: medicine, autonomous driving
- Practical exercise: Image segmentation with pre-trained models
6. Other computer vision applications
- Face recognition and facial landmarks
- Pose estimation
- OCR (Optical Character Recognition)
- Video analysis and action recognition
- Practical Exercise: Multi-Task Computer Vision Pipeline
7. Basics of Generative Models
- What are generative models?
- Difference from discriminative models
- Latent space and embeddings
- Quality metrics for generated images (FID, IS)
- Areas of application and ethics
8. Generative Adversarial Networks (GANs)
- Architecture: Generator and Discriminator
- Training Dynamics and Mode Collapse
- StyleGAN and Progressive Growing
- Conditional GANs
- Practical exercise: Your own GAN experiments (Google Colab)
9. Diffusion Models
- Basics of Diffusion Models
- Forward and reverse process
- Stable diffusion architecture
- Latent diffusion models
- Advantages over GANs
10. Text-to-image generation
- OpenAI's and Google Text-to-Image models
- Open source alternative (stable diffusion etc.)
- Prompt engineering for image generation
- Practical Exercise: Text-to-Image with Stable Diffusion (Google Colab)
11. Image-to-image translation and editing
- Style transfer
- Image inpainting and outpainting
- Great resolution
- ControlNet for precise image control
- Practical exercise: Image manipulation with AI tools
12. Video Generation
- Text-to-video: Runway Gen-2, Google Veo
- Video editing with AI
- Frame interpolation
- Challenges of video generation
13. Fine-tuning and customization
- Fine-tuning of pre-trained models
- LoRA (Low-Rank Adaptation)
- DreamBooth for personalized models
- Dataset preparation
- Practical exercise: Custom Model Fine-tuning (Google Colab)
14. Tools and Platforms
- Hugging Face Diffusers
- Stability AI
- ComfyUI and Automatic1111 for stable diffusion
- Commercial APIs: OpenAI DALL-E
- Cloud platforms for computer vision
15. Ethics and legal aspects
- Deepfakes and abuse
- Copyright for AI-generated images
- Bias in computer vision models
- Watermarks and Provenance
- Responsible AI use
Certification
For this training you will receive a certificate of participation from Spirit in Projects.
After completion we recommend
Certified Trainings
Internationally recognized certifications for your career.
Experienced Trainers
Learn from competent experts with practical experience.
Flexible Formats
Webinars, video trainings or on-site – exactly as you need it.