Get in Touch

Course Outline

Introduction to Multimodal LLMs in Vertex AI

  • Overview of multimodal capabilities within Vertex AI.
  • Deep dive into Gemini models and their supported modalities.
  • Exploration of relevant use cases in enterprise and research settings.

Setting Up the Development Environment

  • Configuring Vertex AI specifically for multimodal workflows.
  • Managing datasets across various modalities.
  • Hands-on lab: Environment setup and dataset preparation.

Long Context Windows and Advanced Reasoning

  • Understanding the mechanics of long-context workflows.
  • Practical applications in planning and decision-making processes.
  • Hands-on lab: Implementing long-context analysis.

Cross-Modal Workflow Design

  • Integrating text, audio, and image analysis into cohesive solutions.
  • Chaining multimodal steps within streamlined pipelines.
  • Hands-on lab: Designing a comprehensive multimodal pipeline.

Working with Gemini API Parameters

  • Configuring inputs and outputs for multimodal scenarios.
  • Strategies for optimizing inference speed and resource efficiency.
  • Hands-on lab: Tuning Gemini API parameters.

Advanced Applications and Integrations

  • Developing interactive multimodal agents and virtual assistants.
  • Seamless integration of external APIs and supplementary tools.
  • Hands-on lab: Building a functional multimodal application.

Evaluation and Iteration

  • Methods for testing multimodal performance.
  • Key metrics for assessing accuracy, alignment, and data drift.
  • Hands-on lab: Evaluating multimodal workflows.

Summary and Next Steps

Requirements

  • Strong proficiency in Python programming.
  • Previous experience in machine learning model development.
  • Familiarity with multimodal data types, including text, audio, and images.

Target Audience

  • AI researchers.
  • Senior software developers.
  • Machine Learning scientists.
 14 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories