Introduction to Multimodal AI Agents and Tool Use
Learn to build intelligent AI agents capable of analyzing documents, interpreting images, and interacting with external tools from the ground up.
About this course
The next evolution of artificial intelligence goes beyond text. Multimodal agents can now analyze images, read complex documents, and take action using external tools. In this foundational written course, you will learn how to design and build AI agents that process visual and textual data simultaneously. You will start with the core concepts of agentic AI and vision-language models, then progress to practical implementation strategies for document extraction, screenshot analysis, and dynamic tool calling. What you will learn: - Understand the foundational terminology of multimodal AI and agentic workflows. - Process and extract structured data from images, screenshots, and complex documents. - Implement modern tool calling patterns to allow your agents to interact with external systems. - Apply prompt engineering techniques specifically designed for vision-language tasks. - Explore fundamental Retrieval-Augmented Generation (RAG) concepts for handling multimodal data. - Design robust agent architectures that gracefully manage multi-step reasoning. The course begins by establishing essential definitions and the basic architecture of multimodal systems. From there, you will read through step-by-step written tutorials and code snippets to build your own document and vision-processing agents. This course is designed for beginners and developers new to AI agents; no prior experience with machine learning is required. Start building the next generation of intelligent, action-oriented AI agents today.
What you'll get
-
๐
Certificate of completion
Add it to your LinkedIn profile -
๐ง
Audio version included
Learn on the go โ no screen needed -
โพ๏ธ
Lifetime access
Come back anytime, no expiry -
๐ฑ
Phone or computer
Works anywhere, any device -
๐ธ
14-day refund
No questions asked -
โก
Short & focused
1h 15m of practical content
Reviews
No reviews yet โ be the first to share your experience.
Learners also took
๐ With certificate
Create AI Videos with Runway Gen-2
Certificate
Hands-on
โช45.00
→
๐ผ Job-ready
LLM Fundamentals: Architecture and GPU Strategies
Certificate
Hands-on
โช45.00
→
๐ Studentsโ pick
Building Agentic and Modular RAG Systems with LangGraph
Certificate
Hands-on
โช45.00
→
๐ Most popular
Build Local LLM Q&A Systems with RAG and Docker
Certificate
Hands-on
โช45.00
→
Frequently asked
What do I need to take this course? +
Just a phone or computer with internet. No installs, no special hardware.
How do I pay? +
By card via Stripe. We donโt store card details โ Stripe handles them securely.
Can I get a refund? +
Yes โ full refund within 14 days, no questions asked.
How long will I have access? +
Forever. Once you purchase, the course is yours to revisit anytime.
Will I get a certificate? +
Yes. On completion you'll receive a certificate you can add to your LinkedIn profile.
Built for learners in
Tech
Design
Finance
Marketing
Healthcare
Education
Hospitality
Manufacturing