(0)

Write a review

-20%

Optimizing LLM Performance Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, VLLM, and More

Peter E Poisson

(0)

Write a review

Language English

Cover Softcover

Published 2025-07-26

€22.90 €28.63

-20% with code BOOKS

Softcover €28.63 Hardcover

Out of stock

30-day return policy

Are you struggling to scale your large language models (LLMs) without breaking the bank or sacrificing latency? This book offers a clear roadmap to optimize inference, reduce costs, and scale seamlessly across platforms like PyTorch, ONNX, vLLM, and more.Optimizing LLM Performance is your hands-on guide to boosting the efficiency of large language models in production environments. Whether you're building c ... Full description

Description

Optimizing LLM Performance is your hands-on guide to boosting the efficiency of large language models in production environments. Whether you're building chatbots, document summarizers, or enterprise AI tools, this book teaches proven methods to accelerate inference while maintaining accuracy. It dives deep into hardware-aware optimizations, quantization, model pruning, compiler acceleration, and memory-efficient runtime strategies without locking you into any single framework.

Written with clarity and real-world use in mind, the book features practical case studies, side-by-side performance comparisons, and up-to-date techniques from the cutting edge of AI deployment. If you're building, serving, or scaling LLMs in 2025, this is the performance engineering guide you've been waiting for.

Key Features:
- Framework-agnostic optimization techniques using PyTorch, ONNX Runtime, vLLM, llama.cpp, and more
- Deep dive into quantization (INT8/4-bit), distillation, pruning, and KV caching
- Hands-on examples with FastAPI, Hugging Face Transformers, and serverless deployment
- Covers performance profiling, streaming, batching, and cost-efficient scaling
- Future-proof insights on compiler-aware models, LoRA 2.0, and edge inference

Ready to build LLM systems that are faster, cheaper, and more scalable?
Grab your copy of Optimizing LLM Performance today and deploy smarter.

More Information

Author	Peter E Poisson
Publisher	Amazon Digital Services LLC - Kdp
Release year	2025
Cover type	Softcover
EAN	9798294338459

Write Your Own Review

You're reviewing: Optimizing LLM Performance Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, VLLM, and More

Your Rating:

Goodreads Reviews

€22.90 €28.63

Optimizing LLM Performance Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, VLLM, and More

You May Also Like

The God Test

If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All

Gödel, Escher, Bach: An Eternal Golden Braid

AI Engineering: Building Applications with Foundation Models

The Rust Programming Language

The Art of Game Design: A Book of Lenses

How To Think About AI: A Guide For The Perplexed

Empire of AI: Dreams and Nightmares in Sam Altman's OpenAI

The Pragmatic Programmer: journey to mastery, 20th Anniversary Edition, 2/e: your journey to mastery, 20th Anniversary Edition

HBR Guide to Generative AI for Managers

Hackers. 25th Anniversary Edition: Heroes of the Computer Revolution

The Web Application Hacker's Handbook: Discovering and Exploiting Security Flaws

How Linux Works: What Every Superuser Should Know

Artificial Intelligence: A Modern Approach, Global Edition

Fundamentals of Software Architecture: A Modern Engineering Approach

Speak Data: Artists, Scientists, Thinkers, and Dreamers on How We Live Our Lives in Numbers

Linux Basics for Hackers, 2nd Edition: Getting Started with Networking, Scripting, and Security in Kali

Mathematics for Machine Learning

The Art of Death Stranding

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications

Description

More Information

Goodreads Reviews

Optimizing LLM Performance Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, VLLM, and More - Peter E Poisson

Optimizing LLM Performance Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, VLLM, and More

You May Also Like

Description

More Information

Goodreads Reviews