AI BMT

AI BMT

compare results

Get Started

Micro2025 Workshop MOA

MOA: Measuring and Optimizing Heterogeneous AI Architectures

MICRO 2025 Workshop (October 19, 2025  |  Lotte Hotel Seoul)

1. Organizers
  • Prof. Hyuk-Jae Lee, CAPP Lab, Seoul National University (http://capp.snu.ac.kr/index.php?p=people)

  • CEO Lokwon Kim, DeepX (https://deepx.ai)

  • Master Kyo-min Seo, Samsung Electronics

  • Prof. Soojung Ryu, Seoul National University

  • Prof. Xuan Truong Nguyen, Seoul National University

  • Prof. Hyun Kim, Seoul National University of Science and Technology University

2. Abstract

The proliferation of sophisticated AI models, from large-scale foundation models to complex on-device networks, has created new system-level challenges across the entire computing spectrum. Efficiently deploying these models requires a paradigm shift towards heterogeneous architectures in both large-scale serving infrastructure and resource-constrained edge devices. This workshop, MOA, provides a focused forum to address this full-spectrum challenge. We will explore architectural innovations and hardware-software co-design strategies for a range of platforms, including CPU-GPU-NPU servers, and embedded systems with dedicated accelerators. The workshop features a technical session with high-impact papers and culminates in a live benchmarking competition on an embedded NPU, providing a practical deep-dive into the critical challenges of edge AI optimization.

3. Motivation

Delivering efficient AI services, from large cloud models to power-constrained edge devices, presents shared architectural hurdles like memory bottlenecks and hardware-software co-design. This workshop provides a venue to address these full-spectrum challenges.

Moving beyond mere discussion, we introduce a hands-on benchmarking competition. This practical framework allows participants to transform theoretical knowledge into empirical results by directly observing and quantifying the impact of their optimizations on real hardware, offering a deeper, more impactful experience.

4. Topics of Interest

We invite discussion and contributions on topics including, but not limited to:

  • Heterogeneous system architectures for both cloud servers and edge devices

  • Hardware-software co-design for inference serving and on-device AI

  • Scalable benchmarking methodologies for diverse AI workloads

  • NPU-PIM integration and CXL-based memory fabrics

  • Efficient model execution and optimization for resource-constrained systems

  • Scheduling and real-time latency analysis for cloud and edge scenarios

  • Hardware-aware quantization, pruning, and neural architecture search (NAS)

5. Workshop Format

This will be a half-day workshop structured to maximize interaction and learning.

  • Technical Paper Session (2.5 hours): This session will not be a typical call for papers. Instead, it will feature presentations of recently published, high-impact papers from premier venues (e.g., ASPLOS, ISCA, HPCA), ensuring the highest quality content.

  • Break & Networking (15 mins): A coffee break to facilitate informal discussions.

  • Benchmarking Competition Results & Awards (1.25 hours): The session will feature presentations from the finalists of our hands-on competition, followed by an awards ceremony. This provides a practical, results-driven conclusion to the workshop.

6. Benchmarking Competition: A Unique Feature

A key differentiator of this workshop is its hands-on benchmarking competition, designed to move beyond theoretical analysis to real-world application. This event seeks to create a venue for discovering and sharing innovative optimization techniques that maximize AI model efficiency under hardware constraints. For this inaugural competition, we will use the DeepX M1 SoC as the standard evaluation platform to validate model performance in a realistic edge environment.

For more details, please visit our MOA competition overview page.

7. Expected Outcomes and Impact
  • Foster Collaboration: Bridge the communities of computer architects, systems researchers, and AI model developers.

  • Disseminate Insights: Provide attendees with deep technical insights into real-world performance bottlenecks in modern AI systems.