MICRO 2025 Workshop (October 19, 2025 | Lotte Hotel Seoul)
Prof. Hyuk-Jae Lee, CAPP Lab, Seoul National University (http://capp.snu.ac.kr/index.php?p=people)
CEO Lokwon Kim, DeepX (https://deepx.ai)
Master Kyo-min Seo, Samsung Electronics
Prof. Soojung Ryu, Seoul National University
Prof. Xuan Truong Nguyen, Seoul National University
Prof. Hyun Kim, Seoul National University of Science and Technology University
The proliferation of sophisticated AI models, from large-scale foundation models to complex on-device networks, has created new system-level challenges across the entire computing spectrum. Efficiently deploying these models requires a paradigm shift towards heterogeneous architectures in both large-scale serving infrastructure and resource-constrained edge devices. This workshop, MOA, provides a focused forum to address this full-spectrum challenge. We will explore architectural innovations and hardware-software co-design strategies for a range of platforms, including CPU-GPU-NPU servers, and embedded systems with dedicated accelerators. The workshop features a technical session with high-impact papers and culminates in a live benchmarking competition on an embedded NPU, providing a practical deep-dive into the critical challenges of edge AI optimization.
Delivering efficient AI services, from large cloud models to power-constrained edge devices, presents shared architectural hurdles like memory bottlenecks and hardware-software co-design. This workshop provides a venue to address these full-spectrum challenges.
Moving beyond mere discussion, we introduce a hands-on benchmarking competition. This practical framework allows participants to transform theoretical knowledge into empirical results by directly observing and quantifying the impact of their optimizations on real hardware, offering a deeper, more impactful experience.
We invite discussion and contributions on topics including, but not limited to:
Heterogeneous system architectures for both cloud servers and edge devices
Hardware-software co-design for inference serving and on-device AI
Scalable benchmarking methodologies for diverse AI workloads
NPU-PIM integration and CXL-based memory fabrics
Efficient model execution and optimization for resource-constrained systems
Scheduling and real-time latency analysis for cloud and edge scenarios
Hardware-aware quantization, pruning, and neural architecture search (NAS)
This will be a half-day workshop structured to maximize interaction and learning.
Technical Paper Session (2.5 hours): This session will not be a typical call for papers. Instead, it will feature presentations of recently published, high-impact papers from premier venues (e.g., ASPLOS, ISCA, HPCA), ensuring the highest quality content.
Break & Networking (15 mins): A coffee break to facilitate informal discussions.
Benchmarking Competition Results & Awards (1.25 hours): The session will feature presentations from the finalists of our hands-on competition, followed by an awards ceremony. This provides a practical, results-driven conclusion to the workshop.
A key differentiator of this workshop is its hands-on benchmarking competition, designed to move beyond theoretical analysis to real-world application. This event seeks to create a venue for discovering and sharing innovative optimization techniques that maximize AI model efficiency under hardware constraints. For this inaugural competition, we will use the DeepX M1 SoC as the standard evaluation platform to validate model performance in a realistic edge environment.
For more details, please visit our MOA competition overview page.
Foster Collaboration: Bridge the communities of computer architects, systems researchers, and AI model developers.
Disseminate Insights: Provide attendees with deep technical insights into real-world performance bottlenecks in modern AI systems.