AI BMT

MOA: Measuring and Optimizing Heterogeneous AI Architectures

MICRO 2025 Workshop (October 19, 2025 | Lotte Hotel Seoul)

Agenda

13:00 – 13:05 Welcome and Overview
13:05 – 13:35 Talk 1: MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from μWatts to MWatts for Sustainable AI
Arya Tschand, PhD Student, Harvard University, USA
13:35 – 14:05 Talk 2: Machine Learning for Analytics Architecture: AI to Design AI
Chris Gwo Giun Lee, Professor, National Cheng Kung University; Founder, CogniNU Technologies, Taiwan
14:05 – 14:35 Talk 3: ATiM: Autotuning Tensor Programs for Processing-in-DRAM
Yongwon Shin, PhD Student, POSTECH, South Korea
14:35 – 15:05 Talk 4: APIM: An Antiferromagnetic MRAM-Based Processing-In-Memory System for Efficient Bit-Level Operations of Quantized Convolutional Neural Networks
Yueting (Yongfu) Li, Associate Professor, Shanghai Jiao Tong University, China
15:05 – 15:35 Talk 5: IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System
Minseok Seo, PhD Student, Seoul National University, South Korea
15:35 – 16:00 Break
16:00 – 16:45 MOA Competition: AI Model Benchmarking
Jonghyun Shin, PhD Student, Seoul National University, South Korea
- Introduction of AI-BMT and Micro2025 MOA (20 minutes)
- Presentation by Winners (20 minutes)
  - 1st Place – 10 minutes
  - 2nd Place– 10 minutes
- Awards Ceremony (5 minutes)
16:45 – 16:50 Closing Remarks

Talk Details

[Talk 1] MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from μWatts to MWatts for Sustainable AI

Rapid adoption of machine learning (ML) technologies has led to a surge in power consumption across diverse systems, from tiny IoT devices to massive datacenter clusters. Benchmarking the energy efficiency of these systems is crucial for optimization, but presents novel challenges due to the variety of hardware platforms, workload characteristics, and system-level interactions. This talk introduces MLPerf® Power, a comprehensive benchmarking methodology with capabilities to evaluate the energy efficiency of ML systems at power levels ranging from microwatts to megawatts. Developed by a consortium of industry professionals from more than 20 organizations, coupled with insights from academia, MLPerf Power establishes rules and best practices to ensure comparability across diverse architectures. Using representative MLPerf workloads, we analyze 1,841 reproducible measurements from 60 systems covering the full deployment scale to reveal trade-offs and provide actionable insights for sustainable AI.

Speaker

Arya Tschand — PhD Student, Harvard University (Advisor: Prof. Vijay Janapa Reddi); NSF GRFP fellow. Research: energy-efficient ML systems, hardware-aware autonomous performance engineering, and ML for systems.

[Talk 2] Machine Learning for Analytics Architecture: AI to Design AI

Lightweight AI is transforming applications from cloud to edge. This talk presents a vertical integration design methodology where software/hardware and algorithm/architecture co-design (AAC) map lean algorithms onto embedded SoCs. Using graph-theoretic analytics to characterize intrinsic algorithmic complexity—including parallelism, storage, and data transfer—the talk traverses from functionality to synthesizable microarchitectures. A case study on mobile-edge skin cancer detection with a two-layer CNN (97% recognition using limited data) will be discussed.

Speaker

Chris Gwo Giun Lee — Professor, National Cheng Kung University; Founder, CogniNU Technologies, Taiwan. Research spans algorithm/architecture co-design, multimedia/bioinformatics systems, and digital health; 130+ publications and 50+ patents; former Philips Semiconductor system architect.

[Talk 3] ATiM: Autotuning Tensor Programs for Processing-in-DRAM

Processing-in-DRAM (DRAM-PIM) has emerged as a promising technology for accelerating memory-intensive operations in modern applications, such as Large Language Models (LLMs). Despite its potential, current software stacks for DRAM-PIM face significant challenges, including reliance on hand-tuned libraries that hinder programmability, limited support for high-level abstractions, and the lack of systematic optimization frameworks. To address these limitations, we present ATiM, a search-based optimizing tensor compiler for UPMEM. Key features of ATiM include: (1) automated searches of the joint search space for host and kernel tensor programs, (2) PIM-aware optimizations for efficiently handling boundary conditions, and (3) improved search algorithms for the expanded search space of UPMEM systems. Our experimental results on UPMEM hardware demonstrate performance gains of up to 6.18× for various UPMEM benchmark kernels and 8.21× for GPT-J layers. To the best of our knowledge, ATiM is the first tensor compiler to provide fully automated, autotuning-integrated code generation support for a DRAM-PIM system. By bridging the gap between high-level tensor computation abstractions and low-level hardware-specific requirements, ATiM establishes a foundation for advancing DRAM-PIM programmability and enabling streamlined optimization.

Link: https://doi.org/10.1145/3695053.3731096

Speaker

Yongwon Shin — PhD Student, POSTECH, South Korea.

Yongwon Shin is a graduate student in the Graduate School of Artificial Intelligence at POSTECH. He is interested in developing a compiler and runtime for modern heterogeneous systems.

[Talk 4] APIM: An Antiferromagnetic MRAM-Based Processing-In-Memory System for Efficient Bit-Level Operations of Quantized CNNs

Quantized CNNs reduce hardware overheads yet still demand notable resources. This talk proposes an antiferromagnetic MRAM (ARAM) based PIM system leveraging bit-level sparsity with three optimizations: (1) ARAM-based memory subsystem enabling dynamic variable bit-width per layer, (2) a bit-level accelerator using a bit-fusion format tailored for ARAM, and (3) a customized RISC-V datapath coordinating ARAM and accelerator operations. Results show 50%–83% reduction in data movement and average throughput/latency improvements of 5×/10× versus state-of-the-art; speedups of 1.63×–2.96× on AlexNet, VGG16, and ResNet18.

Speaker

Yongfu (Yueting) Li — Associate Professor, Shanghai Jiao Tong University. Research interests: analog/mixed-signal circuits, data converters, and power converters.

[Talk 5] IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System

End-to-end LLM inference exhibits diverse compute characteristics, challenging accelerators that target only specific stages. IANUS combines an NPU with PIM in a unified main memory system used for both PIM operations and NPU memory, minimizing data movement. Because normal memory accesses and PIM computations cannot occur simultaneously, novel PIM Access Scheduling maps and schedules workloads across PIM and NPU. Simulations show average speedups of 6.2× over NVIDIA A100 and 3.2× over a state-of-the-art accelerator on GPT-2. A prototype using a commercial PIM, an NPU, and an FPGA-based PIM controller demonstrates feasibility.

Speaker

Minseok Seo — PhD Student, Seoul National University. Research: computer architecture, memory systems, and AI acceleration with a focus on LLM inference.