MICRO 2025 Workshop (October 19, 2025 | Lotte Hotel Seoul)
13:00 – 13:05 Welcome and Overview
13:05 – 13:35 Talk 1: MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from μWatts to MWatts for Sustainable AI
Arya Tschand, PhD Student, Harvard University, USA
13:35 – 14:05 Talk 2: Machine Learning for Analytics Architecture: AI to Design AI
Chris Gwo Giun Lee, Professor, National Cheng Kung University; Founder, CogniNU Technologies, Taiwan
14:05 – 14:35 Talk 3: ATiM: Autotuning Tensor Programs for Processing-in-DRAM
Yongwon Shin, PhD Student, POSTECH, South Korea
14:35 – 15:05 Talk 4: APIM: An Antiferromagnetic MRAM-Based Processing-In-Memory System for Efficient Bit-Level Operations of Quantized Convolutional Neural Networks
Yueting (Yongfu) Li, Associate Professor, Shanghai Jiao Tong University, China
15:05 – 15:35 Talk 5: IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System
Minseok Seo, PhD Student, Seoul National University, South Korea
15:35 – 16:00 Break
16:00 – 16:45 MOA Competition: AI Model Benchmarking
Jonghyun Shin, PhD Student, Seoul National University, South Korea
Introduction of AI-BMT and Micro2025 MOA (20 minutes)
Presentation by Winners (20 minutes)
1st Place – 10 minutes
2nd Place– 10 minutes
Awards Ceremony (5 minutes)
16:45 – 16:50 Closing Remarks
[Talk 2] Machine Learning for Analytics Architecture: AI to Design AI
Lightweight AI is transforming applications from cloud to edge. This talk presents a vertical integration design methodology where software/hardware and algorithm/architecture co-design (AAC) map lean algorithms onto embedded SoCs. Using graph-theoretic analytics to characterize intrinsic algorithmic complexity—including parallelism, storage, and data transfer—the talk traverses from functionality to synthesizable microarchitectures. A case study on mobile-edge skin cancer detection with a two-layer CNN (97% recognition using limited data) will be discussed.
Chris Gwo Giun Lee — Professor, National Cheng Kung University; Founder, CogniNU Technologies, Taiwan. Research spans algorithm/architecture co-design, multimedia/bioinformatics systems, and digital health; 130+ publications and 50+ patents; former Philips Semiconductor system architect.
[Talk 3] ATiM: Autotuning Tensor Programs for Processing-in-DRAM
Processing-in-DRAM (DRAM-PIM) has emerged as a promising technology for accelerating memory-intensive operations in modern applications, such as Large Language Models (LLMs). Despite its potential, current software stacks for DRAM-PIM face significant challenges, including reliance on hand-tuned libraries that hinder programmability, limited support for high-level abstractions, and the lack of systematic optimization frameworks. To address these limitations, we present ATiM, a search-based optimizing tensor compiler for UPMEM. Key features of ATiM include: (1) automated searches of the joint search space for host and kernel tensor programs, (2) PIM-aware optimizations for efficiently handling boundary conditions, and (3) improved search algorithms for the expanded search space of UPMEM systems. Our experimental results on UPMEM hardware demonstrate performance gains of up to 6.18× for various UPMEM benchmark kernels and 8.21× for GPT-J layers. To the best of our knowledge, ATiM is the first tensor compiler to provide fully automated, autotuning-integrated code generation support for a DRAM-PIM system. By bridging the gap between high-level tensor computation abstractions and low-level hardware-specific requirements, ATiM establishes a foundation for advancing DRAM-PIM programmability and enabling streamlined optimization.
Link: https://doi.org/10.1145/3695053.3731096
Yongwon Shin — PhD Student, POSTECH, South Korea.
Yongwon Shin is a graduate student in the Graduate School of Artificial Intelligence at POSTECH. He is interested in developing a compiler and runtime for modern heterogeneous systems.
[Talk 4] APIM: An Antiferromagnetic MRAM-Based Processing-In-Memory System for Efficient Bit-Level Operations of Quantized CNNs
Quantized CNNs reduce hardware overheads yet still demand notable resources. This talk proposes an antiferromagnetic MRAM (ARAM) based PIM system leveraging bit-level sparsity with three optimizations: (1) ARAM-based memory subsystem enabling dynamic variable bit-width per layer, (2) a bit-level accelerator using a bit-fusion format tailored for ARAM, and (3) a customized RISC-V datapath coordinating ARAM and accelerator operations. Results show 50%–83% reduction in data movement and average throughput/latency improvements of 5×/10× versus state-of-the-art; speedups of 1.63×–2.96× on AlexNet, VGG16, and ResNet18.
Yongfu (Yueting) Li — Associate Professor, Shanghai Jiao Tong University. Research interests: analog/mixed-signal circuits, data converters, and power converters.
[Talk 5] IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System
End-to-end LLM inference exhibits diverse compute characteristics, challenging accelerators that target only specific stages. IANUS combines an NPU with PIM in a unified main memory system used for both PIM operations and NPU memory, minimizing data movement. Because normal memory accesses and PIM computations cannot occur simultaneously, novel PIM Access Scheduling maps and schedules workloads across PIM and NPU. Simulations show average speedups of 6.2× over NVIDIA A100 and 3.2× over a state-of-the-art accelerator on GPT-2. A prototype using a commercial PIM, an NPU, and an FPGA-based PIM controller demonstrates feasibility.
Minseok Seo — PhD Student, Seoul National University. Research: computer architecture, memory systems, and AI acceleration with a focus on LLM inference.