BeaconGNN: Large-Scale GNN Acceleration with Out-of-Order Streaming In-Storage Computing

Published in IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2024

Prior in-storage computing (ISC) solutions show fundamental drawbacks when applied to GNN acceleration. First, they obey a strict ordering of GNN neighbor sampling. Such serialization fails to utilize flash internal parallelism. Second, the I/O sizes generated by GNN are much smaller than the minimum flash access granularity. The limited channel bandwidth is wasted when serving the requests. Third, the prior solutions rely on firmware-based request processing, making the backend I/O throughput constrained by the embedded core processing power. To address these challenges, we propose BeaconGNN, an instorage computing (ISC) design for GNN that supports both large-scale graph structures and feature tables. First, it utilizes a novel graph format to enable out-of-order GNN neighbor sampling, improving flash resource utilization. Second, it deploys near-data processing engines across multiple levels of the flash hierarchy (i.e., controller, channel, and die). Specifically, flashdie-level samplers perform neighbor samplings while reducing channel transfer simultaneously. Flash-channel-level command routers communicate with backend dies without the involvement of flash firmware. Lastly, a spatial accelerator is attached to the device bus to accelerate GNN computation. With our software and hardware co-design, BeaconGNN achieves up to 11.6× higher throughput and 4× better energy efficiency than the stateof-the-art ISC design.

Paper download

Recommended citation: .