The rapid influx of biosequence data, coupled with the stagnation of the processing power of conventional computing systems, highlight the critical need for exploring high-performance accelerator designs that can meet the ever-increasing throughput demands of modern bioinformatics pipelines. This work argues that processing in memory (PIM) is a viable and effective solution to alleviate the bottleneck of k-mer matching, a widely used genome sequence comparison and classification algorithm, characterized by highly random access patterns and low computational intensity.
This work proposes and evaluates three DRAM-based in-situ k-mer matching accelerator designs (one optimized for area, one optimized for throughput, and one that strikes a balance between hardware cost and performance), dubbed Sieve, that leverage a novel data mapping scheme to allow for simultaneous comparisons of millions of DNA base pairs, lightweight matching circuitry for fast pattern matching, and an early termination mechanism that prunes unnecessary DRAM row activation to reduce latency and save energy. Evaluation of Sieve using state-of-the-art workloads with real-world datasets shows that the most aggressive design provides an average of 408x/41x speedup and 93X/61x energy savings over multi-core-CPU/GPU baselines for k-mer matching. Sieve's performance scales linearly with the reference sequence data, substantially boosting the efficiency of modern genome sequence pipelines.