Add LLM benchmarking framework to staging by kubraaksux · Pull Request #2405 · apache/systemds

kubraaksux · 2026-01-19T14:05:51Z

Generic LLM benchmark suite for evaluating inference performance across different backends (vLLM, Ollama, OpenAI, MLX).

Features:

Multiple workload categories: math (GSM8K), reasoning (BoolQ, LogiQA), summarization (XSum, CNN/DM), JSON extraction
Pluggable backend architecture for different inference engines
Performance metrics: latency, throughput, memory usage
Accuracy evaluation per workload type
HTML report generation

This framework can be used to evaluate SystemDS LLM inference components once they are developed.

Generic LLM benchmark suite for evaluating inference performance across different backends (vLLM, Ollama, OpenAI, MLX). Features: - Multiple workload categories: math (GSM8K), reasoning (BoolQ, LogiQA), summarization (XSum, CNN/DM), JSON extraction - Pluggable backend architecture for different inference engines - Performance metrics: latency, throughput, memory usage - Accuracy evaluation per workload type - HTML report generation This framework can be used to evaluate SystemDS LLM inference components once they are developed.

github-project-automation bot added this to SystemDS PR Queue Jan 19, 2026

github-project-automation bot moved this to In Progress in SystemDS PR Queue Jan 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LLM benchmarking framework to staging#2405

Add LLM benchmarking framework to staging#2405
kubraaksux wants to merge 1 commit intoapache:mainfrom
kubraaksux:llm-benchmark

kubraaksux commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kubraaksux commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant