Key Features

Test Set Auto-Generation: Based on the documents in the document set, use a naive chunking method and an LLM to automatically generate Q&A pairs from each chunk, producing the test set data.
Custom RAG Integration: Provide an SDK/API for developers to integrate their own RAG solutions into RagView, enabling comparison between their solutions and open-source solutions.
Evaluation Task Optimization: Support setting up and comparing multiple configurations (different hyperparameters) of the same RAG solution.
Evaluation Report Generation: Support automatic generation of PDF reports from evaluation results.

Usability Enhancements

Email Notifications: Since evaluations are asynchronous and may take minutes to tens of minutes, add email notifications to inform users when evaluation results are ready.
Result Charting: Generate bar charts, pie charts, radar charts, etc., based on metric scores to facilitate visual comparison.
Hardware Resource Profiling: Collect statistics on hardware resource usage for different evaluation pipelines, aiding developers in assessing production feasibility.
Optional Metrics: Make evaluation metrics optional (no longer mandatory), allowing users to select only the metrics they are interested in.

More RAG solutions

Legend:
✅ = Integrated | 🚧 = In Progress | ⏳ = Pending Integration

No.	Name	GitHub Link	Features	Status
0	Langflow	langflow-ai/langflow	Build, scale, and deploy RAG and multi-agent AI apps.But we use it to build a naive RAG.	✅
1	R2R	SciPhi-AI/R2R	SoTA production-grade RAG system with Agentic RAG architecture and RESTful API support.	✅
2	KAG	OpenSPG/KAG	Retrieval framework combining OpenSPG engine and LLM, using logical forms for guided reasoning; overcomes traditional vector similarity limitations; supports domain-specific QA.	⏳
3	GraphRAG	microsoft/graphrag	Modular graph-based retrieval RAG system from Microsoft.	🚧
4	LightRAG	HKUDS/LightRAG	"Simple and Fast Retrieval-Augmented Generation," designed for simplicity and speed.	🚧
5	dsRAG	D-Star-AI/dsRAG	High-performance retrieval engine for unstructured data, suitable for complex queries and dense text.	🚧
6	paper-qa	Future-House/paper-qa	Scientific literature QA system with citation support and high accuracy.	⏳
7	cognee	topoteretes/cognee	Lightweight memory management for AI agents ("Memory for AI Agents in 5 lines of code").	⏳
8	trustgraph	trustgraph-ai/trustgraph	Next-generation AI product creation platform with context engineering and LLM orchestration; supports API and private deployment.	⏳
9	graphiti	getzep/graphiti	Real-time knowledge graph builder for AI agents, supporting enterprise-grade applications.	⏳
10	DocsGPT	arc53/DocsGPT	Private AI platform supporting Agent building, deep research, document analysis, multi-model support, and API integration.	✅
11	youtu-graphrag	youtugraph/youtu-graphrag	Graph-based RAG framework from Tencent Youtu Lab, focusing on knowledge graph construction and reasoning for domain-specific applications.	⏳
12	Kiln	https://github.com/Kiln-AI/Kiln	Desktop app for zero-code fine-tuning, evals, synthetic data, and built-in RAG tools.	⏳
13	Quivr	https://github.com/QuivrHQ/quivr	a RAG that is opinionated, fast and efficient so you can focus on your product.	⏳

More RAG Evaluation Metrics

We will gradually add functionality and performance metrics for RAG evaluation, including:

Metric Type	Metric Name	Description
Effectiveness / Quality Metrics	Recall@k	Proportion of queries where the correct answer appears in the top k retrieved documents
	Precision@k	Proportion of relevant documents among the top k retrieved documents
	MRR (Mean Reciprocal Rank)	Average reciprocal rank of the first relevant document
	nDCG (Normalized Discounted Cumulative Gain)	Ranking relevance metric that considers the importance of document order
	Answer Accuracy / F1	Match between generated answers and reference answers (Exact Match or F1)
	ROUGE / BLEU / METEOR	Text overlap / language quality metrics
	BERTScore / MoverScore	Semantic-based answer matching metrics
	Context Precision	Proportion of retrieved documents that actually contribute to the answer
	Context Recall	Proportion of reference answer information covered by retrieved documents
	Context F1	Combined score of Precision and Recall
	Answer-Context Alignment	Whether the answer strictly derives from the retrieved context
	Overall Score	Composite metric, usually a weighted combination of answer quality and context utilization
Efficiency / Cost Metrics	Latency	Time required from input to answer generation
	Token Consumption	Number of tokens consumed during answer generation
	Memory Usage	Memory or GPU usage during model execution
	API Cost / Compute Cost	Estimated cost of calling the model or retrieval API
	Throughput	Number of requests the system can handle per unit time
	Scalability	System performance change when data volume or user requests increase

RagView Milestone Plan

Key Features

Usability Enhancements

More RAG solutions

More RAG Evaluation Metrics

On this page