LogoRAG VIEW

RagView Milestone Plan

Key Features

  • Test Set Auto-Generation: Based on the documents in the document set, use a naive chunking method and an LLM to automatically generate Q&A pairs from each chunk, producing the test set data.
  • Custom RAG Integration: Provide an SDK/API for developers to integrate their own RAG solutions into RagView, enabling comparison between their solutions and open-source solutions.
  • Evaluation Task Optimization: Support setting up and comparing multiple configurations (different hyperparameters) of the same RAG solution.
  • Evaluation Report Generation: Support automatic generation of PDF reports from evaluation results.

Usability Enhancements

  • Email Notifications: Since evaluations are asynchronous and may take minutes to tens of minutes, add email notifications to inform users when evaluation results are ready.
  • Result Charting: Generate bar charts, pie charts, radar charts, etc., based on metric scores to facilitate visual comparison.
  • Hardware Resource Profiling: Collect statistics on hardware resource usage for different evaluation pipelines, aiding developers in assessing production feasibility.
  • Optional Metrics: Make evaluation metrics optional (no longer mandatory), allowing users to select only the metrics they are interested in.

More RAG solutions

Legend:
✅ = Integrated | 🚧 = In Progress | ⏳ = Pending Integration

No.NameGitHub LinkFeaturesStatus
0Langflowlangflow-ai/langflowBuild, scale, and deploy RAG and multi-agent AI apps.But we use it to build a naive RAG.
1R2RSciPhi-AI/R2RSoTA production-grade RAG system with Agentic RAG architecture and RESTful API support.
2KAGOpenSPG/KAGRetrieval framework combining OpenSPG engine and LLM, using logical forms for guided reasoning; overcomes traditional vector similarity limitations; supports domain-specific QA.
3GraphRAGmicrosoft/graphragModular graph-based retrieval RAG system from Microsoft.🚧
4LightRAGHKUDS/LightRAG"Simple and Fast Retrieval-Augmented Generation," designed for simplicity and speed.🚧
5dsRAGD-Star-AI/dsRAGHigh-performance retrieval engine for unstructured data, suitable for complex queries and dense text.🚧
6paper-qaFuture-House/paper-qaScientific literature QA system with citation support and high accuracy.
7cogneetopoteretes/cogneeLightweight memory management for AI agents ("Memory for AI Agents in 5 lines of code").
8trustgraphtrustgraph-ai/trustgraphNext-generation AI product creation platform with context engineering and LLM orchestration; supports API and private deployment.
9graphitigetzep/graphitiReal-time knowledge graph builder for AI agents, supporting enterprise-grade applications.
10DocsGPTarc53/DocsGPTPrivate AI platform supporting Agent building, deep research, document analysis, multi-model support, and API integration.
11youtu-graphragyoutugraph/youtu-graphragGraph-based RAG framework from Tencent Youtu Lab, focusing on knowledge graph construction and reasoning for domain-specific applications.
12Kilnhttps://github.com/Kiln-AI/KilnDesktop app for zero-code fine-tuning, evals, synthetic data, and built-in RAG tools.
13Quivrhttps://github.com/QuivrHQ/quivra RAG that is opinionated, fast and efficient so you can focus on your product.

More RAG Evaluation Metrics

We will gradually add functionality and performance metrics for RAG evaluation, including:

Metric TypeMetric NameDescription
Effectiveness / Quality MetricsRecall@kProportion of queries where the correct answer appears in the top k retrieved documents
Precision@kProportion of relevant documents among the top k retrieved documents
MRR (Mean Reciprocal Rank)Average reciprocal rank of the first relevant document
nDCG (Normalized Discounted Cumulative Gain)Ranking relevance metric that considers the importance of document order
Answer Accuracy / F1Match between generated answers and reference answers (Exact Match or F1)
ROUGE / BLEU / METEORText overlap / language quality metrics
BERTScore / MoverScoreSemantic-based answer matching metrics
Context PrecisionProportion of retrieved documents that actually contribute to the answer
Context RecallProportion of reference answer information covered by retrieved documents
Context F1Combined score of Precision and Recall
Answer-Context AlignmentWhether the answer strictly derives from the retrieved context
Overall ScoreComposite metric, usually a weighted combination of answer quality and context utilization
Efficiency / Cost MetricsLatencyTime required from input to answer generation
Token ConsumptionNumber of tokens consumed during answer generation
Memory UsageMemory or GPU usage during model execution
API Cost / Compute CostEstimated cost of calling the model or retrieval API
ThroughputNumber of requests the system can handle per unit time
ScalabilitySystem performance change when data volume or user requests increase