RAG-QA-ARENA
2 minute read
Request for data
You can obtain the dataset by emailing any of the authors.
Process
You will find a RAG Arena folder in Google Drive. Place the data files from this folder into the rag-qa-arena folder in your GitHub repository.
Index and Answer
Add the -a
flag to the original command to skip evaluation and obtain the raw parquet file.
For example,
python -m /eval/eval_node -f path/to/main_folder -q path/to/question_parquet -a
Use change.ipynb
in the rag-qa-arena folder to convert the parquet to the evaluation JSON format. Place the processed JSON files in the data/pairwise_eval folder, following this structure:
📁 rag-qa-arena
└── 📁 data
└── 📁 pairwise_eval
└── 📁 GraphRAG
├── 📄 fiqa.json
├── 📄 lifestyle.json
├── 📄 recreation.json
├── 📄 science.json
├── 📄 technology.json
└── 📄 writing.json
└── 📁 NodeRAG
├── 📄 fiqa.json
├── 📄 lifestyle.json
├── 📄 recreation.json
├── 📄 science.json
├── 📄 technology.json
└── 📄 writing.json
└── 📁 NaiveRAG
├── 📄 fiqa.json
├── 📄 lifestyle.json
├── 📄 recreation.json
├── 📄 science.json
├── 📄 technology.json
└── 📄 writing.json
Compare with LFRQA directly.
Modify the script by adding your openai_key
.
For mac and linux,
bash run_pairwise_eval_lfrqa.sh
For Windows,
run_pairwise_eval_lfrqa.bat
Compare a pair of LLM generations.
Modify the script by adding your openai_key
.
For mac and linux,
bash run_pairwise_eval_llms.sh
For windows,
run_pairwise_eval_llm.bat
You should modify model1 and model2 to ensure each model is compared against every other model. For example, you can compare NaiveRAG against the other four models, then compare Hyde against the remaining three models (excluding NaiveRAG), and so on until all pairwise comparisons are complete.
3.2 Complete Pairs
python code/report_results.py --use_complete_pairs
This script reports win and win+tie rate for all comparison, and output an all_battles.json
.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.