Indexing

Learn how to index the NodeRAG base from the original corpus. This guide provides step-by-step instructions for efficient indexing.

1: Build
2: Increment Update

1 - Build

Learn how to build and configure the NodeRAG project. This guide provides step-by-step instructions for setting up the project structure and configuration.

Build

Get familiar with project construction

The NodeRAG project has the following structure. You need to manually construct this structure by creating a project folder and placing the input folder inside it. In the input folder, place the corpus you need to RAG.

main_folder/
├── input/
│   ├── file1.md
│   ├── file2.txt
│   ├── file3.docx
│   └── ...

Key Directories

main_folder: The root directory of the project.
input: Contains all input files to be processed by NodeRAG. Supported file formats include: .md, .doc, and .txt.

Quick Input Example

Download this txt file as a quick example to your input folder.

Config

python -m NodeRAG.build -f path/to/main_foulder

When you first use this command, it will create Node_config.yaml file in the main_folder directory.

create config

Modify the config file according to the following instructions (add API and service provider) to ensure that NodeRAG can access the correct API.

To quickly use the NodeRAG demo, set the API key for your OpenAI account. If you don’t have an API key, refer to the OpenAI Auth. Ensure you enter the API key in both the model_config and embedding_config sections.

For detailed configuration and modification instructions, see the Configuration Guide.

#==============================================================================
# AI Model Configuration
#==============================================================================
model_config:
  model_name: gpt-4o-mini            # Model name for text generation
  api_keys:    # Your API key (optional)

embedding_config:
  api_keys:    # Your API key (optional)

Building

After setting up the config, rerun the following command:

python -m NodeRAG.build -f path/to/main_folder

The terminal will display the state tree:

state tree

Press y to continue. Wait for the workflow to complete.

processing

finished

The indexing process will then finish. The final structure (after generation) will be explained in the NodeRAG file structures documentation.

For the next step, see the Answer documentation to generate domain-specific answers.

2 - Increment Update

NodeRAG supports incremental updates of the corpus.

Incremental Update Support

NodeRAG supports incremental updates. It tracks the hash IDs of previously indexed documents to manage updates efficiently.
Do not modify files that have already been indexed, as this may lead to duplicate indexing or unpredictable errors.

Best Practice for Adding New Corpus

To add new documents, place the new files in the input folder, then rerun the indexing command:

python -m NodeRAG.build -f path/to/main_folder

NodeRAG will automatically detect new files and convert them into its internal database format without reprocessing existing data.

For more details on incremental mode and a comparison between GraphRAG and LightRAG approaches to incremental updates, see this blog post.