The internal parameters of LLMs are often insufficient to capture external knowledge that is both rapidly evolving and extremely long-tailed. Consequently, establishing a framework that enables LLMs to interact with external knowledge sources is necessary to improve their factuality and reliability. Retrieval-augmented generation (RAG) has become a mainstream solution toward this goal. However, when external knowledge is not organized as plain text corpora but instead interconnected in graph structures, two major challenges arise.
Challenge 1: Heterogeneous Relations. Graphs may contain heterogeneous node and edge types (e.g., [Gene] is upregulated in [Anatomy] vs. [Gene] is downregulated in [Anatomy]), making it inadequate to retrieve information solely through semantic similarity search. Instead, effective information access requires precisely defined function calls (e.g., finding genes connected to a given anatomy entity via "is upregulated in").
Challenge 2: Multi-hop Reasoning. Reasoning over graphs usually requires capturing complex, multi-hop connections between nodes. This makes it impractical to force the model to acquire all necessary information in a single round. Instead, an adaptive, multi-round process is needed, in which information seeking and reasoning steps are interleaved (e.g., subsequent function calls should be based on the gene nodes obtained in the current round).
To elicit multi-round graph interaction capabilities in LLMs, existing studies have proposed various prompting strategies. However, compared with actual model training, prompting-based approaches are less effective at helping models internalize the sophisticated graph interaction skills. On the other hand, training models via supervised fine-tuning (SFT) to access external knowledge struggles to generalize to new graphs and domains, such as those with novel node/edge types, function calls, or tasks.