Document Retriever
Document Retriever links AI models to your chosen documents and URLs, enabling accurate, up-to-date, and relevant AI responses for your specific use case.

Component description
How the Document Retriever component works
The most significant setback of large language models is their tendency to present vague, outdated, or downright false information. To ensure the answers are always up to date and relevant to your use case, generative models need to be pointed to the right knowledge sources.
This approach, called the Retrieval-Augmented Generation (RAG), supplies generative models with your own knowledge sources. The retriever components, including the Document Retriever, allow you to use this method.
What is the Document Retriever component?
This component allows the chatbot to retrieve knowledge from your own sources, ensuring that the information is relevant, reliable, and up-to-date. This information comes directly from the sources you specified in the Documents and Schedules. The role of this component is to control the retrieval.

Input Query
Specifies the query that’s used to look up relevant information. It can either be linked from a component or inputted manually. In most cases, your input query will be the Chat Input.
Document Count
This setting limits the amount of documents the flow should retrieve from, making sure the results remain relevant and don’t take too long to generate.
Document categories
This optional setting lets you limit the retrieval to one of the categories you’ve created in the Documents screen of Knowledge Sources.
Schedules
Lets you limit the retrieval to one of the Schedules you’ve specified in the Schedules screen of Knowledge Sources.
Threshold
The sources in your knowledge database will match the query to varying degrees. AI will rank these by relevance from 0 to 1. This setting lets you control how well the output must match the query.
The exact threshold depends on your use case, but generally, 0.7-0.8 is recommended for highly relevant answers from a reasonable amount of sources.
Imagine you set the threshold to 0.6 and have the following articles:
- Article A: 0.8
- Article B: 0.65
- Article C: 0.5
- Article D: 0.9
Only the articles with a relevance score of over 0.6 will make it into the output, that is, only A, B, and D.
- A high threshold, such as 0.9, will return very relevant results that closely match the query, but it might struggle to find enough documents and miss some relevant ones.
- A low threshold, for example, one below 0.5, will provide information from more documents, but it runs the risk of returning irrelevant information.
How to connect the Document Retriever component to your flow
The component contains just one input and one output handle:
- Input Query: The query can be any text output. Common use cases would be connecting human Chat Input or a Generator.
- Output: The output of any retriever-type component is always a Document.
The Document output contains structured data unsuitable for the final chat output. All components that take Documents as their input transform them into a user-friendly format. These are either Widget components or the Document to Text transformer.
Why Use the Document Retriever?
- Grounding AI Models: Enhance the factual accuracy and relevance of AI outputs by providing real, contextual information from your organization’s knowledge base.
- Contextual Augmentation: Supply LLMs or chatbots with supporting documents or reference material for more informed responses.
- Flexible Filtering: Search can be fine-tuned by category, schedule, URL, document structure, or metadata, ensuring you surface only the most relevant information.
- Custom Output: Choose how much content to retrieve, how to split it, and which metadata to include, making it easy to adapt for downstream AI processes or UI needs.
- Agent Integration: With tool descriptions and naming, the component can be referenced as a tool in agent-based architectures.
Example Use Cases
- Retrieval-Augmented Generation (RAG): Provide LLMs with supporting documents to generate accurate, knowledge-backed responses.
- Chatbots and Virtual Assistants: Quickly surface FAQs or policy documents in response to employee/customer questions.
- Data Enrichment: Pull in product, author, or other metadata for further AI-driven analysis or workflow automation.
Example
Let’s Try it Now! Before building the flow, we must ensure we have created relevant Documents or Schedules. If no good source is present, the chatbot will either apologize for being unable to answer.
Steps:
- Start with Chat Input.
- Add the Document Retriever and connect Chat Input as the Input Query.
- The output is a Document that needs to be transformed; for this example, we will use the Document to Text.
- Next, connect an AI Generator.
- You’re ready to chat.

Now our Flow can search our sources based on a human query, transform the structured data into readable text, and pass it to AI to generate a user-friendly answer.
Our Knowledge Sources contain a Schedule set to crawl FlowHunt’s pricing page for up-to-date information. Let’s ask the bot about it:

Examples of flow templates using Document Retriever component
To help you get started quickly, we have prepared several example flow templates that demonstrate how to use the Document Retriever component effectively. These templates showcase different use cases and best practices, making it easier for you to understand and implement the component in your own projects.
Frequently asked questions
- What is the DOcument Retriever component?
This component allows the Flow to retrieve knowledge from your own sources, such as documents and URLs, ensuring the returned information is relevant, reliable, and up-to-date.
- Why can’t I connect a Document Retriever to Chat Output?
Retriever components create structured data that is not suitable for output. It must first be transformed to text or visual format before sending to the Chat Output component.
- Where does the Knowledge Retriever get information from?
The component searches for the closest query match within the information from user-specified URLs, documents, and schedules.
- How many documents does it return?
You can set a limit for the number of results returned, ensuring only the most relevant content is included in your flow.
- Can I filter which documents are searched?
Yes, you can filter by document categories, schedules, or URLs, focusing the search on specific segments of your knowledge base.
- Can I connect both the Document Retriever and GoogleSearch? If so, which one is prioritized?
You can use both simultaneously. Each retriever leads to its own output, with priority set by the order of outputs in the canvas. The first output from the top is prioritized.
Try FlowHunt's Document Retriever
Build smarter AI solutions by connecting your knowledge sources and ensuring your chatbot always delivers relevant, up-to-date answers.