Skip indexing content

Use FlowHunt’s skip indexing feature to exclude repetitive or unsuitable content from your AI chatbot’s knowledge base, ensuring relevant and safe interactions.

Skip indexing content

No matter how powerful, AI is still just a machine that relays the information it learns. It doesn’t understand jokes, hypotheticals, or sarcasm, which are often to blame for the most hilariously awful (and sometimes seriously harmful) answers. To ensure your chatbot doesn’t create the newest AI scandal and to help it understand your content better, you can let it know which content to skip.

The way to ensure AI’s reliability is by monitoring the information it learns from. Not all of your content will be suitable for the Chatbot to use. The flowhunt-skip class allows you to mark the content FlowHunt should not index. Any HTML element with this class will be ignored while processing the content.

When to use the skip parameter

There are two main reasons you should use this class, but feel free to use it on any content you find unnecessary or inappropriate for the bot to use.

  1. Skipping repetitive content: If similar content keeps getting indexed, it makes it hard for AI to distinguish and categorize what the content is about. Skipping duplicit information also saves you money on text processing in the long run.

  2. Skipping risky or inappropriate information: You should skip any information that may cause the AI to give wrong, harmful, or out-of-context answers. Be especially cautious if your brand tone often uses jokes or strong language. While great for other content, the users might not appreciate a snarky bot.

How to use the flowhunt-skip parameter

FlowHunt crawls and indexes your website to give context to the Chatbot. Anything FlowHunt indexes your Chatbot may use at some point.

Adding the flowhunt-skip class to HTML elements lets you mark the content you don’t want to index. Any element featuring this class will be ignored and never reach the Chatbot.

Here’s an example of using the class:

<div class="flowhunt-skip">
  <h2>Duplicit content</h2>
  <p>This content is duplicate. I don’t want FlowHunt to index it again.</p>
</div>

You can also skip just a single paragraph or a part of one element:

<div>
  <h2>My content</h2>
  <p>This paragraph should be indexed.</p>
  <p class="flowhunt-skip">I don't want the Chatbot to use this information.</p>
  <p>This paragraph should be indexed.</p>
</div>

How does the indexing work

The crawling process runs in the background and is based on the schedules you set up. It only downloads the HTML page. Any images or media simply get stored as links. Any redirects are followed, and canonical URLs are evaluated.

Once crawled, the HTML content gets converted into plain markdown text. Some information may be removed during this process. The final markdown text is offered to the Chatbot as context. The bot can then retrieve this information whenever needed.

How does AI know which information to pick

The markdown text gets split into chunks, vectorized, and stored in a vector database. This type of database assigns values to word meanings. As a result, AI can understand related words instead of needing an exact word match.

The words get spread on a grid based on their assigned values. This allows the computer to understand which words are close in meaning to each other:

Text split into chunks, vectorized, and stored in a vector database

Note: This is a very simplified model. In practice, AI does this with thousands of words, phrases, and entire sentences.

The retrieval of information from vector databases is called the semantic search. It’s the AI’s ability to search and evaluate the meaning of words in the vector database, using them to provide answers.

When a user posts a query, the bot converts the words to vectors. It then searches the database for any close matches from your content. Finding matches or similar content, it then uses the information to craft an answer.

Why is semantic search so important

Imagine you own an online pet shop. A customer asks the following query:

“Do you sell food for kittens?”

You do, but the product name features the word “junior” instead of “kitten”. The bot will be able to understand that “junior cat food” is the same (or very similar) as “food for kittens” and successfully guide the customer to the right product.

Without semantic search in the vector database, the Chatbot would simply reply that you don’t carry “food for kittens,” making you lose a future customer. You don’t have to worry about anything like this happening when using FlowHunt.

Frequently asked questions

What is the skip indexing feature in FlowHunt?

The skip indexing feature lets you exclude specific content from being used by your AI chatbot. By adding the flowhunt-skip class to HTML elements, you ensure that unsuitable or repetitive content is not indexed or used in chatbot responses.

Why should I skip certain content when training my AI chatbot?

Skipping repetitive, inappropriate, or potentially misleading content helps your AI chatbot provide more relevant, safe, and accurate responses. It also improves performance and reduces unnecessary processing costs.

How do I use the flowhunt-skip class?

Add the flowhunt-skip class to any HTML element you don't want indexed. FlowHunt will ignore these elements during its crawling process, keeping them out of your chatbot's knowledge base.

How does FlowHunt process and store indexed content?

FlowHunt crawls your site, converts HTML to markdown, splits text into chunks, and stores them in a vector database. This allows semantic search so the AI can understand related words and deliver relevant answers to user queries.

What is semantic search and why is it important?

Semantic search uses vector databases to understand word meanings and relationships, not just exact matches. This enables your chatbot to provide smarter, context-aware responses, even if users use different wording.

Ready to build your own AI?

Smart Chatbots and AI tools under one roof. Connect intuitive blocks to turn your ideas into automated Flows.

Learn more