Agenty’s Algolia integration allows you to automatically crawl your website pages to add them on your Algolia indices to create, re-create or refresh your custom search engine on schedule, or on-demand.
Algolia is a web search product company with SaaS pricing model, they have a track record of fastest and most relevant search engine available in the custom search engine market. If you want to add a search engine on your website or blog, Algolia is the way to go. Algolia is one such on-site search engine that provides more suitable and best search platform for your website
But they don’t crawl your website automatically as Google does to refresh the search engine and auto-index new pages, blog posts or products as soon as they are published on your site publicly.
So the problem with Algolia and almost every custom search tool is: You’d need to manually upload a CSV file, or write your own code to crawl your website pages, or pull content from database > then use Algolia’s open-source library to add those pages to your indices(search index) to make them searchable. That’s the pain!
There comes Agenty to solve that problem! Our Algolia integration allows you to:
- Automatically crawl your website on-demand, on schedule or via API
- Extract fields of choice: Like
title
,canonical
,html_body
,description
,crawled_at
etc. - Crawl HTML sitemaps, RSS, JSON or XML feeds to make a workflow of steps. For example — Scrape sitemap first with web scraping agent #1 then each page details using agent #2
- Schedule the crawler to run it daily, weekly etc.
- Send the agent result to your Algolia indices to refresh the indices
Prerequisites
- Agenty professional or higher plan to get access of Algolia plugin
- Algolia account to get your
application_id
, andapi_key
to authorize Agenty to connect to your indices
Algolia API Key
- Login to your Algolia account
- Go to API Keys page
- Copy the application id and admin API key
- This application id and API key will be used by Agenty to authenticate and connect to your Algolia account to add, update objects in indices
Setup your web crawler
Setting up a crawler is easy using our Chrome extension available on Chrome store. You can just go to the website page you want to scrape and add the field of your choice by clicking on the elements to generate selector or write manually if you know how CSS selector works. See this detailed article to learn how to create a scraping agent or video tutorial here.
So, in this example I created 2 scraping agents to crawl agenty.com website
- First agent to crawl hyperlinks from the sitemap and home page
- 2nd web scraper to navigate to each page separately from #1 result, and scrape their details page by connecting to 1st agent.
Configure Algolia plugin
- Go to Agenty plugin page
- Click on the Add button for Algolia plugin row
- The plugin page will open, where we need to select the final agent to attach our plugin(because that’s the one, which has the final result - we want to send to Algolia) and enter the application id, api key and name of indices where the crawling job result will be sent to:
- Click on the save button to attach this plugin to your agent
The plugin will fire on job completion event. For example, if you are crawling > 5,000 pages from your website. The plugin will start the execution when all 5,000 pages crawling has been completed.
Start your web crawler
Once the web crawling agent has been created; plugin has been attached; We are ready to start our web crawling job.
When the job complete, see logs :
2019-05-30 13:00:09.1279 TRACE Algolia plugin started with timeout: 15 minutes
2019-05-30 13:00:15.7217 TRACE Algolia Indices: Cleared successfully
2019-05-30 13:00:15.7217 TRACE Rows 0 to 1000 sent to Algolia successfully
2019-05-30 13:00:15.7217 TRACE Plugin task completed successfully. Duration: 00:00:06.5624766
Preview Algolia Custom Search Engine
- Now, the search index is ready. We can integrate Algolia in the website or can use their built-in UI as well for searching.
- Go back to your Algolia account
- Go to Indices page, and you’ll find your indices have been created, re-created or refreshed with the data sent from Agenty.
- Now, you can generate the UI demo or can use their open-source library in almost every language to add the search feature to your website. For example, we are using instant-search JavaScript library to add the search engine to our website.
- Include the main instantsearch.js library
<script src="https://cdn.jsdelivr.net/npm/algoliasearch@3.32.0"></script>
<script src="https://cdn.jsdelivr.net/npm/instantsearch.js"></script>
- Modify this code with your_application_id and your_api_key or other optional variable if needed.
// 1. Instantiate the search
const search = instantsearch({
indexName: 'Agenty-Search-Index',
searchClient: algoliasearch('your_app_id', 'your_api_key'),
});
// 2. Create an interactive search box
search.addWidget(
instantsearch.widgets.searchBox({
container: '#searchbox',
placeholder: 'Search...',
})
);
// 3. Plug the search results into the product container
search.addWidget(
instantsearch.widgets.hits({
container: '#searchResult',
templates: {
item: '{{#helpers.highlight}}{ "attribute": "title" }{{/helpers.highlight}}',
},
})
);
// 4. Start the search!
search.start();
- Publish your website on server or test on localhost