The scrapping bot is the opposite of the Website Crawler.

He is absolutely not a self-made bot! You must take it by the hand and show it where it can find the title of an article, its image, etc.

The Scrapping bot will visit*+ the page+* you indicated. It won’t get interested in the entirety of the website/‘domain name’ like the crawler.
If you launch it on the press releases page, it will stay in that sub-category.

What advantage then?
The Scrapping bot, if set up correctly, offers impeccable collect of data and is fully customisable.

Create a scrapping bot

  1. Source > Create source > Scrapping bot
  2. Paste the URL of the page you are interested in
  3. “Verify”
  4. Fill in the form
  5. Then click on the “Edit selectors” button

Cikisi will try to present the page of this site to you, like on a browser. Note that you are still on Cikisi.

The goal is simple: An article is made up of several elements. A title, a URL link, a date, etc.
It’s up to you to show the robot where to find them on the page.

To do so, click on the selector type on the right, then on its equivalent on the left (preview):

Most of these selectors are obvious. The very first, however, deserves an explanation.

h2.The “wrapper”?

A wrapper is the envelope of a single and unique article, any one of them. This will allow the robot to distinguish one item from another.

point & click

  1. Click on one of the selectors on the right (ex. “Wrapper”)
  2. It will turn orange. This means that it waits for you to click on the corresponding element on the page you are interested in (on the left).
  3. Move the mouse over the site page: orange boxes will appear.
  4. Click on the corresponding box.

First and second level?

As you may have noticed, the selectors on the right are divided into two groups: the first and the second level.

  • First level: This is the first page, the URL link you provided when you created the robot. Often, the first level displays a list of several articles.
  • Second level: The second level corresponds to what is found when you click on “read more” of an article presented on the first level, or when you click on the title of a publication. You access the page dedicated to the article and its full content.

The scrapping bot will therefore collect some of the elements of a “future Cikisi article” at the first level, on the link you have indicated, then it will pick up the rest by following the article link to access the entire article.

Which selector corresponds to what element?

As we have already mentioned, a Cikisi article has several elements. Each element is a selector.

Wrapper: the wrapping/envelope of a single article. Allows the robot to distinguish one post from another.
Title: The title of the article (will also serve as the title in Cikisi)
Link: The URL link to read the full article. In Cikisi, when you click on the title of an item, you are redirected to the original page. Behind the title is actually the “title” and “link” selectors. If this selector is not filled in, your Cikisi articles would not redirect to any particular content other than the scrapping bot’s starting page.
Description: description of an item, present at the first level. If a Cikisi article does not have one, we create one based on the beginning of the article.
Image: Illustrative image
Date: The date the article was published. If it is not displayed, we will date the article as “today”.
Pagination: The navigation system to turn the pages of results or display more articles at the first level. You can teach your scrapping bot to turn pages and find archives/old articles. Simply tell it the area where it can find the page numbers or the “load more” button (take something to include all the page numbers) and tell the robot the format (numeric “page 1, 2, 3…”, a simple “view more” or a “scroll down”)

Content: The rest of the article we can find at the second level
PDF: A button to download a PDF document. This will be indexed to your article in Cikisi.
Author: Author of the post

Move the selectors

The date is not present on the first level but on the second? You can move the selectors by holding click and dragging to the second or first level.

Do you speak HTML? Our robot too!

You can manually enter your selectors in the blue bar of the “edit selectors” menu:

Or by clicking on “edit selectors manually” in the previous menu:

The following tutorial shows you how to find your own selectors and writing rules.

The classic mistakes of the scrapping bot

  1. The wrapper must be put on a single article! A common mistake among beginners is to place a large wrapper encompassing all the articles. 1 wrapper = 1 item.
  2. A wrapper must contain only articles, not an area that has anything to do with it: Click again on the wrapper selector on the right panel and check that only the content of interest is in blue. If noise is also taken (like an ad), it will generate an error.
  3. One of the wrappers has no title: A Cikisi article always needs a title!
  4. The first level is well collected but I don’t have the whole article? Your “link” selector is probably defective and the scrapping bot therefore does not have access to the full content (second level). Click on the link of a Cikisi article (item view). Are you redirected to the original article in its entirety? If not, your link is wrong.

Revision: 5

Feedback

Was this helpful?

Yes No
You indicated this topic was not helpful to you ...
Could you please leave a comment telling us why? Thank you!
Thanks for your feedback.

Post your comment on this topic.

Post Comment