He is :

  • easy to set up.
nifty (requires little or no configuration).
  • collection* is not always perfect.
  • ideal for rapidly building up massive sourcing.

Initialization crawl

A few collection errors may occur, such as a wrong article publication date. For example, if several dates are present on the article, the robot won’t necessarily pick the right one. What’s more, not all pages on a site are dated. The robot will date them as “today”.

This can lead to the phenomenon of “Initialization Crawl “*.

Just after creating this kind of robot, many articles will be created and dated to today. They will then all arrive at once in your theme results. This phenomenon is temporary. New pages published in the next few days will be dated within six hours of the original.
This only applies to the site archives.

How the crawler works

The Website crawler goes to the URL link you’ve entered. It will then move from URL link to URL link, from page to page, clicking on all the buttons it encounters.

This robot will try to create one article per page encountered.

Create a Website crawler

  1. Create source > enter desired URL and verify > I would like to create my crawling bot now + Create.
  2. Fill in the form (only the name is mandatory).
  3. Click on “Create”.

This clever robot doesn’t really need any settings.

You can, however, filter with the “must include “ and the “add block “.

Paste URL parts into these fields to force the crawler to collect only pages with this extension in the URL (Must Include), or to ignore them altogether.

Example: https://www.floorcoveringweekly.com/
On this site, we’ll collect all articles with /topnews/ in their URL.

So you can add this to the MUST INCLUDE to force their robot to collect only articles.

Don’t forget to paste this section with the slashes “/ /” (see example above).

Revision: 2

Feedback

Was this helpful?

Yes No
You indicated this topic was not helpful to you ...
Could you please leave a comment telling us why? Thank you!
Thanks for your feedback.

Post your comment on this topic.

Post Comment