Package Data | |
---|---|
Maintainer Username: | Sukohi |
Maintainer Contact: | capilano.sukohi@gmail.com (Sukohi) |
Package Create Date: | 2017-02-15 |
Package Last Update: | 2017-02-15 |
Language: | PHP |
License: | MIT |
Last Refreshed: | 2024-11-18 03:03:05 |
Package Statistics | |
---|---|
Total Downloads: | 12 |
Monthly Downloads: | 0 |
Daily Downloads: | 0 |
Total Stars: | 2 |
Total Watchers: | 3 |
Total Forks: | 0 |
Total Open Issues: | 0 |
Laravel package to crawl websites.(Laravel 5+)
Execute the next command.
composer require sukohi/search-bot:1.*
Set the service providers in app.php
'providers' => [
...Others...,
Sukohi\SearchBot\SearchBotServiceProvider::class,
Sukohi\LaravelAbsoluteUrl\LaravelAbsoluteUrlServiceProvider::class,
]
Also alias
'aliases' => [
...Others...,
'LaravelAbsoluteUrl' => Sukohi\LaravelAbsoluteUrl\Facades\LaravelAbsoluteUrl::class,
'SearchBot' => Sukohi\SearchBot\Facades\SearchBot::class,
]
Then execute the next commands.
php artisan vendor:publish
php artisan migrate
Now you have config/search_bot.php
which you can set domains restrictions.
return [
'main' => '*',
'yahoo' => ['yahoo.com', 'www.yahoo.com'],
'reddit' => ['www.reddit.com']
];
*
.$starting_url = 'http://yahoo.com';
$options = [
'type' => 'main', // $type is optional.(Default: main),
'url_deletion' => true // Default: true
];
$result = \SearchBot::request($starting_url, $options);
if($result->exists()) {
// Symfony\Component\BrowserKit\Response
// See http://api.symfony.com/2.3/Symfony/Component/BrowserKit/Response.html
$response = $result->response();
// Symfony\Component\DomCrawler/Crawler
// See http://api.symfony.com/2.3/Symfony/Component/DomCrawler/Crawler.html
$crawler = $result->crawler();
$result->links(function($url, $text){
// All links including URL & text will come here.
});
$result->queues(function($crawler_queue, $url, $text){
// All links that do not exist in DB will come here.
// $crawler_queue has already type and url.
$crawler_queue->save();
});
} else {
$e = $result->exception();
echo $e->getMessage();
$type = $result->type();
$url = $result->url();
}
type
Type is string that you can decide freely.
Default is main
.
url_deletion
If true here, URL accessed will be removed from DB.
Default is true
.
This package is licensed under the MIT License.
Copyright 2017 Sukohi Kuhoh