![]() |
Block The Baidu Spider From My Site?
Hello... does anyone know how i can block the Baidu spider from visiting my website? Ive tried a robots.txt but that hasnt seemed to have worked.
|
Re: Block The Baidu Spider From My Site?
You can do this but it will still make server requests - just not index anything:
Code:
SetEnvIfNoCase User-Agent "^Baidu" bad_botCode:
RewriteEngine On |
Re: Block The Baidu Spider From My Site?
Why do you want to block it ?
|
Re: Block The Baidu Spider From My Site?
I want to block it because my site is on my home server and it visits way to often. Eating my bandwidth.
Acathla, where do I put those? |
Re: Block The Baidu Spider From My Site?
Quote:
I'm not sure why anyone would want to be on a Chinese search engine anyway? |
Re: Block The Baidu Spider From My Site?
Ive realised it ignores the robots.txt...
User-agent: BaiDuSpider Disallow: / User-agent: Baidu Disallow: / User-agent: Baiduspider+(+http://www.baidu.com/search/spider.htm) Disallow: / User-agent: Baiduspider+(+http://www.baidu.com/search/spider_jp.html) Disallow: / The only annoying thing is that it visits like every 5 - 10 mins. Its eating my bandwidth. I cant block it by IP either cause it has loads of IPs |
Re: Block The Baidu Spider From My Site?
OK I guess I should really have asked how your hosting / server / etc as this is for Apache.
Into your httpd.conf Let me know what you can configure and I'll see if I can help. |
Re: Block The Baidu Spider From My Site?
I use abyss
|
Re: Block The Baidu Spider From My Site?
Quote:
Unless you have some reason to want Chinese visitors that'll keep a lot of the spiders out. |
Re: Block The Baidu Spider From My Site?
I can't see how 1 spider can eat all your bandwidth, I used to host my site on half the bandwidth you have and I had google spider and loads of other spiders crawling my site all the time, and it being a forum it had tens of thousands of pages to crawl. It used barely anything, its just like a normal visitor.
The only reason it might use lots is if it has no request-delay like the other crawlers do. Google's default is 5 seconds/page so if its doing more than 1/second it might use more. As said above the easiest way to block it if it is ignoring the robots file is to block the IP range. |
| All times are GMT. The time now is 07:20. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2026, vBulletin Solutions Inc.
All Posts and Content are © Cable Forum