Block The Baidu Spider From My Site?
08-06-2009, 22:52
|
#1
|
|
Inactive
Join Date: Jul 2007
Posts: 1,480
|
Block The Baidu Spider From My Site?
Hello... does anyone know how i can block the Baidu spider from visiting my website? Ive tried a robots.txt but that hasnt seemed to have worked.
|
|
|
08-06-2009, 23:26
|
#2
|
|
Inactive
Join Date: Jun 2003
Posts: 1,354
|
Re: Block The Baidu Spider From My Site?
You can do this but it will still make server requests - just not index anything:
Code:
SetEnvIfNoCase User-Agent "^Baidu" bad_bot
<Directory />
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Directory>
You could probably also see about doing a mod_rewrite like:
Code:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^baiduspider [NC]
RewriteRule .* - [F]
|
|
|
09-06-2009, 01:55
|
#3
|
|
Dr Pepper Addict
Cable Forum Admin
Join Date: Oct 2003
Location: Nottingham
Age: 63
Services: IDNet FTTP (1000M), Sky Q TV, Sky Mobile, Flextel SIP
Posts: 30,574
|
Re: Block The Baidu Spider From My Site?
Why do you want to block it ?
__________________
Baby, I was born this way.
|
|
|
09-06-2009, 17:23
|
#4
|
|
Inactive
Join Date: Jul 2007
Posts: 1,480
|
Re: Block The Baidu Spider From My Site?
I want to block it because my site is on my home server and it visits way to often. Eating my bandwidth.
Acathla, where do I put those?
|
|
|
09-06-2009, 19:03
|
#5
|
|
Inactive
Join Date: Dec 2006
Location: Lincoln UK
Age: 77
Services: 50Mb, TV & Phone
Posts: 3,673
|
Re: Block The Baidu Spider From My Site?
Quote:
Originally Posted by Callumpy
I want to block it because my site is on my home server and it visits way to often. Eating my bandwidth.
Acathla, where do I put those?
|
Apparently it also ignores the robots.txt file so you may find it indexing bits you would rather keep private.
I'm not sure why anyone would want to be on a Chinese search engine anyway?
|
|
|
09-06-2009, 22:27
|
#6
|
|
Inactive
Join Date: Jul 2007
Posts: 1,480
|
Re: Block The Baidu Spider From My Site?
Ive realised it ignores the robots.txt...
User-agent: BaiDuSpider
Disallow: /
User-agent: Baidu
Disallow: /
User-agent: Baiduspider+(+ http://www.baidu.com/search/spider.htm)
Disallow: /
User-agent: Baiduspider+(+ http://www.baidu.com/search/spider_jp.html)
Disallow: /
The only annoying thing is that it visits like every 5 - 10 mins. Its eating my bandwidth.
I cant block it by IP either cause it has loads of IPs
|
|
|
09-06-2009, 22:45
|
#7
|
|
Inactive
Join Date: Jun 2003
Posts: 1,354
|
Re: Block The Baidu Spider From My Site?
OK I guess I should really have asked how your hosting / server / etc as this is for Apache.
Into your httpd.conf
Let me know what you can configure and I'll see if I can help.
|
|
|
10-06-2009, 19:02
|
#8
|
|
Inactive
Join Date: Jul 2007
Posts: 1,480
|
Re: Block The Baidu Spider From My Site?
I use abyss
|
|
|
10-06-2009, 20:38
|
#9
|
|
Inactive
Join Date: Dec 2006
Location: Lincoln UK
Age: 77
Services: 50Mb, TV & Phone
Posts: 3,673
|
Re: Block The Baidu Spider From My Site?
Quote:
Originally Posted by Callumpy
The only annoying thing is that it visits like every 5 - 10 mins. Its eating my bandwidth.
I cant block it by IP either cause it has loads of IPs
|
Block the whole 60.28.*.* range.
Unless you have some reason to want Chinese visitors that'll keep a lot of the spiders out.
|
|
|
11-06-2009, 14:14
|
#10
|
|
Inactive
Join Date: Oct 2005
Location: Merseyside
Age: 37
Services: BT Infinity Option 2, HH5, synced at maximum 80Mbps/20Mbps.
Posts: 2,221
|
Re: Block The Baidu Spider From My Site?
I can't see how 1 spider can eat all your bandwidth, I used to host my site on half the bandwidth you have and I had google spider and loads of other spiders crawling my site all the time, and it being a forum it had tens of thousands of pages to crawl. It used barely anything, its just like a normal visitor.
The only reason it might use lots is if it has no request-delay like the other crawlers do. Google's default is 5 seconds/page so if its doing more than 1/second it might use more.
As said above the easiest way to block it if it is ignoring the robots file is to block the IP range.
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT. The time now is 12:34.
|