Cable Forum

Cable Forum (https://www.cableforum.uk/board/index.php)
-   Internet Discussion (https://www.cableforum.uk/board/forumdisplay.php?f=25)
-   -   Block The Baidu Spider From My Site? (https://www.cableforum.uk/board/showthread.php?t=33651130)

Callumpy 08-06-2009 22:52

Block The Baidu Spider From My Site?
 
Hello... does anyone know how i can block the Baidu spider from visiting my website? Ive tried a robots.txt but that hasnt seemed to have worked.

Acathla 08-06-2009 23:26

Re: Block The Baidu Spider From My Site?
 
You can do this but it will still make server requests - just not index anything:

Code:

SetEnvIfNoCase User-Agent "^Baidu" bad_bot
<Directory />
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Directory>

You could probably also see about doing a mod_rewrite like:

Code:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^baiduspider [NC]
RewriteRule .* - [F]


Paul 09-06-2009 01:55

Re: Block The Baidu Spider From My Site?
 
Why do you want to block it ?

Callumpy 09-06-2009 17:23

Re: Block The Baidu Spider From My Site?
 
I want to block it because my site is on my home server and it visits way to often. Eating my bandwidth.

Acathla, where do I put those?

Dai 09-06-2009 19:03

Re: Block The Baidu Spider From My Site?
 
Quote:

Originally Posted by Callumpy (Post 34811019)
I want to block it because my site is on my home server and it visits way to often. Eating my bandwidth.

Acathla, where do I put those?

Apparently it also ignores the robots.txt file so you may find it indexing bits you would rather keep private.

I'm not sure why anyone would want to be on a Chinese search engine anyway?

Callumpy 09-06-2009 22:27

Re: Block The Baidu Spider From My Site?
 
Ive realised it ignores the robots.txt...

User-agent: BaiDuSpider
Disallow: /

User-agent: Baidu
Disallow: /

User-agent: Baiduspider+(+http://www.baidu.com/search/spider.htm)
Disallow: /

User-agent: Baiduspider+(+http://www.baidu.com/search/spider_jp.html)
Disallow: /

The only annoying thing is that it visits like every 5 - 10 mins. Its eating my bandwidth.
I cant block it by IP either cause it has loads of IPs

Acathla 09-06-2009 22:45

Re: Block The Baidu Spider From My Site?
 
OK I guess I should really have asked how your hosting / server / etc as this is for Apache.

Into your httpd.conf

Let me know what you can configure and I'll see if I can help.

Callumpy 10-06-2009 19:02

Re: Block The Baidu Spider From My Site?
 
I use abyss

Dai 10-06-2009 20:38

Re: Block The Baidu Spider From My Site?
 
Quote:

Originally Posted by Callumpy (Post 34811271)

The only annoying thing is that it visits like every 5 - 10 mins. Its eating my bandwidth.
I cant block it by IP either cause it has loads of IPs

Block the whole 60.28.*.* range.
Unless you have some reason to want Chinese visitors that'll keep a lot of the spiders out.

AbyssUnderground 11-06-2009 14:14

Re: Block The Baidu Spider From My Site?
 
I can't see how 1 spider can eat all your bandwidth, I used to host my site on half the bandwidth you have and I had google spider and loads of other spiders crawling my site all the time, and it being a forum it had tens of thousands of pages to crawl. It used barely anything, its just like a normal visitor.

The only reason it might use lots is if it has no request-delay like the other crawlers do. Google's default is 5 seconds/page so if its doing more than 1/second it might use more.

As said above the easiest way to block it if it is ignoring the robots file is to block the IP range.


All times are GMT. The time now is 07:20.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2026, vBulletin Solutions Inc.
All Posts and Content are © Cable Forum