Welcome to my Blog!
Information should be free, accurate, and available. I will be updating this section most often, enjoy!
Various galleries of artwork and photography I have done over the last five years.
Mini scripts I have written and decided to share. Mostly oneliners for managing network and system load.
Software projects are in the works and will be here eventually!
You can contact me through various methods.
While the main function of this .htaccess entry is to prevent wget site rips, you can actually identify any agent as malicious and prevent their ability to access the site. The first section identifies the agent as a malicious_agent and the second entry blocks access to your site for all malicious_agent's.
SetEnvIfNoCase User-Agent "^Wget" malicious_agent <Limit GET POST> Order Allow,Deny Allow from all Deny from env=malicious_agent </Limit>
And now attempts to rip off your site receive:
root@webserver:/# wget webserver.tld --2009-04-17 13:46:16-- http://webserver.tld/ Resolving webserver.tld... 127.0.0.1 Connecting to webserver.tld|127.0.0.1|:80... connected. HTTP request sent, awaiting response... 403 Forbidden 2009-04-17 13:46:16 ERROR 403: Forbidden.
One thing to note though, not all users hitting your site with wget are malicious, and not all users using curl are malicious. If you are sharing feeds to other sites, there is a chance they are using these agents to retrieve the information. If you are getting your site ripped off, by all means do what it takes to prevent it.
It is also worth noting that wget can identify itself as another useragent quiet easily, so this effort really only blocks unintelligent automated bots or clowns from ripping your site.
|
|
There is nothing so easy but that it becomes difficult when you do it
reluctantly.
-- Publius Terentius Afer (Terence)