Spam-proofing "Plans" |
|
|
Troop 53 Leadership Smart Books Members' Information: ![]() |
Looked at your "View Pending Calendars" lately?You might get a huge surprise! Ads and links for various products and services, legit and non-legit, and, as was our case, a link to a porn website! I felt personally violated when I found that, like someone had entered my house and vandalized it. It started me on a quest to spam-proof our Plans-based web calendar. I started using that script back in 2002 or 2003 and have been very pleased with its look, ease of use, easy setup and more. Apparently many others have as well. I first noticed our vandalization around the end of February, 2008, when updating our calendar with some new and changed events. I immediately went to our logs and found many lines such as: 80.227.1.101 - - [01/Mar/2008:14:49:23 -0500] "GET /cgi-bin/calendar/plans.cgi?active_tab=2&add_edit_cal_action=add HTTP/1.1" 200 29338 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.8) Gecko/20071008 Firefox/2.0.0.8 RPT-HTTPClient/0.3-3"
And 80.227.1.101 - - [01/Mar/2008:14:49:26 -0500] "POST /cgi-bin/calendar/plans.cgi HTTP/1.1" 200 29738 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.8) Gecko/20071008 Firefox/2.0.0.8 RPT-HTTPClient/0.3-3"
By way of translation, for those who don't know how to read Apache server logs, those are actually 2 long lines each beginning with the IP number of the spammer's computer; in this case 80.227.1.101 which comes from an ISP in Dubai, United Arab Emirates (of all places). After the 2 dashes is the date/time string of the request. Then comes GET, which tells the server, "retrieve this file (or URL) and send it to the requesting computer", followed by the requested URL — /cgi-bin/calendar/plans.cgi?active_tab=2&add_edit_cal_action=add. This URL is a form in the Plans calendar that posts a requested "New Calendar" to another page to await approval from the administrator. In the next line is POST which means that form was filled out and submitted and is now viewable, in older versions of Plans, on the "View Pending Calendars" page. This is, in those older versions of Plans, a huge hole in spam security which has become rather well known (notice how many sites in this Google search have been hit!) and robots will hammer your website attempting to spam your calendar. In the later versions (7.9.x) you can set a password on that form so that the requested calendar cannot be submitted without that password — more on that below. To make a long 2-week story a little shorter, and to get to the meat of what you need to do, (which is, after all, probably why you're here) I stumbled upon the following things to do to both spam-proof your calendar and to cut down on bandwidth at the same time, since each request probably sends anywhere from 30-100kb of information and our calendar has been getting hit with anywhere from 150-300 similar requests per day. Spam-proofing steps
If you ever need to legitimately add a calendar, open your .htaccess file and comment out the 2nd and 3rd lines so they aren't executed: RewriteEngine On
# RewriteCond %{QUERY_STRING} action=add
# RewriteRule .* - [F]
Upload .htaccess, change permissions of (CHMOD) /data/new_calendars.xml to 644 (-rw-r--r--), add your calendar(s), uncomment the lines, upload .htaccess again, CHMOD /data/new_calendars.xml back to 444 and you're back to protected. Update: May 2, 2008For most of the last couple of months the IPs these attempts were coming from were fairly random. During that time I had a theory that they were coming from "internet cafes" — notorious for their computers being infected with all sorts of virri, malware, keyloggers, etc. However, in the last couple of weeks the vast majority of attempts are coming from just a few IPs traceable back to even fewer web servers/hosts. One, velcom.com from Canada, I must give props to. They put up with me and a couple of rants very professionally for 48 hours and resolved an issue with a customer running one of these bots or scripts on one of their IPs. So if you had been seeing spam attempts from 206.53.51.50 but no longer do you can thank Julia, John, Kate and others from velcom.com. They run a class act. The battle against others continues however. After almost 2 months of having the above code in our .htaccess file the frequency of attempts has not abated. As mentioned in the above paragraph the number of IPs seems to have greatly decreased to only 7 or 8. The majority of them can be traced back to svservers.com. I have sent a complaint to them using their abuse form and nothing has happened. This "company" seems to be from Russia by the Cyrillic alphabet used on their website. There is confusing information from various whois and traceroute data on where they are and who provides hosting/bandwidth to them. According to traceroute and IP whois information they seem to be a reseller for one company (possibly ezzi.net?) but a domain name whois indicates another (hopone.net?). Yet another company (layeredtech.com) shows up in other IP whois information. So there is a slight (very slight, bordering on remote) possibility that these other companies are resellers for svservers.com. As I said, all very confusing. Update: May 4, 2008It appears to be over! Woohoo!!! After not receiving a response to my complaint to svservers.com I redirected all requests from their IPs to their abuse form. Yes, it's a rather evil thing to do and I won't show how to do it here (hint: Google is your friend...), but I was more than a little perturbed. I checked our logs for this morning and noticed that spam requests stopped right after 1:00 AM Eastern time. There were 7 requests in the midnight to 1:00 hour, then there was one at 1:05 and one at 1:12. I was trying to figure out what happened when I noticed that an IP that was similar to 2 of the spamming IPs hit our main page between the last 2 requests. Then I noticed the UserAgent string: "Opera/9.25 (Windows NT 5.1; U; ru)". "ru" is the Russian language version of Opera. If it had been an English language version the last 2 letters would have been "en". So I traced the IP and, lo and behold, it landed on svservers.com. So, I don't know whether it was the complaint, the redirecting of 200-300 hits a day, or a combination thereof. But somebody took care of it right after 1:12 AM, May 4. My thanks to this unseen and unknown person. Update: May 27, 2008Over the last 3+ weeks there has been about 1 attempt a day to spam our calendar. These attempts are again coming from random IPs. One attempt and gone. They all have this same pattern so it's got to be a robot or bot-net. An edited-for-clarity excerpt from our logs for one such attempt: GET /cgi-bin/calendar/plans.cgi?active_tab=2 GET /cgi-bin/calendar/plans.cgi?active_tab=2&add_edit_cal_action=add (they get a "403" here) GET /cgi-bin/calendar/plans.cgi?active_tab=2&add_edit_cal_action=view_pending GET /cgi-bin/calendar/plans.cgi?active_tab=0 GET /cgi-bin/calendar/plans.cgi?active_tab=1 POST /cgi-bin/calendar/plans.cgi GET /cgi-bin/calendar/plans.cgi GET /cgi-bin/calendar/plans.cgi?active_tab=2 GET /cgi-bin/calendar/plans.cgi?active_tab=1 GET /cgi-bin/calendar/ (they get a "403" here) GET /cgi-bin/calendar/plans.cgi?active_tab=0 GET /calendar/spam-proof.htm (I find this hilarious since that URL is this page!) Additionally, all of these spamming attempts have the same UserAgent string: The other commonality among them is that they go through all those hits without "referral pages". In other words, they go from "active_tab=2" to "add_edit_cal_action=add" without the former being the referring URL for the latter. The only exception to this is the POST request. That does have "active_tab=1" as the referring URL. Notice also that they are now trying to spam the Add/Edit Events section (tab 1). They fail because of the password requirement but they still try. My first thought was banning by UserAgent since I had only seen that string used in these attempts. I actually tried to do that but apparently didn't write the .htaccess code correctly since it threw 500 Internal Server Errors. I probably wasn't "escaping" the correct characters. But then today I saw another visitor to our site with the exact same UserAgent string, traced it back, and found it was from a school that one of our Scouts attends. Since school and the library are the only places this Scout has access to the internet I can't ban by that UserAgent now. Back to the drawing board... Some more research and a little "thinking outside the box" led to some "anti-leeching" or "anti-theft" code. This is normally used to keep other sites from "hot-linking" to content (usually images, videos, or music) but I thought it just might work for our calendar. Here's the code: SetEnvIfNoCase Referer "troop53\.net" local_ref=1 <FilesMatch "\.(cgi|xml|pl)"> Order Allow,Deny Allow from env=local_ref </FilesMatch> This tells the server, "If the referring URL is from anywhere except troop53.net (or blank) do not allow access to any file with a .cgi, .xml, or .pl extension." Adding the .xml file extension is probably overkill, but it can't hurt. In contrast to the previous code, I created a new .htaccess file and placed it in the /cgi-bin/calendar/ folder since there are a couple of scripts to which I do need direct access, i.e. there would be no referring URL. Since those scripts are in different folders, this code does not affect them. I am happy to report that this works magnificently. In fact, you might be able to dispense with the original QUERY_STRING code above. For that matter, it should take care of all the "annoyances" since none of the scrapers or injectors have referring URLs. I've removed all that stuff from our .htaccess file — analysis to come... I'm also happy to report that it doesn't affect cronjobs for e-mail reminders of events. Cron uses a server path rather than a URL to run email_reminders.cgi. Further information
|
| Disclaimer || Copyright © 2002-13 BSA Troop 53 || Privacy statement | |