Bain Posted January 10, 2009 Report Posted January 10, 2009 If you've ever checked out the active users page, you'll notice that there are several spiders from Google, Yahoo, etc. checking out the site. I just mistakenly wound up on the Active Users page, and noticed that three active users are twiceler cull spiders. I Googled these spiders, and only ten hits were returned. Does anyone know what these guys are about? Apparently they're some sort of spam. One of the Google hits contained the following post on a forum. Has anyone else been having trouble getting massive amounts of connections from a web spider called Twiceler? I'm getting hit literally thousands of times per "session", and it's been happening for months it would seem, after having went and checked my logs. Each time it goes into one of its fits, it comes from the same IP. But the IPs are hardly ever the same between each rampant attempt. They're not actually loading pages though; the bot, or at least certain ones, are getting hung at a 302 when my page redirects to have a www. in the url. Thank goodness they're not actually loading the root of my website that many times, or I'm pretty sure it'd be unusable. And I just came to the conclusion today that that probably means that it can't even get robots.txt (since it can't get past the redirection), because it's been blocked from there for a while now. But what's odd is that I've seen certain Twiceler bots actually crawling my site properly. I wrote to the company before, but nothing's been done. They asked for my log files despite me telling them exactly what the problem was with lots of info on attempts and IPs. I mean, the log file was full of nearly identical lines of identical attempts, sometimes with the same timestamp even since it happens so quickly (as I explained to and showed them), but I sent them the huge log of thousands of nearly identical lines anyway. I even checked Google a moment ago, and it appears I'm not the only one getting hit by this thing. How annoying is it that people can't control their bots, and don't even pull them down despite knowing they have a problem? I'd bet this happens to a lot of folks, many without even realizing it, since a lot of websites have redirections in place. I really don't know what to do about it short of banning the dozens of IPs I've seen so far, or telling them to add me to a block list the guy mentioned. But that's not a solution, especially if it's still affecting others. I wrote them again today because I'm pretty much growing tired of it (over 4000 attempts to crawl my site cluttering up my logs when I got up this morning), but I won't hold my breath after all the months that've gone by seeing the problem.
Bill Kibbel Posted January 11, 2009 Report Posted January 11, 2009 I'm pretty sure Twiceler is the indexing spider for a new search engine, cuil. It's reported to search pretty deep into pages of web sites.
Bain Posted January 11, 2009 Author Report Posted January 11, 2009 Originally posted by inspecthistoric I'm pretty sure Twiceler is the indexing spider for a new search engine, cuil. It's reported to search pretty deep into pages of web sites. I've read about Cuil. It was started by some people who left Google because they thought they had a better idea.
Terence McCann Posted January 11, 2009 Report Posted January 11, 2009 Originally posted by inspecthistoric I'm pretty sure Twiceler is the indexing spider for a new search engine, cuil. It's reported to search pretty deep into pages of web sites. A lot of collection spiders comb this web site. I'm waiting for "Boris the spider" to show up.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now