Toomre Capital Markets LLC

Real-Time Capital Markets -- Analytics, Visualization, Event Processing, and Intelligence


Dealing with Aggressive Spiders and Bots on Drupal Websites

IP address maps to a spider computer hosted on the Yandex Enterprise Network. About every minute and a half or so, a spider process on that computer (still) attempts to retrieve yet another piece of content from the Toomre Capital Markets ("TCM") website. Many of the pages this spider requests either do not exist or are part of the no-follow rule section in the robots.txt file. This spider certainly is aggressive and ignores the rules that many other bots seem to respect.

A few months ago, after watching this particular malformed spider consume more five percent of the total hosting bandwidth used that month, we had had enough. Hence, some modifications were made to a custom Drupal module running on the TCM website concerning visitor information (including various bots) and what specific information was being sought. Now as a result, when this Yandex spider come looking for a page like "search/node/facebook", it somehow ends up redirected to a page from a third-party website.

One would think that the person(s) controlling the spider would get the message after some fifty thousand plus attempts to get information from the TCM website. Somehow a human user might wonder why attempts to retrieve information on structured finance products, risk management and/or MATLAB topics always results in a page full of "gay anal porn" or other similar material. Until then, the TCM website might well become a frequent referrer to certain pornographic websites.

Microsoft IE Browser Is So Frustrating!!

The Microsoft Internet Explorer browser in its various implementations is so frustrating to deal with, especially in its various non-standard ways of rendering XHTML elements and CSS mark-up. Working with Internet Explorer version 7 during the past few days, I am reminded well why we here at Toomre Capital Markets LLC ("TCM") fled first to Firefox and then more recently to Google's Chrome as our web browser of choice. Unfortunately, though, slightly more than sixty percent of this site's visitors still use Microsoft IE for browsing content here. Hence, website changes still need to be correctly rendered with IE as well.

Over the past few weeks, we have been doing quite a bit of working on the plumbing so to speak that enables this website to function. At its core, the public side of this website relies on the excellent Content Management System known as Drupal. The core software was upgraded to the most recent Drupal release 6.12 and all of the more than sixty or so modules were upgraded as well. We also have begun implementing a number of new features like the ability to print content in a printer friendly format, an ability to e-mail content to professional contacts and the ability to share content with various social network sites.

As part of that overhaul, we also have rewritten the core Drupal theme that will be used to display various website pages to the user. That new theme is working really well in both the Chrome and Firefox browsers. However, of course, the Microsoft IE browser has other thoughts. It appears not to recognize some CSS selectors or maybe not implement them at all. Other CSS selectors seem to have padding and/or margin issues that are throwing neat rows of graphical elements out of whack.

Hopefully, it will only take a short while to track down solutions to each of these Microsoft IE specific issues. Until then, we will hold off on putting the new theme into daily production. Thank you for bearing with us during this upgrade and redesign process. We are off to find some tools that might assist with the debugging of what is going on with the IE rendering engine. Website Upgrade

During the past week, Toomre Capital Markets LLC ("TCM") has been working behind the scenes to upgrade the underlying software that is used to dynamically present web pages to the reader based on various criteria and preferences. The initial phase of this upgrade is now largely complete. As a result, the complete content of website again is now available and the website has changed its appearance.