Update: The technique described in this series of blog articles has since been improved upon, and integrated into WebSpy Vantage 3.0 via the Origin Domain summary that is present in when analyzing any log files that contain URLs. We’ve called this feature Site Clean. It is also available in our separate Fastvue Reporter applications. See further details about our unique Site Clean engine.
Through parts one, two, and three of this series, the challenges of creating employee Internet reports for the Modern Web have been explained, a solution has been proposed, and we’ve implemented it using Custom Expressions in WebSpy Vantage. In this fourth part of the series we will look at some ways we can further improve the Custom Expression.
Tweaking the Sensible Sites Expression
To see why these ‘junk’ sites are still appearing, lets add a new node to the report called ‘Debugging’ that shows the Sensible Site, along side the original domain and the referrer domain. We’ll also show the Mime type and URL Category as well.
- Go back to your Report template and duplicate the Sensible Sites node in your report with copying / pasting:
- Right-click the Sensible Sites section and click Copy.
- Then click the top/root node (Sensible Site Report node) and click Paste.
- Right-click the Sensible Sites section and click Copy.
- Now double-click the Second ‘Sensible Sites’ node that you just pasted
- On the General page, rename the node to Debugging.
- Still on the General page under the columns section, click Add | Key.
- Select Site Domain and click OK.
- Click Add | Key again. Select Referrer Domain and click OK.
- Click Add | Key again. Select Mime Type and click OK.
- As I’m using Forefront TMG, I’m also going to check out the URL Categories for the sites. If you are too, click Add | Key again. Select URL Category and click OK.
- Rearrange the new key columns to push them all up to the top of the column listing.
- Click Next and sort the node by Sensible Sites Ascending. This will sort the list alphabetically by the sensible site.
- Click OK to save the new node to your report.
Now run the report again. Your new report will have a new section called ‘Debugging’ that looks like this.
Fixing Blank Referrer URLs
You can see why the third site in my Sensible Sites report was a blank. There are 49 hits in my data set where the Referrer URL is blank. We can modify the custom expression to show the original requesting URL when the Referrer URL is blank.
Here’s the custom expression to show the original requesting URL when the Referrer URL is blank:
iif([MimeType] = "text/plain" || [MimeType] = "text/html" || [MimeType] = "text/html;charset=utf-8" || [MimeType] = "text/html; charset=iso-8859-1", domain([Site.Host]), iif(domain([Referrer.Host]) = '' || domain([Referrer.Host]) = '-', domain([Site.Host]), domain([Referrer.Host])))
In other words, if the Mime Type is any one of our four ‘normal page’ Mime Types (as discussed in part two), show the original Site Domain, otherwise if the Referrer Domain is blank or ‘–‘, show the original Site Domain, otherwise show the Referrer Domain.
Phew… Got that? 🙂
Even though this new expression places more of the ‘junky’ sites back into the report, you’ll notice that the actual sites are still in a dominating position.
You may be tempted to leave the Custom Expression the way it was so that these ‘junky’ sites get grouped under the blank site, however I recommend against that.
Embedded YouTube videos unfortunately do not include a Referrer URL for the streaming media content, and nor do many other embedded web page elements that use iFrames. Also other applications such as Windows Updates do not include a Referrer URL. These important applications will therefore be hidden under the blank referrer, if you do not use the new expression above.
Improving with URL Categories
You may also notice some other situations where it makes sense to show the Referrer URL even when the Mime Type is text/plain or text/html. For example:
In this case, while browsing techcrunch.com, my browser requested a resource from the advertising site atwola.com that has the Mime Type text/html. This happens. For example, some normal HTML is required to display a facebook ‘Like’ button on a page, in addition to scripts and images.
My Forefront TMG server has correctly classified this hit as Web Ads. We can improve the custom expressions to always show the Referrer URL for Web Ads.
iif( ([MimeType] = "text/plain" || [MimeType] = "text/html" || [MimeType] = "text/html;charset=utf-8" || [MimeType] = "text/html; charset=iso-8859-1" ) && [UrlCategoryName] != 'Web Ads', domain([Site.Host]), iif( domain([Referrer.Host]) = '' || domain([Referrer.Host]) = '-', domain([Site.Host]), domain([Referrer.Host]) ) )
Lets take this one step further to include the URL Category for CDNs. Forefront TMG categorizes CDNs as Edge Content Servers/Infrastructure.
iif( ([MimeType] = "text/plain" || [MimeType] = "text/html" || [MimeType] = "text/html;charset=utf-8" || [MimeType] = "text/html; charset=iso-8859-1" ) && ([UrlCategoryName] != 'Web Ads' && [UrlCategoryName] != 'Edge Content Servers/Infrastructure'), domain([Site.Host]), iif( domain([Referrer.Host]) = '' || domain([Referrer.Host]) = '-', domain([Site.Host]), domain([Referrer.Host]) ) )
For those playing at home, this basically says, if the Mime Type is any one of our four ‘normal page’ Mime Types AND the URL Category is not Web Ads AND the URL Category is not Edge Content Servers/Infrastructure, show the original Site Domain, otherwise if the Referrer Domain is blank or ‘–‘, show the original Site Domain, otherwise show the Referrer Domain.
The great thing about including URL Categories in the expression is that this gives you a way of including sites into the mix. You can use Forefront TMG’s URL Overrides to re-classify sites as Web Ads or Edge Content Servers/Infrastructure to ensure that the Referrer URL is shown whenever possible.
Let’s rerun the report and check out the Debugging section again.
In the Screenshot above, I’ve highlighted the rows where the URL Category is either Web Ads or Edge Content Servers/Infrastructure and the Mime Type is one of the four ‘normal page’ Mime Types, and you can see that the Sensible Site is now using the Referrer URL correctly.
So lets check out our report in the fifth and final part of this series.
See also:
- Making Sensible Employee Internet Reports for the Modern Web (Part 3)
- Making Sensible Employee Internet Reports for the Modern Web (Part 5)
- Making Sensible Employee Internet Reports for the Modern Web (Part 2)
- How to Create Anonymous Internet Reports in Vantage
- How to Categorize Search Terms Typed Into Search Engines