WebSpy is a Fastvue Product
  • Fastvue Home
  • Partners
  • Contact Us
WebSpy Vantage 3.0 Logo WebSpy Vantage 3.0 Logo WebSpy Vantage 3.0 Logo
  • Features
  • How it Works
  • Supported Log Files
  • Pricing
  • Support
  • Blog
  • Free Trial
Previous Next

Making Sensible Employee Internet Reports for the Modern Web (Part 5)

Update: The technique described in this series of blog articles has since been improved upon, and integrated into WebSpy Vantage 3.0 via the Origin Domain summary that is present in when analyzing any log files that contain URLs. We’ve called this feature Site Clean. It is also available in our separate Fastvue Reporter applications. See further details about our unique Site Clean engine.

In parts one, two, three and four of this series, we’ve investigated the challenges of reporting on the Modern Web when it comes to employee Internet reports and solved them using Custom Expressions in WebSpy Vantage. Now lets take a look at the results!

Final Custom Expression

In case you missed it in part four, the final custom expression we’re using for the Sensible Sites node is:

iif( ([MimeType] = "text/plain" || [MimeType] = "text/html" || [MimeType] = "text/html;charset=utf-8" || [MimeType] = "text/html; charset=iso-8859-1" ) && ([UrlCategoryName] != 'Web Ads' && [UrlCategoryName] != 'Edge Content Servers/Infrastructure'), domain([Site.Host]), iif( domain([Referrer.Host]) = '' || domain([Referrer.Host]) = '-', domain([Site.Host]), domain([Referrer.Host]) ) )

Charming isn’t it? This takes into account Mime Types, Referrer URLs and URL Categories to associate web resources (such as advertising, visitor tracking, CDNs, social sharing widgets and APIs) with the site the user was actually looking at.

Final Report

Let’s take a look at the new Sensible Sites summary with the latest Custom Expression above.

Employee Internet Reports - Sensible Sites Summary

As you can see, the actual sites I visited are no longer buried in the 7th and 9th spots.

To better see what’s going on, I added a ‘Site URL’ under the Sensible Sites node in the Report Template.

Here are the URLs being grouped into the techcrunch.com Sensible Site:

Employee Internet Reports - Sensible Site Drilldown

 

And here are the URLs being grouped into the facebook.com Sensible Site

Sensible Site Drilldown into Facebook

 

Instead of looking at screenshots, feel free to browse around the actual report itself.

You’ll also notice that 5min.com has been reduced from 124 MB down in the original report to just 15 KB,  akamaihd.net has gone from 3.3 MB down to 16 KB, and all the other sites have been reduced to less than 200 KB. Google.com is the only other prominent site, as I was also logged into my Gmail account.

Keep in mind, these employee internet report tests have been performed with a small data set and use case. Even though these ‘junk’ sites still appear in the report above, they are now more likely to be ‘drowned out’ in a large production network.

Issues

There are a couple of issues to be aware of with this Custom Expression approach.

IFrames

The ‘Junk’ sites in the report above are due to there being no Referrer URL to display, or  the Referrer itself is a Web Ad, Tracker, CDN, Widget or API. This occurs when these web resources pull in additional web resources themselves, which is common with embedded widgets that use IFrames, such as embedded YouTube videos.

For example, if you browse to mashable.com and watch an embedded YouTube video, you will see youtube.com in the Sensible Sites report, not mashable.com.

Unfortunately there is not much we can do for these situations using Custom Expressions, but this an area that WebSpy and Fastvue are looking forward to improving through code.

Performance

As you can imagine, Vantage now has extra work to do to calculate Sensible Sites. This will have a slight affect on memory usage, CPU and the length of time it takes your reports to run.

Other Log Formats

It is important to note that the Custom Expressions in these articles relate only to the Microsoft Forefront TMG Web Proxy schema in WebSpy Vantage. If you’re using a different format such as Cisco IronPort (or any of the other formats mentioned in part three), the names of the fields may be different, as well as the web category names.

To find the names of your fields, right-click in the Custom Expression edit box and select Insert field. You will then see the full list of fields you can use in your Custom Expression.

Finding Field Names and Expressions

To find the URL categories, run an ad-hoc analysis on your storage and go to the Category summary. By glancing through the list of categories, you should be able to find the equivalent ‘Web Ads’ and ‘Edge Content Servers/Infrastructure’ categories.

For example, if you’re analyzing Cisco IronPort’s W3C Access Logs, the equivalent web categories are called Advertisements and Infrastructure and Content Delivery Networks. The expression for Mime Type field is still [MimeType], but the Category expression is [Category] instead of [UrlCategoryName]. The Custom Expression for IronPort W3C Logs would therefore be:

iif( 
	(
		[MimeType] = "text/plain" || 
		[MimeType] = "text/html" || 
		[MimeType] = "text/html;charset=utf-8" || 
		[MimeType] = "text/html; charset=iso-8859-1" 
	) 
	&& 
	(
		[Category] != 'Advertisements, ' && 
		[Category] != 'Infrastructure and Content Delivery Networks'
	), 
	domain([Site.Host]), 
	iif( 
		domain([Referrer.Host]) = '' || 
		domain([Referrer.Host]) = '-', 
		domain([Site.Host]), domain([Referrer.Host]) 
	) 
)

Summary

Modern web sites are made up of many different components, most of which are hosted on different domains than the one you’re browsing. When analyzing logs from your web gateway, you see all the web requests to these sites and they clutter up your web reports. Some of the top culprits are advertising sites, CDNs, visitor tracking scripts, widgets and API calls.

This five part series has focused on a method we can employ to make sense of all this noise, and how to use Custom Expressions in WebSpy Vantage to implement it.

The method relies on your log format containing the Referrer URL, Mime Type, as well as the original URL.

By replacing the Site nodes in your report templates with the Custom Expression above, you can generate a report that more accurately reflects what sites users were actually going to in their web browser.

The Custom Expression will not completely eradicate these sites from your reports, but will greatly reduce the amount of traffic associated with them. Also be aware that Vantage will utilize more system resources to run reports with this custom expression.

So please go ahead and try out the custom expressions above and let us know how it goes in the comments!

Resources:

Final Report

Browse the final Sensible Web Report.

Vantage Report Templates

Download the Sensible Web Report Templates for Forefront TMG that I used when creating the reports above. The zip file includes two templates. One that drills down into each Sensible Site to show URLs and one without the drilldowns. I recommend using the one without the drilldowns if you’re running the report on a large data set.

Each report template starts with the normal Site Domain section for comparison, and then shows the Sensible Sites section using the last custom expression above. It also has the Debugging section so you can how the ‘Sensible Site’ is being calculated. Please be aware that the Debugging node in these reports will also be very resource intensive on large datasets.

To use the templates:

  1. Open WebSpy Vantage
  2. Go to  Reports and click Open Templates.
  3. Select the files in the zip and click Open.

Then go ahead and generate the reports on your storage. You can also Copy/Paste the Sensible Sites node into your other Forefront TMG reports.

 

See also:

  • Making Sensible Employee Internet Reports for the Modern Web (Part 4)
  • Making Sensible Employee Internet Reports for the Modern Web (Part 3)
  • Making Sensible Employee Internet Reports for the Modern Web (Part 2)
  • Making Sensible Employee Internet Reports for the Modern Web (Part 1)
  • The Best Way To Report On Websites

By Scott| 2018-04-30T07:18:55+00:00 October 3rd, 2013|Employee Internet Reports, How To, Log File Analysis, Microsoft Threat Management Gateway, Reports, Tips and Best Practices, Uncategorized, Vantage, Web Browsing Analysis, WebSpy|Comments Off on Making Sensible Employee Internet Reports for the Modern Web (Part 5)

Share This Story, Choose Your Platform!

FacebookTwitterLinkedinRedditTumblrGoogle+PinterestVkEmail

About the Author: Scott

Co-founder and Chief Product Officer at Fastvue. I spend my time making sense of the way firewalls and web gateways log traffic so that our customers don't have to!

Related Posts

  • WebSpy Vantage 3.0 Now Available

    December 13th, 2017
  • Analyzing Blocked Traffic in Log Files for Suspicious Activity

    March 27th, 2017
  • Creating a Remote Desktop Report (RDP Connections) with WebSpy Vantage

    February 15th, 2016
  • Distributing Web Activity Reports to Managers Using WebSpy Vantage

    February 3rd, 2016
  • Web Activity Reporting with Palo Alto Firewall Log Files

    December 15th, 2015

WebSpy Vantage Ultimate

  • Features
  • How it Works
  • Supported Log Files
  • Pricing
  • Support
  • Blog
  • Free Trial

Fastvue Quick Links

  • Fastvue Home
  • Partners
  • Contact Us

About WebSpy

WebSpy Vantage Ultimate is an extremely flexible, generic log file analysis and reporting framework supporting over 200 log file formats. WebSpy Vantage Ultimate is developed and maintained by Fastvue, a team of log analysis professionals dedicated to making sense of your log file data!
Copyright 2020 Fastvue Inc | All Rights Reserved | Privacy Policy | Terms Of Use | Cookie Settings
TwitterFacebookVimeo