A Complete Guide to Useful Reverse Proxy Reporting

Reverse proxy reporting (using WebSpy Vantage Ultimate) is a great way to gain insight to how the Internet is using your published web sites or web applications.

Forward proxy reporting is all about the users accessing content on the Internet from within your corporate network. Reverse proxy reporting is the opposite. When you have a web site that is hosted on your internal or DMZ network and you publish it to the Internet through a device like Microsoft Forefront TMG or Sophos UTM’s Web Application Firewall feature then you have a reverse proxy scenario.

Proxy Logs vs Web Analytics Apps

One question that often comes up is why should one use reverse proxy logs to analyse a site’s usage rather than something like Google Analytics. There are numerous reasons, but if your published application or business application is not simply a public website the following are the most important reasons:

  • Proxy logs show you what is really happening, first-hand
  • No need to include any external content or javascript on your site to collect the information you require
  • The site remains single source
  • The site remains completely private
  • You get a holistic picture of multiple sites on the same network

Sounds good, how do I set up Reverse Proxy Reports?

This guide will step you through the basics to get going. We always recommend working with a small manageable dataset to speed up the development of filters, templates and reports. If you are following along, we suggest importing a single log file or a limited amount of files. That said, if the sample is too small you might not get good visibility, but if it’s too big it will slow down your testing.

Filter by Source Network

Some environments are fortunate enough to have a separate or discreet forward and reverse proxy setup. That is, the forward proxy and the reverse proxy are not on the same device, such as a single Microsoft Forefront TMG Server instance. Even if your environment is consolidated, having both the forward and reverse proxy on the same device, you can still make use of this guide. The key is knowing how to filter by source network.

Reverse proxy traffic always has the Internet as the source. In Microsoft Forefront TMG, that is defined as the External Network. By isolating traffic that was initiated on the External Network you eliminate other sources of traffic such as internal, DMZ or VPN.

Since you would generally be running very different reports for forward and reverse proxy traffic, it makes sense to create a separate Storage in WebSpy Vantage for reverse proxy reporting. Specifying the source network in an Import Filter makes sure that your Storage only contains the reverse proxy log data, which makes further filtering and analysis simpler and faster.

Create a Storage and import your log files

Your raw log files need to be imported into a Storage, which will act as a “database” against which the reports will be generated.

  • Open WebSpy Vantage Ultimate, and select the Storages tab
  • On the left-hand side click New Storage
  • Specify a Name and click OK, then Next
  • Input Type: Local or networked files or folders
  • Loader Selection: Pick your proxy type (Microsoft FTMG in my case), then click Next
  • Input Selection: Click Add and specify the path to a log file or folder of log files
  • Filters: Select the Forefront TMG Web tab and click Add for a new Filter
  • For the Summary select Source Network, and select the Include option (note – The Source Network field is a field unique to Microsoft Forefront TMG. For other reverse proxy formats, you might like to use Source Address summary together with the Subnets Alias)
  • Click Add Type External as the value.
  • Click OK twice to return to the Import Wizard
  • Click OK to complete the wizard and start importing your log files.

Reverse Proxy Reporting - Importing Logs

Summary analysis

Once your (filtered) log data has been imported you can have a look to see if everything is as you expect it to be. The Summary Analysis in Ad-hoc mode will also show you what log data is available for use in generating templates and reports.

  • From the WebSpy Vantage console select the Summaries tab
  • Click New Analysis
  • Select your Reverse Proxy Storage created earlier and click Next
  • Select the Ad-hoc Analysis Type and click OK (you can ignore the filters and summaries tab)

To verify that the import filter was successful check the Source Network node and confirm that only External is listed.

Identifying your published web sites

Using the same ad-hoc Summary Analysis, check the Rule node and verify that your “publishing rules” are the only ones listed. Having a separate publishing rule for each site makes analysis and reporting a little easier because each rule can be considered a ‘site’. If however, you have multiple sites published through a single rule you can use the Site Name field to differentiate between sites.

The inverse of this is when you have a single application that is published on multiple servers with multiple rules. In this case you would want to combine the rules for reporting purposes. The best way to do this is by using an Alias.

  • Select the Aliases tab and click New Alias
  • Name: Rule Consolidation
  • Click the Scheme down arrow and select Forefront TMG Web
  • Scroll down the available fields and select Rule
  • Click the Right Arrow
  • Click OK

Reverse Proxy Reporting - Create Alias

Now that the Alias has been created we can define groups and add values to them.  The easiest way is to simply use the Analysis we already have open.

  • Return to the Summaries tab and select the Rule summary
  • Change the Aliases view from ‘No Alias’ to ‘Rule Consolidation
  • Right-click the first rule you want to consolidate and select Add to Alias
  • In the As: field specify the consolidated name you would like to use
  • Click OK
  • You will see the name change to be the same as the alias value
  • Select the second rule you want to consolidate and Right-click | Add to Alias
  • This time you can simply select the existing value from the drop down list
  • Click OK to group the second rule into the same alias as the first.
  • Repeat for all the relevant rules

Populate Alias

This gives you a single rule for the whole application published across multiple rules or servers. The alias not only cleans up your view, but it can also be used when specifying filters and generating reports. The alias consolidates the log data, but it is still available if you want to break it up again.

Selecting the alias will trigger a drill down and from here you will be able to see the individual rules again if you change the Alias view to ‘No Alias’.

Consolidated Rules

In practice you would use both methods. You may for instance want to get an overall picture of your Exchange web site usage. In many cases this would be published with a single rule, but it will contain multiple site names such as webmail, autodiscover, legacy and so on. If you want to get more details on which components are used, you would use Site Names without an alias.

Useful information for reverse proxy reports

Now that we have a method to isolate the various published web sites or applications, we can investigate the kind of information you might want to report on. In the reverse proxy scenario you typically want to know:

  • Who
  • What
  • Where
  • When

Who

Since the usernames are generally not known, you can determine who the user is by looking at the Source IP and the User Agent fields. This helps identify unique users and the device types they are using.

What

Knowing what your user base is accessing can tell you many of things about your application. The following fields will give you better insight:  Site URL, MIME Type, Operation (GET, POST) and Protocol (HTTP or HTTPS).

When

Typically, this would be information requiring date and time so the fields of interest here would be  Date, Day of Week and Hour.

Where

Knowing where your site’s users are coming from can be very useful, especially if you are trying to measure the effectiveness of advertising campaigns. The fields of interest here would include Referrer Domain and Referrer URL.

The ad-hoc analysis would have shown you that there are a huge number of fields available, most of them adding little or no value to reverse proxy reporting.

To consolidate things, but still give you enough information to work with, we can reduce the required summaries down to the following:

  • Source IP
  • User Agent
  • Site URL
  • Mime Type
  • Operation
  • Protocol
  • Date
  • Day of Week
  • Hour
  • Referrer domain
  • Referrer URL
  • Rule
  • Site Name

Create a Reverse Proxy Report Template

Having all of the data in an analysis is great for doing a ad-hoc drill investigations, but most likely you will need the information to be reduced and condensed into an easy to digest report. Next we are going to set up a simple report template that will show the key pieces of information per site:

  • Select the Reports tab and click New Template
  • Specify a Name and select Forefront TMG Web as the Schema
  • Report Type should be Analysis
  • Click OK

Now that a blank template has been created, we can add the fields we are interested in. The steps below add seven nodes to the template. This looks like a lot to do but it is really quick and easy. The template will also be available for download below so you can skip this step if you’re using Microsoft Forefront TMG.

  • Double click the Template to edit it
  • Click New Node
  • Name: Web Sites
  • Summary: Rule
  • Alias: Rule Consolidation (created above)
  • Sort by: Size
  • OK
  • Select the Web Sites Node and click New Node to create a sub node
  • Name: Day of Week
  • Summary: Day of Week
  • Alias: Weekdays
  • Sort By Day of Week | Ascending
  • OK
  • Select the Web Sites Node and click New Node to create a sub node
  • Name: Time
  • Summary: Hour
  • Alias: Hours
  • Sort By Hours | Ascending
  • OK
  • Select the Web Sites Node and click New Node to create a sub node
  • Name: Protocol
  • Summary: Protocol
  • Alias: None
  • Sort By Size | Descending
  • OK
  • Select the Web Sites Node and click New Node to create a sub node
  • Name: MIME Type
  • Summary: MIME Type
  • Alias: None
  • Sort By Size | Descending
  • OK
  • Select the Web Sites Node and click New Node to create a sub node
  • Name: Source IP
  • Summary: Source IP
  • Alias: None
  • Sort By Size
  • Click Advanced
  • Select Having for TOP N Select 25
  • OK
  • Select the Web Sites Node and click New Node to create a sub node
  • Name: User Agent
  • Summary: User Agent
  • Alias: User Agents
  • Sort By Size
  • Click Advanced
  • Select Having and for TOP N Select 25
  • OK
  • Select the Web Sites Node and click New Node to create a sub node
  • Name: Referrer Domain
  • Summary: Referrer Domain
  • Alias:
  • Sort By Hits

By the time you are done you should have something that looks like this:

Reverse Proxy Reporting - Template Editor

Using the Reverse Proxy Template in an Analysis

The template we created can be used to generate a report (Word, PDF, CSV etc), but the same template can also be used for doing a Summary Analysis. This is a great way to check your template structure.

  • Select the Summaries tab
  • Click New analysis
  • Select your reverse proxy Storage
  • Analysis Type : Template-based Analysis and select your new Reverse Proxy template
  • Click OK to load up the Analysis

TemplateAnalysis

You can now browse through the Analysis and you will see that it is much cleaner to look through than the default ad-hoc analysis we ran earlier. A nice feature of the running the template on the Summaries tab as a template-based analysis, is that you can still drill down past the bounds of what you defined in the report template. For instance, you can click the http protocol to get more details on the resources not served over HTTPS.

SummaryView

When you click on an item to drilldown into it, all of the available summaries are displayed. Also note in the navigation bar at the top, your filters are being cumulatively applied.

SummaryDrillDown

Distributing Reverse Proxy Reports

Next you can generate a report in one of the many different formats available within WebSpy Vantage. These reports can then be automatically emailed out on a schedule, or could be published and made available through the web module.

  • Select the Reports tab
  • Locate the Reverse Proxy Template and click the Generate Report button
  • Select the Storage to use
  • Select a Format (Web Document, PDF, Word, CSV, or Text. I’m using Web Document)
  • Publish: Specify where you want it to go
  • Documents: Since I want to give each web application owner his own report, I select ‘Create a Separate report documents for each’: Rule with the Alias Rule Consolidation. 
  • Filter: I want to run the report on the past week, so click AddRelative Date Filter | Last 7 Days
  • Specify an email address to send all reports to (if desired)
  • OK to start the report

Once the task is done you will have a separate report for each Web Application for the past week only.

ReportPerApp

The Results!

Each report contains loads of useful information. We can determine that the bulk of the site’s traffic peaks between 9 and 10 AM. The bulk of the users are connecting via Safari browsers, and the biggest external sources for users to connecting to the site are google.co.za followed by facebook.com, while only 87 hits came from pinterest.com (30th on the list).

By looking at the source IP’s, we can determine that a lot of users of the site stick around and click through multiple links and download a fair amount of content.

Applying some knowledge about the site WRT to HTTP vs HTTPS content, we can tell that a high percentage of users are actively logging in, not simply browsing without converting to actual sales.

Furthermore, because we have a holistic picture of all the sites being published, we know that this particular site consumes 70% of all the available bandwidth to the hosting site.

This level of information can be extremely useful in retail settings, but can be equally important for corporate web applications. Usage patterns can help guide you to determining the best windows for system maintenance etc.

Since reports templates are almost indefinitely customizable, you can tune them to show the exact data you are interested in.

ReportSample

What next?

This was an introduction on how to explore the information available to you in a reverse proxy log file. Typically you would want to import more log files into your Storage, and report across larger time frames. At this point the summary and report generation time will start to increase, but the bigger picture becomes more accurate and valuable.

You may also want to tweak your template to show just the right level of information for your report audience.

Lastly, you should automate log importing, report generation and publishing, and purging of log data older than what is required. All of this is easily accomplished via the Tasks tab in WebSpy Vantage.

I hope this basic ‘start to finish’ tutorial was helpful in not just creating a basic reverse proxy report, but also in showing you some of the awesome reporting possibilities with WebSpy Vantage.

Vantage supports log files from over 200 popular network devices, and with its comprehensive aliasing feature, is the most flexible log analysis and reporting framework you’ll find.

Etienne Liebetrau

Technical Specialist at FixMyITSystem
Based in Cape Town, South Africa, Etienne is an IT Professional working in various environments building, testing and maintaining systems for a large national retail chain. An IT professional since 1996 Etienne has worked in various environments and is certified by Comptia, Dell and Microsoft. Etienne is the technical blogger and primary technical consultant for FixMyITsystem.com a solutions provider company based in Cape Town with a global client base.