Types of Non-Functional Tests

There are many varying definitions for different types of non-functional tests.  While we are not attempting to be definitive, Testing Performance would like to state what we mean by different types of non-functional tests.

 

The Performance Test

This is a test which measures, or determines, the performance of an application or an application component.  Of all non-functional testing, this is probably the most commonly executed type of test.
  

The overall purpose of a performance test is to determine if the application will be functionally correct even at high workloads.
The objectives of a performance test would be something along the lines of:

    * Determine if the application can support the expected workload
    * Find and resolve any bottlenecks

It is very difficult (i.e., time consuming and expensive) to build and replicate in a test environment an exact simulation of the workload that the application will be expected to process in production.  It is much easier (i.e., quicker and cheaper) to build an approximation of the workload.  Often the 80:20 rule is used to persuade project managers that an approximation makes more sense.  That is, 80% of the workload is generated by 20% of the functionality.  Of course, no two applications are the same, in some we can easily achieve 90:10, in others it is more like 70:30.  Careful analysis by the performance tester will help determine the volumetrics for the application and therefore which functions will be included in a performance test.
 

Using the 80:20 rule is in essence compromising the testing effort.  While some or most performance issues will be detected, performance issues associated with functionality not included in the performance test could still cause problems on release to production.  Further steps can be made to minimise this possibility, including:

    * Manually key functions while a performance test is executing
    * Observe and measure performance, especially database performance, in functional test environments

Once an approximation of the production workload has been determined and agreed, the performance tester works towards building the automation into a workload that can be executed in an orderly and controlled fashion.  The work early on in the performance testing process becomes a good foundation on which to analyse and publish results, ultimately determining if the application can or cannot meet the specified objectives.
 

Performance tests usually need to be run multiple times as part of a series of test tune cycles.  Where a performance bottleneck is detected, further tests are run with an ever increasing amount of tracing, logging or monitoring taking place.  When the cause of the problem is identified, a solution is devised and implemented.  Again, the performance test is re-run to ensure the performance bottleneck has been removed.
 

It is of course quite difficult to determine how many performance issues will be detected as part of a performance testing exercise.  The table below is a simplistic guide to the number of performance testing cycles that may be executed depending on the origins of the application.

 
 

Applications origins

Number of test tune performance testing cycles required for first release

Number of test tune performance testing cycles required for a maintenance drop in the first 6 months after first release

Number of test tune performance testing cycles required for a maintenance drop more than 6 months after first release

An off the shelf package with a minimum of customisation

4 3 2

An off the shelf package heavily customised

6 3 2

A bespoke application

10 6 3

 

The Stress Test


This is a test which determines the breaking point of an application or an application component.  Stressing the application implies that the workload which the system will be subjected to is in excess of the maximum peak workload that the application would have to support in production.

The overall purpose of a performance test is to determine what the breaking point of the application is.

The objectives of a performance test would be something along the lines of:

    * Generate an ever increasing workload until breaking point is achieved
    * Find and resolve any bottlenecks

What does the breaking point look like?  How do you know if you have reached it?  Some breaking points are obvious.  Response times increase and throughput of workload whether measured as hits per second, transactions per second, network utilisation or some other measure, decreases to zero.  One or more components have stopped working and the workload is no longer being processed.  This type of failure can cause other components to break, as suddenly they become overwhelmed by rapidly building queues.  An example:

The database log fills up, the log archive job has fallen behind and no spare logs are available.  The log switch process is held waiting on the log archive job to complete.  The database stops dead in the water, refusing do to anything before it can begin logging again.  The application server queues requests to the database in an orderly fashion.  With the database not responding, the queue builds very quickly reaching max queue length.  Being unable to queue any further requests for the database, the application server begins to reject requests from the web server.

The performance tester must be aware of the sequence of events through careful monitoring and analysis.  Increasing the maximum size of the queue that feeds requests to the database will not solve this problem.  There are a number of possible solutions including:

   1. Looking at the priority of the archive job
   2. Look at the size and speed of the disks holding archive jobs
   3. Increase the number of logs available to the database
   4. Look at tuning the database parameters for logging such as:

  •   Log buffer size
  •   Size of the log buffer pool
  •   Size of the log files themselves

Other breaking points are less obvious.  An increase of the workload generated does not cause an increase in the workload progressed.  The only evidence that the generated workload has increased is that response times have begun to lengthen.  Again, careful analysis of the system software and hardware is required to find the area which is unable to process any more workload. An example:

Messages are building up in the application server within the queue that feeds the database.  The queue feeds just one thread (connection with a database) which is constantly active.  Requests for the database are arriving faster than requests are being processed.  Increasing the workload generated causes more messages onto the queue feeding the single thread or connection with the database.

There is really only one solution to this - increase the number of threads between the application and the database server.

What are the benefits of Stress testing?  When a component breaks, that component is essentially the weakest link in the architecture.  Tuning at high workloads can improve performance, stability and response times at lower workloads. Stress testing can provide valuable information if workloads were to unexpectedly increase in the future.

When have you finished your stress testing?  Look at CPU utilisation.  If one or more of the servers is at or near 100%, then you have reached the final bottleneck, the CPU bottleneck.  How do you get around this?  This is the one instance where throwing more CPU at the application will help. 

 

The Benchmark Test

This is testing that is normally carried out in conjunction with tuning.

In order to understand if a tuning change has had a positive affect, it is necessary to run a repeatable load test that each and every time it is executed produces statistics that vary by as small a percentage as possible.

For instance, if a tuning change results in a two percent improvement yet the benchmark test has a natural variation of five percent, then it will not be possible to determine that the tuning change was successful.

One way of getting around this is to test three tuning changes in isolation.  If the improvement or degradation is less than the benchmark test margin of error, then a fourth test can be run that will measure the impact of all three tuning changes being implemented.  If each tuning change on average produced a two percent improvement, then the overall improvement of six percent should be recognisable.

On the whole, if natural variation of less than three percent can be obtained, then you have an excellent benchmark test.

There is a debate in load testing circles as to whether a benchmark test should have any degree of randomness about it or not.  While this can vary from one test to the next, in general randomness is more realistic so overall the benefits to the testing process are greater with randomness in place.  For example, if searching on the surname Smith, the cost will be reasonably similar if searching on the surname Jones. The use of a random surname when searching for a customer probably will not affect the consistency of the benchmark test. 

The benefit to the test is that the amount of work that the database has to do in order to obtain the database rows is more realistic so that the overall results are of a better quality.

 

The Load Test

Load Testing is a very generic term.  In many respects it is the term that best represents the activity of gererating a large representative workload onto an application.

There is often confusion when Load Testing is being executed.  Errors will often occur, especially as the Load Test is really starting to get going.  In the situation whereby there were not many errors at the start of the Load Test, but the frequency of errors is increasing as the Load Test workload increases, there are two possibilities:

1. The rate of errors is increasing proportionally as the load increases
2. The rate of errors is increasing faster than the rate that the load is increasing

There is often some kind of expectation from the project that errors should not occur in a load test and that if errors are occuring then it is a problem with the load test automation or maybe even the load test tool itself.  This confusion can extend to the load testers themselves.  They can see that when they run a load test at a low workload, the load test is stable and works well.  When the load test is executing at a high workload, any errors can be baffling.

The explanation is simple.  When a load test is running at a high workload, the application can stop responding corretly to a request.  The errors will often look like a functional problem, for instance when running a web load test, an element such as a button may be missing.  Without the button available, the automated test script cannot navigate to the next screen.

When running a Load test, whether it be a web load test, a citrix load test or some other protocol, expect errors to appear.  When these errors start appearing, they are symptomatic of a performance bottleneck.