1

Continuous Delivery Digest: Ch.9 Testing Non-Functional Requirements

 2 years ago
source link: https://blog.jakubholy.net/2015/01/08/continuous-delivery-digest-ch-9-testing-non-functional-requirements/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Continuous Delivery Digest: Ch.9 Testing Non-Functional Requirements

January 8, 2015

(Cross-posted from blog.iterate.no)

Digest of chapter 9 of the Continuous Delivery bible by Humble and Farley. See also the digest of ch 8: Automated Acceptance Testing.

("cross-functional" might be better as they too are crucial for functionality)

  • f.ex. security, usability, maintainability, auditability, configurability but especially capacity, throughput, performance
  • performance = time to process 1 transaction (tx); throughput = #tx/a period; capacity = max throughput we can handle under a given load while maintaining acceptable response times.
  • NFRs determine the architecture so we must define them early; all (ops, devs, testers, customer) should meet to estimate their impact (it costs to increase any of them and often they go contrary to each other, e.g. security and performance)
  • das appropriate, you can either create specific stories for NFRs (e.g. capacity; => easier to prioritize explicitely) or/and add them as requirements to feature stories
  • if poorly analyzed then they constrain thinking, lead to overdesign and inappropriate optimization
  • only ever optimize based on facts, i.e. realistic measurements (if you don't know it: developers are really terrible at guessing the source of performance problems)
A strategy to address capacity problems:
  1. Decide upon an architecture; beware process/network boundaries and I/O in general
  2. Apply stability/capacity patterns and avoid antipatterns - see Release It!
  3. Other than that, avoid premature optimization; prefer clear, simple code; never optimize without a proof it is necessary
  4. Make sure your algorithms and data structures are suitable for your app (O(n) etc.)
  5. Be extremely careful about threading (-> "blocked threads anti-pattern")
  6. Create automated tests asserting the desired capacity; they also will guide you when fixing failures
  7. Only profile to fix issues identified by tests
  8. Use real-world capacity measures whenever possible - measure in your prod system (# users, patterns of behavior, data volumes, ...)

Measuring Capacity

There are different possible tests, f.ex.:
  • Scalability testing - how does the response time of an individual request and # concurrent users changes as we add more servers, services, or threads?
  • Longevity t. - see performance changes when running for a long time - detect memory leaks, stability problems
  • Throughput t. - #tx/messages/page hits per second
  • Load t. - capacity as functional of load to and beyond the prod-like volumes; this is the most common
  • it's vital to use realistic scenarios; on the contrary, technical benchmark-style measurements (# reads/s from DB,..) can be sometimes useful to guard against specific problems, to optimize specific areas, or to choose a technology
  • systems do many things so it's important to run different capacity tests in parallel; it's impossible to replicate prod traffic => use traffic analysis, experience, intuition to achieve as close a simulation as possible

How to Define Success or Failure

  • tip: collect measurements (absolute values, trends) during the testing and present them in a graphical form to gain insight into what happened
  • too strict limits will lead to intermittent failures (f.ex. when network overloaded by another operation) X too relaxed limits => won't discover a partial drop in capacity =>
    1. Aim for stable, reproducible results - isolate the test env as much as possible
    2. Tune the pass threshold up once it passes at a minimum acceptable level; back down if it starts failing after a commit due to well-understood and acceptable reason

Capacity-Testing Environment

  • replicates Prod as much as possible; extrapolation from a different environment is highly speculative, unless based on good measurements. "Configuration changes tend to have nonlinear effect on capacity characteristics." p234
  • an exact replica of Prod sometimes impossible or not sensible (small project, capacity little important, or when prod has 100s of servers) => capacity testing can be done on a subset of prod servers as a part of Canary Releasing, see p263
  • scaling is rarely linear, even if the app is designed for it; if test env is a scaled-down prod, do few scalings runs to measure the size effect
  • saving money on a downscaled test env is a false economy if capacity is critical; no matter what it won't be able to find all issues and it will be expensive to fix them later - see the storu on p236

Automating Capacity Testing

  • it's expensive but if important, it must be a part of the deployment pipeline
  • these tests are complex, fragile, easily broken with minor changes
  • Ideal tests: use real-world scenarios; predefine success threshold; relatively short duration to finish in a reasonable time; robust wrt. change to improve maintainability; composable into larger-scale scenarios so that we can simulate real-world patterns of use; repeatable and runnable sequentially or in parallel => suitable both for load and longevity testing
  • start with some existing (robust and realistic) acceptance tests, adapt them for capacity testing - add success threshold and auditability to scale up
Goals:
  1. Creat realistic, prod-like load (in form and volume)
  2. Test realistic but pathological real-life loading scenarios, i.e. not just the happy path; tip: identify the most expensive transactions and double/triple their ratio
To scale up, you can record the communication generated by acceptance tests, postprocess it to scale up (multiply, insert unique data where necessary), reply at high volume
  • Question: Where to record and play back:
    1. UI - realistic but impractical for 10,000s users (and expensive)
    2. Service/public API (e.g. HTTP req.)
    3. Lower-level API (such as a direct call to the service layer or DB)

Testing via UI

  • Not suitable for high-volume systems, when too many clients are necessary to generate a high load (partially due to UI client [browser] overhead); also expensive to run many machines
  • UI condenses a number of actions (clicks, selections) into few interactions with back-end (e.g. 1 form submission) that has a more stable API. To answer: are we interested in performance of the clients or of the back-end.
  • "[..] we generally prefer to avoid capacity testing through the UI." - unless the UI itself or the client-server interaction are of a concern

Recording Interactions against a Service or Public API

  • run acceptance tests, record in/outputs (e.g. SOAP XML, HTTP), replace what must vary with placeholders (e.g. ${ORDER_ID}), create test data, merge the two
  • Recommended compromise: Aim to change as little as possible between instances of a test - less coupling between the test and test data, more flexible, less fragile. Ex.: unique orderId, customerId but same product, quantity.

Using Capacity Test Stubs To Develop Tests

In high-performance systems testing may fail because the tests themselves do not run fast enough. To discover this case, run them originally against a no-op stub of the application.

Adding Capacity Tests to the Deployment Pipeline

  • beware that warm-up time may be necessary (JIT, ...)
  • for known hot spots, you can simple "guard tests" already to the commit stage
  • typically we run them separately from acceptance tests - they've different environment needs, perhaps are long-running, we want to avoid undesirable interactions between acceptance and capacity tests; acceptance test stage may include a few performance smoke tests

Other Benefits of Capacity Tests

Composable, scenario-based tests enable us to simulate complex interactions, together with prod-like env we can
  • reproduce complex prod defects
  • detect/debug memory leaks
  • evaluate impact of garbage collection (GC); tune GC
  • tune app config and 3rd party app (OS, AS, DB, ...) config
  • simulate worst-day scenarios
  • evaluate different solutions to a complex problem
  • simulate integration failures
  • measure scalability with different hardware configs
  • load-test communication with external systems even though the tests were originally designed for stubbed interfaces
  • rehears rollback
  • and many more ...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK