Safety/Security and Extensibility/Scalability in Software System Design and Architecture

Introduction

Security/safety and extensibility/scalability are two pairs of quality attributes that is of great importance in software architecture. This articles discusses about their definitions and comparisons, followed by their respective general scenarios, which are complemented with typical concrete scenarios to give reader a clearer and concrete picture. Finally, some strategies and tactics to improve each attribute are given, each with description, benefits and penalities.

It is the first assignment for course Software Architecture of NJUSE.

Definitions and Comparisons

In this part, the definitions of, relationships and differences between selected pairs of quality attributes (security/safety and extensibility/scalability) are analyzed and presented with examples.

Safety/Security

A “safe” system is such that the harm from accidental mishaps to the system itself can be minimized or avoided. A “secure” system is such that some important properties of a system (like integrity, access, accountability, availability and confidentiality) can still be maintained under intentional attacks.

In another word, “safety” means the ability to reduce the risk of and harm from unintentional mishaps to the system’s stakeholders and valuable assets, whereas the “security” indicates lowering of the risk of and harm from intentional attacks.

It can be observed that both attributes require a system to be able to preserve important properties of a system and minimize the harm, but from difference types of accidents.

“Safety” focuses on unintentional incidents, i.e. the incidents that not meant to cause damage to the system itself. For example, a “safe” social media system should still be functional if one of its servers is disconnected from Internet because of a maloperation during a construction (like cutting a critical wire by mistake); an artificial satellite should keep intact for its major components operational when facing a strong cosmic ray, but some degree of slowdown is tolerable; in a safety system, important data (like financial transactions) won’t be unrecoverably lost if a disk containing these data malfunctions unexpectedly.

“Security”, on the other hand, talks about intentional attacks, i.e. the attacks that targets to the system from the very beginning. For example, in a secure system, no plaintext passwords shall be compromised when hackers are attacking database; a medical system should hold long enough under terroristic cyber attacks to be able to get professional assists from law enforcement department, since a breakdown of such system might cause disasters; if a hacker had gained access to the system illegally, the system should be able to detect their existence, remove their privilege, and then report the bug that was abused as soon as possible.

Extensibility/Scalability

System always grows as time goes, but by two directions: vertically or horizontally. A system might need to implement more functions than originally anticipated or change the implementation that has already been made (vertically); a system might also be required to process more requests without excessive changes to the system itself in the future (horizontally).

Both extensibility and scalability focus on the growth of a system, but from difference perspective. Extensibility is the ability to extend vertically: that is to add “extensions” (like new features, modification to existing modules etc.) without too much changes and impacts to be expected. Scalability, on the other hand, is the ability and potential to scale horizontally: i.e. the ability to effectively handle growing or reducing amount of work using existing system or with minimal changes to the system itself as the number of works increases or decreases.

For example, an extensible frontend project usually indicates that adding a new page (to meet newly-derived business requirements) can be easily implemented without a deep dive into existing code. It also might mean that changing a style to an existing common component can be done within one place which takes effect for all of its occurrences. An extensible system with complex dataflow should be able to integrate a data processing module without an overhaul to the whole dataflow.

As for scalability, existing cloud service providers (Microsoft Azure, AWS etc.) all provide a scalable infrastructure that can gradually accommodate more demands as more companies are migrating their services to cloud based platform for better performance, maintainability and cost. A new concept of computing, serverless or functional computing, are gaining ground in cloud service territory because of its “infinite scalability”, which means a service can adapt to handle any amounts of requests the service is actually facing, freeing developers from caring infrastructure and codebase themselves as the number of requests increases.

General and Concrete Scenarios

In this part, scenario-based analysis method is applied on the aforementioned pairs of quality attributes to create their general and two concrete scenarios respectively.

Safety

Portion of ScenarioPossible ValuesSourceInternal or external to the systemStimulusEvents that is unintentional to damage but will affect the systemArtifactSystem or one or more modules of the systemEnvironmentPortion of system might be influenced by this eventResponseDetect the event- Detect the event’s occurrence- Analyze the affected modules- Notify related entitiesAvoid or minimize the damage- Disable or isolate affected modules- Deploy backup assets to restore functionality- Switch into a degraded mode- Be offline during repair- Restore to normal after the damage is fixedAvoid future occurrence- Find the vulnerabilities- Report the vulnerabilities- Fix the vulnerabilitiesResponse MeasureTime to detect the eventsTime to notify the related entitiesTime to mask affected modulesTime to restore functionalityTime to repair critical modulesEstimated damage for accidentsTime to find the vulnerabilitiesTime to fix the vulnerabilities

Samples

PortionDescriptionSourceA construction team unrelated to the systemStimulusCut a network wire that connects system to the Internet by mistakeArtifactNetwork moduleEnvironmentThe affected network module is using the wire when the event occursResponseThe module detects the event and switches to another router bypassing the broken wireResponse MeasureThe detection and reaction take only 30s for the system to go back to normal, during which the throughput dropped 5%.

PortionDescriptionSourceNatureStimulusUnexpected earthquakeArtifactA disaster control system for a nuclear power plantEnvironmentThe earthquake damages containers and causes leaking of radioactive materials.ResponseDetect the leaking, initiate early emergency process (like shutting related reactors), notify authoritiesResponse MeasureThe detection, initiation and notification take only 3 min. Leaked materials are within control and won’t cause severe biohazard.

Security

Portion of ScenarioPossible ValuesSourceInternal or external to the systemStimulusIntentional attacks to the systemArtifactSystem or one or more componentsEnvironmentThe system doesn’t foresee the attackResponseDetect the attack- Detect the attack’s occurrence- Analyze the damage- Report the attackMaintain the properties- Disable compromised modules- Deploy backup assets- Lock critical modules and data- Remove attackers from the system- Switch to degraded mode- Restore to normal after the attack is resolvedAvoid future occurrence- Find the vulnerabilities- Report the vulnerabilities- Fix the vulnerabilitiesResponse MeasureTime to detect the attackTime to switch to degraded modeTime to secure critical modules and dataTime to remove or block the attackersTime to deploy backup assetsEstimated damage of the attackTime to find the vulnerabilitiesTime to fix the vulnerabilities

Samples

PortionDescriptionSourceA malicious hacker teamStimulusA well-coordinated and massive DDoS attackArtifactSome part of the system that can barely hold the attackEnvironmentThe hacker initiated a massive DDoS attack to the systemResponseDetect the attack; Detect and block attack source; Disable compromised module; Deploy backup server resources; Notify security team.Response MeasureTime to detect and notify the attack is 30s. 80% attack sources have been blocked after 30 minutes. The system goes back to normal in 1 hours.

PortionDescriptionSourceA lone-wolf hackerStimulusAn unauthorized entry to critical databaseArtifactA database that contains critical and confidential dataEnvironmentThe database is operationalResponseDetect the entry; Block the entry; Report the event to authority; Report the vulnerabilities the intruder uses.Response MeasureThe detection, blocking and reporting take 15 seconds and no data is leaked. The vulnerabilities are fixed in 1 day.

Extensibility

Portion of ScenarioPossible ValuesSourceEnd user, developers, requestorStimulusA directive to add/delete/modify functionality on existing systemArtifactCode, data, interfaces, components, resources, configurations…EnvironmentBusiness analysis time, runtime, compile time, build time, initiation time, design time, test timeResponse- Understand extension- Design extension- Make extension- Test extension- Deploy extensionResponse Measure- Time and material cost of communicating, understanding, designing, making, testing and deploying of the system extension- Time and material cost of reeducating users or other stakeholders after system extensions- Other affected modules not originally anticipated during actual processing- Potential time and material cost of newly-introduced defects

Samples

PortionDescriptionSourceRequestorStimulusWish to add a new functionality onto an existing websiteArtifactCodeEnvironmentdesign time, business analysis timeResponseUnderstand, design, make, test and deploy the new requirementResponse MeasureAll changes deployed in 3 days but brought 70 more bugs which took 3 days more to resolve. New tutorials are written to teach the end users about the new functionalityPortionDescription---------------------------------------------------------------------------------------------------------------SourceDeveloperStimulusWish to change a provider for a service that depends on third-party servicesArtifactInterfacesEnvironmentdesign time, runtime, test timeResponseDesign, make, test and deploy the new requirementResponse MeasureAll changes made in 1 days. Only 1 module are affected, and no more defects are introduced.

Scalability

Portion of ScenarioPossible ValuesSourceEnd user, developers, requestorStimulusThe need to use existing system to handle different number of requests from originally designed with minimal changeArtifactSystem or one or more components in the systemEnvironmentSystem’s operation modeResponse- Evaluate possibilities and potential change- Make the change, if necessary- Process requestsResponse MeasureLatencies on different level of loadMax and min number of requestsThe improvement brought by the scaleThe cost to expand or shrink scaleThe cost to change existing system to adapt for the changeThe cost to resolve defects and interference when executing a scaling

Samples

PortionDescriptionSourceRequestorStimulusWish to handle 3 times more requests than originally planned during a unit timeArtifactThe whole systemEnvironmentSystem hasn’t been online yet.ResponseThe system is evaluated as capable to accommodate the increase of requests, so no changes should be made.Response MeasureThe max number of requests is 5 times more than plan and the latency increases only 10% after the 3 times increase. No more cost needed.

PortionDescriptionSourceDeveloper, end userStimulusThe need to improve calculation precision for a complex algorithm within original timeArtifactThe calculating moduleEnvironmentSystem has been operational.Response800 more CPUs are added into the mainframe as the evaluation indicates, no more changes required.Response MeasureThe process doesn’t interfere normal operation. No more cost or change except CPUs’ are needed. The precision improvement increases the sale of the system by 30%.

Strategies and Tactics

In this part, strategies and tactics to improve each QAs are presented as well as their benefits and penalties to other attributes and QAs.

Safety

StrategyTacticDescriptionBenefitsPenaltiesAvoidRedundancyIntroduce redundant assets into the systemIncrease robustness and security, avoid single-point of failureIncrease cost, complicate system arch designAvoid risk designAvoid making arch designs that has high possibility to cause problem in the future.Reduce the possibility for problems to occurLimit the decisions that can be beneficial in other perspectivesDetectDesignated monitoring systemA complete and separate system to constantly monitor the critical perspectives of the systemGet accurate, complete data and error report in time without interfering original systemIncrease cost, a new point of failure to be kept watch onHeartbeatSystem sends a signal every time interval to report its statusEasier to implement and integrate; detect event in timeMay affect system performanceHandleDegradeLimit the system’s functionality to limit the potential damageMaintain basic functionality while handling the problemAffect user experiences during degradationDisable affected modulesDisable the affected modules completely, fix it and then goes back to normalCompletely avoid further damage and be able to be fixed quicklyMight cause a complete breakdown of a function

Security

StrategyTacticDescriptionBenefitsPenaltiesAvoidTestFind and fix as many vulnerabilities as possible before putting the system into useAvoid further and usually more damage at a security breach in runtimeIncrease development time and material costSimplify designSimplify the architecture design to avoid vulnerabilities that comes with unnecessary componentsAvoid vulnerabilities and save resourcesMight be negative for other QA like modifiability and extensibilityAdd security strategiesAdd more strict security methods (like 2-step auth) to protect the system from being hackedIncrease the cost of hacking to reduce the hacker’s benefit and interestIncrease the complexity. Negative impacts on usability and efficiencyDetectLogLog all entries to protected areaEasy to integrate and implementMight not be effective for well-prepared attacks; entries are too many to checkReportReport suspicious and abnormal operationReduce amount of work to check all the logsSome operation might be mistakenly ignoredHandleIsolate or shutdown compromised modulesIsolate or shutdown the compromised modules to limit the damageCompletely avoid further damageMight cause a complete breakdown of a functionDelete critical dataDelete critical and confidential data, if backed up, to avoid data leakingAvoid data leakingNot applicable if no backup is available.

Extensibility

StrategyTacticDescriptionBenefitsPenaltiesImprove inner architectureSplit by functionSplit a large system by function so that each function can be run individuallyAdding or modifying function won’t affect existing onesNeed careful design; might not be the most efficient and performantConstant refactoringConstantly refactor the arch as development goes on, not relying on an unrealistic “perfect” archA good balance between cost and quality within a development cycleHigh skill requirement for developers and teamsImprove outer interface designExpose only necessary interfacesOnly exposes necessary APIsIncrease flexibilities on implementation; reduce interface changes; improve securityReduce the flexibility of usage; hard to determine the “necessity” of interfacesDo one thing, do it wellAn interface should focus on one small piece of work and do it well.Improve usability, implementation flexibility and interoperability; also helps in scalabilityNeed careful design

Scalability

StrategyTacticDescriptionBenefitsPenaltiesSplitSplit by responsibilitySplit a system by different responsibilities into difference layers (data accessing, calculating, viewing etc.)Optimize each layer with their own characteristics; easy to scale each layer accordinglyMore complicated architecture design; more time and material costPartition databasePartition database so that pressure to database can be “divided and conquered”.More throughput and scalability from the databaseComplicated architecture design; not always applicable; inappropriate partition may lower performanceMake use of cacheDeliver static contents from cheap sourcesSplit static contents out of dynamic parts and deliver static contents from cheaper and more scalable sources (like CDN)Reduce server pressure and make the most use of precious calculating resourcesdata synchronization might be a problemUse in-memory database as cacheUse in-memory database (like redis) to avoid frequent access to actual databaseReduce access to database, improve performance and responsivenessA new layer to worry about; more complicated architecture design

References

5.10 Measuring the System Scalability. (n.d.). Retrieved from Lebanese Republic Office of the Minister of State for Administrative Reform: http://www.omsar.gov.lb/ICTSG/105OS/5.10_Measuring_the_System_Scalability.htm

Bloch, J. (2006, October 22-26). How to Design a Good API and Why it Matters. Proceeding OOPSLA ‘06, (pp. 506-507). Portland, Oregon, USA. doi:10.1145/1176617.1176622

Firesmith, D. G. (2010). Engineering Safety- and Security-Related Requirements for Software-Intensive Systems. Carnegie Mellon University, Software Engineering Institude, Pittsburgh, PA 15213.

Kellyh, T. (2008). Safety Tactics for Software Architecture Design. The University of York, High Integrity Systems Engineering Group, Department of Computer Science .

Seovic, A. (2010). Achieving Performance, Scalability and Availability Objectives. In M. F. Aleksandar Seovic, Oracle Coherence 3.5.

Serhiy. (2017, April 14). How to Increase The Scalability of a Web Application. Retrieved from Romexsoft: https://www.romexsoft.com/blog/improve-scalability/

Shoup, R. (2008, May 27). Scalability Best Practices: Lessons from eBay. Retrieved from InfoQ: https://www.infoq.com/articles/ebay-scalability-best-practices

Safety/Security and Extensibility/Scalability in Software System Design and Arch...

Introduction

Definitions and Comparisons

Safety/Security

Extensibility/Scalability

General and Concrete Scenarios

Safety

Security

Extensibility

Scalability

Strategies and Tactics

Safety

Security

Extensibility

Scalability

References

Recommend

Why We Code

2018年总结

Apple's 5K Studio Display should support Windows, including the webcam and speak...

CQL Filtering in pg_featureserv

Another Early Look - Netlify Graph

使用PowerShell脚本让UWP应用使用localhost上的系统代理

一次生产环境的文件丢失事故：复盘和教训

That Time I Got Pwned For .2 ETH

Strongly Typed i18n with TypeScript

可靠性、辅助功能和多屏协同：我对智能手表的体验和看法

About Joyk