Develop Your Architectural Risk Radar
Risks are part of life. There is uncertainty everywhere. And as more and more of our lives and businesses depend on IT, the risk of IT failures and problems is a critical thing to be able to identify, understand, document and – ultimately – manage.
Architectural Risk is a critical discipline in the Systems Flow process. Its a huge selling point in the “design process” sales pitch we are often called on to deliver, usually not to prospective clients but rather to recalcitrant stakeholders or projects sponsors who need to be convinced to spend time in their projects sorting out architecture and design before diving into coding.
Unfortunately, we’ve found through experience that identifying and documenting architectural risks is a very difficult skill to acquire, even for seasoned IT professionals. Some of these simply require a “Risk 101″ crash-course, similar to that offered by most project management methodologies.
For instance, if an architect has trouble differentiating an issue from a risk, we like to break it down very simply:
- A risk is a problem that may happen
- An issue is a problem that has happened (or is still happening)
But by far the greatest challenge for an architect’s “risk radar” – once they’re clear what a risk is – is to differentiate a project risk from an architectural risk – they are completely different.
Architectural Risks are those that either directly impact or directly result from the architecture. They are not project risks, but operational risks that will be present once the architecture is in production use.
Here’s an example of a mis-identified risk:
|01||Interface code is running slowly in QA||High||Testing time is being elongated||
Implement fixes once testing completes
This is a classic issue - a problem that has occurred during a project, which impacts project activities, not production operational activities, and whose mitigation includes…resolving the issue! Its good that this issue was identified, but it certainly doesn’t belong in the risk bucket, unless we’re going to get silly and call this a “risk with a likelihood of Yes!”.
Now, there is a scenario conceivable where this project issue could result in a production, architectural risk. Imagine if the testing was inconclusive, the interface code continued running sluggishly, but the project and post-production support team decided that the issue will probably not exist in production. Since the production infrastructure is much larger and has more capacity, they will accept the poor performance in testing with the belief that there will be no issue once the code is released to production.
But…there’s a risk that the team may decide wrong. And what’s the best way to document this? With a Risk!
|01||Interface code may run slowly in production and may cause SLA to be missed.||Low||Customers will see incorrect balances as a result of missing data from XYX interfacing system||
Implement monitoring script to alert support team if job does not end on time
There are a couple key things here that make this a good risk:
- The Risk is stated as a possibility – i.e. the key word “may” is used. This might seem pedantic, but the subtle shift in language here helps clarify if this is really a risk. An issue would not say that it “may happen”, since an issue “is happening”.
- The Impact truly specifies the business impact if the Risk is realized. Newbie architects will frequently gloss over – or not even bother to understand – the potential business impact of a risk. For example, they will instead document as the impact that “The SLA may be missed”, when this is really part and parcel of the risk. One needs to ask: so what if the SLA is missed? This example makes it clear: customers will be having trouble.