Friday, May 08, 2009

A FAIR measure of defense in depth

Recently, the owners of a system containing sensitive information where I work began planning an upgrade to the latest available version. In addition to performance improvements and bug fixes, the new release also modified authentication and authorization processes. Compared to the current model, these changes would offer significant cost improvements in administration and support. But before flipping the switch, business and security stakeholders wanted to know “which configuration is more secure?”

To provide an objective answer to that question, we defined “more secure” as the configuration and support processes (i.e. security controls) that would result in the smallest amount of residual risk to the organization.

Initially, this looked like a simple consulting request from a business unit. Normally, the security team reviews the proposed security architecture and provides a recommendation. But long after the system upgrade, system owners and administrators will face new decisions that impact the security of the system. They needed advice and a full knowledge transfer on how security controls work together.

The Factor Analysis of Information Risk (FAIR) methodology makes this kind of analysis transparent by decomposing the analysis into lower levels of detail only as needed. In situations where the alternatives are very similar, FAIR allows an analyst to identify and focus only on the relevant differences.

FAIR defines risk as “The probable frequency and probable magnitude of future loss.” Given that we looked at two possible configurations of the same system, the underlying information value was equivalent in both situations, and the expected threats were also the same. So the probable magnitude of loss could only be determined by the controls, not by any differences in the underlying information. Since the FAIR framework structures the analysis in a hierarchy along impact and likelihood boundaries, an analyst can isolate the comparison to only the parts of the analysis to the subset of factors that are different. In this case, the focus was on control strength and control depth against the range of expected threats.

Using FAIR, we looked at the type and frequency of contact that threats would have with the system, their probability of action, and the number and strength of controls applied in each case.

In the end, the analysis objectively determined which configuration had enabled more layers of security controls that could not be circumvented by an attacker. (As an example of circumventing a control: logon banners may be required for legal reasons on certain systems, but an attacker may not interact with the system through those defined interfaces and thus circumvent the control. The AOL mumble attack is another.) And the threat-oriented focus provided a context for evaluating future system changes: owners, auditors and security team members now share a common understanding of how control changes add or remove layers of protection between threats and assets.

Eventually, we wound up with a reasonably portable security metric for comparison: defensive layers per threat vector.

It’s not the number of controls or compliance gaps that determine the security of a system, but the strength and depth of that protection that attackers can’t sidestep.

Friday, April 24, 2009

Security policy pest control: Exterminate weasel words

Do your security policies suffer from an infestation of “weasel words?” If so, they need to be captured and destroyed. If that seems inhumane, they can also be recycled and sold to professional politicians, United States Federal Reserve Bank chairmen, or used in ready-to-make waffle mix.

What are weasel words, why don’t they belong in a security policy, and why are they associated with “waffling?” In the information security policy space, weasel words fall into two basic categories: undefined terms, and inherently vague phrases. For example:

Undefined terms:
“Shall be limited to authorized personnel…”
“…only IT-approved software may be installed”
“…must be restricted.”

Inherently vague phrases:
“…where possible…”
“…where feasible…”

So what’s the problem? Left unchecked, weasel words weaken an information security program by:
1. Generating an excessive amount of consulting requests for the security team. Scarce analyst time is consumed answering questions about the meaning of security requirements, instead of advising on how to implement them.
2. Creating uncertainty for functional teams. If the requirements aren’t clear, team leaders will not know how to prepare for audits, or how they will perform when examined because boundaries aren’t clear.
3. Allowing inconsistent implementation of security controls. Unspecified requirements are not requirements: the phrases used must constrain action in some way. Otherwise, you’ll see 20 different interpretations for each, and no consistency across organizational boundaries. And as the 2008 and 2009 Verizon breach investigation survey and the recent joint strike fighter intrusion incident shows, successful attacks gravitate to the areas of weakest security.
4. Leading to weak enforcement. What is the boundary between authorized and unauthorized? Where and how is IT approval granted? Without being specific, it isn’t possible to enforce.
5. Causing ineffective reporting. If there isn’t a clear threshold for when a requirement is “met” or “not met,” then how can you report on the state of security? If each control allows for a wide span of interpretation, a list of “met” controls doesn’t cover it. One caveat here: Fusion Risk Management has a great solution to this issue; when assessing current implementations, their processes allow for the assignment of a maturity level to each control implementation. This gives greater context than a simple “met” or “not met.” But even in this setting, there are defined thresholds that separate each level of maturity, which is the key to visibility and continuous improvement.

Policies are an opportunity to set direction for an organization at a high level. What is the intent of management? It’s important to be flexible, but vague is not the same as “high-level.”

The appeal of using inherently vague phrases is that they can be quickly inserted at draft time, and at first look they appear to allow for flexibility. The intent is to account for the give-and-take between risk and cost at the policy origination stage, since organizations do not have the resources to evaluate the cost of dozens (or hundreds) of controls across a wide range of teams, departments and business groups.

But weasel words are not a substitute for meaningful security governance. If a control is too restrictive, or isn’t clear, it needs to be reviewed by leadership and aligned with the needs and capabilities of the organization. And if there are substantial differences between units, then there needs to be an explicit documentation of how that risk will be handled. But a well-designed ISO 27001 Information Security Management System (ISMS) accounts for this.

When documenting a security requirement, follow this simple rule: if the organizational impact of a requirement isn’t clear enough to specify management intent in a given category, then leave it out until that impact is known.

Good security hygiene requires a pest-free environment. Find and exterminate all weasel words, and use governance to weigh risks and costs in a planned approach. This will help you trap them before they get back in again. Catch and release …

Sunday, April 19, 2009

Getting the most out of virtual teams

Most of the big challenges in information security require a multi-disciplinary approach. It takes specialized knowledge and input from many different areas for leaders to successfully balance costs to the business against the expected benefits of reducing risk while ensuring that operational goals are reached.

In global organizations, this usually involves virtual teams working with a mix of collaboration tools, with relatively few opportunities for face to face interaction. These matrixed teams can often feature a more diverse mix of countries, cultures, educational backgrounds and perspectives. But their value can be easily lost if one or more dominant voices crowd out the rest.

To keep that from happening, there are several decision making tools that can be helpful in a virtual setting which encourage collaborative and creative development within a project structure.

Spiral Development Methodology
If the goal of the project is to develop a process or internal service offering under tight timelines, and if role definitions and/or project deliverables have a significant amount of ambiguity, it may make sense to use the spiral development approach in order to ensure that a working process is implemented right away. While it isn’t labeled a “spiral” methodology, Kevin Behr, Gene Kim and George Spafford detail the essential steps for establishing control over change management in their book The Visible Ops Handbook: Implmenting ITIL in 4 Practical and Auditable Steps.

In contrast to traditional development methodologies that use a top-down approach which begins with fully specified requirements and ends with a final product, the spiral approach uses these steps:
1. Plan – specify requirements in as much detail as possible
2. Design – design the solution based on known requirements
3. Prototype – build a working process / solution and deploy it
4. Evaluate – compare prototype performance against expected performance; have the initial goals been met? Identify lessons learned and new requirements, and repeat steps 1-4 as needed.

By taking an iterative approach, the team can deliver a working solution that meets immediate operational and/or regulatory requirements while gaining experience that will be helpful in refining and improving the solution.

Improving decision making in virtual teams
As typically implemented, brainstorming in a team setting involves a facilitator documenting alternatives in the order in which they are most loudly, and frequently, repeated. Because they’re generated one at a time, some ideas get lost along the way, and at a certain point the list seems “long enough” and that’s the end of the input.

Even in a motivated team with good interpersonal relations, the “tyranny of the enthusiastic” may unwittingly crowd out other options. One way to prevent this is to use what is called the Nominal group technique:
1. Before the meeting, each team member writes down their own ideas on the problem; requirements, design issues, and solution approaches.
2. The team meets:
a. Each member presents one idea to the group; no discussion takes place until all ideas have been recorded.
b. The team asks questions to each presenter to ensure that their approach is clearly understood, and then evaluates it.
3. Each team member ranks the ideas presented and sends their “votes” to the facilitator. A final decision is based on the highest aggregate ranking.

While this involves more pre-work and coordination than the typical “brainstorming” approach, the advantage is a much fuller reflection of the capabilities of the team. And since all team members must present, it makes “social loafing” much less likely as everyone is expected to provide input.

Another approach, originally pioneered by RAND as a forecasting tool is called the “Delphi” method:
1. Each member provides a written forecast, along with supporting arguments and assumptions.
2. The facilitator edits, clarifies and summarizes the data
3. Data is returned as feedback to the members, along with a second round of questions.
4. The process continues, usually for about 4 rounds, until a consensus is reached.

Sometimes it’s possible to just throw people on a conference call and just hash it out. But other times, you need all of the creativity, engagement and effort that a matrixed team can muster, and all on a very short deadline. In those circumstances, an ounce of smart structure can yield a pound of results.

Saturday, April 04, 2009

Boiling the O.C.E.A.N.

Metrics projects that are intended to consolidate and report on the state of security for an organization rarely fail for a lack of measures. Information technology systems, processes and projects all throw off an impressive amount of data that can be captured and counted. The Complete Guide to Security and Privacy Metrics suggests over 900 metrics, and NIST Special Publication (SP) 800-55 Rev. 1, Performance Measurement Guide for Information Security extends this analysis from the system level to an executive view by providing a framework for summarizing the results.

So given all of the measures, structure and guidance available, why is it so tough to be successful? The silent killer in this space is often a lack of focus: too many metrics, too much aggregation, and too little analysis connected to business problems and goals to provide useful insight.

Instead, its better start with the stakeholders and focus on fully understanding their goals and decisions without limiting the conversation with assumptions about what is or isn’t going to be measureable.

Consider this subset of stakeholders, and some of their goals:
Executive management – financial health and strategic direction of the organization. Are we profitable and are we executing effectively in the markets we serve?
Risk governance / Security management – are we keeping risk at an acceptable level? Are we making the best use of the security resources we have?
Line Management – are we achieving operational goals, and aligning with strategic initiatives?
These questions become an effective filter for removing the measures that don’t matter, and for finding common measures that, with analysis, can serve many different purposes. Here’s where it may be useful to classify measures from a stakeholder perspective in terms of the types of decisions that they enable:
Output measures - what is the primary deliverable from a given team?
Coverage measures – how many locations, systems or groups are covered by a given process or policy?
Exposure measures – what proportion of the environment stores or processes regulated information?
Activity measures – how many requests have been received during a given reporting period? Addressed?
Null measures – which teams have not provided data?

The last category is an important one, as it highlights the difference between a measure and a metric. A measure is an observation that increases your understanding about a situation and improves the quality of decision making; a metric is a standardized measurement. Inconsistent, incomplete and missing data from key teams or groups are an important measure of program maturity. Sometimes it’s what you can’t count that counts.

Above all else, resist the pressure to measure everything. A few well-chosen measures will allow for versatile and powerful analysis. There are literally dozens of ways to analyze and present a limited number of well-chosen data points. And when captured consistently over time, the correlations between seemingly unrelated activities offer the opportunity to surprise.

Saturday, March 28, 2009

Security Policy as concept car

In the JMU Information Security MBA program, the main assignment for the second class is to put together an information security policy manual. During the lectures we spent most of our time focusing on frameworks and sources such as ISO 27001, COBIT, ITIL, NIST, SANS and many other sources of policy content. Thankfully, we also spent time working through some themes from The Design of Everyday Things by Donald Norman.

My favorite takeaway from the class was the realization that "fit" is an important concept in information security; so much so that it should be explicitly recognized in the policy framework. Policies must fit the security requirements, cost constraints, culture and capabilities of an organization.

At the risk of leaving out a number of "must haves" in my policy manual, I wound up putting together a Concept Car for security -- a collection of statements and requirements oriented around three questions:
* What does your business need?
* What can you execute?
* What can you afford?

They're not complete, but hopefully reflect a decent start in each of the categories that they address. I've also included links to all reference sources for more detail:

Information Security Strategy and Architecture
Information Security Charter
Acceptable Use Policy
Data Owner Security Policy
System Owner Security Policy
Platform Infrastructure Security Policy
Messaging Security Policy
Network Security Policy

Tuesday, March 24, 2009

Information Supply Chain Security

Abraham Maslow once wrote “I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.” But what if your toolbox has everything except a hammer? At the very least, it limits what you can build.

Last week at the University of Maryland I had the opportunity to be a part of a workshop to develop a Cyber-Supply Chain Assurance Reference Model, sponsored by the RH Smith School of Business and SAIC. Looking at the security challenges that organizations are now facing, the old toolbox seems about half empty.

Prior to the workshop I was very comfortable with confidentiality, integrity, availability, authenticity, and non-repudiation along with risk management definitions of loss expectancy as the basic language of information assurance. But after a few hours of looking at information technology in the context of a cyber-supply chain, it became apparent that we need better tools to characterize and manage emerging risks. There were a number of different perspectives represented at the meeting, but here’s my take:

Traditionally, assets are assessed individually and independently as part of the information assurance process. For internally facing systems with limited or explicit interdependencies, this isn’t a bad representation. But for organizations where boundaries with suppliers and customers are blurring, the interdependencies among these systems eclipse the value of the data they hold. From a risk perspective, Verizon’s 2008 Data Breach survey shows how attacks against vendors and suppliers become the entry point into “secure” organizations because of trust relationships. And from a financial perspective, high confidentiality requirements can make it difficult to ensure high availability in a cost-effective way.

Existing risk frameworks such as COBIT and ISO 27001 can describe these issues, but are not designed to model the trade offs in a way that helps security leaders optimize.

This is the point where the information security toolbox needs to draw on research capabilities from other disciplines. The Supply-Chain Operations Reference Model (SCOR) provides a proven framework for analysis that captures these dependencies.

The information supply chain analyst asks: where is information captured (created) and processed? What are the storage and delivery requirements? Risk, cost and the traditional “CIA” triad are variables in a business decision, rather than optimization goals on their own.

In contrast, infrastructure protection often takes an asset-centric view that attempts to identify the intrinsic value of an application or environment, separate from its role within an extended system. This makes the connection to business value more difficult to express, and to optimize.

The reference model will be published in April. In the meantime, there are still a few details that are being … hammered out …

Wednesday, March 18, 2009

Securing the organization, despite management’s best efforts to stop you

Looks like the abstract below is going to get the green light for the May 2009 Grand Rapids ISSA meeting. Ok, so the title is a bit of "red meat" for a largely technical audience, but the straw man here isn't management ... or security: it's the "ivory tower" textbook description of how security is supposed to work.

In reality, the most effective leaders that I have seen have been the ones who are pragmatic, patient, and unconcerned about "style points" when it comes to building an effective program. They just make sure that the number and severity of incidents keep trending in the right direction, even if the drivers of that success come from other parts of the organization.

Hopefully, I'll capture some of that in the slides that go with this presentation:

"Every text on information security says “be sure to get executive management support” before you start. But what should you do when that support is less than what you need, as is often the case in today’s cost-conscious environment? Management isn’t really out to stop you, although at times it may seem that way because of the contradictory pressures that affect the entire business.

Meanwhile, threats to information security are recession-proof, they don’t have layers of approval to contend with, and they’re not going to go away any time soon. Information security professionals need to respond to these threats regardless of the organizational challenges, and in the process build that support by demonstrating the value of the work they do. And they need to be strategic in their approach as it comes to requesting additional resources and support. The purpose of this presentation is to build on the concepts introduced in the Harvard Business Review “Managing Up” article collection, and presenting the impact of security with management-centric measures and analysis that will build the case for improving security by highlighting the facts, rather than fear, uncertainty and doubt.

Suggestions, success stories and one-line management rebuttals are welcome.

Something to the effect of: "Enabling the business / Serving customers / earning a profit ... despite security's best effort to stop you ..."

Monday, March 16, 2009

Making the right call

Cloud computing, or on-premises: which is more secure, and which is the better option for your organization?

It’s a simple yes or no question, and yet it shows just how much further security risk management needs to mature in order to command the stature of marketing or finance in driving company strategy. This isn’t to suggest that security is less important to an organization; it just hasn’t made as much progress formalizing and defending its decision making processes. Financial analysis tools can help in this category, so long as they’re not applied too literally.

For example, the “cloud vs. onsite” decision shares some important similarities with the “lease vs. buy” decisions that finance supports all the time. Finance uses a very simple decision rule to choose between alternatives: accept the decision that maximizes the net present value of the investment. Specifically: what is the sum of all cash flows (i.e. investments, expenses and revenues generated) and what discount rate should be applied to reflect the rate of return that is appropriate for this kind of investment decision?

Often the underlying assumptions and analysis are as important to decision makers as the final recommendation, so transparency is essential.

Given the rate of change in most organizations, security isn’t often asked to weigh in on a single investment choice in isolation. Usually, the decision involves picking the best course among alternatives, so it just needs to be clear, based on a consistent set of evaluation criteria, which alternative is comparatively better. And just as with the “lease vs. buy” scenario, decision makers need to see the analysis as well as the recommendation.

To compare alternatives, objectively, from a security perspective:
* Compare architectures. Which has greater complexity, and why? Higher complexity works against high availability.
* Compare security models: count the number and severity of exposures in each environment to attack.
* Compare control strength, using a common framework such as COBIT or ISO 27001: which environment provides greater defense in depth? What controls must perform effectively in order to ensure the security of systems and critical processes?

So long as both alternatives are assessed with standard, open frameworks the analysis will provide both a recommendation and a basis for evaluating all of the essential underlying assumptions. The intent is not to reduce the inherent variability of threat behavior into a single score that can be applied to both environments, or to conduct an expensive, overly detailed exercise. If there is a significant difference among the alternatives, it will begin to appear with a basic review of high level architectures and security models. If there isn’t much difference, then the decision threshold for security is likely to be met by either environment, and the decision rightly shifts to an evaluation of business benefits.

It only becomes difficult when you’re trading off performance and risk. But there’s a way to deal with that as well …

Sunday, March 08, 2009

Strategy-based Bracketology

In the information economy, it’s important to cross-train on select skills from other fields: there’s Operations Management for MBAs, Finance for Senior Managers, and perhaps the most important of all, Bracketology for Information Security Risk Managers.

Managing bracket risk
In the NCAA tournament, on average, the higher seed wins about 70% of the time. Most bracket pools score the results of each round the same, with 32 possible points for picking all of the winners in that round. There are six rounds, so the maximum possible score is 192. If you follow a high-seed strategy (i.e. pick the higher-ranked team) you’ll likely wind up with a score that's better than average.

Of course, if you pick straight seeds, you can expect the following:
* You’ll do well in tournament years that feature exceptionally strong top teams.
* You’ll be ridiculed by your friends for having no imagination and playing it safe.
* In a bracket pool of any size, you’re odds of winning are very, very low.

Everyone else picks upsets. Most people get most of them wrong, but a few get lucky, and the lucky ones come out on top. To have a shot at winning against your friends, prognosticators, or the masses on Pickmanager, you have to go with some underdogs. Each year there are usually a bunch of upsets, and the more you pick, the higher your potential score will be--at least in theory.

(As an aside, this perspective sheds a little light on how the current Wall Street mess started, and why it was so hard to stop: to attract investors, you have to produce top returns. And you’re not going to get top returns by always playing conservative.)

Start with history
Obviously, seeds are a strong indicator of performance, so it doesn’t make sense to just pick upsets at random. It’s good to look at the historical performance of each seed as a starting point. There are some upsets that happen every year, and the conventional wisdom is that they are “safe” to pick. For example, let’s look at 5 v. 12 first round matchup. Historically the 5 seed wins 67% of these games; an average of 1 to 2 upsets per year.

If you pick all 5 seeds, you’ll usually get 3 out of 4 possible points in the first round from those matchups. Sometimes they’ll all win and you’ll get 4 points; other times there will be two upsets and you’ll only get 2.

So putting your risk management hat on, which is that the best approach? Without any additional information, what strategy will give you the highest payoff? Consider the 2008 tournament 5 v 12 pairings:

(5) Notre Dame v (12) George Mason
(5) Clemson v. (12) Villanova
(5) Michigan State v. (12) Temple
(5) Drake v. (12) Western Kentucky

The left column on the chart below shows the 16 possible outcomes, with the historical probability of each. To see which one has the highest payoff, compare the columns to the right for each strategy: no upsets, 1 upset, or 2 upsets. (In the table, 2008 team names are listed instead of scenarios for clarity.)
  Pick Strategy 
All high seeds winNotre Dame upsetMSU and Drake upset
OutcomeHist. Prob.Max PointsExp. ValueMax PointsExp. ValueMax PointsExp. Value
All high seeds win20.2%40.8130.6020.40
Notre Dame upset9.9%30.3040.4010.10
Clemson upset9.9%30.3020.2010.10
Michigan State upset9.9%30.3020.2030.30
Drake upset9.9%30.3020.2030.30
MSU and Drake upset4.9%20.1010.0540.20
Clemson and Drake upset4.9%20.1010.0530.15
Clemson and MSU upset4.9%20.1010.0520.10
Notre Dame and Drake upset4.9%20.1030.1520.10
Notre Dame and MSU upset4.9%20.1030.1520.10
Notre Dame and Clemson upset4.9%20.1030.1520.10
Clemson, MSU and Drake upset2.4%10.0200.0030.07
Notre Dame, MSU and Drake upset2.4%10.0210.0230.07
Notre Dame, Clemson and Drake upset2.4%10.0210.0210.02
Notre Dame, Clemson and MSU upset2.4%10.0210.0210.02
 All low seeds win1.2%00.0010.0120.02
Expected Value (number of wins)100.0%2.682.272.15

Focus on specific outcomes, not typical results
It seems that if you know that one high seed is going to lose, you should pick at least one upset --and yet picking only the high seeds has the highest expected payoff (2.68). So what’s going on here?

Across the 16 possible outcomes of the four 5 v 12 games, a “no upset” strategy for this particular matchup ensures that the most likely scenario gives you the highest possible payoff, and the least likely scenario is the one that would leave you with the lowest possible payoff. (It does not hold true in the 8 v 9 case.) Knowing that 33% of the number 12 seeds are going to come out on top doesn’t help you pick the right ones. (Clemson and Drake were knocked out in the first round last year.)

The moral of the story: historical averages are important, but there’s a world of difference between knowing what typically happens and predicting what will specifically happen. You need a much higher level of confidence about specific outcomes (i.e. risks) in order to be more effective than just playing the odds.

How much more confident? Working backwards, if you adjust the probability of the scenario you think is most likely (e.g. MSU and Notre Dame as the only 5 seed winners) you can see what level of confidence you need in your prediction to justify making that choice.

Getting to that level of confidence requires research; knowing that you’ve reached it takes practice ...

Saturday, March 07, 2009

March Madness and Risk Management Strategy

Every vice, if it hangs around long enough, starts attracting self-justifying quotes. Ben Franklin came up with one of my favorites: “Beer is proof that God loves us and wants us to be happy.” I don’t necessarily agree, but I can empathize with anyone looking for ways to reduce their own cognitive dissonance. I also have a vice that I find virtuous: "March Madness," the annual NCAA college basketball tournament.

Each year, along with about 2 million other people, I sign up for Yahoo’s College Basketball Tournament Pick’em to see how many I can get right. Personal obsessions and Izzomania aside, I will proclaim with all sincerity that the skills you need to consistently make good picks in the NCAA tournament will also make you better at security risk management. Both risk management and tournament bracketology are based on making risk choices under uncertainty; both involve the judicious use of outside experts, rich statistical data, and intangibles. They also share the trait that over the short term, it’s really tough to tell the difference between luck and skill.

March Madness 101
The single elimination tournament is played in six rounds, with 64 teams seeded in 4 regions. In the first round, teams are paired with the highest seed playing the lowest seed e.g. 1 plays number 16, 2 goes against 15, all the way down to 8 against the 9th seeded team. Winners advance, so assuming that the high seed wins each game, in the second round the number one seed would then play the number eight team in the region; the two seed will play number seven, etc. Of course, the high-seed teams are regularly upset by lower seeds with a randomness and regularity that is … maddening.

Points are awarded during each round for correct picks as follows:

RoundPoints per correct pickNumber of gamesPossible points
113232 points
221632 points
3 ("Sweet 16")4832 points
4 ("Elite 8")8432 points
5 ("Final Four")16232 points
6 (National Championship)32132 points
Maximum Possible192 points

So there are 63 decisions to take before the first game begins, and the goal is to predict the winner of each game, in each round, in such a way as to maximize your total score:

Score for the round = points available * number of correct picks

This equation bears a very strong resemblance to the standard information risk equation below, which is used to calculate loss expectancy as part of the risk assessment process. Both equations define a payoff as the product of something you know quite a bit about (impact) and something that you can estimate to some level of confidence but not perfectly predict:

Risk exposure = risk impact * event probability

So if you get pushback for following the tournament in minute detail, obsessing over your picks and constantly checking your rankings every time there’s an update, take heart: It's not just a tournament, it’s a huge learning opportunity. Decision making in a dynamic, competitive situation with limited information and lots of uncertainty is a great environment for building your risk optimization skills.

Wednesday, March 04, 2009

Organizational Agility

It seems that 2009 is stacked against just about everyone trying to get new security initiatives off the ground. First we saw the waves of cuts and layoffs, with information security budgets left largely intact. But now the freeze is turning into cuts for security departments as well.

If only the threats to our environment were also struggling with the pressures of downsizing. But they’re not, so we have to stand up the most robust set of administrative, technical and physical controls we can muster with the resources we have.

Security departments aren’t the only teams that have to figure out how to win under these circumstances. Hockey teams are used to playing outnumbered for short periods of time. When a player is sent off to the penalty box, their team must carry on short-handed until the penalty time expires.

During this “power play,” the penalized team changes its defensive stance. They still directly challenge the attacking player with the puck, and maintain a depth of defenders in front of the goal to take away any open shots. But the defense can’t cover everything, and so they do their best to recognize and respond quickly as their opponent constantly shifts the point of attack.

Until the economy rebounds and budgets recover, many organizations won’t be able to fully staff every function and administer every control. It might take a year or two, but for now we’re in “penalty kill” mode. Situational awareness and the ability to respond quickly and cohesively is going to be especially important.

So how agile is your organization, and how does that agility impact your short-handed security strategy in a “power play” environment?

Measuring agility
Organizational agility is the ability of groups and teams to react to change in a way that benefits the overall organization. Agile business organizations observe market conditions, analyze opportunities, decide on a course of action and execute those plans effectively. (Well, in theory anyway. As military strategists like to say: “No plan survives contact with the enemy.”)

An organization with staff overburdened with responsibilities isn’t agile. So before trying to press on with a labor-intensive approach to security, it’s important for management to assess the organizational capacity to carry it out.

A good indicator of staff workload is meeting availability. So to measure agility, pick 30 people at random across the company and schedule a meeting without sending it. See how many are available during 2 or 3 different time slots this week. Then push it out 2 weeks, and choose a few more time slots. Then push it out a month. With a random spot sample of time availability, you can get a sense of the capacity of the organization to support key security initiatives.

If you find that the capacity is there, then labor-intensive activities such as security awareness training, information classification and risk assessment work can be sustained with a good chance of uptake and success. But if the calendar space isn’t there, it’s likely that your strategy will need to change. It may be better to focus on delivering technical security controls to your organization, instead of expecting as much from them.

Friday, February 27, 2009

Security, Functionality, and Profitability

As a security manager, are you frequently at odds with your business leadership regarding risk decisions? If the answer is yes, then good … the process is working.

So long as it is surfaced and resolved, conflict can lead to better decisions: but only if the process considers in detail how adjustments to the mix of security and functionality within IT systems affect the long run profitability of the organization. To quote Alfred P. Sloan: “If we are all in agreement on the decision - then I propose we postpone further discussion of this matter until our next meeting to give ourselves time to develop disagreement and perhaps gain some understanding of what the decision is all about.”

To be useful, IT systems need to operationalize business processes at a cost that allows the organization a reasonable return on investment. At the same time, these systems and the data they contain must be protected from unauthorized disclosure, modification, or loss.

Security professionals are hired for their specialized knowledge in deploying and managing systems that provide defense in depth: multiple layers of independent security controls that reduce the exposure of these systems to security incidents, and reduce the impact of these incidents when they do occur. Likewise, business leaders bring a similar level of specialization to key business processes, but with a focus on maximizing functionality and performance; reduced overhead, increased throughput, and so on.

So if both expertise and incentives are cross-aligned, what is the solution? Split the difference? Each time a firewall rule change, or configuration exception, or other deviation from best practices is under review, flip a coin? Well, not exactly. Compromise is important – but not to the exclusion of understanding the forces at work in the situation.

There are a lot of ways to represent this, but in the interest of promoting “Green IT” I’m recycling a few things from my microeconomics classes. The graph below shows the financial impact of securing an IT system. The vertical axis represents profitability; higher is better. The horizontal axis is a continuum: the left side represents a high degree of functionality, but lower security. Moving to the right involves adding layers of security controls, which in turn reduces the functionality and efficiency of the system from the perspective of the end user. The semi-circle on the graph is a benefit curve, which shows what happens to profitability as more controls are implemented. Moving from left to right, increasing protection up to point “A” makes the company more secure and more profitable. Functionality begins to decrease, but the value of protection over the long run pays for itself...up to a point. Eventually, adding “more security” begins to frustrate end users and slow business processes. And at point “B” the company is more secure, but worse off.
Ideally, leadership will recognize the trade-off that maximizes profitability, work to reach point “A,” and when they get there, stop. If the company finds itself at point “B,” exception requests which greatly ease the business process without significantly eroding the quality of protection should be approved until point “A” is reached.

If it's this obvious, then why does the process break down? Typically, security managers have more experience finding risks than business opportunities, and are rewarded for decreasing the former, rather than increasing the latter. Perhaps it's written into the annual goals this way:
  • Manage information security threats (30%)
  • Define security architecture, direct daily operations of staff. (50%)
  • Support financial targets of company (20%)
In this scenario, security incentives outweigh profitability incentives by a 4:1 ratio.

So the second illustration below shows how a security manager might evaluate different levels of functionality and protection. The curve, “U”, that runs from the top left to bottom right of the graph represents the trade-offs between security and profitability that a manager is willing to make. At any point on the curve, the security manager is indifferent (equally satisfied) with a given mix of security and profitability. The point at which the indifference curve "U" touches the profitability curve is the point a security leader sees as optimal.

From the shape of this curve, to accept low levels of security, the organization has to be exceptionally profitable. Moving to the right, a security manager might be willing to continue locking systems down even when there is a measurable profit impact.

And finally, one last graph below. Consider a business manager who understandably wants to maximize functionality, specifying requirements for a new customer-facing application. Business requirements put the tradeoff at “X” while the security chief pushes for “B”. Point “A” again represents the maximum benefit to the company. Sometimes “X” is closer to “A”; other times “B” is. So how do you determine where you actually are, and then make the improvements needed to get closer to “A?”
Risk Governance
IT governance processes, if properly designed and well managed, can be a huge help in bridging the natural divide between specialized experts with widely differing preferences. While it’s important over the long run to teach security professionals the fundamentals of the business, and equally important to have business leaders recognize the impact of security vulnerabilities, the reality is that rational decision makers will be influenced most strongly by the incentives that directly apply. Or as they say in the political realm: “where you stand depends on where you sit.” But back to Sloan -- what really matters is the shape of the curve, and how well the governance group understands it. Where is the “A” investment, and given the available architecture and implementation choices, how close to “A” are the various alternatives?

The governance process should seek to draw out all of the pieces of the proposed solution: what are the key components of the business process? Which elements are the most important contributors to the business value produced? What are the constraints? Likewise with security: what configuration requirements, administrative overhead, monitoring capabilities or other concerns are involved?

Without a sense of the size and shape of the benefit curve, and the location of various options on it, decisions will be based on the relative political strength of the participants. It's available to do better than that. While it is always going to be difficult to tell if you’ve actually reached “A,” it can be very apparent that you’re doing better than “X” or “B.” And if that decision comes at the cost of some challenging discussions, it’s a debate worth having.

Friday, February 20, 2009

The next 12 months

Yesterday at the Chicago ISACA meeting I had the opportunity to hear Dave Ostertag from Verizon walk through the 2008 Verizon Data Breach Investigations Report, point by point. At the time of publication, the report included over 100 data points from 500 cases, but the base is now up to 700 cases and still more interesting patterns in the data continue to emerge.

The report is 27 pages long, but it informs an information security strategy by simply and persuasively answering one simple question: “What changes can I make in the next 12 months that will significantly reduce the likelihood and impact of a security incident in my organization?”

Across all the activities lumped under the banner of information security, Verizon found that a surprisingly small set of outcomes (or more accurately, the absence of these outcomes) mattered most. The survey lists nine recommendations, but I’ve re-worded and consolidated them a bit here:
1. Execute: ensure that security processes implement the identity management, patch management and configuration management basics. From the survey: “Eighty-three percent of breaches were caused by attacks not considered to be highly difficult. Eighty-five percent were opportunistic…criminals prefer to exploit weaknesses rather than strengths. In most situations, they will look for an easy opportunity and, finding none, will move on.” In contrast, among poor-performers, “…the organization had security policies … but these were not enacted through actual processes…victims knew what they needed to do … but did not follow through.”
2. Inventory, segment and protect sensitive information: “Sixty-six percent of breaches involved data that the victim did not know was on the system.” Know where critical data is captured and processed, and where it flows. Secure partner connections, and consider creating “transaction zones” at the network level to separate baseline business activities from high sensitivity environments.
3. Increase awareness. “Twelve percent of data breaches were discovered by employees of the victim organization. This may not seem like much, but it is significantly more than any other means of internal discovery observed during investigations.”
4. Strengthen incident handling capabilities. Monitor event logs, create an incident response plan, and engage in mock incident testing.

Steps 1 and 2 reduce the likelihood of an incident; steps 3 and 4 primarily reduce the potential impact by decreasing the time lag between an intrusion and its eventual identification and containment.

As for step four, my first thought is that mock testing won’t be much of a need for most incident response teams because of the natural cycle of event monitoring, suspected incident reporting, and initial response to events that are often false positives. Organizations that promote active reporting of suspicious events, and who treat each one as an actual incident will have much of the practice in a live setting that mock drills would otherwise offer. Instead of trying to prevent false postitives from occurring, an IR team should work to become more efficient at quickly ruling them out. As they do, the threshold for activating an initial review will drop, and ultimately they’ll catch more events closer to the time of occurrence.

It’s still a good idea to ensure that all stages from identification through remediation and recovery are fully practiced, but in general achieving containment quickly reduces the number of records exposed, and thus the eventual full cost of the breach.

Which brings us to next steps for Verizon; it seems that they’re now working on developing an incident costing model. This will be huge, because without it, organizations will continue to struggle with how to set specific protection goals that align with their cost structure and business strategy.

As an example, the survey looked at four sectors. Retail was one that contributing a sizeable amount of data (which is a polite way to say they got hacked a lot.) No surprise that simple survival is usually a bigger concern than security for many retailers: net profit margin among publicly traded companies in this sector often ranges between two and six percent. An additional dollar spent on physical security needs to be matched by up to $25 in additional sales … just to break even. Considering the wholesale cost of merchandise, it’s understandable why management accepts the risk of physical theft, formally accounting for it as “shrinkage.”

Unfortunately, while this mindset towards risk carries over into the electronic space, the analogy doesn’t. A dollar lost to computer crime, either through the cost of the incident itself, or the cost of organizational response, comes straight out of profits. It’s a much more damaging effect.

But, without a clear measure of the cost of an incident, the value of steps 1-4 to the CFO are murky at best. It doesn’t need to stay this way: calculating the direct and indirect handling costs of an incident isn’t a terribly difficult exercise, and most organizations already have the data needed to put it together. At JMU I started down this path with Dr. Mike Riordan in his Managerial Accounting class, drawing heavily on Gary Cokins’ paper Identifying and Measuring the Cost of Error and Waste to frame the problem. We need a credible model backed by lots of data, and I’m really hoping Verizon is able to put it together.

As for the next 200+ cases, I can’t wait to see how they present the 2009 findings. To characterize the survey as “pathology” might be a bit strong, but I thought it was interesting to note Dave’s background as a former homicide investigator. During the live session, you get some answers to the “so then what happened?” questions that the report doesn’t touch.

On our end it may feel like a never ending battle, so it’s good to talk to someone with a broad view of what is going on internationally. It’s more than a little comforting to learn how much progress is being made in locating and taking legal action against the bad guys…

Tuesday, February 10, 2009

Change as a catalyst for security

IT Budgets are expected to be flat for just about everybody in 2009; IT security spending will likely be the same. After years of relatively strong management support this may seem like a setback, but I’m convinced that the proverbial glass is still at least half full.

Even if new security technology rollouts are being delayed, that doesn’t mean the entire organization is standing still. Management faces pressure on revenues and costs, and they’re going to be very active pursuing any and all strategies to make improvements in both of those categories. These pressures are going to drive change, and change can become a powerful catalyst if you can influence the organization to address security issues opportunistically.

There are two keys to an opportunistic security strategy: first, a thorough understanding of the gaps in administrative, technical and physical controls across the enterprise. And second, an equally sound understanding of how to produce better security as a side effect of operational improvements.

As an example, the Visible Ops Handbook describes high performance organizations which have gained control over their change management processes, boosting efficiency. More importantly, “by putting in controls to find variance, they have implemented preventative and detective procedures to manage risk.” Security is a side effect; an externality of operational improvements.

The output of security control gap assessments effectively becomes a shopping list for an opportunistic security manager. Once you start looking at security as a positive side effect, there are at least four main opportunistic strategies available:
1. Attrition: retire systems with known gaps. Network gear with password length / strength limitations? Applications on end-of-life operating systems? Security won’t drive these retirement decisions – but it makes a good tiebreaker.
2. Relocation: consolidate critical systems from environments with low control coverage in areas with better protection capabilities.
3. Extension: broaden the asset base addressed by compliant platforms as an overlay, reducing configuration diversity and streamlining support costs.
4. Outsourcing: When transitioning, fully document procedural controls that were informally implemented, but not consistently.

Visible Ops describes the mechanics of strategies 3 and 4, but in a different context. They’re two instances of a common theme: quality and control make a strong foundation for both security and cost efficiency. Some organizations will be better positioned to take an opportunistic approach in 2009. A lot depends on the manager, but there are other factors that will also play a significant role:
1. Metrics maturity: does the organization have an objective view of control coverage and control strength?
2. Communications: Accountable system owners and project sponsors need to be aware of the current state of protection, and the expected effects (benefits) of proposed changes.
3. Line of sight to business objectives: how does coverage and exposure impact profit and loss?
4. A significant volume of organizational change.
5.Operational flexibility and creativity
to modify projects, ensuring that opportunities to improve security are incorporated.
6. Continuous improvement: once a change has been made, capture and replicate it. And just as important: make sure that subsequent change in these environments do not reopen old vulnerabilities.

“Progress, of the best kind, is comparatively slow. Great results cannot be achieved at once; and we must be satisfied to advance in life as we walk, step by step.”
--Samuel Smiles [Scottish author, 1812-1904]

Thursday, February 05, 2009

Assessing Enterprise Risk with forensic tools

There’s no need for FUD (fear, uncertainty and doubt) or guesswork when making the case to management for improving the protection of sensitive information. A serious incident or close call is often the most effective form of persuasion, but it’s not the most desirable. Ironically, forensic investigation tools can be just as useful in preventing incidents as they are in responding to them. But the key is how they’re used. To make the case for change, build on a foundation of reasonably sized data samples, transparent criteria for characterizing results, and focus on the decisions these data are intended to support.

For example: in the 2008 Global State of Information Security Survey, authored by CSO Magazine, CIO Magazine and PriceWaterhouseCoopers, 54% of executives surveyed admitted that they did not have “an accurate inventory of where personal data for employees and customers is collected, transmitted or stored.”

Organizations that don’t normally handle personal data in the course of business might not put the risk of sensitive information loss high on their priority list. Businesses that routinely process high volumes of sensitive information may reach the same conclusion if they feel confident that all systems are consistently protected with highly restricted access. But in either case, without knowing how many copies of these records have been created and shared across end user systems--over the course of several years—a blind decision to either accept or mitigate this risk is likely to be off the mark.

Enter the forensic investigator, often overworked, with relatively little down time to spare. Armed with forensic tools and a basic understanding of what and how much to measure, they can provide a compelling case for decision makers without the expense of a huge data gathering exercise.

With sample results from 30 systems chosen at random, using predefined search strings that are applied the same way to each search, you can get a good feel for the scale of the problem with a reasonable margin of error, where reasonable is defined as: “precise enough to support a decision, while maintaining confidence in your conclusions and credibility with your audience.”

Consider a company of 40,000 employees, with no prior formal assessment of how much sensitive information is on its end user systems. Even a basic estimate would be a huge improvement in understanding the problem. Using output from this online calculator, the table below shows the confidence interval for sample proportions that range from 0 to 6 out of 30, and an estimate of the fraction of the 40,000 that these results most likely represent:

So if it turns out that 5 of the 30 systems from across the company contained sensitive information, you could reasonably conclude that up to 12,000 systems are affected. Is this too much risk? Depending on the threats and current protection capabilities, it could be. It may justify putting more education and enforcement behind a records retention policy, strengthening access controls and account reviews, or implementing a data loss prevention (DLP) solution.

One word of caution: while the initial sample showing 5 out of 30 may make the case for an awareness campaign, a second random test several months later with another small sample may not definitively show that things are improving. If the second sample shows 6 out of 30 (20%) still contain sensitive information, this sample proportion is within the margin of error of the first assessment (9% to 31%). That is, with a population of 40,000 end users, you’re about as likely to get 6 out of 30 as you are to get 5 out of 30 in a random draw. However, if you get zero out of 30 – then you’re much more likely to have achieved a (statistically) significant improvement.

How much more likely? To test against a threshold, use this calculator: