Friday, February 27, 2009

Security, Functionality, and Profitability

As a security manager, are you frequently at odds with your business leadership regarding risk decisions? If the answer is yes, then good … the process is working.

So long as it is surfaced and resolved, conflict can lead to better decisions: but only if the process considers in detail how adjustments to the mix of security and functionality within IT systems affect the long run profitability of the organization. To quote Alfred P. Sloan: “If we are all in agreement on the decision - then I propose we postpone further discussion of this matter until our next meeting to give ourselves time to develop disagreement and perhaps gain some understanding of what the decision is all about.”

To be useful, IT systems need to operationalize business processes at a cost that allows the organization a reasonable return on investment. At the same time, these systems and the data they contain must be protected from unauthorized disclosure, modification, or loss.


Security professionals are hired for their specialized knowledge in deploying and managing systems that provide defense in depth: multiple layers of independent security controls that reduce the exposure of these systems to security incidents, and reduce the impact of these incidents when they do occur. Likewise, business leaders bring a similar level of specialization to key business processes, but with a focus on maximizing functionality and performance; reduced overhead, increased throughput, and so on.

So if both expertise and incentives are cross-aligned, what is the solution? Split the difference? Each time a firewall rule change, or configuration exception, or other deviation from best practices is under review, flip a coin? Well, not exactly. Compromise is important – but not to the exclusion of understanding the forces at work in the situation.

There are a lot of ways to represent this, but in the interest of promoting “Green IT” I’m recycling a few things from my microeconomics classes. The graph below shows the financial impact of securing an IT system. The vertical axis represents profitability; higher is better. The horizontal axis is a continuum: the left side represents a high degree of functionality, but lower security. Moving to the right involves adding layers of security controls, which in turn reduces the functionality and efficiency of the system from the perspective of the end user. The semi-circle on the graph is a benefit curve, which shows what happens to profitability as more controls are implemented. Moving from left to right, increasing protection up to point “A” makes the company more secure and more profitable. Functionality begins to decrease, but the value of protection over the long run pays for itself...up to a point. Eventually, adding “more security” begins to frustrate end users and slow business processes. And at point “B” the company is more secure, but worse off.
Ideally, leadership will recognize the trade-off that maximizes profitability, work to reach point “A,” and when they get there, stop. If the company finds itself at point “B,” exception requests which greatly ease the business process without significantly eroding the quality of protection should be approved until point “A” is reached.

If it's this obvious, then why does the process break down? Typically, security managers have more experience finding risks than business opportunities, and are rewarded for decreasing the former, rather than increasing the latter. Perhaps it's written into the annual goals this way:
  • Manage information security threats (30%)
  • Define security architecture, direct daily operations of staff. (50%)
  • Support financial targets of company (20%)
In this scenario, security incentives outweigh profitability incentives by a 4:1 ratio.

So the second illustration below shows how a security manager might evaluate different levels of functionality and protection. The curve, “U”, that runs from the top left to bottom right of the graph represents the trade-offs between security and profitability that a manager is willing to make. At any point on the curve, the security manager is indifferent (equally satisfied) with a given mix of security and profitability. The point at which the indifference curve "U" touches the profitability curve is the point a security leader sees as optimal.

From the shape of this curve, to accept low levels of security, the organization has to be exceptionally profitable. Moving to the right, a security manager might be willing to continue locking systems down even when there is a measurable profit impact.

And finally, one last graph below. Consider a business manager who understandably wants to maximize functionality, specifying requirements for a new customer-facing application. Business requirements put the tradeoff at “X” while the security chief pushes for “B”. Point “A” again represents the maximum benefit to the company. Sometimes “X” is closer to “A”; other times “B” is. So how do you determine where you actually are, and then make the improvements needed to get closer to “A?”
Risk Governance
IT governance processes, if properly designed and well managed, can be a huge help in bridging the natural divide between specialized experts with widely differing preferences. While it’s important over the long run to teach security professionals the fundamentals of the business, and equally important to have business leaders recognize the impact of security vulnerabilities, the reality is that rational decision makers will be influenced most strongly by the incentives that directly apply. Or as they say in the political realm: “where you stand depends on where you sit.” But back to Sloan -- what really matters is the shape of the curve, and how well the governance group understands it. Where is the “A” investment, and given the available architecture and implementation choices, how close to “A” are the various alternatives?

The governance process should seek to draw out all of the pieces of the proposed solution: what are the key components of the business process? Which elements are the most important contributors to the business value produced? What are the constraints? Likewise with security: what configuration requirements, administrative overhead, monitoring capabilities or other concerns are involved?

Without a sense of the size and shape of the benefit curve, and the location of various options on it, decisions will be based on the relative political strength of the participants. It's available to do better than that. While it is always going to be difficult to tell if you’ve actually reached “A,” it can be very apparent that you’re doing better than “X” or “B.” And if that decision comes at the cost of some challenging discussions, it’s a debate worth having.

Friday, February 20, 2009

The next 12 months

Yesterday at the Chicago ISACA meeting I had the opportunity to hear Dave Ostertag from Verizon walk through the 2008 Verizon Data Breach Investigations Report, point by point. At the time of publication, the report included over 100 data points from 500 cases, but the base is now up to 700 cases and still more interesting patterns in the data continue to emerge.

The report is 27 pages long, but it informs an information security strategy by simply and persuasively answering one simple question: “What changes can I make in the next 12 months that will significantly reduce the likelihood and impact of a security incident in my organization?”

Across all the activities lumped under the banner of information security, Verizon found that a surprisingly small set of outcomes (or more accurately, the absence of these outcomes) mattered most. The survey lists nine recommendations, but I’ve re-worded and consolidated them a bit here:
1. Execute: ensure that security processes implement the identity management, patch management and configuration management basics. From the survey: “Eighty-three percent of breaches were caused by attacks not considered to be highly difficult. Eighty-five percent were opportunistic…criminals prefer to exploit weaknesses rather than strengths. In most situations, they will look for an easy opportunity and, finding none, will move on.” In contrast, among poor-performers, “…the organization had security policies … but these were not enacted through actual processes…victims knew what they needed to do … but did not follow through.”
2. Inventory, segment and protect sensitive information: “Sixty-six percent of breaches involved data that the victim did not know was on the system.” Know where critical data is captured and processed, and where it flows. Secure partner connections, and consider creating “transaction zones” at the network level to separate baseline business activities from high sensitivity environments.
3. Increase awareness. “Twelve percent of data breaches were discovered by employees of the victim organization. This may not seem like much, but it is significantly more than any other means of internal discovery observed during investigations.”
4. Strengthen incident handling capabilities. Monitor event logs, create an incident response plan, and engage in mock incident testing.

Steps 1 and 2 reduce the likelihood of an incident; steps 3 and 4 primarily reduce the potential impact by decreasing the time lag between an intrusion and its eventual identification and containment.

As for step four, my first thought is that mock testing won’t be much of a need for most incident response teams because of the natural cycle of event monitoring, suspected incident reporting, and initial response to events that are often false positives. Organizations that promote active reporting of suspicious events, and who treat each one as an actual incident will have much of the practice in a live setting that mock drills would otherwise offer. Instead of trying to prevent false postitives from occurring, an IR team should work to become more efficient at quickly ruling them out. As they do, the threshold for activating an initial review will drop, and ultimately they’ll catch more events closer to the time of occurrence.

It’s still a good idea to ensure that all stages from identification through remediation and recovery are fully practiced, but in general achieving containment quickly reduces the number of records exposed, and thus the eventual full cost of the breach.

Which brings us to next steps for Verizon; it seems that they’re now working on developing an incident costing model. This will be huge, because without it, organizations will continue to struggle with how to set specific protection goals that align with their cost structure and business strategy.

As an example, the survey looked at four sectors. Retail was one that contributing a sizeable amount of data (which is a polite way to say they got hacked a lot.) No surprise that simple survival is usually a bigger concern than security for many retailers: net profit margin among publicly traded companies in this sector often ranges between two and six percent. An additional dollar spent on physical security needs to be matched by up to $25 in additional sales … just to break even. Considering the wholesale cost of merchandise, it’s understandable why management accepts the risk of physical theft, formally accounting for it as “shrinkage.”

Unfortunately, while this mindset towards risk carries over into the electronic space, the analogy doesn’t. A dollar lost to computer crime, either through the cost of the incident itself, or the cost of organizational response, comes straight out of profits. It’s a much more damaging effect.

But, without a clear measure of the cost of an incident, the value of steps 1-4 to the CFO are murky at best. It doesn’t need to stay this way: calculating the direct and indirect handling costs of an incident isn’t a terribly difficult exercise, and most organizations already have the data needed to put it together. At JMU I started down this path with Dr. Mike Riordan in his Managerial Accounting class, drawing heavily on Gary Cokins’ paper Identifying and Measuring the Cost of Error and Waste to frame the problem. We need a credible model backed by lots of data, and I’m really hoping Verizon is able to put it together.

As for the next 200+ cases, I can’t wait to see how they present the 2009 findings. To characterize the survey as “pathology” might be a bit strong, but I thought it was interesting to note Dave’s background as a former homicide investigator. During the live session, you get some answers to the “so then what happened?” questions that the report doesn’t touch.

On our end it may feel like a never ending battle, so it’s good to talk to someone with a broad view of what is going on internationally. It’s more than a little comforting to learn how much progress is being made in locating and taking legal action against the bad guys…

Tuesday, February 10, 2009

Change as a catalyst for security

IT Budgets are expected to be flat for just about everybody in 2009; IT security spending will likely be the same. After years of relatively strong management support this may seem like a setback, but I’m convinced that the proverbial glass is still at least half full.

Even if new security technology rollouts are being delayed, that doesn’t mean the entire organization is standing still. Management faces pressure on revenues and costs, and they’re going to be very active pursuing any and all strategies to make improvements in both of those categories. These pressures are going to drive change, and change can become a powerful catalyst if you can influence the organization to address security issues opportunistically.

There are two keys to an opportunistic security strategy: first, a thorough understanding of the gaps in administrative, technical and physical controls across the enterprise. And second, an equally sound understanding of how to produce better security as a side effect of operational improvements.

As an example, the Visible Ops Handbook describes high performance organizations which have gained control over their change management processes, boosting efficiency. More importantly, “by putting in controls to find variance, they have implemented preventative and detective procedures to manage risk.” Security is a side effect; an externality of operational improvements.

The output of security control gap assessments effectively becomes a shopping list for an opportunistic security manager. Once you start looking at security as a positive side effect, there are at least four main opportunistic strategies available:
1. Attrition: retire systems with known gaps. Network gear with password length / strength limitations? Applications on end-of-life operating systems? Security won’t drive these retirement decisions – but it makes a good tiebreaker.
2. Relocation: consolidate critical systems from environments with low control coverage in areas with better protection capabilities.
3. Extension: broaden the asset base addressed by compliant platforms as an overlay, reducing configuration diversity and streamlining support costs.
4. Outsourcing: When transitioning, fully document procedural controls that were informally implemented, but not consistently.

Visible Ops describes the mechanics of strategies 3 and 4, but in a different context. They’re two instances of a common theme: quality and control make a strong foundation for both security and cost efficiency. Some organizations will be better positioned to take an opportunistic approach in 2009. A lot depends on the manager, but there are other factors that will also play a significant role:
1. Metrics maturity: does the organization have an objective view of control coverage and control strength?
2. Communications: Accountable system owners and project sponsors need to be aware of the current state of protection, and the expected effects (benefits) of proposed changes.
3. Line of sight to business objectives: how does coverage and exposure impact profit and loss?
4. A significant volume of organizational change.
5.Operational flexibility and creativity
to modify projects, ensuring that opportunities to improve security are incorporated.
6. Continuous improvement: once a change has been made, capture and replicate it. And just as important: make sure that subsequent change in these environments do not reopen old vulnerabilities.

“Progress, of the best kind, is comparatively slow. Great results cannot be achieved at once; and we must be satisfied to advance in life as we walk, step by step.”
--Samuel Smiles [Scottish author, 1812-1904]

Thursday, February 05, 2009

Assessing Enterprise Risk with forensic tools

There’s no need for FUD (fear, uncertainty and doubt) or guesswork when making the case to management for improving the protection of sensitive information. A serious incident or close call is often the most effective form of persuasion, but it’s not the most desirable. Ironically, forensic investigation tools can be just as useful in preventing incidents as they are in responding to them. But the key is how they’re used. To make the case for change, build on a foundation of reasonably sized data samples, transparent criteria for characterizing results, and focus on the decisions these data are intended to support.

For example: in the 2008 Global State of Information Security Survey, authored by CSO Magazine, CIO Magazine and PriceWaterhouseCoopers, 54% of executives surveyed admitted that they did not have “an accurate inventory of where personal data for employees and customers is collected, transmitted or stored.”

Organizations that don’t normally handle personal data in the course of business might not put the risk of sensitive information loss high on their priority list. Businesses that routinely process high volumes of sensitive information may reach the same conclusion if they feel confident that all systems are consistently protected with highly restricted access. But in either case, without knowing how many copies of these records have been created and shared across end user systems--over the course of several years—a blind decision to either accept or mitigate this risk is likely to be off the mark.

Enter the forensic investigator, often overworked, with relatively little down time to spare. Armed with forensic tools and a basic understanding of what and how much to measure, they can provide a compelling case for decision makers without the expense of a huge data gathering exercise.

With sample results from 30 systems chosen at random, using predefined search strings that are applied the same way to each search, you can get a good feel for the scale of the problem with a reasonable margin of error, where reasonable is defined as: “precise enough to support a decision, while maintaining confidence in your conclusions and credibility with your audience.”

Consider a company of 40,000 employees, with no prior formal assessment of how much sensitive information is on its end user systems. Even a basic estimate would be a huge improvement in understanding the problem. Using output from this online calculator, the table below shows the confidence interval for sample proportions that range from 0 to 6 out of 30, and an estimate of the fraction of the 40,000 that these results most likely represent:



So if it turns out that 5 of the 30 systems from across the company contained sensitive information, you could reasonably conclude that up to 12,000 systems are affected. Is this too much risk? Depending on the threats and current protection capabilities, it could be. It may justify putting more education and enforcement behind a records retention policy, strengthening access controls and account reviews, or implementing a data loss prevention (DLP) solution.

One word of caution: while the initial sample showing 5 out of 30 may make the case for an awareness campaign, a second random test several months later with another small sample may not definitively show that things are improving. If the second sample shows 6 out of 30 (20%) still contain sensitive information, this sample proportion is within the margin of error of the first assessment (9% to 31%). That is, with a population of 40,000 end users, you’re about as likely to get 6 out of 30 as you are to get 5 out of 30 in a random draw. However, if you get zero out of 30 – then you’re much more likely to have achieved a (statistically) significant improvement.

How much more likely? To test against a threshold, use this calculator: http://www.measuringusability.com/onep.php.