Tuesday, January 27, 2009

Making the most of forensic downtime

As a computer forensic investigator, at times the caseload can become a bit overwhelming. Sometimes the requests come pouring in; other times the queue will be empty. Looking back through several months of work, you can put together a reasonable estimate of the average arrival rate and average completion rate for forensic investigation requests. Armed with these two pieces of data and some equations from queuing theory, you’ll be able to estimate the amount of cumulative non-investigation time that will likely be available for other tasks over the course of a year.

Naturally much of that down time should be spent “sharpening the saw;” maintaining tools and scripts, etc. But as discussed in the last post, it may also be helpful to leverage those forensic tools and skills to measure risks in the end user environment. Taken periodically, these measurements support the execution of a security strategy by providing the evidence needed to drive changes that will ultimately reduce the frequency and impact of incidents that do occur.

Queuing theory can become complex in a hurry, but there are a few formulas that are easy to use and very helpful if you make a few reasonable simplifying assumptions. For the long version, check out Contemporary Management Science by Anderson, Sweeney and Williams. Or an online excerpt of the relevant chapter is available here.

If you know the average arrival rate of requests, and have a good feel for how long it takes to complete the typical investigation, you can calculate the following:

1. Probability of an empty queue with no requests in process:
1 – (arrival rate / service rate)
2. Average number of pending requests:
((arrival rate) ^ 2) / (service rate * (service rate – arrival rate))
3. Average number of investigations in the systems:
Average number of pending requests + (arrival rate / service rate)
4. Average time a request has to wait before the investigation starts:
Average number of pending requests / arrival rate
5. Average resolution time; from initial request to completed resolution:
Request wait time + (1 / service rate)
6. Probability that a new request has to wait for service:
Arrival rate / service rate
7. The probability of N number of investigations and requests that are in the system at a given point in time:
(arrival rate / service rate)^N * probability of an empty queue

These equations provide a good approximation if the following assumptions hold true:
1. For a given time period, you’re almost always going to get between zero and 2 requests (i.e. 95% likely) and only rarely do you get a bunch of requests (5% chance of 3 or more requests arriving at once).
2. Few service requests will take significantly than the average service time to complete.
3. You’ve got one investigator servicing requests – a “single service channel.”

So, as an example, suppose the average request arrival rate is about 3 cases per month, and an investigator can complete about 4 cases each month. Calculate expected downtime using this proces:

First, convert the monthly numbers to a weekly rate, 3 arrivals per month is a 0.75 weekly arrival rate, and 4 completions per month is a 1.0 service rate. Then, plug and go:

The probability of zero requests in queue is:

1 – (.75 / 1) = 0.25

A 25% chance of having a week with no requests in progress.

So in this “best case” scenario, roughly 25% of an annual 2000 hours worked won’t be directly allocated to investigating live cases; 500 hours. In reality, this estimate is almost certainly too low for a couple of reasons. First, management isn’t likely to over-staff a forensic role; demand will rise to fill that capacity, and inevitably many long-running difficult cases will come up that fall outside the average completion rate by a big margin. And investigators will need to factor in time for script development, system administration, and other tasks.

Assuming that a residual 200 hours (10%) of time remains available throughout the course of the year, this can provide the perfect opportunity to quantify policy compliance against specific goals.

So how much can you do with 200 hours? Turns out, quite a bit.

Saturday, January 17, 2009

Four things to do with computer forensic tools (besides forensics)

When staffing an internal computer forensics capability for an organization, management needs to determine how to balance capacity with demand. At the extremes, you either have a backlog of cases waiting on available investigators, or investigators waiting on requests for support.

Even under the best of circumstances, the investigative caseload won’t follow a regular schedule and some amount of downtime is inevitable. Forensic analysts will need to spend some of that time putting together hash sets, updating scripts, evaluating new tools and doing all of the other arcane tasks that go along with keeping pace with the changing needs of the function. But for an IT risk manager, if you can tap into it, unused forensic capacity is an asset that can be extremely helpful in other contexts as well. Here are just a few examples:

1. Identify the prevalence of sensitive information on end user systems. Because they’re fast, thorough, minimally disruptive and often support remote data capture, forensic tools can help determine the “hit rate” of confidential documents across a randomly selected cross-section of the end user environment.
2. Measure the compliance rate against system usage policies. A scan of Internet usage can show the proportion of systems accessing content that poses a risk to the organization and/or its users. Over time, the amount should decrease if the training and awareness efforts are having an effect.
3. Estimate the amount of data at risk that is not being backed up. Depending on the architecture, this may be a bit more difficult to determine. A comparison of data files created or edited locally that are outside of backup routines will give a good sense of the amount of work lost each time a hard drive crashes, or a laptop is stolen.
4. Identify the level of unauthorized configuration changes. How long is the screen saver timeout supposed to be? What applications or changes are not allowed on a standard system build? This is less of an issue in organizations where IT has locked down the desktop. But where this is a contested issue, actually quantifying the impact can show the best tradeoff between control and usability for a given department or organization.

It goes without saying that nobody likes to be investigated. If the purpose, scope, approach and usage of this information isn’t spelled out in advance (i.e. good-faith random anonymous survey, not a warrantless wiretap) and communicated with the proper level of support, it’ll be the last time you get to try using forensic capabilities to tune security policies and practices.

But let’s face it – all of the critical data in any business either originates or is viewed from an end user system, which is often the least-defended part of the environment. Attackers realize this, and end user systems will always be a popular target. Unless you know what your exposure is, you won’t have a good understanding of what your policies and protection capabilities should be.

Saturday, January 10, 2009

Getting privileged accounts under control: spend less time finding, more time fixing

Are there too many privileged accounts on the business critical systems in your organization? If you suspect so, how would you find out, and how would you energize the leadership in your organization to act? And once you get management endorsement, what number would you set as the maximum allowable number of accounts on a system as a benchmark for non-compliant system owners to shoot for? You'll want all owners to verify compliance, but would a positive response from 50% of those owners justify the call to action?

Perhaps most important of all, after driving this change and moving on to the next problem, will you have the time and resources needed to follow up later in the year and make sure that the problem hasn’t reappeared?

As with any security issue, a small amount of effort should go into finding the problem, and the majority into solving it. To paraphrase Tom Clancy from Into the Storm: “The art of command is to husband that strength for the right time and the right place. You want to conduct your attack [in this example, on the problem] in such a way that you do not spend all your energy before you reach the decisive point." (page 153)

Using a tool like dumpsec for Windows it doesn’t take long to pull group memberships remotely from any given system. But if you’re dealing with hundreds or even thousands of systems, well, that’s a lot of energy to spend before reaching the decisive point, i.e. when system owners start removing excessive accounts.

Intuitively, it makes sense that you wouldn’t want to poll every system in a large environment. Instead, you’d take a sample. But how big of a sample is needed for you – and senior management – to be confident that you know the current state?

Turns out, you (and your boss) can be 90% confident of knowing the median number of privileged accounts on all systems across the server population if you start with a randomly selected sample of 18 systems. And because by definition the median is the middle value, you know that half of the systems are above the sampled value. If this value is too high based on the risk requirements of the environment, you can set a compliance goal such as “reduce the number of privileged accounts on each Windows systems to X by the end of the year.”

To find the median, follow these steps:
1. Pick 18 systems at random across the system population. Dump the list of users with privileged access from each system.
2. Arrange them from fewest to most accounts.
3. Throw out the lowest six and the highest six values, and keep the middle six.

The median number of privileged accounts will be between the low value and the high value of the middle six numbers out of the sample of 18.

For example, if I dumped the local admins group across a set of systems, I might get a result like this. (The “middle six” values are highlighted in bold):

49, 23, 17, 33, 17, 16, 28, 14, 29, 40, 12, 44, 34, 12, 25, 9, 10, 32**

So based on this sample, the median number of privileged accounts across all systems are 90% likely to be between 17 and 29. Granted, due to the architecture certain accounts may be present across all systems. And other factors may help determine if 29 is too high … or 17. But once you decide, you have a baseline value that defines the boundary between acceptable risk and excessive access, which can be communicated across the organization.

Once you’ve gotten buy-in and communicated the requirement, each system owner who wasn’t sampled can compare and confirm that they comply. And in keeping with Clancy’s principle above, only a fraction of your time was spent identifying the problem and communicating it: the rest goes in to helping fix it.

But why does 18 work? Where does the 90% confidence come from, and why throw out the bottom six and top six values?

Doug Hubbard explains it in Chapter 3 of his book “How to Measure Anything.” And while this isn’t a specific example in the text, there are a lot of intriguing applications to information security that he does cover.

Hubbard introduces the idea of finding the median from a small sample as “the rule of five:”

“When you get answers from five people, stop…Take the highest and lowest values in the sample…There is a 93% chance that the median of the entire population … is between those two numbers.” Why? “The chance of randomly picking a value above the median is, by definition, 50% -- the same as a coin flip resulting in “heads.” The chance of randomly selecting five values that happen to be all above the median is like flipping a coin and getting heads five times in a row.(pp. 28-29)”

In other words: 0.5 x 0.5 x 0.5 x 0.5 x 0.5 = .03125 With a random sample of five, there’s only a 3.125% chance of being above the median all five times, and the same 3.125% chance of being below the median all five times. So each time you take five random samples, you’re going to get values on both sides of the median 93% of the time -- the median will very frequently be between your lowest and highest value.

So if five samples gives you 93% confidence, why take 18 samples? From the example above, if you picked the first five at random and stopped, you would have found this:

49, 23, 17, 33, 17

With 93% confidence, you’d be able to assert that applications contain between 17 and 49 privileged accounts. With small samples randomly chosen, high confidence comes at the expense of intervals that are often quite wide. And in this case, it may be too wide to be useful. But picking more samples and tossing out six of the lows and six of the highs retains roughly the same level of confidence in the middle six, with the advantage of a much smaller range between the low and high values. And it’s the smaller range that allows you to understand the state of the environment, and set a credible level of improvement that the organization can meet.

More info I found useful:
How to Measure Anything http://www.howtomeasureanything.com/ Lots of gems on the site; check out the PowerPoint on measuring unobserved intrusions in information systems.

Confidence intervals for a median, with different size samples: http://www.math.unb.ca/~knight/utility/MedInt95.htm

**These numbers were generated by Excel; try it out for yourself. For this example I used the formula =5+(40*RAND()) to give a higher starting value than just "1."

Sunday, January 04, 2009

Security career snapshot - January 2, 2009

Now that the holiday break has ended and everyone is heading back to work, it seems like a good time for information security professionals at every level to take stock of available opportunities and chart a course for the new year.

Is it safer to stay put, or move?

While there's an abundance of forecasts available that predict where 2009 is headed, most are discouraging, few will turn out to be correct, and there doesn’t seem to be a method for sorting between the good and bad estimates that’s any more trustworthy than the estimates themselves.

Instead, I'd argue that it makes more sense to take a second look at the current role, the financial health of the organization, external opportunities, and the stability of the regional and national economy ... and plan according to current actualities.

To cut through that uncertainty, I spent some time over the break going through online job postings to compile a snapshot of security jobs that are currently open and available. I looked at job titles, years of experience required, expected regulatory / compliance background, certifications, and the most active hiring locations. This snapshot won’t show hiring trends for 2009, but my hope is that it’ll at least make a decent starting point for figuring out where the holes in the resume are, and which types of work assignments today may open doors for the next role.

I started with a query of security jobs using an aggregator site, and randomly selected a subset of 200 for analysis. I downloaded each full post directly from the offering website and parsed them locally using some scripts. Below are some of the high points. The margin of error on the survey should be plus or minus 7%. If you want a detailed look at the approach, or the data itself, just drop me a line.

Here’s what I found:

Most common job titles
A bit less than half of all security job openings are for the role of engineer, analyst, or administrator. Managers jobs appear less than 5% of the time, and director level only 1%.

Without more information it's tough to be definitive, but the numbers could imply a couple of things: first, that security organizations may be flattening right now as managers hire more staff; and second, that “individual contributor” roles may have more mobility across organizations than leadership positions. It’s also possible that management roles are filled through other means (internal candidates, etc.) more frequently than staff positions are.

Position title Number of postings Percent

Years of experience expected for each role
Across all positions, five years was the median level of experience required. Only 30% of positions expected two or fewer years of prior relevant work history. One interesting fact was that out of 41 postings with a specific requirement, that requirement was described 21 different ways (e.g. 1 to 4 years, 2 or more, 4-6 years, etc.) It seems the industry has generally standardized on which certifications and skills are expected, but not the level of experience associated with those skills that represent appropriate minimum requirements.

Years of experience requiredNumber of job postings
0 to 13
2 or more10
3 or more3
4 or more2
5 or more12
6 or more3
7 or more2
8 or more1
9 or more1
10 or more5

Most common regulatory / compliance keywords
Not every posting specifically cited regulatory requirements or security framework experience. But for those that did, the following are the most commonly listed:

Regulatory or governance requirementNumber of postings
Federal Information Security Management Act (FISMA)14
Code of practice for information security management (ISO 17799/2701/2702)12
Sarbanes-Oxley (SOX 404)12
Payment Card Industry Data Security Standard (PCI DSS)12
Health Insurance Portability and Accountability Act (HIPAA)7
Gramm-Leach-Bliley Act (GLBA)3

Most common certifications
As of early 2009, candidates with a security certification have an edge over non-certified candidates, but certification is not usually a make-or-break requirement. Less than half (47%) of all security job postings examined had listed certification as a requirement; around 20% described certification as “required” or “highly desirable.”

CISSP is the most commonly listed credential, although it often is provided as one of several examples e.g. “Professional security certification such as CISSP, CISM, GIAC, CCNA, CCSP, CCNP, MCSE, Security+, Network+.”

Security Certification (n=94)Number of postingsPercent
Certified Information Systems Security Professional (CISSP)48(52.7%)
Other (Cisco, etc.)12(13.2%)
Certified Information Security Manager (CISM)11(12.1%)
Certified Information Systems Auditor (CISA)10(11.0%)
SANS Global Information Assurance Certification (GIAC)10(11.0%)

Most active hiring locations
Finally, the top ten states (and Washington D.C.) listed by frequency of job posting:

StateNumber of postings (n=200)
Washington D.C.17
New York8
New Jersey6

So if you're a Security Engineer with a CISSP and five or more years experience in your current role, with a strong background in FISMA, SOX and ISO 17799 who lives in the Washington D.C. area ... relax ... even in the midst of this economic mess, it looks like the world is still beating a path to your door. For the rest of us, though, we probably have some work to do.

Best of luck to everyone trying to improve their skills and find the right organizational fit in 2009. I hope this was helpful; if you have questions about specific skills, opportunities or regions not listed in this overview that you haven't been able to ferret out using the job search engines - let me know and I'll help if I can.

Thursday, January 01, 2009

Twitter Security

A few weeks ago I decided to give twitter a try, following some friends and colleagues scattered throughout the Midwest. Like sets of data points on a time-series plot, it’s amazing to see patterns develop 140 characters at a time.

As with most things that are new, cool or interesting, I wondered if there was a practical way to translate the things that make twitter ‘work’ into something useful at the office.

A few months ago I put together a one page summary of key metrics my project team had gathered and sent it to a number of stakeholders throughout the organization. The response was decent, but not as strong as I’d hoped. As nice as it would be for facts to flow like electrical current throughout an organization, powering change, I needed to put a lot of follow-on effort into making sure the themes of the report registered with decision makers.

As an experiment in communications, I wanted to see if the size and frequency of the message could make the change process any easier. I decided to “twitter” a single metric from a follow on project to see if I could make a bigger impact by dialing down the content but increasing the frequency. To start, I sent a four line Email that put the metric in context along with a recommended organizational response. So far, the hit rate is up.

Not every security metric or message reduces down to one or two sentences. But for those that do, sharing status, concerns and recommendations in a “blackberry friendly” format seems to increase the likelihood that it’ll get read, and re-sent, gaining momentum throughout the organization.