Proving reachability: Static analysis vs Runtime detection

Static program analysis (static analysis) is a technique to analyze software source code, configuration, and its Bill of Materials statically without executing or deploying the application or service. In contrast, runtime detection is performed by tracing and monitoring the behavior of an application or service after building, deploying, and executing them in an appropriate environment.

Software Composition Analysis tools and security platforms that employ Reachability analysis can help reduce the noise and identify prioritized and actionable security vulnerabilities with mitigation readily available. We rightly see multiple platforms offering this capability. The big questions for prospective buyers and users are: “Should we go static or runtime or both?” “How do we know which of these platforms are good,” and most importantly “What good even means?”

In this blog, we dive deeper into each principle of software security analysis, specifically focusing on proving reachability, and critically review both static program analysis and runtime detection. For each principle, we grade and determine which technique outperforms the other, providing a comprehensive evaluation of their effectiveness in identifying and mitigating security vulnerabilities within applications. Through this comparative analysis, we hope to give insights into the strengths and weaknesses of each method, allowing for informed decisions when choosing security solutions.

Soundness

Any analysis and detection technique, whether static or runtime must be “sound” by design. A reachability analysis technique must clearly show the flow of adversary-controlled data or calls via the software across all its layers and modules (inter-procedural and inter-package). Typical, grepping techniques fail the soundness test since the results produced could be considered as just luck based solely on the presence of certain words and combinations of them. Some static analysis tools that utilize simplistic data-structures such as Abstract Syntax Tree (AST) or Concrete Property Graph (CPG) could yield sound results however may not work well in real-world software that use advanced techniques such as dependency injection, dynamic loading, event-driven programming. These misses are called false negatives, which is a problem that keeps all developers in security companies awake. Runtime detection also suffers from false negatives since the behavior of software, especially dynamic languages, could differ based on the deployed environment and the usage patterns.

Measuring soundness

The good news is that accuracy of reachability analysis implementations can be measured using precision-recall. While this deserves a blog on its own, precision tries to answer, “What proportion of positive identifications was actually correct?” while recall answers “What proportion of actual positives was identified correctly?” The platform can be benchmarked against a series of test applications that mimic a range of scenarios, with precision-recall being computed both on an individual application level and for the entire category.

(Credits: https://www.thaipng.com/png-0b98p9/)

Winner: Static analysis

Explainability

“Explain how you found this”

Explainability is a critical metric when evaluating any analysis technique. The results must be “explainable” to a human using parameters and evidence that can prove the finding beyond doubt.

Let us look at some examples where the finding could be valid but not explainable.

Example 1: A grep tool showing SQL Injection vulnerability

It is possible to grep for a word such as “query” and look for “+” keyword and show the finding as a vulnerability. When we push the tool to explain how it knows that the “query” method is from a SQL library and that the input to these methods could be controlled by an adversary externally, often these tools might fail to explain these beyond doubt.

Example 2: A runtime tracing tool showing a call stack

By utilizing techniques such as tracing, runtime detection tools can be used to observe the call stack and thread dumps. However, when we try to reason with the tool and ask it to explain why a given call stack was produced and why a given call was not present in a call stack, these tools will fall short.

Example 3: Static analysis tool showing a reachable flow

Static analysis can be used to demonstrate a flow that originates from a source and reaches a "sink" that could belong to third-party libraries with and without known vulnerabilities. The same technique can be used to explain why a flow does not exist between an arbitrary source and a sink. Explanations can also be iterative.

“Given this flow is valid, identify other similar flows that are also valid.”

“Given this flow is sanitized and mitigated, identify other similar flows that are also sanitized.”

Explainability is a criterion where static analysis is superior and outperforms runtime detection. Research is performed to improve runtime detection with type inferences and grammar, but at this point, static analysis has a clear edge.

Winner: Static analysis by a huge margin

Business logic flows

Closely related to explainability is semantically understanding business logic flows. For example, assume the pseudo code below.

When the user enters a coupon code, make an API call to rewards service using open-source Package A.

When the user clicks the pay button, make an API call to stripe service using open-source Package B.

With static analysis, we can effortlessly compute all combinations of flows and easily arrive at the decision that both Package A and Package B are utilized and reachable.

With runtime detection, this is entirely dependent on the human QA and automation tests. For example, if both the human and the tests miss the scenario to “enter a valid coupon code,” then Package A might get incorrectly treated as not-reachable so a false negative.

Winner: Static analysis by a huge margin

Depth of flows

“How deep into the call or dependency tree can you go”

Let us begin by stating that runtime detection techniques have the advantage of having access to the full call stack and thread dumps. With static analysis, tools must go beyond AST and Control-Flows to compute program dependency graph (PDG) and data dependency graph (DDG) to improve the call tree depth. Depending on the complexity of the application, these data structures might still not be enough to compute the full depth of calls and data flows.

There is ongoing research into improving static analysis by using promising intermediate representations and static rewriting.

Is depth important for reachability analysis?

The answer is yes and no! There are some commercial platforms such as Backslash that have identified a solution to the depth problem by tracking calls to transitive dependencies.

Continuing to wear an open-source developer hat, firstly, we have the dependency tree in the SBOM (Software Bill of Materials), so the information is present. Then, from a prioritization perspective, it is best to start with actionable direct results first. Often, an update to a direct dependency would automatically update and mitigate the issues reported in the transitive dependency. However, there could be situations where this is not feasible or applicable.

Winner: runtime detection but gaps are closing

Return-on-Investment (ROI)

Multiple research projects have proven the ROI and productivity benefits of fixing software bugs early in the lifecycle. With static reachability analysis, vulnerabilities that require real attention with suggested mitigation could be presented to the dev teams for early remediation. This approach would beat the runtime detection approach that mandates deployment, tracing, and real execution of software which is both slow and time-consuming.

When it comes to static analysis, the age-old question about "False positives" continues to persist. While not false positives are bad (separate blog topic), with reachability analysis we focus on specific adversary reachable flows that can reach and exploit the given SCA vulnerabilities. Such analysis has a higher accuracy (and low false positives) when compared to generic SAST approaches that aim to identify as many software vulnerabilities as possible across categories.

Winner: Static analysis

Closing thoughts

As proponents of static analysis, we are a bit biased about our technique of choice. However, we have attempted to keep bias out of this blog and explain the criteria that can be utilized to understand and measure the relative strengths and weaknesses of static analysis and runtime detection.

About the author

Prabhu Subramanian is an expert in application security and is the creator of open-source projects such as CycloneDX Generator and OWASP depscan. He is focused on improving the state of supply chain security across open-source software.

‍