I Introduction
Sidechannel attacks are practical and widespread [1, 2, 3]. A side channel arises if the attacker can infer the value of secret inputs (or some of their properties) based on public inputs, runtime observations, and the source code of the program. A typical example is an online pharmacy, where based on the number of rounds of interaction and the processing time, an eavesdropper can infer a user’s prescriptions on file.
We consider the setting where the secret input stays fixed across a number of interactions. This gives rise to functional observations: for a secret input, we observe the program on a number of public inputs. For a secret input , we thus obtain a partial function from public inputs to runtime observations. In this paper, we focus on timing side channels, where the attacker observes the running time of the program. However, our methods apply to other types of side channels equally well.
Functional side channels. We adapt the classical definition of noninterference to functional side channels, where two secret inputs and are indistinguishable for the attacker if the functions and are equal. However, in the presence of noise (a common situation for timing measurements), we cannot require an exact equality of functions. Instead, we define two functional observations to be indistinguishable when the distance between them is small. We show on examples that it is critical to choose the distance that corresponds to the characteristics of the environment noise, otherwise side channels might be undetected.
Problem. We focus on automatically discovering functional timing side channels in programs, and on pinpointing code regions that contribute to creating the side channel.
Algorithms. As functional timing side channels are hard to detect statically, and are clearly beyond the scope of current program analysis tools, we turn to dynamic analysis methods. We build on the results and algorithms from the theory of functional data analysis. We use functional data clustering
to find timing side channels and estimate their strength. It allows us to compute an equivalence relation on secret inputs that models the distinguishing power of the attacker. If this relation has multiple equivalence classes, there is an information leak. In order to find what parts of the code caused the leak, we identify what properties are common for secrets in the same cluster (equivalence class), and what separates the clusters. We consider the properties to be program internals such as methods called or basic blocks executed for a given secret value. We present a functional extension of
decision tree inference techniques to identify code regions that explain differences among clusters. These code regions are thus suspect of being a root cause of the functional side channel.Experiments. We evaluate our techniques on microbenchmarks and on seven larger case studies. The case studies serve to evaluate scalability and performance on realworld applications. These programs have up to thousands of methods, and we show that our tool is able to find side channels, and pinpoint their cause in the code in under minutes.
Contributions. Our main contributions are:

Definition of functional noninterference in the presence of noisy observations. We demonstrate functional side channels in programs that would be deemed informationleakfree using the standard (nonfunctional) definition.

Algorithms: We adapt existing theory and algorithms for functional data clustering to discover the existence of side channels. We develop a functional extension of decision tree learning to locate the code regions causing a side channel if there is one.

Evaluation: we show on microbenchmarks and larger case studies that Fuschia (FUnctional Side CHannel Investgator and Analyzer) is able to scalably discover (and locate in the code) functional side channels, including one that was since fixed by the original developers.
Ii Overview
First, we illustrate what an attacker can infer based on functional observations, even in the presence of noise. Second, we show how our tool Fuschia can help in detecting such functional side channels.
Iia Illustrating functional side channels
We consider the setting of server programs with public and secret variables. We focus on the (very common) situation where secret values stay constant for some amount of time (e.g., number of friends a person has on a social network). The eavesdropper can try to infer the secret value by observing public parts of a number of client requests and server responses (for instance, number of packets sent, and their size, can be publicly observable). Note that in our setting, the eavesdropper does not choose public values—these are chosen by the legitimate users that the eavesdropper observes.
Let us consider the classical definition of confidentiality:– noninterference. A program is unsafe iff for all pairs of secret values and , there exists a public value such that the behavior of the program on is observably different than on . If our observable is the running time , then: .
In our setting, however, we have functional data. For each secret value, we observe the running time of a program on a number of public values, and the noninterference then becomes: . In other words, the program is unsafe if the two secret values do not correspond to the same (partial) function of public inputs.
Side channels in the presence of noise. Quantitative observations of a program’s runtime behavior are often noisy. For instance, running the same program twice on the same machine results in different measurements of running time. Observing the program remotely adds further level of noise. Classical definitions of confidentiality properties therefore need to be adapted to noisy environments. In the noisy environment, no two observations are equal, and our definition needs to include tolerance: . In this definition, is a distance between two functions. The distance is suitably chosen, typically based on the noise expected for a particular use case. For instance, to recover (the spirit of) the classical definition, we can use the distance:
However, we now demonstrate that the distance is not the only option, and that depending on the type of noise, different distances are needed. In particular, we show that if we use distance , we could certify a program to be safe even though some secret information leaks.
Gaussian noise (pointwise independent, mean 0)
Consider the two functional observations (red and black) of a program in Figure 1. On the axis, we have public values, and on axis, there is the running time. The red function corresponds to secret value and the black function corresponds to secret value . The eavesdropper can produce this graph easily by trying all possible inputs on their machine before hand. At runtime, the eavesdropper collects the public inputs and the running time, and tries to learn the secret by matching the observed data to the red or black functions.
In this example, we assume that the noise for each pair of publicsecret inputs is independent and identically distributed to the noise for other inputs, and furthermore we assume that it is distributed according to the Gaussian distribution with mean
. Let us consider of ms, and then apply our definition with distance . We see that that the two functional observations are close for this distance, so the attacker cannot infer the secret value ( or ).However, the functional observations are clearly very different, and an eavesdropper can reliably learn the secret if there are many observations. Theoretically, this can be captured for instance by considering the distance to be the norm between the two functions, that is, the sum (integral) of the absolute value of the difference of the two functions over an interval of interest. Practically, this example shows that considering the supremum of the pointwise distance () is insufficient to detect all side channels, as the eavesdropper learns from all inputs they see, not just one of them.
Gaussian noise (pointwise independent, mean C)
Let us consider the case where the noise is again Gaussian, but with a nonzero mean. The threat model we consider is that the mean is fixed but unknown to the attacker. This case arises for instance if the eavesdropper is remote, and cannot determine reliably the delay introduced by the network and separate it from the noise caused by the remote machine.
Consider a program with two possible functional behaviors (red and black) pictured in Figure 2 (a), where the red behavior corresponds to secret value and the black behavior to secret value . Note that the attacker can obtain this graph by running the program for many inputs on their own machine.
At runtime, the attacker interacts with the remote server running the same instance of the application with a fixed secret value. The green timing function in Figure 2 (b) shows different values of execution time for different public input values obtained from the interaction with the remote server. Green function looks far apart from both local observations (black and red functions in Figure 2 (a)). However, due to the effect of remote observations, the attacker knows that the observation of running time is off by an unknown constant.
Therefore, the attacker is in effect observing only the shape of the function, i.e., its first derivative. The appropriate distance is therefore over the derivatives of the functions, for instance the norm of the difference of the two functions, as used in the literature [4, 5] to calculate the distance between green function with red and black functions. The eavesdropper can use this distance to calculate that the green function is closer to the black function than the red function (and if the difference is greater than , the value of the secret leaks).
We have shown situations where the distance is the norm, or the norm over the first derivative of the functions. In general, in this paper we will consider distances that are given by norms over th derivatives of functions.
IiB Debugging functional side channels with Fuschia
IiB1 Using Fuschia to debug Eclipse Jetty
We illustrate how our tool Fuschia can be used for discovering and explaining information leaks arising due to functional side channels. We analyze the Jetty.util.security package of Eclipse Jetty web server. The package has Credential class which had a timing side channel. This vulnerability was analyzed in [6] and fixed initially in [7]. Then, the developers noticed that the implementation in [7] can leak still leak information and fixed this issue with a new implementation [8]. We consider this new implementation and apply Fuschia to check its security. The final fix was done few months later [9] (but before we reported our finding to the developers).
Problem. The secret input is the password stored at the server, the public input is the guess by the user. The goal of the defender is to determine whether an attacker can infer properties of the (unchanging) secret if the attacker can observe the timing for many public inputs. Fuschia helps with this, and helps the user pinpoint the problematic code.
Side channel discovery. The defender starts by choosing a finite set of secret and public values. For this example, the defender uses libFuzzer [10], a popular evolutionary fuzzer. The defender chooses 800 different secret passwords and 800 different guesses (ordered by length). The lengths of passwords are at most 20 characters. For each secret value, Fuschia varies 800 different guesses and measures the execution time of the Jetty. Figure 4 (a) shows 800 different execution time functions (one for each secret value).
The defender provides the notion of a distance and the bound . In this case, we consider between the first derivatives, and the bound . Fuschia discovers 20 classes of observations. Figure 4 (b) shows 20 clusters detected by Fuschia. Since every cluster corresponds to a distinct class of observation, the defender concludes that there are classes of observations in Jetty util.security package that leak information about the secret via side channels.
Side channel explanation. Now, the defender wants to know what properties of program internal leak through the timing side channel and use this information for further analysis such as elimination of the leaks. Fuschia helps the defender with inferring what properties are common for secrets in the same cluster, and what separates the clusters. We look at program internals such as methods called or basic blocks executed during the executions with a given secret value. This part is done by extending techniques from [11, 12] to functional setting. It produces a decision tree whose nodes are labeled by program internal features, and whose leaves represent sets of secret values inside a cluster. Figure 4 (c) shows the decision tree model learned for Jetty. Using this model, the defender realizes that the executions of stringEquals_bblock_118 is what distinguishes the clusters from each other. This basic block represents the loop body of the for loop in the method shown in Figure 3. For instance, the green cluster (third from the bottom of the center diagram, bottom of the right diagram) corresponds to the case where stringEquals_bblock_118 is executed 3 times. More specifically, the cluster corresponds to the case where if the length of public input (userprovided password) is less than 3, stringEquals_bblock_118 is the length of public input, and if the length is greater than or equal to 3, stringEquals_bblock_118 is 3. Since the functions labeled with are obtained for the secret values with the length of 3, the defender realizes that the minimum of the lengths of the secret password and the userprovided password is correlated with the number of execution of the loop (or the basic block) inside stringEquals, and the minimum of the lengths is leaking through this basic block. Note that defender could establish this as Fuschia pinpointed the right function.
Remote observation. Now assume that the attacker has the same version of Jetty application and performs analysis similar to the one performed by the defender on his local machine. Consequently, the attacker has obtained the same Figures in 4. Now, on the remote machine that runs the same instance of jetty, the attacker wants to guess the length of the (fixed) password. The attacker sends the same guesses as his local analysis over the network and observes the execution time. The new remote observation is shown with green function in Figure 4 (a). Now, to get rid of additive noise effects, the attacker considers the first derivative of timing models and finds the closet cluster to green function among the discovered clusters. Figure 4 (b) shows the result of this analysis where it assigns the remote timing observations to pink cluster. Since this cluster corresponds to passwords with the length 10, the attacker correctly discovers the length of password running on the remote machine is 10.
IiB2 Inside Fuschia
We describe how Fuschia produces the diagrams. To produce the middle diagram of Figure 4, Fuschia converts the execution time of secret values over public guesses to timing functions in the domain of the public guesses for each secret value using Bspline basis [13]. Then, it applies a functional data clustering algorithm to discover different classes of observations in time (described in Section IV). This clustering algorithm is nonparametric functional clustering [4] that works in two steps: first, it applies the norm distance function over every pair of functions to summarize the distance between them in a distance matrix, and then it uses constrained clustering algorithms to group similar functions together (described in VA).
For explaining the clusters and producing the decision tree in the right diagram of Figure 4, Fuschia executes an instrumented version of the program with Javassist [14] on the same inputs as before and obtains the basic blocks taken for each secret value over the public input values. Therefore, for each secret value, the evaluation of a basic block is a function in the domain of public input. We label the functions with categorical values and use offtheshelf decision tree learning algorithm. The tool learns a decision tree model that discriminates each cluster of timing functions based on the basic block calls (described in Section IV).
Iii Problem Statement
We develop a framework for detecting and explaining information leaks due to execution time of programs and we focus leaks due to functional observations by the attacker.
Iiia Threat Model
We assume that the attacker has access to the source code of the application and she can sample executiontimes for the application arbitrarily many times on her local machine with different combinations of secret and public values. As a result, she can infer an arbitrarily accurate model of the application’s true execution time. During an attack, the attacker intends to guess a fixed secret by observing the application on a remote machine. These remote observations, however, may not correspond to the timing model inferred by the attacker because of several factors, such as, i) network delays and noises, and ii) masking delays added by the administrator to every response (potentially as a function of public inputs) to mitigate the side channel. We assume that the attacker knows an upper bound on the mitigation model, but she does not know the specific model parameters.
IiiB Timing Model and Functional Observations
Let and be the set of reals and positive reals. Variables with unspecified types are assumed to be realvalued.
Definition III.1.
The timing model of a program is a tuple where:

is the set of secretinput variables,

is the set of publicinput variables,

is a finite set of secretinputs, and

is the executiontime function of the program as a function of secret and public inputs.
Functional observations
A functional observation of the program for a secret input is the function defined as . Let be the set of all functional observations. In order to characterize indistinguishability between two functional observations, we introduce a distance function on functional observations, for , defined as:
where represents th derivative (wrt ) of the function and th derivative is the function itself. The distance function corresponds to the norm distance between th derivatives of the functional observations. Given , we say that secrets and are indistinguishable (or indistinguishable when is clear from context) if .
Depending upon the context, as we argued in the previous section, different distance functions (and different bounds ) may be applicable. For instance, the distance between first derivatives may be applicable when the shape of the functional observation is leaking information about the secret (while observations themselves may be arbitrarily close) and the second derivatives may be applicable when the number of growth spurts in the observations is leaking information. Similarly, in the situations where the attacker knows the mitigation model—say temporal noises added to the signal are th order polynomials of the public inputs—two functional observations whose th derivatives are close in the norm sense may be indistinguishable to the attacker. Finally, depending upon the specific situation, an analyst may wish to use more nuanced notion of distance by taking a weighted combination [15] of various distance functions characterized by . To keep the technical discourse simple, we will not formally introduce such weighted combinations in this paper.
Noninterference
Noninterference is a wellestablished [16, 17, 18] criterion to guarantee absence of sidechannel vulnerabilities. A program is said to satisfy the noninterference property if:
(1) 
To account for the measurement noises in the observation of the executiontime, it is prudent (see, e.g., [6]) to relax the notion of noninterference from exact equality in timing observations to a parameterized neighborhood. For a given , a program satisfies approximate noninterference if:
(2) 
We adapt the notion of approximate noninterference in our setting of functional observations by generalizing previous notions of noninterference. We say that a program satisfies functional approximate noninterference if
(3) 
where is a distance function over functional observations defined earlier. For the rest of the paper we assume a fixed distance function over functional observations.
Quantifying Information Leakage
The notion of noninterference requires that the attacker should deduce nothing about the secret inputs from observing execution time for various public inputs. However, one can argue that achieving noninterference is neither possible nor desirable, because oftentimes programs need to reveal information that depends on the secret inputs. We therefore need a notion of information leakage. Shannon entropy, guessing entropy, and minentropy are three prevalent information metrics [19] to quantify information leakage in programs. Number of distinguishable observations provide a practical upperbound [20, 21] on the information leakage. In this paper, we use the number of distinguishable functional observation clusters as a yardstick for quantitative information leaks.
Iv DataDriven Discovery and Explanation
The space of program inputs are often too large (potentially infinite) to exhaustively explore even for mediumsized programs. This necessitates a datadriven approach for discovery and explanation of functional side channels. In the proposed approach, an analyst provides a set of interesting secret and public input pairs (using domain knowledge or fuzzing techniques), and our tool estimates the executiontime of the program. The tool exploits functional clustering approaches to discover functional side channels. To explain any discovered side channels, our tool instruments the program to print information about auxiliary variables (e.g., predicates on secrets and public variables, number of times a method called, value of some internal variables, number of execution of a block) for each pair of inputs. To summarize: given such set of program traces, the key computational problems are a) to cluster traces exhibiting distinguishable timing behaviors (discovery) and b) to explain these differences by exploiting richer information based on program internals.
Hypertrace Learning. Let be the set of auxiliary variables. An execution trace of a program is a tuple wherein is a value to the secret inputs, is a value to the public inputs, and and are the valuations to the auxiliary variables and estimated execution time, respectively, of the program for secret and public input pairs. We further assume that the valuations of the auxiliary variables deterministically depend only on the secret and public inputs. To keep executiontime unaffected from the instrumentation process, we assume that executiontime for traces is estimated for uninstrumented program. Let be a set of execution traces.
As our main objective is to explain the differences on functional observations due to differences on secret and auxiliary variables, we rearrange the raw execution traces to functional traces by combining traces with common values of secret inputs. Functional traces are hypertraces—as they summarize multiple program executions—that model auxiliary and timing values as a function of public inputs. A hypertrace is a tuple wherein is a value to the secret input, and are functions modeling values of auxiliary variables and execution times, respectively, as a function of public inputs for secret
. Computation of hypertraces from a set of rawtraces is achieved by turning the discrete vectors of observations (for auxiliary variables as well as execution time) into smooth functions represented as linear combinations of appropriate basis functions (e.g. Bspline basis system, Fourier basis functions, and polynomial bases)
[22]. In our tool, we primarily uses Bspline basis functions.SideChannel Discovery. Given a set of hypertraces, we use functional clustering over to detect different classes of functional observations such that hypertraces within a cluster are closer according to the distance function than hypertraces from different clusters. Functional clustering approaches [23]
can be broadly classified into nonparametric and modelbased approaches. Our tool uses nonparametric functional clustering and implements two algorithms to cluster indistinguishable observations. These algorithms—described in Section
VA—take the timing observations set , an upper bound on the number of clusters, a distance function , and the indistinguishability distance as inputs, and returns the “centroids” of observational functions for . Our algorithm guarantees that each centroid represents the timing functions for the set of secret values such that if and only if .SideChannel Explanation. A (hyper) trace discriminant is defined as a disjoint partitioning of the product of the secret variables and auxiliary variables functional spaces along with a functional observations for each partition that models the execution time as a function of public inputs. Formally, a trace discriminant is a set of functional observations —where each models the execution time as a function of the public input variables—and a partition where each is a predicate over secret inputs and functions of auxiliary variables. We define as the total number of functions in the discriminant .
Given a hypertrace and discriminant , we define the prediction error as where is the index of the unique value in such that i.e. the predicate evaluates to true for the valuation of secret value and the auxiliary function . Given a set of hypertraces , and a discriminant , we define the fitness of the discriminant as the mean of prediction errors: .
Definition IV.1 (Discriminant Learning Problem).
Given a set of hyper traces , a bound on the size of the discriminant , a bound on the error , the discriminant learning problem is to find a discriminant with and prediction error .
It follows from Theorem 1 in [24] that the discriminant learning problem is NPhard. For this reason we propose a practical solution to the discriminant learning problem by exploiting functional data clustering and decision tree learning.
For learning discriminant model, we adapt decision tree learning algorithm by converting various functional datavalues into categorical variables. For the auxiliary variable observations
of a secret , our algorithm clusters each auxiliary variable into groups by employing functional clustering [23]. Let shows secret value and categorical auxiliary variable for clustered in groups. Given the set of labeled traces with categorical auxiliary variables and the timing function label , the decision tree learning algorithms provide an efficient way to learn hypertrace discriminants.V Implementation Details
We refer to Algorithm 1 to illustrate the inputs, the output, and the different steps of Fuschia. Given the program , the procedure ExecTime extracts execution time over the public input for each secret input value. The procedure ExecAux produces the internal properties of (method calls and basic block executions) by executing the same input as ExecTime procedure using the instrumented version of the program (). Given the output of ExecTime, an upper bound on the number of clusters, and the distance picked by the user, FDClustering discovers classes of observations and returns the clusters . Each cluster includes a set of timing functions (corresponds to a set of secret values). The procedure DiscLearning learns a set of discriminant predicates .
Va Implementation
We describe the implementation of each component in Algorithm 1. All timing measurements in ExecTime of Algorithm 1 is conducted on an Intel NUC5i5RYH. All other components in Algorithm 1 are conducted on an Intel i52.7 GHz machine with 8 GB memory.
Overall Details. We use functional data analysis library [13] to create Bspline basis and fit functions to the vector of timing and auxiliary variable observations. We use Javassist [25] to produce and obtain a set of auxiliary variable vector for each secret. Given an upperbound on the number of clusters as well as an arbitrary distance function with the indistinguishably distances , we implement FDClustering to discover classes of observations (). This clustering is an instantiation of nonparametric functional clustering [4]. Using the auxiliary variables as features and the functional clusters as labels, we apply CART decision tree in scikitlearn library [26] to implement DiscLearning.
Preparation. We obtain timing functions from the discrete timing vector. Then, we use a given distance function to obtain the distance matrix that includes distance between any timing functions. We specify cannotlink constrains over the matrix . Cannotlink constraints disallow two functions that are more than far to be in the same cluster.
constrained Kmeans. Given the upper bound over the number of observational classes ( is less than number of secret values), constrained Kmeans algorithm [29] obtains clusters in each iteration ( = 1 in the first iteration). If the algorithm could not find clusters with the given constraints, it increases to and runs the algorithm again (if ). Otherwise, it returns cluster object . It has been known that constrained Kmeans with cannotlink constraints is computationally intractable [30]. Constrained Kmeans algorithm internally uses constraints to discover clusters.
Hierarchical clustering. The clustering algorithm with complete link method [31] obtains clusters in each iteration ( = 1 in the first iteration). In each iteration, after clustering, it checks that all pairs in cannotlink are in different clusters. If the condition is not satisfied, it increases to and run the algorithm again (if ). Otherwise, it returns
. Hierarchical clustering is agnostic to constraints and the constraints are verified after the clustering.
Nonfunctional clustering. We extend the wellestablish noninterference implementation [32] to our setting with quantified discovery of leaks. For every public input value, we form cannotlink constrains and apply one of the clustering algorithm with the infinity norm and an indistinguishably distance . Finally, we choose the largest number of clusters among all values of the public input.
DiscLearning. For each secret value, the evaluation of an auxiliary variable is a function in the domain of public input. We use Bspline functions to represent the auxiliary functions in general, but we also allow to fit simpler functions such polynomial functions. Then, we cluster the functions and use the decision tree model to learn discriminant formulas.
VB Microbenchmarks
We design synthetic microbenchmarks to compare the two clustering algorithms, evaluate the scalability of Fuschia, and compare results (in term of discovered clusters) for functional and nonfunctional clustering algorithms.
Programs. Two programs and are shown in Figure 5. Two versions of guess secret applications are considered. The applications [33] and [34] (shown in Figure 5) take the secret and guess as the inputs and execute different sleep commands depending on the values of secret and guess. Six versions of branch and loop are considered. One of these versions is shown in Figure 5. Depending on the values of the secret input, the program does different computations with different complexities. There are four loop complexities: O(log(N)), O(N), O(N.log(N)), and where N is the public input. Each microbenchmark Branch_and_Loop_i has all of these four loop complexities, and there are i types of each loop with different constant factors such as O(log(N)) and O(2.log(N)).
Clustering Parameters. We use both functional clustering (constrained Kmeans and hierarchical) as well as nonfunctional clustering from Section VA with different parameters to compare their results. For nonfunctional clustering, we show the indistinguishably distance with . We consider three types of instantiations for functional clustering: 1) Functional timing models with the infinity norm and the indistinguishably distance ; 2) The first derivative of the functional timing models with the infinity norm and the indistinguishably distance ; 3) The first derivative of the functional timing models with norm and the indistinguishably distance . We use the symbols and similar to the indistinguishably distance symbol to show the number of observations, and the computation time of the clustering algorithms for different distance models and norms.
Clustering Comparison. We compare two functional clustering algorithms proposed in Section VA. Figure 6 shows the comparison between hierarchical and constrained Kmeans algorithms for benchmarks. In particular, the comparison shows that constrained Kmeans is computationally expensive, while the hierarchical clustering is much more scalable. In addition, it shows that the constrained Kmeans discovers more classes of observations than hierarchical clustering. Note that the clusters discovered by both algorithms are valid, and we prefer the one with lower number of clusters.
Scalability. We examine the scalability of Fuschia for applications with different sizes of microbenchmark such as the number of recorded features and different sizes of experiments such as the number of secret and public values. To learn decision trees, we consider the largest number of observational classes among all the clustering parameters and evaluate the accuracy of tree models with 20fold crossvalidation procedure. Table I shows the result of evaluations for different microbenchmarks. We can see that Fuschia can handle programs with 100 classes of observations and complex decision tree models in a reasonable amount of time. In the worst case, it takes less than 30 seconds to cluster timing functions. In addition, it takes less than a second to learn the decision tree model. For the branch and loop applications, the computation time is growing in linear factor with respect to the size of applications and the size of experiments. For application, it takes less than 25 seconds to compute the clusters and learn decision tree models where we have 24,192 test cases with a decision tree model that has 48 unique leaves and the height of 92.
Clustering results. We evaluate the effects of the timing models and the distance norm over the number of discovered classes of observations. Table I shows that the nonfunctional clustering accepts the security of (found one cluster) while the first derivative functional model rejects the security of the application (found two clusters). In addition, in the case of , , and , the functional clustering discovers that there are more classes of observations than the ones discovered by nonfunctional clustering. We can also see how the number of observations are changing based on the timing functions and the distance norm. For example, there are classes of observations for using the timing functions, while there are clusters using the first derivate of timing functions.
Benchmark  # R  #S  #P  #K  T  #K  T  #K  T  #K  T  A  #H  #L  T  
Zigzag  13  100  20  0.001  1  0.2  0.001  1  0.1  0.001  2  0.1  0.001  2  0.2  100%  1  2  0.1 
processBid  3  100  100  0.001  2  1.6  0.001  100  0.4  0.001  5  0.1  0.001  100  0.7  100%  20  100  0.1 
Guess_Secret_1  10  500  100  0.001  2  64.0  0.001  100  1.9  0.001  100  3.4  0.001  100  7.2  100%  21  100  0.1 
Guess_Secret_2  5  100  400  0.001  21  9.8  0.001  100  0.8  0.001  100  2.3  0.001  100  2.6  97.2%  19  100  0.1 
Branch_and_Loop_1  4  36  21  0.1  4  0.2  0.1  4  0.1  0.1  3  0.1  0.1  3  0.1  100.0%  3  4  0.1 
Branch_and_Loop_2  8  72  21  0.1  8  0.4  0.1  8  0.1  0.1  6  0.1  0.1  7  0.2  98.7%  7  8  0.1 
Branch_and_Loop_3  16  144  21  0.1  12  1.7  0.1  14  0.2  0.1  7  0.2  0.1  12  0.4  99.0%  13  14  0.1 
Branch_and_Loop_4  32  288  21  0.2  22  7.9  0.2  22  0.7  0.2  13  0.7  0.2  13  1.3  100.0%  24  22  0.1 
Branch_and_Loop_5  64  576  21  0.2  22  31.4  0.2  22  2.3  0.2  12  2.0  0.2  17  5.7  100.0%  48  22  0.1 
Branch_and_Loop_6  128  1,152  21  0.2  23  207.7  0.2  23  9.2  0.2  48  12.7  0.2  45  24.5  100.0%  92  48  0.5 
Vi Case Studies
Table II summarizes seven realworld Java applications used as case studies in this paper. Table II is similar to Table I in Section VB. Here, we consider functional observations (with indistinguishability bound ) and the first derivative of functional timing models with norm distance (with indistinguishability bound ). The main research questions is the following: Do functional clustering and decision tree learning scale well and pinpoint code fragments related to leaks?
Benchmark  #M  #R  #S  #P  #K  T  #K  T  A  H  #L  T  
Jetty  63  38  800  800  0.002  20  54.7  0.001  20  74.6  100%  9  20  0.1 
GabFeed  573  43  1,105  65  0.1  34  56.7  0.01  32  40.8  99.6%  31  34  0.1 
SnapBuddy  3,071  65  477  14  0.5  20  2.8  0.5  24  3.0  96.2%  21  20  0.1 
ShareValue  13  7  164  41  0.01  29  0.5  0.005  13  0.5  99.3%  17  29  0.1 
PowerBroker  306  44  8  2,048  0.1  6  26.0  0.1  3  86.8  81%  2  3  0.1 
Collab  185  53  176  11  0.001  1  0.3  0.001  1  0.3  N/A  N/A  N/A  N/A 
Password Checker  6  3  10  21,123  0.001  4  0.1  0.001  4  0.1  99.9%  3  4  0.1 
A) GabFeed. Gabfeed is a Java web application with 573 methods implementing a chat server [6]. The server takes users’ public key (public input) and its own private key (secret input) to generate a common key.
Inputs. The defender chooses a finite set of secret values and public values. For this example, he randomly chooses 1,105 server’s private keys and 65 user public keys (uniformly spread through the space of keys). In total, he uses 71,825 test cases.
Leakages. For each private key, Fuschia varies public keys and measures the execution time of the server to generate the common key (Fig. 7 (a)). Next, Fuschia creates timing functions from the discrete timing observations and discovers classes of observations with (Fig. 7 (b)).
Explanation. On the instrumented GabFeed, for each secret value, Fuschia fits functions for each auxiliary variable. Figure 7 (c) shows the decision tree model learned for GabFeed. Using this model, he realizes that the number of basic block calls at line 18 of standardMultiply method explains different classes of observations. The basic block executes expensive shift left and add operations over BigIntegers. The split value in the decision tree model is linear function of the public input with different slopes. The functions are linear since the basic block is triggered as the number of set bits in the public key, and the slope of functions are determined by the number of calls to the standardMultiply method that depends on the number of set bits in the secret key minus one. This vulnerability is an instance of timing leaks in the squareandmultiply algorithm [1].
Mitigation. The clustering over the first derivative timing functions found 32 classes of observations with . This indicates the leaks over the shape of functions for remote attackers. Let’s assume the defender quantizes (mitigates) execution time [35] where the mitigator releases the events at certain times like {4.5,9,13.5,…}. Figure 7 (d) shows 27 classes of observations discovered after applying this mitigator.
Highlights. The decision tree model explains the calls to an expensive basic block is linear function of public input where the slope depends on the secret inputs. The clustering over mitigated observations shows a slight reduction in the strength of leaks. The overall algorithm takes less than 75 seconds.
B) SnapBuddy. SnapBuddy is a mock social network application where each user has their own pages with a photograph [36, 11]. In this application, the public profile is public input, and the identity of users is secret input.
Inputs. The defender considers 447 users in the system (with secret identity) and changes the size of profile images as the public inputs in the scale from 1 to 14.
Leakages. Figure 8(a) shows the download time functions of users versus the public profile sizes. Fuschia discovers 20 classes of observations (=0.5) shown in Figure 8(b).
Explanation. Figure 8 (c) shows (part of) the decision tree model that says users who do not apply any filter on their images follow black cluster (the bottom cluster in Figure 8 (b)), while those apply oilFilter on their images are assigned to red cluster (the top cluster in Figure 8 (b)).
Mitigation. Fuschia discovers 24 classes of observations with over the first derivative functions. The defender also uses the basic double scheme mitigator [35] that predicts the execution time at th slot of
th epoch with
where is the start time of the th epoch (). Figure 8 (d) shows 10 classes of observations discovered for a remote attacker after applying the double scheme mitigation with .Highlights. The defender finds out filters can leak the identity of users. In addition, the defender realizes that applying basic double scheme reduces the strength of leaks. The clustering and decision tree algorithms take less than 5 seconds.
C) Share Value. Share value application is the functional extension of classical share value program studied in [37, 38]. In this case, every user in the system has two types of shares: public and private shares.
Inputs. The program has 63 users each with maximum 400 private shares. The user can have 1 to 400 public shares.
Leakages. Fuschia discovers 29 classes of observations with as shown in Figure 9 (a). The clustering algorithm takes less than 1 second.
Explanation. Figure 9(b) shows decision tree model that says different intervals of calls to the remote data base cause the leaks. The decision tree learning takes less than 1 second.
Mitigation. The defender sets the values of to be 0.005, and Fuschia obtains 13 classes of observations over the first derivatives. The defender also mitigates the leaks with permitting the response times to occur at {1,2,3,etc}, and Fuschia discovers 12 classes of observations with .
D) PowerBroker. PowerBroker is a peertopeer program used by power suppliers for exchanging power [39]. During the connection setup, there is a step where the two peers exchange RSA public keys.
Inputs. The defender chooses 8 secret inputs (RSA Keys) and 2,048 public inputs (messages need to be decrypted) using AFL fuzzer [40] and the domain knowledges.
Leakages. Fuschia takes the indistinguishable distances and discovers three classes of observations over the first derivative of timing functions as shown in Figure 9 (c). The clustering algorithm takes less than 90 seconds.
Explanation. Figure 9(d) shows the decision tree model learned for PowerBroker that pinpoints the basic block at line 97 of montgomeryMultiply method discriminates green cluster from other two clusters. The decision tree learning takes less than 1 second. PowerBroker uses the Chinese Remainder Theorem based on the implementation in [41] that can leak secret keys using attacks such as [42].
E) Collab. Collab is a scheduling application that allows users to create new events and modify existing events [43]. An audit events is a secret, while other events are public.
Inputs. The defender considers 176 users each has either no or one audit event. The defender consider 11 operations (combination of add and commit) over public events.
Leakages. Fuschia discovers only one cluster for both timing functions and the first derivative of timing functions with and to be as small as 0.001. In this example, given the input traces (finite sets of public and secret values), the defender concludes that the program is not leaking any information. The clustering algorithm takes less than 1 second.
F) Password Checker. We consider a password checker that uses string equality checker from [44].
Inputs. The defender uses libFuzzer [10] that generates 21,123 guesses for 10 randomly selected password. We assume the length of password is at most 6 lowercase alphabet.
Leakages. Fuschia discovers 4 classes of observations for both timing functions and the first derivative of timing functions where and . The clustering algorithm takes less than 1 second.
Explanation. The number of basic block at the entry of the loop that compares the candidate and password strings explain different classes of observations. The decision tree learning takes less than 1 second.
Vii Related Work
Noninterference. Noninterference was first introduced by Goguen and Meseguer [16] and has been widely used to enforce confidentiality properties in various systems [17, 18, 45]. The work [6] defines bounded noninterference that requires the resource usage behavior of the program executed from the same public inputs differ at most . The functional noninterference allows us to consider various notion of distance norms on noise models and mitigation policies.
Detection of information leaks. Various techniques have been used to detect information leaks [46, 32, 6]. Molnar et al. [32] propose program transcript model to detect timing side channels. This model captures a class of sidechannel attacks in which the adversary can see the entire controlflow. Molnar et al. [32] do not quantify classes of observations.
Quantification of information leaks. The amount of information leakage can be estimated based on quantitative information flow techniques [19, 47, 46]. The works [20, 21] give an upper bound on the amount of information leakage based on sidechannel observations. In particular, Köpf and Dürmuth [20] prove that the amount of information leakage is bounded from above by bits, where is the number of distinct classes of observations, and is the number of measurements by the attacker. Our clustering algorithm uses this bound to show the strength of leaks.
Hardening against timing side channels. Previous works study methods to eliminate timing leaks [37, 48, 49] and to mitigate timing side channels [20, 35, 50]. The work [35] introduces different schemes to mitigate timing side channels. We use these schemes in our functional setting to show possible classes of observations after the mitigation.
Statistical Learning for program performance analysis.Machine learning techniques have been used for detecting and explaining performance bugs in software [11, 12, 51]. TizpazNiari et al. [12] consider performance issues in softwares. They also cluster execution times of programs and then explain what properties of program distinguish the different functional clusters. Their work is limited to linear functions as it needs to discover functions, while we support arbitrary functions . Further, they analyze performance, while we focus on confidentiality. Time series have been used for profiling and failure detection [52, 53, 54, 55]. In particular, Hauswirth et al. [53] group the traces of the same input together. If there is a pattern like sudden instructionpercycle changes, they align all executions using dynamic time warping (DTW) [56] and apply statistical correlation to find properties that are linked to the changes. In contrast, we use (different forms of) functional data analysis for confidentiality.
References
 [1] P. C. Kocher, “Timing attacks on implementations of diffiehellman, rsa, dss, and other systems,” in Annual International Cryptology Conference. Springer, 1996, pp. 104–113.
 [2] D. Brumley and D. Boneh, “Remote timing attacks are practical,” Computer Networks, vol. 48, no. 5, pp. 701–716, 2005.
 [3] S. Chen, R. Wang, X. Wang, and K. Zhang, “Sidechannel leaks in web applications: A reality today, a challenge tomorrow,” in 31st IEEE Symposium on Security and Privacy, S&P 2010, 1619 May 2010, Berleley/Oakland, California, USA, 2010, pp. 191–206.
 [4] F. Ferraty and P. Vieu, Nonparametric functional data analysis: theory and practice. Springer Science & Business Media, 2006.
 [5] M. O. de la Fuente and M. FebreroBande, “Utilities for statistical computing in functional data analysis: The package fda. usc,” 2011.
 [6] J. Chen, Y. Feng, and I. Dillig, “Precise detection of sidechannel vulnerabilities using quantitative cartesian hoare logic,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30  November 03, 2017, B. M. Thuraisingham, D. Evans, T. Malkin, and D. Xu, Eds. ACM, 2017, pp. 875–890. [Online]. Available: http://doi.acm.org/10.1145/3133956.3134058
 [7] “Timing sidechannel on the password in eclipse jetty,” May 2017. [Online]. Available: https://github.com/eclipse/jetty.project/commit/f3751d70787fd8ab93932a51c60514c2eb37cb58
 [8] “Timing sidechannel on the length of password in eclipse jetty,” May 2017. [Online]. Available: https://github.com/eclipse/jetty.project/commit/2baa1abe4b1c380a30deacca1ed367466a1a62ea
 [9] “Fixed timing sidechannel on the length of password in eclipse jetty,” August 2017. [Online]. Available: https://github.com/eclipse/jetty.project/commit/a7e8b4220a410b85c843bffcd13f07d70f1b3fe8
 [10] “libfuzzer  a library for coverageguided fuzz testing (part of llvm 3.9),” 2016. [Online]. Available: http://llvm.org/docs/LibFuzzer.html
 [11] S. TizpazNiari, P. Černỳ, B.Y. E. Chang, S. Sankaranarayanan, and A. Trivedi, “Discriminating traces with time,” in International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 2017, pp. 21–37.

[12]
S. TizpazNiari, P. Černý, B. E. Chang, and A. Trivedi, “Differential
performance debugging with discriminant regression trees,” in
32nd AAAI Conference on Artificial Intelligence (AAAI)
, 2018, pp. 2468–2475.  [13] J. Ramsay, G. Hooker, and S. Graves, Functional data analysis with R and MATLAB. Springer Science & Business Media, 2009.
 [14] S. Chiba, “Javassist  a reflectionbased programming wizard for java,” in Proceedings of OOPSLA’98 Workshop on Reflective Programming in C++ and Java, vol. 174, 1998.
 [15] T. Górecki and M. Łuczak, “First and second derivatives in time series classification using dtw,” Communications in StatisticsSimulation and Computation, vol. 43, no. 9, pp. 2081–2092, 2014.
 [16] J. A. Goguen and J. Meseguer, “Security policies and security models,” in Security and Privacy, 1982 IEEE Symposium on. IEEE, 1982, pp. 11–11.
 [17] A. Sabelfeld and A. C. Myers, “Languagebased informationflow security,” IEEE Journal on selected areas in communications, vol. 21, no. 1, pp. 5–19, 2003.
 [18] T. Terauchi and A. Aiken, “Secure information flow as a safety problem,” in International Static Analysis Symposium. Springer, 2005, pp. 352–367.
 [19] B. Köpf and D. Basin, “An informationtheoretic model for adaptive sidechannel attacks,” in Proceedings of the 14th ACM Conference on Computer and Communications Security, ser. CCS ’07. New York, NY, USA: ACM, 2007, pp. 286–296.
 [20] B. Köpf and M. Dürmuth, “A provably secure and efficient countermeasure against timing attacks,” in Computer Security Foundations Symposium, 2009. CSF’09. 22nd IEEE. IEEE, 2009, pp. 324–335.
 [21] B. Köpf and G. Smith, “Vulnerability bounds and leakage resilience of blinded cryptography under timing attacks,” in Computer Security Foundations Symposium (CSF), 2010 23rd IEEE. IEEE, 2010, pp. 44–56.
 [22] J. O. Ramsay, Functional data analysis. Wiley Online Library, 2006.
 [23] J. Jacques and C. Preda, “Functional data clustering: a survey,” Advances in Data Analysis and Classification, vol. 8, no. 3, pp. 231–255, 2014.
 [24] R. Alur and N. Singhania, “Precise piecewise affine models from inputoutput data,” in Proceedings of the 14th International Conference on Embedded Software, ser. EMSOFT. New York, NY, USA: ACM, 2014, pp. 3:1–3:10.
 [25] S. Chiba, “Loadtime structural reflection in java,” in European Conference on ObjectOriented Programming. Springer, 2000, pp. 313–336.
 [26] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikitlearn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
 [27] S. C. Johnson, “Hierarchical clustering schemes,” Psychometrika, vol. 32, no. 3, pp. 241–254, 1967.
 [28] K. Wagstaff, C. Cardie, S. Rogers, S. Schrödl et al., “Constrained kmeans clustering with background knowledge,” in ICML, vol. 1, 2001, pp. 577–584.
 [29] J. Song, H. Wang, and M. J. Song, “Package ‘ckmeans. 1d. dp’,” 2017.
 [30] I. Davidson and S. Ravi, “Clustering with constraints: Feasibility issues and the kmeans algorithm,” in Proceedings of the 2005 SIAM international conference on data mining. SIAM, 2005, pp. 138–149.
 [31] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2013, ISBN 3900051070. [Online]. Available: http://www.Rproject.org/
 [32] D. Molnar, M. Piotrowski, D. Schultz, and D. A. Wagner, “The program counter security model: Automatic detection and removal of controlflow side channel attacks,” IACR Cryptology ePrint Archive, vol. 2005, p. 368, 2005. [Online]. Available: http://eprint.iacr.org/2005/368
 [33] Q.S. Phan, L. Bang, C. S. Pasareanu, P. Malacaria, and T. Bultan, “Synthesis of adaptive sidechannel attacks,” in Computer Security Foundations Symposium (CSF), 2017 IEEE 30th. IEEE, 2017, pp. 328–342.
 [34] “Guess secret version 2,” 2017. [Online]. Available: https://github.com/ApogeeResearch/STAC/blob/masterCanonical_Examples/Source/Category3_vulnerable.java
 [35] A. Askarov, D. Zhang, and A. C. Myers, “Predictive blackbox mitigation of timing channels,” in Proceedings of the 17th ACM conference on Computer and communications security. ACM, 2010, pp. 297–307.
 [36] “Snapbuddy application,” 2016. [Online]. Available: https://github.com/ApogeeResearch/STAC/tree/master/Engagement_Challenges/Engagement_2/snapbuddy_1
 [37] J. Agat, “Transforming out timing leaks,” in Proceedings of the 27th ACM SIGPLANSIGACT symposium on Principles of programming languages. ACM, 2000, pp. 40–53.
 [38] H. Mantel and A. Starostin, “Transforming out timing leaks, more or less,” in European Symposium on Research in Computer Security. Springer, 2015, pp. 447–467.
 [39] “Powerbroker application,” 2017. [Online]. Available: https://github.com/ApogeeResearch/STAC/tree/master/Engagement_Challenges/Engagement_4/powerbroker_4
 [40] “American fuzzy lop,” 2016. [Online]. Available: http://lcamtuf.coredump.cx/afl/
 [41] H. Warren, “Montgomery multiplication,” 2012. [Online]. Available: http://www.hackersdelight.org/MontgomeryMultiplication.pdf
 [42] W. Schindler, “A timing attack against rsa with the chinese remainder theorem,” in International Workshop on Cryptographic Hardware and Embedded Systems. Springer, 2000, pp. 109–124.
 [43] “Collab application,” 2017. [Online]. Available: https://github.com/ApogeeResearch/STAC/tree/master/Engagement_Challenges/Engagement_4/collab
 [44] “Timing attack in google keyczar library,” 2009. [Online]. Available: https://rdist.root.org/2009/05/28/timingattackingooglekeyczarlibrary/
 [45] J. B. Almeida, M. Barbosa, G. Barthe, F. Dupressoir, and M. Emmi, “Verifying constanttime implementations.” in USENIX Security Symposium, 2016, pp. 53–70.
 [46] M. Backes, B. Köpf, and A. Rybalchenko, “Automatic discovery and quantification of information leaks,” in Security and Privacy, 2009 30th IEEE Symposium on. IEEE, 2009, pp. 141–153.
 [47] G. Smith, “On the foundations of quantitative information flow,” in International Conference on Foundations of Software Science and Computational Structures. Springer, 2009, pp. 288–302.
 [48] D. Molnar, M. Piotrowski, D. Schultz, and D. Wagner, “The program counter security model: Automatic detection and removal of controlflow side channel attacks,” in International Conference on Information Security and Cryptology. Springer, 2005, pp. 156–168.
 [49] G. Barthe, T. Rezk, and M. Warnier, “Preventing timing leaks through transactional branching instructions,” Electronic Notes in Theoretical Computer Science, vol. 153, no. 2, pp. 33–55, 2006.
 [50] D. Zhang, A. Askarov, and A. C. Myers, “Predictive mitigation of timing channels in interactive systems,” in Proceedings of the 18th ACM conference on Computer and communications security. ACM, 2011, pp. 563–574.
 [51] L. Song and S. Lu, “Statistical debugging for realworld performance problems,” ACM SIGPLAN Notices, vol. 49, no. 10, pp. 561–578, 2014.
 [52] M. Hauswirth, P. F. Sweeney, A. Diwan, and M. Hind, “Vertical profiling: understanding the behavior of objectpriented applications,” ACM Sigplan Notices, vol. 39, no. 10, pp. 251–269, 2004.
 [53] M. Hauswirth, A. Diwan, P. F. Sweeney, and M. C. Mozer, “Automating vertical profiling,” in ACM SIGPLAN Notices, vol. 40, no. 10. ACM, 2005, pp. 281–296.
 [54] P. F. Sweeney, M. Hauswirth, B. Cahoon, P. Cheng, A. Diwan, D. Grove, and M. Hind, “Using hardware performance monitors to understand the behavior of java applications.” in Virtual Machine Research and Technology Symposium, 2004, pp. 57–72.
 [55] A. Adamoli and M. Hauswirth, “Trevis: A context tree visualization and analysis framework and its use for classifying performance failure reports,” in Proceedings of the 5th international symposium on Software visualization. ACM, 2010, pp. 73–82.
 [56] D. J. Berndt and J. Clifford, “Using dynamic time warping to find patterns in time series.” in KDD workshop, vol. 10, no. 16. Seattle, WA, 1994, pp. 359–370.
Comments
There are no comments yet.