Code Coverage is a technique used to obtain information about which internal code of a binary is being executed when it is running. In Fuzz Testing, we can receive this feedback from the Target on each execution, allowing us to better evaluate the quality of our test cases and determine which are the most interesting.
Normally, this technique is used in Fuzz Testing that runs test cases quickly and to which mutations are applied.
This technique can be customized to different methods. We can receive feedback for each executed function (in Clang, we can use -fsanitize-coverage=func
), allowing us to know which binary functions have been executed by our test case. One of the most common methods, and the one we will use in the tests, is to receive feedback for each transition from one Basic Block to another, called edges. This can be achieved by compiling the Target with -fsanitize-coverage=trace-pc-guard
.
The use of Code Coverage introduces an execution time penalty at runtime, so it will be necessary to plan and decide correctly if we need to make use of it or not.
To perform the tests, we used the V8 JavaScript engine, where we added the flag sanitizer_coverage_flags="trace-pc-guard"
to the args.gn
file. We will use the “Custom” technique described in the previous post.
The JavaScript test case used in all tests has been:
We have performed metrics on our server and obtained the following results:
With Code Coverage disabled, we have obtained a total of 17,670 executions every 300 seconds. While with Code Coverage enabled, we have obtained a total of 9,760 executions in the same timeframe. As can be seen, Code Coverage reduces the speed of execution by 44%.
These tests have been performed with a 1-byte write in memory for each edge reported, and we have looped a memory map in each execution to make a more realistic simulation.
Issues and Solutions
As mentioned, the first problem is the loss of performance in each execution. To reduce this, on one hand, we must disable the Code Coverage of parts of the Target code that do not interest us and/or that are repeated in each execution. On the other hand, we must reduce and optimize the operations that the Fuzzer performs with the reported Code Coverage.
The second problem concerns stability in the Code Coverage report. When Fuzz Testing a target as large as V8, whose latest version contains 12,244,947 lines of code (according to the cloc
command), it is very common that when running the same test case several times, each Code Coverage report varies. This is because V8 generates threads and introduces random variables that generate “noise” (edges that are random or independent of the executed test case). Such instability can affect the quality of our Fuzzer, which will make decisions based on erroneous reports.
As a test, after running our test case 8 times, we obtained a variable number of total edges reported in each execution:
- Exec 1: Total Unique Edges reported: 5782
- Exec 2: Total Unique Edges reported: 5761
- Exec 3: Total Unique Edges reported: 5430
- Exec 4: Total Unique Edges reported: 5599
- Exec 5: Total Unique Edges reported: 5599
- Exec 6: Total Unique Edges reported: 5803
- Exec 7: Total Unique Edges reported: 5472
- Exec 8: Total Unique Edges reported: 5740
In this execution, we are getting the Code Coverage reported from the Main Thread. As can be seen, each execution reports a different total number of edges.
Improving Stability
To reduce this noise, the first technique would be to activate the Code Coverage report at the right moment. For example, in our Executor, we could add a variable (is_code_coverage_report
) that enables and disables the reporting mechanism in this way:
The second technique would be to follow only the Code Coverage of the main thread. To do this, we can check that the thread reporting Code Coverage is the desired one.
The third option we can explore is creating a blacklist containing all noise-type edges, which will prevent us from detecting these edges as false positives. An implementation would be, once we detect that a test case has generated new Code Coverage, we execute it N times, collecting the Code Coverage of each execution. At the end, we compare which edges have been reported N times, considering these as valid edges, and the rest as noise to be added to the blacklist.
By doing this repeatedly, we will see how the noise decreases, and we will know precisely when a test case produces new Code Coverage and when it does not.
Conclusion
If we choose to use Code Coverage in our Fuzz Testing, we must consider the impact on performance that this may cause. We must also measure noise to detect if the Target generates false positives that could reduce the quality of the test cases generated by our Fuzzer. After this analysis, we must apply the appropriate measures to improve our Fuzz Testing.
Code Coverage is a technique used to obtain information about which internal code of a binary is being executed when it is running. In Fuzz Testing, we can receive this feedback from the Target on each execution, allowing us to better evaluate the quality of our test cases and determine which are the most interesting.
Normally, this technique is used in Fuzz Testing that runs test cases quickly and to which mutations are applied.
This technique can be customized to different methods. We can receive feedback for each executed function (in Clang, we can use -fsanitize-coverage=func
), allowing us to know which binary functions have been executed by our test case. One of the most common methods, and the one we will use in the tests, is to receive feedback for each transition from one Basic Block to another, called edges. This can be achieved by compiling the Target with -fsanitize-coverage=trace-pc-guard
.
The use of Code Coverage introduces an execution time penalty at runtime, so it will be necessary to plan and decide correctly if we need to make use of it or not.
To perform the tests, we used the V8 JavaScript engine, where we added the flag sanitizer_coverage_flags="trace-pc-guard"
to the args.gn
file. We will use the “Custom” technique described in the previous post.
The JavaScript test case used in all tests has been:
We have performed metrics on our server and obtained the following results:
With Code Coverage disabled, we have obtained a total of 17,670 executions every 300 seconds. While with Code Coverage enabled, we have obtained a total of 9,760 executions in the same timeframe. As can be seen, Code Coverage reduces the speed of execution by 44%.
These tests have been performed with a 1-byte write in memory for each edge reported, and we have looped a memory map in each execution to make a more realistic simulation.
Issues and Solutions
As mentioned, the first problem is the loss of performance in each execution. To reduce this, on one hand, we must disable the Code Coverage of parts of the Target code that do not interest us and/or that are repeated in each execution. On the other hand, we must reduce and optimize the operations that the Fuzzer performs with the reported Code Coverage.
The second problem concerns stability in the Code Coverage report. When Fuzz Testing a target as large as V8, whose latest version contains 12,244,947 lines of code (according to the cloc
command), it is very common that when running the same test case several times, each Code Coverage report varies. This is because V8 generates threads and introduces random variables that generate “noise” (edges that are random or independent of the executed test case). Such instability can affect the quality of our Fuzzer, which will make decisions based on erroneous reports.
As a test, after running our test case 8 times, we obtained a variable number of total edges reported in each execution:
- Exec 1: Total Unique Edges reported: 5782
- Exec 2: Total Unique Edges reported: 5761
- Exec 3: Total Unique Edges reported: 5430
- Exec 4: Total Unique Edges reported: 5599
- Exec 5: Total Unique Edges reported: 5599
- Exec 6: Total Unique Edges reported: 5803
- Exec 7: Total Unique Edges reported: 5472
- Exec 8: Total Unique Edges reported: 5740
In this execution, we are getting the Code Coverage reported from the Main Thread. As can be seen, each execution reports a different total number of edges.
Improving Stability
To reduce this noise, the first technique would be to activate the Code Coverage report at the right moment. For example, in our Executor, we could add a variable (is_code_coverage_report
) that enables and disables the reporting mechanism in this way:
The second technique would be to follow only the Code Coverage of the main thread. To do this, we can check that the thread reporting Code Coverage is the desired one.
The third option we can explore is creating a blacklist containing all noise-type edges, which will prevent us from detecting these edges as false positives. An implementation would be, once we detect that a test case has generated new Code Coverage, we execute it N times, collecting the Code Coverage of each execution. At the end, we compare which edges have been reported N times, considering these as valid edges, and the rest as noise to be added to the blacklist.
By doing this repeatedly, we will see how the noise decreases, and we will know precisely when a test case produces new Code Coverage and when it does not.
Conclusion
If we choose to use Code Coverage in our Fuzz Testing, we must consider the impact on performance that this may cause. We must also measure noise to detect if the Target generates false positives that could reduce the quality of the test cases generated by our Fuzzer. After this analysis, we must apply the appropriate measures to improve our Fuzz Testing.