11-1 Goodness-of-Fit 583 Cybersecurity Benford’s law is also being used to detect cyberattacks on computer systems by analyzing Internet traffic in real time based on the time between the arrival of consecutive data packets, or “interarrival time.” The basic idea is to detect anomalies in times of Internet traffic flow by analyzing leading digits of those times and determining whether the distribution of those leading digits is a significant departure from the distribution that follows Benford’s law. Normal Internet traffic follows Benford’s law, while a significant departure from Benford’s law may indicate a cyberattack. (See “Benford’s Law Behavior of Internet Traffic,” by Arshadi and Jahangir, Journal of Network and Computer Applications, Vol. 40, No. 2014.) Major advantages of this approach are that it is relatively simple, it doesn’t require difficult computations, it can be done in real time, and hackers would not be able to configure their malware to avoid detection. TABLE 11-4 Leading Digits of Internet Traffic Interarrival Times Leading Digit 1 2 3 4 5 6 7 8 9 Benford’s Law: Distribution of Leading Digits 30.1% 17.6% 12.5% 9.7% 7.9% 6.7% 5.8% 5.1% 4.6% Sample 2 of Leading Digits 69 40 42 26 25 16 16 17 20 SOLUTION REQUIREMENT CHECK (1) The sample data are randomly selected from a larger population. (2) The sample data do consist of frequency counts. (3) Each expected frequency is at least 5. Because there are 271 leading digits and the lowest expected percentage is 4.6%, the lowest expected frequency is 271# 0.046 = 12.466. All of the requirements are satisfied. Step 1: The original claim is that the leading digits fit the distribution given as Benford’s law. Using subscripts corresponding to the leading digits, we can express this claim as p1 = 0.301 and p2 = 0.176 and p3 = 0.125 and . . . and p9 = 0.046. Step 2: If the original claim is false, then at least one of the proportions does not have the value as claimed. Step 3: The null hypothesis must contain the condition of equality, so we have H0: p1 = 0.301 and p2 = 0.176 and p3 = 0.125 and c and p9 = 0.046. H1: At least one of the proportions is not equal to the given claimed value. Step 4: The significance level is not specified, so we use the common choice of a = 0.05. Step 5: Because we are testing a claim that the distribution of leading digits fits the distribution given by Benford’s law, we use the goodness-of-fit test described in this section. The x2 distribution is used with the test statistic given in the preceding Key Elements box. continued Benford’s Law: Detecting Computer Intrusions The bottom row of Table 11-4 lists a sample of 271 leading digits of interarrival times of Internet traffic flow. Do these 271 leading digits appear to provide a good fit with the distribution indicated by Benford’s law (as in the top two rows of Table 11-4)? What does the result suggest about a potential cyberattack? EXAMPLE 2
RkJQdWJsaXNoZXIy NjM5ODQ=