There are only two widelyused searches for randomly accessible data: linear search and binary search. I developed and created a new search and tested it against binary search, linear search, and variations of them.
There are only two widelyused searches for randomly accessible data: linear search and binary search. I developed and created a new search and tested it against binary search, linear search, and variations of them. I hypothesized that my new search, which estimates the location of a value before placing probes to search for it, would be the most effective. I coded all searches in Java, tested for accuracy, and finally tested for efficiency. I tested the searches on consistent and random arrays from 100 elements to 100,000 elements. The data supported my hypothesis, as my search proved much more efficient in terms of the number of probes used than the others when tested on both consistent and random arrays of varying sizes. Furthermore, as the number of elements increased, the number of probes used in the search I developed stayed about constant, while binary and linear searches increased exponentially. The results are significant because it is possible that searching randomlyaccessible data in the future could be much faster.
Simple searching algorithms can be used on randomly accessible data, such as data stored in an array, to quickly locate a desired value (Cook, 2005, p. 340). The two methods that are typically used for searching are binary and linear, or sequential, search (Cook, 2005, p. 241). Linear search operates with an efficiency of O(n), and binary search with a higher efficiency of O(log(n)) (See Appendix A for more information on “Big O notation”). Binary search works by cutting the dataset in half until the desired value is found. For example, when asked to find a number between 1 and 100, a guesser would guess fifty, and then be told if the number being searched for is higher or lower than that, and the guesser would guess again. This continues until the desired value is found. Linear search works by checking every value from the beginning of the list to the end until the value is found. When asked to find a certain number between 1 and 100, the guesser would guess 1, then 2, then 3, then 4, etc, until the value is found. Linear search can be very slow when used on large data sets, which is why binary search paired with a search algorithm would typically be used. Binary search is very effective at quickly locating values, but could still take a lot of time when used on big data. Although binary search is sufficiently fast for retrieving data, I wanted to make a search that is more efficient. In school, I had learned about many different algorithms used for sorting, but only of two for searching: linear search and binary Search. This lead me to wonder if a more efficient search could be created. The reasoning I used to reach my hypothesis was that humans would not use linear search or binary search when searching in an ordered list. For example, if someone was looking for the word “yak” in a dictionary, they certainly would not read each word one at a time starting at the beginning of the dictionary (how linear search works), and they would not flip to a page in the middle of the dictionary either (such as how binary search works). Instead, they would flip to a page perhaps nine tenths of the way through the dictionary, because they know that “yak” will be towards the end. I used the idea of estimating before probing to create a new way of searching ordered, randomlyaccessible memory. For my experiment (testing my searches against the commonly used binary and linear searches), the searches used and the data sets they were used on were the independent variables, and the number of probes used to find a given value was the dependent variable. The purpose of this experiment was to determine which search is most effective. My hypothesis was that my search, “Estimate,” would be the most effective.
My experimental design was to take the average number of probes for 1000 runs of each search used on each dataset. The six searches tested were linear search, binary search, random to binary search (the first probe is randomly placed and all subsequent probes use binary search), random search (all probes are placed randomly), estimate to binary (first probe is placed by estimating the location, following probes are placed using binary search), and estimate (location is estimated before every probe). Each search was tested on eight data sets, four “consistent” data sets in which values matched element numbers, and four randomly generated data sets. Within the two types of arrays, there were arrays with one hundred, one thousand, ten thousand, and one hundred thousand elements. The range of numbers for each data set is 1 to the number of elements. For example, for each array with 100 elements, the range is 1 to 100. For each random data set, there were eight values being searched for: the smallest possible value, the highest possible value, a value estimated to be 5, 25, 50, 75, and 95 percent of the way into the array, and 0, which is not included in any of the arrays. For the consistent data sets, there were twelve values being searched for: the lowest possible value, the highest possible value, a value estimated to be 5, 25, 30, 40, 50, 60, 70, 75, and 95 percent of the way into the array, and 0, which is not included in the array. It is important to check for values that are not present in the array because in real life applications it is very possible that a desired value would not exist, and when that is the case, the algorithm needs to be able to quickly identify that the given value is not present. See Appendix B for tables of the datasets and search values. The consistent array datasets were tested on more values than the random arrays because if they were tested primarily on values that fall in intervals that are multiples of ½, ¼, and ⅛ of the way through the data, binary search’s efficiency would be inflated, because it is most efficient at those intervals. This was not necessary for the random arrays because a value estimated to be 50% into the data will typically not fall there, negating any possible advantage. All code was written in Java, and compiled using BlueJ version 3.0.8.
Results
The data for linear search is shown here on a separate graph because its range is much higher than that of the other searches.
In Average Probes (Random), the line for Binary is almost identical to the line for RanBin, and as such is not visible.
See Appendix C for tables of raw data.
The results of my experiment show that my search, Estimate, was the most efficient search in terms of the number of probes it used, for both consistent and random data. My hypothesis was supported by my data; the search that estimates a value’s location before probing is the most efficient.
It is important to note that Estimate was the most efficient in terms of the number of probes it required, not necessarily in the amount of time it took to find the value. For example, binary search used three lines of code to place a new probe, and Estimate used nine. This certainly does not translate directly into amount of time taken per probe, but must be considered.
If applied to realworld datasets, Estimate could increase search efficiency, but it is unclear by how much. It is very possible that, even though Estimate used only a fraction of the number of probes that binary search did, it would only increase efficiency by a few nanoseconds.
There are many ways that this study of search efficiency could be furthered: efficiency could be tested strictly on time, the searches could be tested on more varied data sets, on much larger datasets, and on real data, and Estimate search could be improved to place probes more accurately and efficiently.
The purpose of my study was to determine which search used the fewest probes to find a desired value. My hypothesis was strongly supported by my data; Estimate was the most efficient search.
I am grateful to have had Dr. David Brown as a teacher, as he allowed me to realize my passion for computer science, and taught me just about all that I know about it. Mr. Carr taught me the importance of experimental design, and how to properly conduct research. Thanks to Mr. Kramer for mentoring me and for continuing to extend my computer science knowledge.
Appendix A
Big O Notation
The “Big O notation” of a computational function describes how efficient it is in terms of how many operations are needed to reach the goal (sorting a list, finding a certain value, inserting a node, deleting a node, etc.).
(Rowell, E, n.d)
Appendix B
Table of all datasets
100 elements 
1,000 elements 
10,000 elements 
100,000 elements 

Consistent 
A 
B 
C 
D 
Random 
E 
F 
G 
H 
Table 1.
Table of All Datasets with Search Values
(Letters correspond to the datasets established in table 1)
Dataset 
0% 
5% 
25% 
30% 
40% 
50% 
60% 
70% 
75% 
95% 
100% 
0 
A 
A1 
A2 
A3 
A4 
A5 
A6 
A7 
A8 
A9 
A10 
A11 
A12 
B 
B1 
B2 
B3 
B4 
B5 
B6 
B7 
B8 
B9 
B10 
B11 
B12 
C 
C1 
C2 
C3 
C4 
C5 
C6 
C7 
C8 
C9 
C10 
C11 
C12 
D 
D1 
D2 
D3 
D4 
D5 
D6 
D7 
D8 
D9 
D10 
D11 
D12 
Table 2
Dataset 
0% 
5% 
25% 
50% 
75% 
95% 
100% 
0 
E 
A1 
A2 
A3 
A4 
A5 
A6 
A7 
A8 
F 
B1 
B2 
B3 
B4 
B5 
B6 
B7 
B8 
G 
C1 
C2 
C3 
C4 
C5 
C6 
C7 
C8 
H 
D1 
D2 
D3 
D4 
D5 
D6 
D7 
D8 
Table 3
Each letter paired with a number represents one condition that must be tested by each program.
Appendix C
Raw Data
Searched for value 

Search used 

Number of elements in array 

Average of row 
Key for raw data below
For all data from random arrays, each data point represents the average of 1000 runs using 1000 different random arrays. For each search value, every search used the same 1000 random arrays, but the 1000 arrays were different between search values. For RanBin and Random, there were 1000 runs on each of the 1000 arrays, and the average of all one million runs is shown.
100 
1 
5 
25 
50 
75 
95 
100 
0 
100 
Linear 
36.2 
40.921 
49.885 
67.036 
84.905 
97.42 
101 
101 
72.29588 
Binary 
6.265 
6.232 
6.019 
6.003 
6.195 
6.354 
8 
7 
6.5085 
RanBin (1000 avg) 
5.887728 
5.995563 
6.329254 
6.524723 
6.486642 
6.038911 
7.720701 
6.728629 
6.464019 
Random(1000 avg) 
4.999793 
6.527926 
7.798838 
8.093585 
7.938542 
6.510032 
7.185795 
5.711049 
6.845695 
EstBin 
1 
4.928 
5.828 
5.914 
5.781 
3.947 
1 
1 
3.67475 
Est 
1 
2.221 
2.897 
2.89 
2.888 
2.226 
1 
1 
2.01525 
1000: 11000 
1 
50 
250 
500 
750 
950 
1000 
0 
1,000 
Linear 
370 
403.779 
527.347 
685.394 
837.754 
968.305 
1001 
1001 
724.3224 
Binary 
9.3 
9.431 
9.282 
9.307 
9.262 
9.411 
11 
10 
9.624125 
RanBin (1000 avg) 
9.172718 
9.300086 
9.758573 
9.977423 
9.752822 
9.225938 
10.975175 
9.975595 
9.767291 
Random(1000 avg) 
7.30911 
11.130077 
12.446807 
12.759983 
12.439991 
11.045309 
9.485426 
8.051348 
10.58351 
EstBin 
1 
8.224 
9.127 
9.268 
9.096 
7.877 
1 
1 
5.824 
Est 
1 
3.254 
3.527 
3.604 
3.506 
3.169 
1 
1 
2.5075 
10,000 
1 
500 
2500 
5000 
7500 
9500 
10000 
0 
10,000 
Linear 
3691 
4005.238 
5177.492 
6866.562 
8458.405 
9702.34 
10001 
10001 
7237.88 
Binary 
13.095 
12.752 
12.783 
12.871 
12.746 
12.795 
15 
14 
13.25525 
RanBin (1000 avg) 
12.538485 
12.608171 
13.100695 
13.364917 
13.171166 
12.683484 
14.364464 
13.360973 
13.14904 
Random(1000 avg) 
9.611969 
15.723728 
17.058388 
17.423006 
17.131926 
15.793367 
11.788359 
10.358783 
14.36119 
EstBin 
1 
11.389 
12.528 
12.784 
12.531 
11.409 
1 
1 
7.955125 
Est 
1 
3.699 
4.054 
4.075 
3.946 
3.722 
1 
1 
2.812 
100,000 
1 
5000 
25000 
50000 
75000 
95000 
100000 
0 
100,000 
Linear 
36701 
40434.275 
53272.408 
68094.365 
84178.807 
96945.439 
100001 
100001 
72453.54 
Binary 
16.282 
16.14 
16.143 
16.124 
16.02 
16.166 
18 
17 
16.48438 
RanBin (1,000 avg) 
15.857128 
15.908077 
16.491563 
16.624851 
16.448232 
15.964562 
17.690557 
16.689483 
16.45931 
Random(1,000 avg) 
11.906298 
20.293071 
21.721988 
21.967444 
21.677897 
20.359816 
14.090079 
12.664054 
18.08508 
EstBin 
1 
14.787 
15.952 
16.08 
15.825 
14.889 
1 
1 
10.06663 
Est 
1 
4.1 
4.347 
4.306 
4.249 
4.163 
1 
1 
3.020625 
Consistent
For Linear, Binary, EstBin, and Est, only 1 run was needed, because the array is always the same, and the program would always yield the same result. For RanBin and Random, the average of 1000 runs was taken.
100 
1 
5 
25 
30 
40 
50 
60 
70 
75 
95 
100 
0 
100 
Linear 
1 
5 
25 
30 
40 
50 
60 
70 
75 
95 
100 
101 
54.33333333 
Binary 
6 
7 
2 
7 
5 
1 
6 
7 
2 
6 
7 
7 
5.25 
RanBin (1000 avg) 
5.709 
5.515 
6.25 
5.906 
6.192 
6.333 
6.074 
6.167 
6.289 
5.533 
6.712 
6.666 
6.112166667 
Random(1000 avg) 
4.844 
6.107 
7.553 
7.59 
7.681 
7.84 
7.62 
7.627 
7.579 
6.559 
6.254 
5.746 
6.916666667 
EstBin 
1 
1 
1 
1 
1 
1 
6 
1 
1 
1 
1 
1 
1.416666667 
Est 
1 
1 
1 
1 
1 
1 
2 
1 
1 
1 
1 
1 
1.083333333 
1000 
1 
50 
250 
300 
400 
500 
600 
700 
750 
950 
1000 
0 
1000 
Linear 
1 
50 
250 
300 
400 
500 
600 
700 
750 
950 
1000 
1001 
541.8333333 
Binary 
9 
8 
2 
8 
9 
1 
10 
9 
2 
10 
10 
10 
7.333333333 
RanBin (1000 avg) 
8.962 
8.936 
9.428 
9.445 
9.64 
9.524 
9.611 
9.461 
9.378 
8.77 
9.996 
9.997 
9.429 
Random(1000 avg) 
7.034 
10.66 
12.188 
12.163 
12.213 
12.315 
12.246 
12.237 
12.407 
10.793 
8.433 
8.082 
10.89758333 
EstBin 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
Est 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
10,000 
1 
500 
2500 
3000 
4000 
5000 
6000 
7000 
7500 
9500 
10000 
0 
10,000 
Linear 
1 
500 
2500 
3000 
4000 
5000 
6000 
7000 
7500 
9500 
10000 
10001 
5416.833333 
Binary 
13 
13 
2 
13 
12 
1 
11 
14 
2 
14 
14 
14 
10.25 
RanBin (1000 avg) 
12.377 
12.196 
12.652 
12.701 
12.936 
12.962 
12.862 
12.822 
12.6 
12.177 
13.286 
13.36 
12.74425 
Random(1000 avg) 
9.444 
15.094 
16.737 
16.765 
16.869 
16.96 
17.012 
16.951 
16.656 
15.351 
10.714 
10.474 
14.91891667 
EstBin 
1 
14 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
2.083333333 
Est 
1 
2 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1.083333333 
100,000 
1 
5000 
25000 
30000 
40000 
50000 
60000 
70000 
75000 
95000 
100000 
0 
100,000 
Linear 
1 
5000 
25000 
30000 
40000 
50000 
60000 
70000 
75000 
95000 
100000 
100001 
54166.83333 
Binary 
16 
15 
2 
15 
17 
1 
16 
19 
2 
16 
17 
17 
12.75 
RanBin (1000 avg) 
15.754 
15.595 
16.042 
16.075 
16.135 
16.218 
16.254 
18.901 
16.015 
15.543 
16.679 
16.756 
16.33058333 
Random(1000 avg) 
11.721 
19.904 
21.504 
21.444 
21.72 
21.871 
21.88 
24.922 
21.217 
20.043 
13.041 
12.676 
19.32858333 
EstBin 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
Est 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
You must log in to like an article
There are only two widelyused searches for randomly accessible data: linear search and binary search. I developed and created a new search and tested it against binary search, linear search, and variations of them.