3.4. An Anagram Detection Example — Problem Solving with Algorithms and Data Structures (2024)

A good example problem for showing algorithms with different orders ofmagnitude is the classic anagram detection problem for strings. Onestring is an anagram of another if the second is simply a rearrangementof the first. For example, 'heart' and 'earth' are anagrams. Thestrings 'python' and 'typhon' are anagrams as well. For the sakeof simplicity, we will assume that the two strings in question are ofequal length and that they are made up of symbols from the set of 26lowercase alphabetic characters. Our goal is to write a boolean functionthat will take two strings and return whether they are anagrams.

3.4.1. Solution 1: Checking Off

Our first solution to the anagram problem will check the lengths of thestrings and then to see that each character in the first string actuallyoccurs in the second. If it is possible to “checkoff” each character, thenthe two strings must be anagrams. Checking off a character will beaccomplished by replacing it with the special Python value None.However, since strings in Python are immutable, the first step in theprocess will be to convert the second string to a list. Each characterfrom the first string can be checked against the characters in the listand if found, checked off by replacement. ActiveCode 1 shows this function.

To analyze this algorithm, we need to note that each of the ncharacters in s1 will cause an iteration through up to ncharacters in the list from s2. Each of the n positions in thelist will be visited once to match a character from s1. The numberof visits then becomes the sum of the integers from 1 to n. We statedearlier that this can be written as

\[\begin{split}\sum_{i=1}^{n} i &= \frac {n(n+1)}{2} \\ &= \frac {1}{2}n^{2} + \frac {1}{2}n\end{split}\]

As \(n\) gets large, the \(n^{2}\) term will dominate the\(n\) term and the \(\frac {1}{2}\) can be ignored.Therefore, this solution is \(O(n^{2})\).

3.4.2. Solution 2: Sort and Compare

Another solution to the anagram problem will make use of the fact thateven though s1 and s2 are different, they are anagrams only ifthey consist of exactly the same characters. So, if we begin by sortingeach string alphabetically, from a to z, we will end up with the samestring if the original two strings are anagrams. ActiveCode 2 showsthis solution. Again, in Python we can use the built-in sort methodon lists by simply converting each string to a list at the start.

At first glance you may be tempted to think that this algorithm is\(O(n)\), since there is one simple iteration to compare the ncharacters after the sorting process. However, the two calls to thePython sort method are not without their own cost. As we will see ina later chapter, sorting is typically either \(O(n^{2})\) or\(O(n\log n)\), so the sorting operations dominate the iteration.In the end, this algorithm will have the same order of magnitude as thatof the sorting process.

3.4.3. Solution 3: Brute Force

A brute force technique for solving a problem typically tries toexhaust all possibilities. For the anagram detection problem, we cansimply generate a list of all possible strings using the characters froms1 and then see if s2 occurs. However, there is a difficultywith this approach. When generating all possible strings from s1,there are n possible first characters, \(n-1\) possiblecharacters for the second position, \(n-2\) for the third, and soon. The total number of candidate strings is\(n*(n-1)*(n-2)*...*3*2*1\), which is \(n!\). Although someof the strings may be duplicates, the program cannot know this ahead oftime and so it will still generate \(n!\) different strings.

It turns out that \(n!\) grows even faster than \(2^{n}\) asn gets large. In fact, if s1 were 20 characters long, there wouldbe \(20!=2,432,902,008,176,640,000\) possible candidate strings.If we processed one possibility every second, it would still take us77,146,816,596 years to go through the entire list. This is probably notgoing to be a good solution.

3.4.4. Solution 4: Count and Compare

Our final solution to the anagram problem takes advantage of the factthat any two anagrams will have the same number of a’s, the same numberof b’s, the same number of c’s, and so on. In order to decide whethertwo strings are anagrams, we will first count the number of times eachcharacter occurs. Since there are 26 possible characters, we can use alist of 26 counters, one for each possible character. Each time we see aparticular character, we will increment the counter at that position. Inthe end, if the two lists of counters are identical, the strings must beanagrams. ActiveCode 3 shows this solution.

Again, the solution has a number of iterations. However, unlike thefirst solution, none of them are nested. The first two iterations usedto count the characters are both based on n. The third iteration,comparing the two lists of counts, always takes 26 steps since there are26 possible characters in the strings. Adding it all up gives us\(T(n)=2n+26\) steps. That is \(O(n)\). We have found alinear order of magnitude algorithm for solving this problem.

Before leaving this example, we need to say something about spacerequirements. Although the last solution was able to run in linear time,it could only do so by using additional storage to keep the two lists ofcharacter counts. In other words, this algorithm sacrificed space inorder to gain time.

This is a common occurrence. On many occasions you will need to makedecisions between time and space trade-offs. In this case, the amount ofextra space is not significant. However, if the underlying alphabet hadmillions of characters, there would be more concern. As a computerscientist, when given a choice of algorithms, it will be up to you todetermine the best use of computing resources given a particularproblem.

Self Check

    Q-4: Given the following code fragment, what is its Big-O running time?

    test = 0for i in range(n): for j in range(n): test = test + i * j
  • O(n)
  • In an example like this you want to count the nested loops. especially the loops that are dependent on the same variable, in this case, n.
  • O(n^2)
  • A singly nested loop like this is O(n^2)
  • O(log n)
  • log n typically is indicated when the problem is iteratvely made smaller
  • O(n^3)
  • In an example like this you want to count the nested loops. especially the loops that are dependent on the same variable, in this case, n.

    Q-5: Given the following code fragment what is its Big-O running time?

    test = 0for i in range(n): test = test + 1for j in range(n): test = test - 1
  • O(n)
  • Even though there are two loops they are not nested. You might think of this as O(2n) but we can ignore the constant 2.
  • O(n^2)
  • Be careful, in counting loops you want to make sure the loops are nested.
  • O(log n)
  • log n typically is indicated when the problem is iteratvely made smaller
  • O(n^3)
  • Be careful, in counting loops you want to make sure the loops are nested.

    Q-6: Given the following code fragment what is its Big-O running time?

    i = nwhile i > 0: k = 2 + 2 i = i // 2
  • O(n)
  • Look carefully at the loop variable i. Notice that the value of i is cut in half each time through the loop. This is a big hint that the performance is better than O(n)
  • O(n^2)
  • Check again, is this a nested loop?
  • O(log n)
  • The value of i is cut in half each time through the loop so it will only take log n iterations.
  • O(n^3)
  • Check again, is this a nested loop?

You have attempted of activities on this page

3.4. An Anagram Detection Example — Problem Solving with Algorithms and Data Structures (2024)
Top Articles
Latest Posts
Article information

Author: Aron Pacocha

Last Updated:

Views: 6321

Rating: 4.8 / 5 (48 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Aron Pacocha

Birthday: 1999-08-12

Address: 3808 Moen Corner, Gorczanyport, FL 67364-2074

Phone: +393457723392

Job: Retail Consultant

Hobby: Jewelry making, Cooking, Gaming, Reading, Juggling, Cabaret, Origami

Introduction: My name is Aron Pacocha, I am a happy, tasty, innocent, proud, talented, courageous, magnificent person who loves writing and wants to share my knowledge and understanding with you.