Xbox LIVE Indie Games
Sort Discussions: Previous Discussion Next Discussion
Page 1 of 1 (19 posts)

Fast Letter Permutations

Last post 12/15/2009 5:08 PM by jwatte. 18 replies.
  • 12/11/2009 3:38 PM

    Fast Letter Permutations

    I'm working on a game (a collection of smaller word games) that frequently prompts the user for letters, the user creates a word, and then that word is verified.  All is working good.

    There's an AI opponent, and the AI has to do this too.  For the AI to complete its selection, it finds every permutation and sub permutation of the letters, determines which of those permutations is actually a word, sorts that list by size, and then selects a word from the list.  Which word is selected depends largely on the type of game (some of the games want longer words, others want a lot of short words).

    Right now there's a noticeable jerk when finding those permutations of 9 letters and then doing the evaluation.  The evaluation is very fast (there should never be more checks than there are letters in the word), but the permutation comes out with a huge list and can take upward of half a second.  Not good when this has to be run frequently. ^^

    I have a 'solution' in place which caps the time allotted for permutation - once the permutation algorithm goes over that time, it gets cut off.  This works OK and is going to be make it into the final code, but I'd still like to improve the performance of the permutation checking so that I don't hinder the AI.

    Here's an outline of the algorithm I'm using:
    Given: 9 letters, the current permutation (which starts as ""), a Dictionary of found permutations, and the start time:
      If the time has exceeded the limit, cut out
      if the current permutation is already in the permutation dictionary, cut out, otherwise add it to the dictionary
      If there are no letters in the array, cut out

      For each letter in the array
        Create a new array of letters that doesn't contain that letter
        Create a new permutation that is the current permutation plus the letter
        Recursively call the function with the new array and new permutation

    There's a lot of array recreation and string resizing, but I can't find a good way around that that still gives each branch of the recursion its own individual data.  I considered going to StringBuilder instead of string, but I still need to create a new instance of the string each time, so that probably wouldn't help much.

    Does anyone happen to know a better algorithm or a good way to avoid all the dynamic memory I'm throwing all over the place?

    Cheers,
    --Brian
  • 12/11/2009 4:01 PM In reply to

    Re: Fast Letter Permutations

    Assuming you're working in English, and your maximum search space is 9 letters at a time, you've basically got 26^9 possible word combinations. That's roughly 5.4*10^12, which is a little bit more than 2^42.  That will easily fit in an Int64 with 20 bytes to spare.  You can precalculate the integer value for every word in your dictionary ahead of time, and do all your combination work purely in integer space.  Integers are a lot friendlier to memory than are strings.

    *edit* - It's actually a bit more than 26^9 - I forgot to take words shorter than 9 letters into account.  Still, it'll easily fit into an Int64.
  • 12/11/2009 4:42 PM In reply to

    Re: Fast Letter Permutations

    If you have 9 letters and all dont have to be used youre getting almost half a million permutations in worst case (no same letters). So depending on the size of your list for "acceptable" words it might be faster to loop through that list instead to find which words can be formed with given letters. Doesnt help if its a large dictionary though.

    Or (i dont know if youre doing this already) you could evaluate the new permutation at the start of recursive function and cut out that recursion branch if theres no acceptable words starting with that sequence.
  • 12/11/2009 4:47 PM In reply to

    Re: Fast Letter Permutations

    Sowaz:

    Right now there's a noticeable jerk when finding those permutations of 9 letters and then doing the evaluation.  The evaluation is very fast (there should never be more checks than there are letters in the word), but the permutation comes out with a huge list and can take upward of half a second.  Not good when this has to be run frequently. ^^


    How often is "frequently"?  If it's just every few seconds you should be able to run the algorithm in a different thread so it's not interrupting your drawing.  This will eliminate the jerkiness when the algorithm is running.  Another option if you want to avoid threading is to split the work up across multiple frames.

  • 12/11/2009 5:00 PM In reply to

    Re: Fast Letter Permutations

    You could probably also see a decent improvement by multithreading your permutation algorithm. Send each base letter to a new thread for searching (don't forget to exclude any base duplicates too).
  • 12/11/2009 5:04 PM In reply to

    Re: Fast Letter Permutations

    Dave Carlile:
    How often is "frequently"?  If it's just every few seconds you should be able to run the algorithm in a different thread so it's not interrupting your drawing.  This will eliminate the jerkiness when the algorithm is running.  Another option if you want to avoid threading is to split the work up across multiple frames.


    If the jerk is caused by garbage collection, which seems likely if he's running millions of string permutations, then threading may not help as much, depending on the platform.  The Xbox360 doesn't do generational garbage collection, and all the threads stop every time your allocations add up to another megabyte and trigger collection.  Less of an issue on the PC, of course, but still, allocating millions of strings is never going to be good for performance, regardless of how you structure it.
  • 12/11/2009 5:06 PM In reply to

    Re: Fast Letter Permutations

    I'm guessing its a combinaion of the string/array allocations and GC. Add Shawns GC counting code from his blog and watch. I suspect you will see more than you want. I know inthe bible app I generate a lot of strings as I split them for the line wrapping and when the search happens and that is one of the main speed differences between windows and xbox.

    You may consider writing yourself a sort-of-string class that uses a fixed lenght char aray to avoid the allocations.

  • 12/11/2009 5:17 PM In reply to

    Re: Fast Letter Permutations

    The ZMan:
    You may consider writing yourself a sort-of-string class that uses a fixed lenght char aray to avoid the allocations.

    You mean something like StringBuilder?
  • 12/11/2009 5:35 PM In reply to

    Re: Fast Letter Permutations

    9 letters have 9! permutations, which is 362,880.  That's a sufficiently large number that no matter how carefully you micro-optimize your computations, enumerating them all is never exactly going to be fast.

    But I don't think you need to enumerate every permutation to solve this problem. Instead of:

        for every permutation:
            check if it is a word

    How about:

        for every word:
            check if it is a permutation

    The number of valid words in your dictionary is most likely significantly lower than the total number of permutations, plus there are many optimizations you can use to prune the word set if you have your dictionary organized in a suitable hierarchical structure:

    • Only bother examining words that are the right length
    • Only bother examining words that start with one of the letters in your set of 9
    • Could further prune into buckets based on the second and third letters of the word, etc.
  • 12/11/2009 5:47 PM In reply to

    Re: Fast Letter Permutations

    What Shawn explained was what I tried to express earlier on with poor success. This approach however kinda shoots itself in the knee when the words are small, say 3 or 4 letters (the original method becomes significantly faster). So maybe have both algorithms and pick depending on the amount of letters?
  • 12/11/2009 5:59 PM In reply to

    Re: Fast Letter Permutations

    There are several extra techniques you can use if you want to get cute.

    The first is to calculate a hash value for each word in your dictionary, use an algorithm that generates a unique hash for each word.

    Use the same algo to generate a hash for the suspect word and then run through your dictionary comparing values.

    Speeds up the search.

    The second exploits the nature of the language the word is written in.

    You can get frequency tables for letters and pairs of letters as used in crypto.

    Run through the suspect word and compute a floating point value which is an expression of the probability of it being a word.

    If the probability is high then you can do a full search.

    Save on time spent searching for words like xxzzzzzzzzzzzzzzzzzzz

    Or you can store your dictionary as a tree, a root node for each letter, followed by a child node for each letter beneath.

    SO ..


    S - O - <end>
            - M - E - <end>
       - H - I -P -<end>
                -T - <end>
            - O - T -<end>


    If you can get anything out of that you have "So some ship *** shot"

    Speeds up searching and reduces storage.
  • 12/11/2009 6:07 PM In reply to

    Re: Fast Letter Permutations

    Whole lot of stuff here to choose from.


    Assuming you're working in English, and your maximum search space is 9 letters at a time, you've basically got 26^9 possible word combinations. That's roughly 5.4*10^12, which is a little bit more than 2^42.  That will easily fit in an Int64 with 20 bytes to spare.  You can precalculate the integer value for every word in your dictionary ahead of time, and do all your combination work purely in integer space.  Integers are a lot friendlier to memory than are strings.

    *edit* - It's actually a bit more than 26^9 - I forgot to take words shorter than 9 letters into account.  Still, it'll easily fit into an Int64.


    This has a pretty profound effect on my timing.  I go from about half a second in the worst case to under 80MS, averaging around 30 MS.


    If you have 9 letters and all dont have to be used youre getting almost half a million permutations in worst case (no same letters). So depending on the size of your list for "acceptable" words it might be faster to loop through that list instead to find which words can be formed with given letters. Doesnt help if its a large dictionary though.


    I'll give this a shot.  My word list has 80,000 words with <= 9 letters, so that could definitely give an improvement.



    How often is "frequently"?  If it's just every few seconds you should be able to run the algorithm in a different thread so it's not interrupting your drawing.  This will eliminate the jerkiness when the algorithm is running.  Another option if you want to avoid threading is to split the work up across multiple frames.


    Frequently depends on the game.  Usually it's just once at the beginning, but a few games require a new word list every time a word is completed.  Multithreading might help.  I haven't even been looking at GC slowdown, its all been slowdown by the algorithm itself.  I'll have to consider this.  I did notice a bunch of odd jerkiness when I tested on the 360 a couple months ago, so the GC might be to blame here.


    You mean something like StringBuilder?


    Switched over to StringBuilder where appropriate, but now that I'm doing mostly Int64 comparisons I don't think string construction is killing me too much.

    Thanks guys,
    --Brian
  • 12/11/2009 8:26 PM In reply to

    Re: Fast Letter Permutations

    OK, I think I've come up with an algorithm that's pretty close to optimal (short of micro-optimizations which may still need to happen).  I tossed the Int64 representation, as the new method does nothing but additions/subtractions and no string building/compares.  Here's the idea:

    I have two arrays: letterCounts[26] and used[9];
    for each word W in the dictionary:
      usedIdx = 0
     
      for each character c in W
        if letterCounts[c-'a'] < 0) letterCounts[c-'a'] = 0

        letterCounts[c - 'a'] ++
        used[usedIdx++] = c - 'a'

      for each character c in Letters
        letterCounts[c - 'a'] --

      while (usedIdx >= 0)
        if (letterCounts[used[usedIdx]] > 0) then not a word
        letterCounts[used[usedIdx--]] = 0

    The additions/subtractions are intended to cover what happens if a word has duplicate letters.  The "if letterCounts[c-'a'] <0" conditional is to make sure that letters in the Letters list which are not in the dictionary word do not cause letterCounts to start negative in a future word.  It's a slightly round-about way of avoiding checking and zeroing out the entire letterCounts array for each word.

    I'm getting about 25-30 MS constant, and I'm generating very little garbage.  I don't know that it'll get much better than this, but if you can see any early-outs that I'm not noticing or you think there's still a better way, I'm all ears.
  • 12/12/2009 2:46 PM In reply to

    Re: Fast Letter Permutations

    I managed to increase the performance considerably again (from about 30MS to 10 or less on average) by fundamentally changing the algorithm, so I thought I'd post my approach for those interested:

    First, for every word, keep an array of the characters sorted (ie: CAT => ACT)
    Also take the letters sent in and sort those.

    Now things progress similarly to a merge sort:

    Go through each word in the dictionary, and for each word loop:
    You have two indices (idxWord and idxLetters), one of which points into the sorted char's for the word and the other the sorted available letters
    If letters[idxLetters] == charSortedWord[idxWord]
      idxLetters++, idxWord++
      If idxWord is >= charSortedWord.Length, this is a word (every letter was found in 'letters' somewhere)

    Else
      if letters[idxLetters > charSortedWord[idxWord] this can never be a valid word, since the sorting prevents charSortedWord[idxWord] from showing up later in letters
      else idxLetters++

    if idxLetters >= letters.Length this can never be a valid word (no more letters to choose from)


    There is at least one early out that can be thrown in (if letters.Length - idxLetters < charSortedWord.Length - idxWord => there aren't enough letters to finish off the word), but I found that it didn't really help performance at all.  I don't see anything else, but at this point I'm pretty much satisfied with where things are.
  • 12/12/2009 7:44 PM In reply to

    Re: Fast Letter Permutations

    The problem is that your search space is *huge*. The number of combinations is a factorial, which grows faster than exponential!

    Instead, you should order your word dictionary as a tree. Then keep a counter for each of the 26 possible characters. Do a tree search, where you descend into a child node if you still have an appropriate letter. Your tree would look something like:


    public class Node {
      public char letter;
      public bool isWord;
      public Node parent;
      public Node[] children;
    };


    The search would look something like:


      void FindWords(int[] counts, Node root, List<Node> words) {
        if (root.isWord) {
          words.Add(root);
        }
        for (int i = 0; i < 26; ++i) {
          if (counts[i] > 0 && root.children[i] != null) {
            counts[i] -= 1;
            FindWords(counts, root.children[i], words);
            counts[i] += 1;
          }
        }
      }


    Re-constructing a word would happen by starting at the Node in the list, and walking backwards to concatenate characters upwards.

    Finally, constructing the search tree:


      void AddWord(string word, int offset, ref Node root) {
        if (root == null) {
          root = new Node;
          root.letter = word[offset];
        }
        offset += 1;
        if (offset == word.Length) {
          root.isWord = true;
        }
        else {
          if (root.children == null) {
            root.children = new Node[26];
          }
          AddWord(word, offset, ref root.children[word[offset] - 'a']);  // this assumes all words are lowercase English
        }
      }


    Just call AddWord() for each word in your dictionary. You might want to do it in the content pipeline, and serialize the entire dictionary/tree. And, yes, it will probably take like 100 MB.

    There are thinner ways of storing this information (in fact, you can get away with only storing the dictionary sorted plus, say, 26*26 "entry pointers") but those are more complex than you need for the stated problem.

    Note: no garbage is created either in dictionary building or search!
  • 12/13/2009 12:11 PM In reply to

    Re: Fast Letter Permutations

    Wow jwatte, that took my average performance from 10-15MS to under 1 MS.  Quite the difference.  I already had a word tree built in, but I wasn't using it properly for this problem.

    As an aside, there's one minor optimization you can make to the FindWords algorithm given, though I honestly don't know if it helps (don't have profiling setup for times this small).  You can add an int[] UsedNodes and int UsedNodeCount to your Node class which keeps track of which nodes in Node.Children are actually set to something non-null.  Then when you're running through FindAllWords, you don't have to test all 26 letters to see if a child exists.

    Thanks again!
  • 12/15/2009 4:37 AM In reply to

    Re: Fast Letter Permutations

    jwatte:
    The problem is that your search space is *huge*. The number of combinations is a factorial, which grows faster than exponential!

    Instead, you should order your word dictionary as a tree.


    Nice trie Jon!
  • 12/15/2009 4:43 PM In reply to

    Re: Fast Letter Permutations

    fr3shme4t:

    Nice trie Jon!


    Very good! :D
  • 12/15/2009 5:08 PM In reply to

    Re: Fast Letter Permutations

    fr3shme4t:
    Nice trie Jon!

    Lol!

Page 1 of 1 (19 posts) Previous Discussion Next Discussion
var gDomain='m.webtrends.com'; var gDcsId='dcschd84w10000w4lw9hcqmsz_8n3x'; var gTrackEvents=1; var gFpc='WT_FPC'; /*<\/scr"+"ipt>");} /*]]>*/
DCSIMG