Tuesday, October 6, 2015

Lloyd's Word Cloud: What a Difference "Repeat for Each" Can Make

As mentioned in my previous posting, I decided to update the first part of my word cloud analysis - the part that goes through a passage of text and computes a list of the unique words and their frequency of use - with the "repeat for each" method. I am very happy that I did.

I used a speech given by former President Jimmy Carter given on July 15, 1979 titled the "Crisis of Confidence", and often referred to as the "Malaise Speech." This passage of text has 3301 words in it. (Click here for a transcript of the speech and click here to watch a video of it.) I've had Jimmy Carter on mind because of the recent news of his serious health condition. He's also from Georgia. And, I've always really liked him and respected him as a human being. Anyhow, I thought this speech would make for another good example of a text passage to run through my word cloud app. Plus, it's fairly lengthy.

The time needed by LiveCode to go through the passage to find the unique words followed by computing the frequencies of each the "repeat for each" method was 26.52 seconds. In comparison, my original code using the "repeat with" method took 1604 seconds, or 26.7 minutes, or 60 times as long. To use a technical term, "Wow!"

Here's the resulting word cloud of the speech using the top words:

[ Get the free LiveCode Community version. ]

So, What's the Deal with "Repeat for Each"?


Good question. First, here's a simple example comparing the "repeat for each" with the "repeat with" approaches.



Here's the code for the "repeat with" button:

 on mouseUp  
   put empty into field "two"  
   put the number of words in field "one" into L  
   repeat with i = 1 to L  
    put word i of field "one"&return after field "two"  
   end repeat  
 end mouseUp  

Here's the code for the "repeat for each" button:

 on mouseUp  
   put empty into field "two"  
   repeat for each word varWord in field "one"  
    put varWord&return after field "two"  
   end repeat  
 end mouseUp  

Here are the key differences. First, this line of code in the top example - word i of field "one" - is equivalent to this line of code in the bottom example - varWord in field "one".  This actually sets the value of the variable varWord to that word. In both cases, varWord and word i of field "one" contain the value of each successive word found in the paragraph on the left during each cycle of the repeating loop.

You might say that these two repeating code structures don't look that different, so why bother? Well, the "repeat with" executes the repeat loop much more slowly because it requires LiveCode to loop down through the field, line by line, until it finds the correct line. I infer from this that the "repeat for each" uses a different indexing system and is able to quickly go to the next line without having to first start at the top and work its way down.

I'm now wondering if I will ever use the "repeat with" method again. However, the "repeat with" method has the advantage of noting the line number for each loop with a local variable (I used "i" in the example above). Now, one can also do that with the "repeat for each" method, but you would have to set up the local variable right before the loop begins, such as put 0 into i, then manually increment this variable at the start of each loop, such as add 1 to i. No big deal, but it is two extra lines of code.

Special Thanks Again to Richard Gaskin


I conclude with yet another shout out to Richard Gaskin who kindly took the time to point me in this direction months ago. OK, so it took awhile to "see the light," but I'm finally there.

Now, I just have to hunker down and spend some time with Ali Lloyd's script for creating a visually sophisticated word cloud to learn how he did it exactly. LiveCode has a great community!






3 comments:

  1. Hi Lloyd,

    You'll get even more of speed boost by manipulating a variable instead of a field.

    on mouseUp
    put empty into field 2
    repeat for each word varWord in field 1
    put varWord & return after temp
    end repeat
    put temp into field 2
    end mouseUp

    ReplyDelete
  2. Then there's:

    put fld 1 into temp
    replace space with return in temp
    put temp into fld 2

    ReplyDelete
  3. Now replaceText:

    on mouseUp
    put "" into fld 2
    put replaceText(fld 1, space,return) into fld 2
    answer the number of lines of fld 2
    end mouseUp

    You might also play around with the new 'trueword' vs. 'word':

    repeat for each trueword varWord in field 1

    Each one of these approaches will produce a different word count as the definition of a word can vary.

    ReplyDelete