Advertising
- Someone
- Thursday, April 12th, 2012 at 12:59:09pm MDT
- words = urllib2.unquote(
- 352 ' '.join(
- 353 open(os.path.join(base,'#asylum/out')).read().split('\n')[-20:])
- 354 ).replace('/',' ').replace('.',' ').split(' ')
- 355 #drop common words, grabbed a list of 500 common english words
- 356 words = filter(lambda ix: (len(ix) > 3) and (re.search(r"^[a-zA-Z]*$",ix)) and ix not in blacklist,words)
- 357 #user can supply optional term to add to the search list, add those in as if they were the most recent tokens
- 358 user_words = line[4:]
- 359 words += user_words
- 360 #if the user supplied terms they must be impoprtant. Let's add them in twice. Reverse the list to (nearly) equalize the bias for each.
- 361 user_words.reverse()
- 362 words += user_words
- 363 #score the tokens based on frequency and chronological order
- 364 keys = {}
- 365 for ix in range(len(words)):
- 366 word = words[ix]
- 367 keys[word] = keys.get(word,0.0)+sqrt(log(ix+1))
- 368
advertising
Update the Post
Either update this post and resubmit it with changes, or make a new post.
You may also comment on this post.
Please note that information posted here will expire by default in one month. If you do not want it to expire, please set the expiry time above. If it is set to expire, web search engines will not be allowed to index it prior to it expiring. Items that are not marked to expire will be indexable by search engines. Be careful with your passwords. All illegal activities will be reported and any information will be handed over to the authorities, so be good.