Go Back   London Fixed-gear and Single-speed > Forum Help & Testing Area


 
LinkBack Thread Tools
Old 29th April 2008   #1
crank
 
crank's Avatar
Searching the forum for short words

Is there any way that I can search the forum for the term 'on-one'.. I've tried all kinda tricks with wildcards, and can't make it work.

I've seen that a lot of sites don't let you search for two-letter terms. Is it to cut server-load?
  quote   reply
Registered users don't see adverts. Signup or signin to hide them.
Old 29th April 2008   #2
VelocityBoy
 
VelocityBoy's Avatar
It's about databases and search indexes.

The explanation: A search index is a record of every word and which rows in the database (posts) hold those words. Search indexes are highly efficient as instead of searching every row (post) for the word, the database will look at the word in the index (like a book index) and then see a list of
rows (posts) that hold the word. Indexes only work if they are smaller than the database table that they are the index for. So to keep the index small and fast, the database people choose to exclude common words, otherwise the index will contain an entry for "the" and it would point to tens of thousands of posts.

Both the words "on" and "one" are in the exclude list:
http://dev.mysql.com/doc/refman/5.0/...stopwords.html

So you can't search for them as they get skipped in the search as being too common.

Tags solve this. Tags aren't an index to a database table, they are their own table. You can have three letter words as tags and it should be fine.

So the key to finding things with very few letters or that are very common is to tag things with the words, so that later you can find stuff.
  quote   reply
Old 29th April 2008   #3
VelocityBoy
 
VelocityBoy's Avatar
http://www.londonfgss.com/tags/on-one.html

See?

Unfortunately only one thread has been tagged so far... but that's your answer.
  quote   reply
Old 29th April 2008   #4
mashton
 
mashton's Avatar
This might be a way of getting it to work, but some index size testing would be needed:

(from http://dev.mysql.com/doc/refman/5.0/...ne-tuning.html)

"If you want to change the set of characters that are considered word characters, you can do so in two ways. Suppose that you want to treat the hyphen character ('-') as a word character. Use either of these methods:
  • Modify the MySQL source: In myisam/ftdefs.h, see the true_word_char() and misc_word_char() macros. Add '-' to one of those macros and recompile MySQL.
  • Modify a character set file: This requires no recompilation. The true_word_char() macro uses a “character type” table to distinguish letters and numbers from other characters. . You can edit the <ctype><map> contents in one of the character set XML files to specify that '-' is a “letter.” Then use the given character set for your FULLTEXT indexes.

After making the modification, you must rebuild the indexes for each table that contains any FULLTEXT indexes."
  quote   reply
Old 29th April 2008   #5
mashton
 
mashton's Avatar
and i presume that an index rebuild requires a bit of downtime?
  quote   reply
Old 29th April 2008   #6
VelocityBoy
 
VelocityBoy's Avatar
If I kill the server, this is all your fault.
  quote   reply
Old 29th April 2008   #7
mashton
 
mashton's Avatar
seems fair :-)
  quote   reply
Old 29th April 2008   #8
VelocityBoy
 
VelocityBoy's Avatar
OK, Just looked... and I'm not doing that.

#1 requires re-compilation and for maintenance I'd rather stick with the CPU optimised build that is in the ubuntu repositories.

#2 doesn't require a re-compilation, but the Xml file you have to edit is scary. It basically defines the alphabet for that locale and the entire database.

The problem I have with #2 is that changing the alphabet may have consequences beyond the fulltext indexes. And there is no documentation to say what that may be. Would making '-' a letter impact in any way on the security given that '--' is a Sql comment and can be used in shaping a Sql injection query.

So... tags will solve this.
  quote   reply
Old 29th April 2008   #9
mashton
 
mashton's Avatar
Quote:
Originally Posted by VelocityBoy View Post
OK, Just looked... and I'm not doing that.

#1 requires re-compilation and for maintenance I'd rather stick with the CPU optimised build that is in the ubuntu repositories.

#2 doesn't require a re-compilation, but the Xml file you have to edit is scary. It basically defines the alphabet for that locale and the entire database.

The problem I have with #2 is that changing the alphabet may have consequences beyond the fulltext indexes. And there is no documentation to say what that may be. Would making '-' a letter impact in any way on the security given that '--' is a Sql comment and can be used in shaping a Sql injection query.

So... tags will solve this.
Wise you are, yes, yes.

I agree, particularly since I have already sold that On-One Pompino :-)
  quote   reply
Old 29th April 2008   #10
RPM
 
RPM's Avatar
why would anyone want to search for on-one anyway?

shit bikes made by a bloke with a bad attitude
  quote   reply
Old 29th April 2008   #11
Kelvin
 
Kelvin's Avatar
hohow would they know that with out searching and finding you think that :P
  quote   reply
Old 29th April 2008   #12
RPM
 
RPM's Avatar
shit you're right, better get t-shirts done
  quote   reply

Bookmarks Shortcuts
Posts Categories

Tags
on one, tags
Registered users don't see adverts. Signup or signin to hide them.


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sweet Short Cuts lpg General 53 23rd July 2008 07:18
Short Stubby Stems.. Flatlander Bikes & Bits 31 30th April 2008 20:03
Difference between long stem/short top tube and short stem/long top tube rusty Mechanics & Fixin' 7 23rd April 2008 14:43
Sum Up Your Weekend in Two Words Build Miscellaneous & Meaningless 80 5th March 2008 09:01
Brake bolt too short :-( rideordie Mechanics & Fixin' 6 8th October 2007 21:39

Creative Commons License All times are GMT. The time now is 22:20.