Knowledgebase & FAQ

RecFind 6 Noise Word configuration

Full text indexing systems generally have the ability to configure "noise words" (also known as "stop words"). This gives you the ability to exclude commonly used terms from your index and therefore avoid your index from becoming bloated. For example, in the English language, words such as "a," "and," "is," and "the" are in the English noise word file and are left out of the full-text index since they are considered useless to a search.

Default Noise Words

0 and can has k of so to which
1 another come have l on some too while
2 any could he like only still u who
3 are d her m or such under will
4 as did here make other t up with
5 at do him many our take use would
6 b does himself me out than v x
7 be e his might over that very y
8 because each how more p the w you
9 been else i most q their want your
$ before f if much r them was z
_ being for in must re then way  
a between from into my s there we  
about both g is n said these well  
after but get it never same they were  
all by got its no see this what  
also c h j now should those when  
an came had just o since through where  

Reconfiguring your RecFind 6 Noise Words (for SQL Server 2005)

RecFind 6 utilizes the Microsoft SQL Server full text indexing capabilities and therefore the configuration of Noise Words is predominantly a SQL Server configuration task.

For SQL Server 2005, the noise word files are located in the directory:
$SQL_Server_Install_Path\Microsoft SQL Server\MSSQL.1\MSSQL\FTDATA\

This directory is created, and the noise-word files are installed when you set up SQL Server with the Full-Text Search support. Noise-word files are simple text files that can be edited using a text editor (eg. Notepad).

To alter your noise word configuration:

  1. Using Notepad (or other text editor), open the NoiseNEU.txt file located in $SQL_Server_Install_Path\Microsoft SQL Server\MSSQL.1\MSSQL\FTDATA\ folder of the SQL Server 2005 server.

  2. Make the necessary configuration changes, i.e. add or remove necessary terms

  3. Using SQL Server Management Studio, open the NoiseWords table located in the RecFind 6 database and make the same configuration changes again (ie. add or remove necessary terms)

  4. Repopulate the full-text indexes by logging in to the DRM and selecting the option to "Re-Index The Database"

Reconfiguring your RecFind 6 Noise Words (for SQL Server 2008)

RecFind 6 utilizes the Microsoft SQL Server full text indexing capabilities and therefore the configuration of Noise Words is predominantly a SQL Server configuration task. In SQL Server 2008 noise words are now called stop words and are maintained via a new feature called Stoplists.

Unlike SQL Server 2005 the stop words are stored within the database, and by default text indexes (including RecFind 6's) use the System Stoplist. The following procedures explains how to create your own Stoplist for RecFind 6, update the applicable RecFind 6 indexes to use this Stoplist and then how to reconfigure your Stoplist.

Note: Before commencing this process you will need to ensure that your RecFind 6 database is in SQL 2008 compatibility mode. To check your database, from SQL Server Management Studio view the Properties of your RecFind 6 database and on the Options page ensure that the compatibility level is set to "SQL Server 2008 (100)".

To create your own Stoplist:

  1. From Microsoft SQL Server Management Studio, from underneath the RecFind6 Database expand the Storage folder.

  2. Right-click on the 'Full Text Stoplists' folder and select 'New Full-Text Stoplist...'

  3. Enter a name for your Full-text stoplist (eg. RecFind6Stoplist) and select the option "Create from the system stoplist".

  4. Click 'OK'.

To link the full text indexes to your new Stoplist:

  1. From Microsoft SQL Server Management Studio, expand the RecFind6 Database, then expand the Tables folder.

  2. Right-click on the table EDOC and select 'Full-Text index > Properties'

  3. For the Full-Text Index Stoplist change from "<System>" to the Stoplist you created earlier (ie. RecFind6Stoplist).

  4. Click the "Repopulate index" (leaving the "Full" option selected)

  5. Click 'OK'.

  6. Repeat steps 2 to 5 for the tables MetadataProfile, Space and Title, plus the view (under the Views folder) EDOCFieldView

Now you have your own custom Stoplist, to alter your noise words perform the following:

  1. From Microsoft SQL Server Management Studio, double-click on your Stoplist (found under RecFind6 > Storage > Full Text Stoplists)

  2. Select the applicable action (eg. "Delete stop word") and enter the Stop word (note: can be case sensitive, to see list of current stop words run the query "select * from sys.fulltext_stopwords where language = 'Neutral'") and select the language of "Neutral".

  3. Open the NoiseWords table located in the RecFind 6 database (i.e. right-click on the table and select "Edit top 200 rows") and make the same configuration changes again (ie. add or remove necessary terms). Note: to delete highlight the entire row, right-click and then select "Delete".

  4. Repopulate the full-text indexes by logging in to the DRM and selecting the option to "Re-Index The Database"

Special note about noise words

Please note that SQL Server still takes in consideration the position of the noise word in phrases, and therefore you can receive unexpected results. For example searching for "walk the dog" ("the" is a noise word) will locate records with the phrase "walk a dog", "walk 1 dog", "walk over dog", "walk their dog", etc because a noise word is located in the same position in each instance and therefore could be a result.

Further reading:

Noise Words @ Microsoft MSDN : http://msdn.microsoft.com/en-us/library/ms142551(SQL.90).aspx (SQL 2005) or http://msdn.microsoft.com/en-us/library/ms142551.aspx?ppud=4 (SQL 2008)

» Back to FAQ index

  Back | Top
Site by Intelliweb Productions