Full-Text Searching in MS SQL Server 2005 Express
SQL Server has a very powerful Full Text indexing capability, which is even
better in SQL Server 2005. I use the free editions of Visual Web Developer Express
and SQL Server 2005 Express, but these still allow the full capability of full-text
searching to be performed. This means that you can have a large database of
long text fields and yet find all records containing certain words very quickly
just by executing a SQL query.
Note: If you are using SQL Server Express, make sure you have downloaded the
version with full-text indexing. There is a smaller version without it, but
you must have the full version. If you have already installed the smaller version,
you may need to uninstall that and install the full version.
Getting started
- Microsoft have a background document on Full
Text Search concepts and I suggest that you read that first.
- The fundamental fact of full-text indexing is that the database requires
extra files called the catalog. You still have the normal .mdb and .ldb files,
but you also have a folder called FTData which contains a folder for your
catalog, which in turn contains lots of files that make up the full text index.
These files are all made for you and kept up to date by SQL Server.
- How to make the Catalog
Now must execute some SQL commands to create the full-text catalog.
The catalog itself is a container to put one or more full-text indexes into.
So, if you want to be able to search on the text of three of your database
tables, you make a single catalog and create three indexes which go into that
catalog. Note that if your indexes are going to be really huge you should
make a separate catalog for each index, but to get started, stick with one
catalog.
WARNING: Some of the information of the Microsoft website relates to the full
version of SQL Server, such as the "Storage folder" that Express
doesn't have. Don't panic, you just need to create the catalog by hand, rather
than through the Management Interface. Also, you may find out-of-date information
about SQL Server 2000, telling you that a table can only have one column full-text
indexed. Ignore this, as it is not true under SQL Server 2005.
In this example I will make a catalog and put two indexes into it. Here is
how:
- Start SQL Server 2005 Management Studio Express, and connect to your database.
- Click the "New Query" icon so that you can start entering SQL
queries.
- Type:
use mydatabase
go
EXEC sp_fulltext_database 'enable'
go
CREATE FULLTEXT CATALOG mycatalog
go
- Click "Execute" to run this SQL statement. This will enable
fulltext searching on the database (it is not enabled by default), and then
create the catalog, which will be called mycatalog. The catalog
is just a folder called "mycatalog" in Microsoft SQL
Server\SQL.1\FTData. Note here again that if your data tables are huge,
this catalog could get huge too, so may be worth putting on a different
disk to the one with the database on it, for performance reasons.
- How to make the full-text index
The rules state that if you want to make a fulltext index on a table,
that table must have a "unique, single-column, non-nullable index".
Very often your primary key will already be just like this, because all primary
keys are unique and non-nullable. However, if your primary key is on more
than one column, you have a slight problem. You will need to add an extra
column to the table that is unique, and add a new index that is non-nullable
on this column. The standard way to do this is to create a new column called
"text_id" which is an integer column, and make it an identity with
a seed of 1 and an increment of 1. If you do this, and add a new index on
just that column, you will have a unique, single-column, non-nullable index
on your table. Ok, so it wastes a bit of space, but you have to obey this
rule or things don't work.
If you already have a single-column primary key but it is on a really
large field such as a GUID, it is suggested that you also add an int
field and use that instead as the key field. This is because the full-text
index will contain a copy of all the GUIDs, and so will take up a lot of space.
And now, how to create the fulltext index itself:
- If you needed to add a new field, make sure you have created an index
on it; otherwise just use the PK.
- Execute this SQL:
CREATE FULLTEXT INDEX ON mydatabase.dbo.mytable
(
column_to_be_indexed
Language 2057
)
KEY INDEX myindex ON mycatalog
WITH CHANGE_TRACKING AUTO
Here, mytable is the name of the table that we are building the
index on, and column_to_be_indexed is the name of the column of
text that you want to be able to search on. myindex is the name
of the index to use (which is most likely your Primary Key), and mycatalog
is the name of the catalog you previously created.
The "Language 2057" part tells SQL Server which language is used
in the texts, such as English or French. This is the really clever part,
because SQL Server knows that in English "mice" is the plural
of "mouse", so a search on "mouse" will return records
that contain "mice" too. In this case I used 2057 which is the
code for UK English. Microsoft have a table of different codes for different
languages, and you should pick the appropriate one. There is also a code
0X0 which means "no language specified".
- Now, to verify that things have worked as planed, start SQL Server Manager
interface and right-click on your table and select "Modify". Find
the column that you wanted to be indexed and right-click it, and choose
"Full text index...". Check that "Columns" is set to
the column name you want indexed and check whether "Active" is
set to "Yes". If it isn't then the index hasn't yet been built.
- You can find the status of your fulltext indexes like this:
exec sp_help_fulltext_columns mytable
This shows you which columns in given table are configured for full-text
indexing. Also:
select * from sys.fulltext_indexes
shows you the status of the current indexes and when they were last updated.
The key thing is to make sure that change_tracking_state_desc is
set to "AUTO" and that has_crawl_completed = "1".
You can also see the time of the last crawl, meaning the time that the index
was last updated.
- You will have to wait a while for the index to be created. If you have
a large table, the CPU usage on your PC will probably go up to 100% while
this happens. There is a trouble-shooting section further down this page
if you get problems.
- Performing a Find on your index
The SQL SELECT statement to find records is quite simple:
SELECT mycolumn
FROM mytable
WHERE CONTAINS(mycolumn, '"find this"')
(You must put the search words in double-quotes within the single-quotes needed
by SQL Server)
- Eliminating Noise Words
There is a noise file for each language, and this contains simple words like
"and" and "or" that are not worth searching for. If your
text fields are in HTML or XML, then you should add words like "BR"
and "P" to the noise words list.
My experiences with full-text indexes
When you first set up a full text index, SQL server takes a long time to build
the catalog, if the tables are large. This really slows down your PC, but full
text searching still works during this time, all be it very very slowly. After
a while the system sorts itself out and queries become very rapid, even instant.
Troubleshooting Full Text Searching
Finding errors in the Application Events log
On your desktop, right-click "My Computer" and choose "Manage"
then "Event Viewer" then "Application". This shows you the
Application Event Log, and in there you will see any error messages created
by the Full Text indexing service. Usually these errors will redirect you to
the server logs themselves for more information.
SQL Server Logfile & Fulltext Index Log-file
Have a look in the logs. I found mine in C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG.
The fulltext logs are called SQLFTxxxxxx.LOG. They are pretty difficult to understand,
but the answer is in there somewhere!
Try ReBooting
You need to reboot after you install FullText Indexing, and a reboot is always
a good place to start when troubleshooting. I had the error message "Error
'0x80040e09' occurred during full-text index population" appearing
in my logfiles, and no fulltext indexes created at all until I figured out that
I needed to reboot.
Commands & Queries that you can type into SQL Server 2005:
How to check the status of the full-text indexes in a catalog:
USE mydatabase;
GO
EXEC sp_help_fulltext_tables mycatalog;
GO
How to enable fulltext search on a table:
USE mydatabase;
GO
ALTER FULLTEXT INDEX ON mytable ENABLE;
GO
How to check if full text indexing is enabled:
USE mydatabase;
GO
SELECT DATABASEPROPERTY('mydatabase', 'IsFullTextEnabled');
GO
Example SQL Commands
Here is an example sql statement, that sets up three columns of a table for
fulltext indexing, and sorts out the change tracking state, and starts a full
crawl:
use mydatabase
exec sp_fulltext_database 'enable'
go
exec sp_fulltext_catalog 'mycatalog', 'create'
go
exec sp_fulltext_table 'mytable', 'create', 'cmscatalog',
'PK_document'
go
exec sp_fulltext_column 'mytable', 'mycolumn', 'add', '2057'
exec sp_fulltext_column 'mytable', 'anothercolumn', 'add',
'2057'
exec sp_fulltext_column 'mytable', 'onemorecolumn', 'add',
'2057'
go
exec sp_fulltext_table 'mytable', 'activate'
go
EXEC sp_fulltext_table 'mytable', 'Start_background_updateindex';
go
exec sp_fulltext_table 'mytable', 'start_full'
go
Conclusion
SQL Server 2005 Express offers an excellent facility to allow searching for
text in your database. It is both fast and reliable, and uses some clever "language
aware" features that would be hard to implement any other way.
|
|