Full Text Search in Managed Database not Stemming Properly?

August 9, 2019 169 views
PostgreSQL Databases

In my ‘managed’ Postgres database, running that query:

SELECT to_tsvector('The quick brown fox jumped over the lazy dog.’);

Returns:

'brown’:3 'dog’:9 'fox’:4 'jumped’:5 'lazy’:8 'over’:6 'quick’:2 'the’:1,7

But, from tutorials and also from running that query on a Postgres database installed on Ubuntu, I get:

'brown’:3 'dog’:9 'fox’:4 'jump’:5 'lazi’:8 'quick’:2

It seems like the managed databases aren’t dropping the 'the’, not changing 'lazy’ into 'lazi’, etc.

I also have the behavior of Postgres not finding text with 'dogs’ if looking for 'dog’, for example, which worked with a self installed database I had on Ubuntu.

I’m wondering if there’s anything I can do on our side or is it a limitation of the managed databases?

Note: I checked the collation / charset of the database and it’s all UTF8 (same as my self installed DB that works)

Thank you!

1 Answer

From what I understand, English is by default on databases when installing Postgres (or at least it was on mine), so running:

SELECT to_tsvector('English', 'The quick brown fox jumped over the lazy dog.');  

will give the right result.

You can also set the default to english by running:

ALTER DATABASE dbname SET default_text_search_config = 'pg_catalog.english' 

Also, make sure the extension is installed by running:

CREATE EXTENSION pg_trgm;

To k now if it’s installed, run:

\dx
Have another answer? Share your knowledge.