Hi everyone,
Some users have asked me if there is an easy way to configure some Solr search exclusions. ATM, a standard Solr search query performs the following “exclusions”:
- access rights (search results don’t include pages that you can’t view)
- hidden pages (based on your preference)
- locale (only pages that match the current locale, or the default translation if there’s no translation for the current locale)
- wiki (search is limited to the current subwiki, and in the case of the main wiki there is a
wikisSearchableFromMainWiki
configuration)
But it seems this is not enough. Some users would like to be able to exclude specific “spaces” (subtrees of the page hierarchy) from the search. So my first question is: do you think it’s useful to add a configuration for this?
If you find this useful then the next question is: should this be an index-time filter or a query-time filter?
- Index-time:
-
Pro: Don’t index pages that are not meant to be searched.
-
Con: But, are you sure you’ll never need to perform a custom search query (outside the standard search page / quick search) on those pages? Plus there may be other features of XWiki that expect those pages to be indexed (because we’re starting to rely on Solr more and more).
-
Con: We’ll have to define a syntax for specifying exclusions. We can use regular expressions, but it’s not trivial to express and implement filters like “pages with an object of this type” or “pages created more than 10 years ago”.
-
Con: when modifying the configuration we’ll have to update the index:
- remove entries that match the new exclusion filter
- index entries that were matching the old filter but are not matching the new filter anymore (which doesn’t sound easy to do)
And if you do a mistake expressing the exclusion filter you can end up deleting the entire search index.
-
- Query-time:
- Pro: we can use the Solr query syntax to express complex / advanced filters, e.g.
-space_prefix:A.B
- Pro: changing the exclusion filter doesn’t require updating the search index. If you do a mistake, the search results are back as soon as you fix the filter.
- Con: time and space allocated on indexed pages that may never be queried
- Pro: we can use the Solr query syntax to express complex / advanced filters, e.g.
I find the query-time approach better. On the UI side, I would add a new sub-section, named “Searching” before the existing “Indexing” one, in the Solr search administration section. It would have a single field “Exclusions”, a text area, where administrators can write a filter query per line, like:
-space_prefix:A.B
(we could add the minus ourselves, and let the user simply specify a positive filter).
Modifying the main search page to obey this configuration should be easy. For the quick search / search suggest (from the top bar), it’s a bit more complex because each search suggest source has its own configuration. We could decide to:
- apply the search excludes all the time (i.e. modify the
SuggestSolrMacros
that are used by all sources) - allow the search suggest source to indicate it they follow the standard search exclusions or not
I’d keep it simple and apply the exclusion filters all the time.
WDYT?
Thanks,
Marius
5 posts - 4 participants