Infuriating search not indexing all page string fields

Vote:
 

Has anybody had any experience of the Episerver Search not indexing all the text fields in a page? I have a text field in my SitePageData which stubbonly refuses to show up in any searches. I have tried changing it to a XHTMLString to see if that makes any difference, but it does not. The field is updated from a scheduled task which collects all the text data from any blocks within the page. I know that the field is getting updated as the function that populates the fields can also read them out again to a log, with a different parameter.

The site is supposed to go live on 30th Dec so i am a bit worried

Thanks, in advance,

Marshall

#173440
Dec 27, 2016 11:06
Vote:
 

Hi Marshall,

So, I assume you are talking about the built-in Episerver search? It is annoying that it doesn't index everything, but, I think that is on purpose becasue they want people to use Find. Anyway, here is what you do to index custom fields.

First, create an init module:

[InitializableModule]
    [ModuleDependency(typeof(EPiServer.Web.InitializationModule))]
    public class SearchInitialization : IInitializableModule
    {
        public void Initialize(InitializationEngine context)
        {
            IndexingService.DocumentAdding += CustomizeIndexing;
        }

        public void Uninitialize(InitializationEngine context)
        {
            IndexingService.DocumentAdding -= CustomizeIndexing;
        }

        void CustomizeIndexing(object sender, EventArgs e)
        {
            var addUpdateEventArgs = e as AddUpdateEventArgs;

            if(addUpdateEventArgs == null)
            {
                return; //Document is not being added/updated
            }

            var document = addUpdateEventArgs.Document;

            var page = document.GetContent<IContent>() as PageData;

            if(page != null && page is SitePageData)
            {
                var examplePage = page as SitePageData;
                                         
                if(examplePage.YourCustomField != null)
                {
                    document.Add(new Field("CUSTOM_FIELD", examplePage.YourCustomField, Field.Store.NO, Field.Index.ANALYZED));
                }
                
            }
        }
    }

Here is the extension method GetContent that the variable page is using when getting set:

public static class DocumentHelper
    {
        public static T GetContent<T>(this Document document) where T : IContent
        {
            const string fieldName = "EPISERVER_SEARCH_ID";

            var fieldValue = document.Get(fieldName);

            if(string.IsNullOrWhiteSpace(fieldValue))
            {
                throw new NotSupportedException(
                    string.Format("Specified document did not have a '{0}' field value", fieldName));
            }

            var fieldValueFragments = fieldValue.Split('|');

            Guid contentGuid;

            if(!Guid.TryParse(fieldValueFragments[0], out contentGuid))
            {
                throw new NotSupportedException(
                    "Expected first part of ID field to be valid GUID");
            }

            return ServiceLocator.Current.GetInstance<IContentLoader>().Get<T>(contentGuid);
        }
    }

Here is an example of a search query using the cutom field:

public void Search(string q)
        {
            var culture = ContentLanguage.PreferredCulture.Name;
            SearchResult = new List<IndexResponseItem>();

            var query = new GroupQuery(LuceneOperator.AND);

            // Only search for pages
            query.QueryExpressions.Add(new ContentQuery<PageData>());

            // Search for keywords in any of te fields specified below (OR condition)
            var keywordsQuery = new GroupQuery(LuceneOperator.OR);

            // Search in default fields
            keywordsQuery.QueryExpressions.Add(new FieldQuery(q));

            // Search in the custom fields
            keywordsQuery.QueryExpressions.Add(new CustomFieldQuery(q, "CUSTOM_FIELD"));

            query.QueryExpressions.Add(keywordsQuery);

            // The access control list query will remove any pages the user doesn't have read access to
            var accessQuery = new AccessControlListQuery();
            accessQuery.AddAclForUser(PrincipalInfo.Current, HttpContext.Current);
            query.QueryExpressions.Add(keywordsQuery);

            var fieldQueryResult = SearchHandler.Instance.GetSearchResults(query, 1, 40)
                .IndexResponseItems
                //.Where(x =>
                //    (x.Culture.Equals(culture) || string.IsNullOrEmpty(x.Culture))
                //    )
                .ToList();

            SearchResult.AddRange(fieldQueryResult);

        }

Lastly, when you update the field in your scheduled job, update the index:

var contentSearchHandler = ServiceLocator.Current.GetInstance<ContentSearchHandler>();
            contentSearchHandler.UpdateItem(currentPage as IContent);

I hope this helps!

- John

#173537
Edited, Dec 30, 2016 20:20
This topic was created over six months ago and has been resolved. If you have a similar question, please create a new topic and refer to this one.
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.