private string siteId; [Ignore] public string SiteId { get { if (string.IsNullOrWhiteSpace(this.siteId)) { this.siteId = "5EBC1E97-CC5A-4251-A2F6-E04A05E5C4DC"; } return this.siteId; } set { this.siteId = value; } }
This is my current fix(adding it to Node, Product, Variation), but still Find scheduled job should be fixed
Its best that SiteId matches actual SiteId so when content is old its deleted
It deletes everything that has no siteId AND a timestamp in the past (before the indexing started) so to clear the index of items that have been removed.
I would focus on the 413 Request Entity Too Large and dial the batch size down somewhat by tweaking the batch sizes. Look at this:
http://antecknat.se/blog/2015/02/23/convention-for-episerver-find-to-ignore-large-files/
Ok, lets say I add:
ContentIndexer.Instance.ContentBatchSize = 10;
Then next question, why 'EPiServer Find Content Indexing Job' needs to index everything twice, look at code on line 54, and how to get rid of it as it doubles everything:
// Decompiled with JetBrains decompiler // Type: EPiServer.Find.Cms.Job.IndexingJob // Assembly: EPiServer.Find.Cms, Version=11.1.2.4113, Culture=neutral, PublicKeyToken=8fe83dea738b45b7 // MVID: 97EFDC61-6868-439D-949B-8F9FA6949EAF // Assembly location: C:\Projects\xxx\src\xxx\Bin\EPiServer.Find.Cms.dll using EPiServer.BaseLibrary.Scheduling; using EPiServer.Find.Cms; using EPiServer.Find.Cms.BestBets; using EPiServer.PlugIn; using EPiServer.ServiceLocation; using EPiServer.Web; using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading; namespace EPiServer.Find.Cms.Job { [ScheduledPlugIn(Description = "This indexing job is used to reindex all content. During normal operation changes to content are being indexed as they are made without rerunning or scheduling of this job.", DisplayName = "EPiServer Find Content Indexing Job", LanguagePath = "/EPiServer/Find/indexingJob", SortIndex = 10100)] public class IndexingJob : JobBase { private static readonly object jobLock = new object(); private bool stop; public Injected<EPiServer.Web.SiteDefinitionRepository> SiteDefinitionRepository { get; set; } public IndexingJob() { this.IsStoppable = true; } public override string Execute() { SiteDefinition current = SiteDefinition.Current; try { Func<SiteDefinition, string> getNameOfDefinition = (Func<SiteDefinition, string>) (sd => { if (SiteDefinition.Empty == sd) return "Global assets and other data"; return sd.Name; }); if (!Monitor.TryEnter(IndexingJob.jobLock)) throw new ApplicationException("Indexing job is already running."); try { string str1 = string.Empty; if (Enumerable.Any<SiteDefinition>(this.SiteDefinitionRepository.Service.List())) { foreach (SiteDefinition siteDefinition in Enumerable.Concat<SiteDefinition>(this.SiteDefinitionRepository.Service.List(), (IEnumerable<SiteDefinition>) new SiteDefinition[1] { SiteDefinition.Empty })) { SiteDefinition.Current = siteDefinition; this.stop = false; StringBuilder statusReport = new StringBuilder(); ContentIndexer.ReIndexResult reIndexResult = ContentIndexer.Instance.ReIndex((Action<ContentIndexer.ReIndexStatus>) (s => { if (s.IsError) statusReport.AppendLine(EPiServer.Find.Helpers.Text.StringExtensions.StripHtml(s.Message)); this.OnStatusChanged("Indexing job [" + getNameOfDefinition(SiteDefinition.Current) + "] [content]: " + EPiServer.Find.Helpers.Text.StringExtensions.StripHtml(s.Message)); }), new Func<bool>(this.IsStopped)); str1 = str1 + "Indexing job [" + getNameOfDefinition(SiteDefinition.Current) + "] [content]: " + EPiServer.Find.Helpers.Text.StringExtensions.StripHtml(reIndexResult.PrintReport()).Replace("\n", "<br />") + "<br />"; if (statusReport.Length > 0) str1 = str1 + statusReport.ToString().Replace("\n", "<br />") + "<br />"; string str2 = ExternalUrlBestBetHandlers.ReindexExternalUrlBestBets(); if (str2.Length > 0) str1 += str2; } } else str1 += "No sites have been configured. Please go to the 'Manage Websites' section to add a site configuration."; return str1; } finally { Monitor.Exit(IndexingJob.jobLock); } } finally { SiteDefinition.Current = current; } } public override void Stop() { this.stop = true; } public bool IsStopped() { return this.stop; } } }
No, that's rushing the conclusion...
Each parallell thread will take ten batches (but no more than 1000 items) and on line 117:
foreach (IEnumerable<IContent> source in this.Batch<IContent>((IEnumerable<IContent>) list2, this.ContentBatchSize)) this.IndexBatch((IEnumerable<IContent>) Enumerable.ToList<IContent>(source), statusAction, ref numberOfContentErrors, ref indexingCount);
Your second question relates to how it does indexing of site specific data versus the global assets and other data. The items are provided by the IReIndexInformation implementing class(es). It will first index all items under a (specific) site root. Later it will index all items that lives outside of a site root, these include global assets and other data. That's why it's named so.
It should not double anything, if you have duplicates in your index, please give examples. Thanks!
Yes, you are right about ContentBatchSize it is correct, I was rushing, sorry about that!
So it is true as @ksjoberg says commerce data that is added using IReIndexInformation is indexed only in 'site' reindex and not under - 'Global assets and other data' reindex.
Problem here was, as a workaround for 'Global assets and other data' reindex not to delete commerce data I had added:
private Guid siteId; [Ignore] public Guid SiteId { get { if (this.siteId == Guid.Empty) { this.siteId = Guid.Parse("5EBC1E97-CC5A-4251-A2F6-E04A05E5C4DC"); } return this.siteId; } set { this.siteId = value; } }
And still commerce data was deleted, but infact should have added:
private string siteId; [Ignore] public string SiteId { get { if (string.IsNullOrWhiteSpace(this.siteId)) { this.siteId = "5EBC1E97-CC5A-4251-A2F6-E04A05E5C4DC"; } return this.siteId; } set { this.siteId = value; } }
difference is between string and Guid
if its Guid then this request:
DELETE http://es-eu-dev-api01.episerver.net/xxxx/mysite/_query HTTP/1.1 Content-Type: application/json User-Agent: EPiServer-Find-NET-API/11.1.2.4113 Host: es-eu-dev-api01.episerver.net Content-Length: 396 Expect: 100-continue Accept-Encoding: gzip, deflate { "filtered":{ "query":{ "constant_score":{ "filter":{ "and":[ { "range":{ "GetTimestamp$$date":{ "from":"0001-01-01T00:00:00Z", "to":"2016-03-18T11:29:19.5380677Z", "include_lower":true, "include_upper":false } } }, { "or":[ { "term":{ "SiteId$$string":"00000000-0000-0000-0000-000000000000" } }, { "not":{ "filter":{ "exists":{ "field":"SiteId$$string" } } } } ] } ] } } }, "filter":{ "term":{ "___types":"EPiServer.Core.IContent" } } } }
deletes all commerce data if its string it skips, as it looks to:
Also original response is modified to use string instead of Guid
However now question remains:
Is it normal practice that I need to add property(string SiteId) to entities that are added by IReIndexInformation, so that they are not deleted by 'Global assets and other data' reindex job?
Hi,
Let me break the query down that you posted above:
{ "filtered":{ "query":{ "constant_score":{ "filter":{ "and":[ { "range":{ "GetTimestamp$$date":{ "from":"0001-01-01T00:00:00Z", "to":"2016-03-18T11:29:19.5380677Z", "include_lower":true, "include_upper":false } } }, { "or":[ { "term":{ "SiteId$$string":"00000000-0000-0000-0000-000000000000" } }, { "not":{ "filter":{ "exists":{ "field":"SiteId$$string" } } } } ] } ] } } }, "filter":{ "term":{ "___types":"EPiServer.Core.IContent" } } } }
What it does is: it deletes all documents that were indexed before this indexing run was performed AND (has the SiteId-field set to Guid.Empty OR not having the SiteId-field).
So to answer your question: No, adding that property is not required. Performing that query will only delete items that were not indexed this time around, allowing you to perform a full index without emptying the index first.
There are two sub jobs that are done by epi.find reindex job:
1.) Indexing job [mysite] [content] - will call it first job
2.) Indexing job [Global assets and other data] [content] - will call it second job
1.)
First job takes timestamp when it starts and then reindexes all site content(that includes commerce data(what inherits from IContent, as it is mannualy added using IReIndexInformation) as well)
Afterwards first job runs delete and correctly deletes only old data
2.)
Second job takes timestamp
It looks at images files and other data
Then afterwards deletes everything that is IContent and has old timestamp
As commerce data is indexed in first job, and as by default it does not have SiteId property, all the commerce data is deleted,
unless that property is added to variation, product and node entities.
Does this clarifies it?
Hi,
Thank you, I think I got it. We're looking into it. It is not expected that the first (site-specific) indexing job would add items to the index that are non-site-specific, but as you are saying, it is.
Just to verify. Going back to your last post; you're saying that the second job doesn't index commerce data (again)?
Yes, it does not, and that is what I expect as it is done with first job(so only once)
Using EPiServer.Find version 11.1.2.4113
This is what is logged in scheduled job history:
It consists of two parts:
Problem: While job is running commerce data is added in index and as a last step all commerce data is deleted, even tough it was just added to index
This is how it looks from fiddler:
For 'mysite' reindex:
a lot of:
requests that indexes CMS data and that also include commerce data
as a last step this is sent over wire:
And thus far everything is fine, it would appear that only old content is removed and that is correct
Then second part is ran: 'Global assets and other data' reindex
also a lot of:
requests, some of them fail with:
should not to be related as that is only few missing items(andd all the commerce data was indexed in first part)
Then as a last request following is sent:
And what that does is:
delete everything that is IContent and does not have property SiteId in index, so it deletes all of commerce data(Nodes, Products, Variations as all of them inherit from IContent)
something similar is fixed in current version: http://world.episerver.com/documentation/Release-Notes/ReleaseNote/?releaseNoteId=FIND-811
but still similar problem exist