Orphaned blobs slowly eating AppData disk space

Vote:
 

I recently took over the support of a fairly complex web site solution, currently running CMS 8.4. It's a single instance installation. Rather vanilla setup.

The site polls information from a web service and creates pages containing this information. The pages live for a while, and then they are deleted. The pages also have a "For this page" folder, in which a few image files are placed when the page is created. 

So - we have pages with attached ContentAssetFolders that are created and deleted automatically and continuosusly.

Now, the problem is that the blobs directory in my AppData folder contains about 30 000 blobs that are not referenced in the database (not that I can see, anyway). So it seems that sometimes when pages are deleted, their attached ContentAssetFolders are not.

The scheduled jobs "Remove abandoned blobs" and "Remove unrelated content assets" run on a daily basis and remove lots of items according to their logs.

I know that these items utilise EPiServer's Change Log. Now, the job that automatically truncates the log is indeed activated, however it looks like it hasn't worked since a long time ago. Every run logs a timeout, which I guess is due to the fact that the tblChangeLog contains nearly 20M rows and uses 8 GB of disk space. I'd like to clear this out completely, but I dare not do it until I've solved the orphaned blob issue.

I used the following SQL query to find all references to blob directories in the database. I'm not sure this is an entirely safe way of determining all references to blob directories, but I believe so:

SELECT * FROM [MyClient].[dbo].[tblContent] c
inner join [MyClient].[dbo].[tblContentLanguage] cl
on c.pkID = cl.fkContentID
where c.fkContentTypeID in (select pkid from [MyClient].[dbo].[tblContentType])
and cl.BlobUri is not null

Then I matched all the BlobUris against the folders in AppData\blobs. All the BlobUris had a matching folder in that directory, but some 30k folders in the directory did not have a matching BlobUri and this makes me think that they are orphaned and can be deleted.

The questions I need to answer are the following:

1) Has anyone else experienced this problem and how did you go about solving it?

2) Can I use the strategy described above (compare my SQL query result to the blob directories present in appdata) to determine which blob directories are orphaned and can be safely deleted?

3) How can I stop blobs from being orphaned in the future (this is less important if I can get 2) answered, since I could then just create a scheduled job that does this every now and then).

Any thoughts are most appreciated!

#155137
Sep 09, 2016 10:54
Vote:
 

Interesting, I also have the same issue with one of my clients sites in Episerver cms 8. The scheduled jobs "Remove abandoned blobs" and "Remove unrelated content assets" are also set to run on a daily basis, so that shouldn't the issue. The blob folder also contains several thousands folders which don't have a match in regards to the BlobUris retrieved from the sql query. 

#162813
Oct 17, 2016 14:50
Vote:
 

I proposed a solution to the problem to EPiServer support, and they confirmed it would not be "dangerous", so if it is of any help to you - create a list of all the paths to all the directories in the blobs folder, send them into the following method, and you will get back a list of paths to the directories that may be safely removed.

        private IReadOnlyList<string> GetBlobsToRemove(IReadOnlyList<string> allBlobPaths)
        {
            var blobsToRemove = new List<string>();
            var mapper = ServiceLocator.Current.GetInstance<IPermanentLinkMapper>();

            foreach (var blobPath in allBlobPaths)
            {
                this.ThrowIfStopped();
                var guid = Guid.Parse(blobPath.Split('\\').Last());
                var map = mapper.Find(guid);
                if (null == map)
                {
                    blobsToRemove.Add(blobPath);
                }
            }

            return blobsToRemove;
        }
#162816
Oct 17, 2016 14:57
Vote:
 

Great Henrik, thanks a lot! Will definitely try it out then. 

/Christian

#162820
Oct 17, 2016 14:59
Vote:
 

Hi

We have a similar issue but are using EPiCode.SqlBlobProvider, which means, that all images are placed in DDS. Do you know how I can identify all orphan blobs in DDS, I tried with this, but I'm not 100% sure it is correct?

SELECT count(*)
  FROM [CustomerDb].[dbo].[tblBigTable] where Indexed_String01 like 'epi.fx.blob://%' and Indexed_String01 not in (
    SELECT cl.BlobUri FROM [CustomerDb].[dbo].[tblContent] c 
    inner join [CustomerDb].[dbo].[tblContentLanguage] cl 
    on c.pkID = cl.fkContentID 
    where c.fkContentTypeID in (select pkid from [CustomerDb].[dbo].[tblContentType]) 
    and cl.BlobUri is not null
  )

#282183
Jun 21, 2022 7:23
* You are NOT allowed to include any hyperlinks in the post because your account hasn't associated to your company. User profile should be updated.