Deane Barker
Sep 1, 2009
  6372
(4 votes)

EPiServerSearchMeta

Over the last few years, I’ve done four implementations of the Google Mini search appliance.  This is a piece of hardware (a 1U rack mount) that acts has a search crawler and engine.

It crawls your Web site (or whatever else you point it at) 24 hours a day, and you can throw queries at it via a REST interface, and get results back as XML (you can also transform the XML on the device itself, and use it to actually present queries to the end user, but this is awkward and requires you to dupe your interface on another machine, which is never fun).

The device is quite good for text-heavy search, and retails for $2,995, making it a cheap solution for a lot of situations.

The Mini can do fairly granular searching of META (search protocol reference). Over the years, we’ve figured out that you should stack as many META tags as possible in your pages, because you never know what you’re going to want to search on.  If, for instance, your client wants to isolate a search to just news articles, then it’s helpful to have a META tag in there with the type of content (alternately, you could create a distinct collection in the device, but maintaining these can be tedious).

For another CMS, we developed a control that dumped all sorts of META to the HEAD tag of the page.  We refined this over the years to only run for the Mini, since it got to the point where it was computationally expensive to find and return all this information, and we only needed it for the Mini (we didn’t need it for public search engines, for instance).

For our first EPiServer/Mini integration, we adapted the control a bit, but the functionality is roughly the same – it dumps all sort of information to META tags, including any properties you might specify.

Register it like this:

<%@ Register TagPrefix=”Blend” Namespace=”Blend.EPiServer.Controls” Assembly=”[insert your assembly name here"]” />

Then put the control in the HEAD tag like this:

<Blend:EPiServerSearchMeta TagNameFormat="MySite.EPiServer.{0}" UserAgentString=”gsa” QuerystringCode=”OpenSesame” Properties="Title,Summary" runat="server" />

It will only run when the currently executing page is of type TemplatePage (so, only for EPiServer templates that have a content object attached).

The control outputs the following information:

  • The page ID
  • The page type ID
  • The page type name
  • The page name
  • The parent page ID
  • The parent page type ID
  • The parent page type name
  • Every page ID from the current page’s parent back to the start page (in multiple META tags)
  • The depth of the page (the start page is 0, top level pages are 1, etc.)

It looks like this:

<meta name="MySite.EPiServer.PageID" content="9" />
<meta name="MySite.EPiServer.PageTypeID" content="7" />
<meta name="MySite.EPiServer.PageTypeName" content="NewsArticle" />
<meta name="MySite.EPiServer.PageName" content="Deane Saves the World" />
<meta name="MySite.EPiServer.ParentPageID" content="8" />
<meta name="MySite.EPiServer.ParentTypeID" content="5" />
<meta name="MySite.EPiServer.ParentTypeName" content="NewsArchive" />
<meta name="MySite.EPiServer.AncestorID" content="8" />
<meta name="MySite.EPiServer.AncestorID" content="7" />
<meta name="MySite.EPiServer.AncestorID" content="3" />
<meta name="MySite.EPiServer.PageDepth" content="3" />
<meta name="MySite.EPiServer.Category" content="7" />
<meta name="MySite.EPiServer.Category" content="9" />
<meta name="MySite.EPiServer.Category" content="13" />
<meta name="MySite.EPiServer.Category" content="15" />
<meta name="MySite.EPiServer.Category" content="16" />

There are a few control attributes…

TagNameFormat is the format of the “name” attribute of the resulting META tag.  So, in the above example, the Page Type ID of the content will output as:

<meta name=”MySite.EPiServer.PageTypeID” content=”7”/>

Properties is a comma-delimited list of properties you want to dump to META.  Be careful here, obviously – the entire text of the content object is unnecessary and potentially problematic.  The control will simply call ToWebString() on all of them, so make sure this outputs what you want.  Also, if the property is a Category selection, the control will split the IDs up under separate tags.

UserAgentString is used to identify the crawler. Enter a value in here that will be unique to the user agent string of your crawler – “gsa” works well for the Mini.  If the control finds this string it will execute, otherwise it will exit without doing anything.

QuerystringCode is a secret code you can use to debug the control.  If this value is found in a querystring argument called “show_meta,” the control will always execute (regardless of the user agent string). This is useful for debugging, so you can see the META it outputs.

Get the Code (.zip file, containing a single .cs file)

Sep 01, 2009

Comments

Sep 21, 2010 10:32 AM

Awesome stuff Deane. Nice to see you blogging
/ Jacob Khan

joel.williams@auros.co.uk
joel.williams@auros.co.uk Sep 21, 2010 10:32 AM

Nice article. Have you ever had the Google Mini indexing a document surfaced on the web via EPiServer SharePoint Connect?

Sep 21, 2010 10:32 AM

Joel: I have not, sorry.
/ Deane

joel.williams@auros.co.uk
joel.williams@auros.co.uk Sep 21, 2010 10:32 AM

no worries, thanks anyway.

Please login to comment.
Latest blogs
Optimizely Forms: You cannot submit this form because an administrator has turned off data storage.

Do not let this error message scare you, the solution is quite simple!

Tomas Hensrud Gulla | Oct 4, 2024 | Syndicated blog

Add your own tools to the Optimizely CMS 12 admin menu

The menus in Optimizely CMS can be extended using a MenuProvider, and using the path parameter you decide what menu you want to add additional menu...

Tomas Hensrud Gulla | Oct 3, 2024 | Syndicated blog

Integrating Optimizely DAM with Your Website

This article is the second in a series about integrating Optimizely DAM with websites. It discusses how to install the necessary package and code t...

Andrew Markham | Sep 28, 2024 | Syndicated blog

Opticon 2024 - highlights

I went to Opticon in Stockholm and here are my brief highlights based on the demos, presentations and roadmaps  Optimizely CMS SaaS will start to...

Daniel Ovaska | Sep 27, 2024

Required fields support in Optimizely Graph

It's been possible to have "required" properties (value must be entered) in the CMS for a long time. The required metadata haven't been reflected i...

Jonas Bergqvist | Sep 25, 2024

How to write a bespoke notification management system

Websites can be the perfect vehicle for notifying customers of important information quickly, whether it’s the latest offer, an operational message...

Nicole Drath | Sep 25, 2024