11 ways to make your website more ‘searchable’
We often get asked for hints and tips on how to make websites more ‘searchable’.
Firstly, we like to say that even less than perfect content can be well managed with a suite of good quality tools such as analytics, results promotion, strategic auto-completion and so on. These methods can be particularly helpful when your resources for correcting content and metadata are limited; but if perfection is your benchmark, read on for 11 ways to improve the quality of your content to encourage better search results.
The basics: for content creationists
1. Split large documents into smaller documents
If some of your documents are very long, consider publishing them as separate chapters or sections. Imagine that your organization has an administrative procedures manual (APM) which is 3,000 pages long and a HR employee enters the search query "long service leave". A PDF file of the whole APM wouldn't be a good answer to the query, even though it contains the best answer, because the HR employee would then need to search through the very large document for what they actually wanted. A far better answer would be a single HTML file containing "Section 13.4.5: Long Service Provisions".
2. Supply correct date metadata
This could be achieved by ensuring that page-level date metadata is published in a supported format, or by ensuring that your web server is configured to send the correct document modified dates in the HTTP headers.
3. Create concise page titles
Title tags are often used as search result titles, and aid in providing a strong information scent. Titles should aim to be unambiguous, and provide users with a clear indication of the result's content, purpose, and context.
4. Create good quality metadata
Search platforms can be configured to index metadata, and use metadata for display purposes. For example, a metadata abstract can be presented instead of the auto-generated snippet. Good metadata can also be used to provide faceted navigation. Bad metadata is worse than having no metadata at all.
5. Create descriptive link text
Link text is defined as the words that form the text of the hyperlink when creating links in your HTML. Avoid using link text like 'More...' or 'Click here…'. Instead, connect the link to descriptive text, for example, Read our blog 5 ways to supercharge your site search.
The deeper end: for developers
6. Avoid excessive reliance on dynamically generated web pages
Search crawlers work by following links. With dynamically generated content, they can potentially miss important pages or clutter up indexes with rubbish. When you do generate pages dynamically, give each page a single, short, human-readable URL.
7. Avoid excessive use of <frameset>s and <frame>s
Most search platforms index the frame and its component pages separately. When a particular search result is returned it may appear without the context which would have been provided by the frame.
8. Exclude unsuitable material
Configure your collections (or use ROBOTS.TXT files) to prevent the crawler from accessing material that isn't suitable for searching. You may wish to exclude mirror sites and directories of non-textual data. Excess material increases disk space usage, and slows down crawling, indexing, and query processing. Focusing the material indexed may also improve the quality of results.
9. Prevent individual pages that do not contain useful information from being indexed
This may include pages that are useful in a browsing context but are less likely to be appropriate as search results. Examples include A-Z listing pages, mid-, and low-level index pages, etc. Use of the <meta name="robots" content="follow,noindex"/> robots metadata directives would be appropriate here.
10. Excluding portions of a page (such as navigation content)
This might include navigational elements, headers, footers, etc. The query-biased result summaries on some sites can suffer in quality because the summaries include sentences extracted from the site navigation text instead of the main document content. A solution for this problem is to add directives into the Web pages to indicate that certain sections should not be indexed. Where these pages cannot be modified at the source, the use of a NoIndexFilterInjector is recommended. Note that anchor text is indexed as part of the target document at all times to ensure that ranking quality is not affected.
11. Ensure your webserver serves appropriate status codes
During crawling, URLs that are requested that return a status code of 200:OK will be regarded as valid pages, even if the page itself contains a 'Broken / Not Found' message. Your web server should ensure that broken URLs return a 404:Not Found status code.
Your result: Better search for greater customer satisfaction
These guidelines will help you build a site that is highly searchable. A searchable site means enhanced search experience in Funnelback (and any other search product), plus greater visibility in global search engines such as Yahoo, Google, or Bing. This translates to efficiency gains for employees and easier information availability for customers and stakeholders.
The pressure on marketing and IT leaders to select the right technologies to achieve ...