SearchHippo Search Engine Optimization Hints

This section is especially provided for Search Engine Optimization folk

How do I get SearchHippo to spider my site?

SearchHippo currently uses a manually generated list of about 1M urls, and sites listed within the Open Directory. There are additional arrangements with some individuals to provide extra indexing in special cases (such as .be sites for In the past I have walked some of the sites listed in open directory to add depth and also used "registered domain lists" to increase breadth.

Users who have added a link, and done the email account verification are also spidered and indexed about once every three months. Please note: The "free keyword listing" applies to your specifically chosen keywords and is active the next day after you verify your email address. These listings are only shown on the site. However, one the site has been spidered and indexed, the site will appear in the 'backfill' which is syndicated out through the Search Web Services interface. This has roughly 50 times the traffic as directly.

How often is the index updated?

SearchHippo's index has six different tiers. The top few tiers are updated each night. The second to last tier is updated about once a week (consisting of the highest HipRanked sites). The last tier is updated every few months. This means that it could be several months before your site is indexed even if it is listed in open directory.

SearchHippo Rank Inputs

There are a variety of factors with different weights that go into SearchHippo's ranking mechanism, effectively a combination of HipRank (see below) and the Query Terms.

Query Terms:
Query Term location and lexicon frequency has a significant impact on overall positioning. In order by weight (most significantly weighted first):

  • 'Word Frequency' (how common is each word in the lexicon - less common = more likely to appear. Very common words may be ignored.)
  • Site Title
  • 'Phrases' (see below)
  • URL (including domain and path)
  • Derived description (the top visible text of a page, but also including meta keywords and meta description as per below)

    Phrases: SearchHippo automatically generates "pseudo-phrases" from various locations in pages: Anchor text, Bold or Italic text, H-sections (H1, H2, etc.), and IMG ALT tags. Psuedo phrases are also generated from the Meta Keywords section, except that comma or semicolon delimiters MUST appear in the meta keywords section. A pseudo-phrase is two non stop words located right next to each other. The individual words are still indexed, but the word pair is given a higher relevancy if it is entered as part of the search query.

    Each site is given a "HipRank", or a number indicating a sites overall prevalence. This ranking is key to determining whether or not a site will appear in the set of sites examined by the aforementioned heuristic. There are several different inputs to calculating this number, in order or weight:

  • Number of unique user visits to a particular site by users of the SearchHippo Toolbar.
  • Number of click-thru's delivered by searchhippo to the site in the past.
  • Number of external links pointing to a site.
  • Manual modification/prioritization by me.
  • Sites requiring frames are penalized (because so many of the people who design sites with frames write me nasty emails when one of the frames is indexed and the link doesnt produce navigation or some piece of their site).
  • Sites linking to SearchHippo get a small increase.

    There is a static adjustment made depending on which index tier the result comes from (more frequently refreshed data is more liekly to come up than more old data)

    You can get an interpretation of the HipRank of a site from the HipRank Web Service or from the SearchHippo Toolbar

    SearchHippo Query Processor

    The SearchHippo Query Processor is responsible for mushing together the query terms and figuring out which sites should actually be displayed in which order. First, the query term from the user is broken down into the individual words and phrases (using the phrase mechanism above). Each word/phrase is then looked up in the lexicon for different section of the index (as per the 'Query Terms' section above) to produce a (potentially very long) list of 'term locations'. A term location is just a reference of one of these words or phrases to a particular web page. Term locations are retrieved in a descending order of the referred to web pages' HipRank. That is, sites with the highest HipRank containing the same term are located first and given a higher weights. It is important to note here that there are several "cutoff points" here: Very long lists will be possibly ignored and lists may be truncated when a the HipRank of the referred to site drops too low. Additionally, the more query terms entered, the less each individual word weighs. The effect of this is that sites with low HipRanks and mostly non unique content are unlikely to be found.

    Next, match pairs are generated. A match pair consists of the intersection of two term-locations as above. Match pairs of higher precendence in the 'Query Terms' section are weighed higher still. This data is then merged using the "referred to" page as the key to merge on. Several counts are tallied here including the number of match pairs referring to a particular page, the type of match pairs, cumulative match pair weight, etc.

    Finally, this merged list is then sorted to produce a list of sites with appropriate weights and these results are shown to the user.

    Other info

    You may find the technology overview useful.
    About - Submit/Modify URL - Partnerships - Help! - Privacy - FAQ - New Search
    Query Spy - Highlighter Proxy - FREE Web Services
    Copyright (c) 2001-2008, (TM)