SearchHippo Search Engine Optimization Hints
This section is especially provided for Search Engine Optimization folk
How do I get SearchHippo to spider my site?
SearchHippo currently uses a manually generated list of about 1M urls, and
sites listed within the Open Directory.
There are additional arrangements with some individuals to provide extra
indexing in special cases (such as .be sites for
Zoek.be). In the past I have walked
some of the sites listed in open directory to add depth and also used
"registered domain lists" to increase breadth.
Users who have added a link, and done the email
account verification are also spidered and indexed about once every three
months. Please note: The "free keyword listing" applies to
your specifically chosen keywords and is active the next day after you verify
your email address. These listings are only shown on the searchhippo.com site.
However, one the site has been spidered and indexed, the site will appear in
the 'backfill' which is syndicated out through the
Search Web Services interface. This has roughly
50 times the traffic as searchhippo.com directly.
How often is the index updated?
SearchHippo's index has six different tiers. The top few tiers are updated
each night. The second to last tier is updated about once a week (consisting
of the highest HipRanked sites). The last tier is updated every few months.
This means that it could be several months before your site is indexed even
if it is listed in open directory.
SearchHippo Rank Inputs
There are a variety of factors with different weights that go into
SearchHippo's ranking mechanism, effectively a combination of HipRank (see
below) and the Query Terms.
Query Terms:
Query Term location and lexicon frequency has a significant impact on overall
positioning. In order by weight (most significantly weighted first):
'Word Frequency' (how common is each word in the lexicon - less common =
more likely to appear. Very common words may be ignored.)
Site Title
'Phrases' (see below)
URL (including domain and path)
Derived description (the top visible text of a page, but also including
meta keywords and meta description as per below)
Phrases: SearchHippo automatically generates "pseudo-phrases" from
various locations in pages: Anchor text, Bold or Italic text, H-sections (H1, H2, etc.), and IMG ALT tags. Psuedo phrases are also generated from the Meta
Keywords section, except that comma or semicolon delimiters MUST appear in the
meta keywords section. A pseudo-phrase is two non stop words located right
next to each other. The individual words are still indexed, but the word pair
is given a higher relevancy if it is entered as part of the search query.
HipRank:
Each site is given a "HipRank", or a number indicating a sites overall
prevalence. This ranking is key to determining whether or not a site will
appear in the set of sites examined by the aforementioned heuristic. There
are several different inputs to calculating this number, in order or weight:
Number of unique user visits to a particular site by users of the SearchHippo Toolbar.
Number of click-thru's delivered by searchhippo to the site in the past.
Number of external links pointing to a site.
Manual modification/prioritization by me.
Sites requiring frames are penalized (because so many of the people who design sites with frames write me nasty emails when one of the frames is indexed and the link doesnt produce navigation or some piece of their site).
Sites linking to SearchHippo get a small increase.
There is a static adjustment made depending on which index tier the result
comes from (more frequently refreshed data is more liekly to come up than
more old data)
You can get an interpretation of the HipRank of a site from the
HipRank Web Service or from the SearchHippo Toolbar
SearchHippo Query Processor
The SearchHippo Query Processor is responsible for mushing together the query
terms and figuring out which sites should actually be displayed in which order.
First, the query term from the user is broken down into the individual words
and phrases (using the phrase mechanism above). Each word/phrase is then
looked up in the lexicon for different section of the index (as per the
'Query Terms' section above) to produce a (potentially very long) list of
'term locations'. A term location is just a reference of one of these words
or phrases to a particular web page. Term locations are retrieved in a
descending order of the referred to web pages' HipRank. That is, sites
with the highest HipRank containing the same term are located first and given
a higher weights. It is important to note here that there are several "cutoff
points" here: Very long lists will be possibly ignored and lists may be
truncated when a the HipRank of the referred to site drops too low.
Additionally, the more query terms entered, the less each individual word
weighs. The effect of this is that sites with low HipRanks and mostly non
unique content are unlikely to be found.
Next, match pairs are generated. A match pair consists of the intersection of
two term-locations as above. Match pairs of higher precendence in the 'Query
Terms' section are weighed higher still. This data is then merged using the
"referred to" page as the key to merge on. Several counts are tallied here
including the number of match pairs referring to a particular page, the type
of match pairs, cumulative match pair weight, etc.
Finally, this merged list is then sorted to produce a list of sites with
appropriate weights and these results are shown to the user.
Other info
You may find the technology overview useful.
About -
Submit/Modify URL - Partnerships -
Help! -
Privacy -
FAQ -
New Search
Query Spy - Highlighter Proxy - FREE Web Services
Copyright (c) 2001-2008, SearchHippo.com (TM)