Learn
Learn
There is no question that website traffic from Search Engines
is an important marketing consideration. And links from other websites are
considered an integral part of a comprehensive Search Engine Optimization (SEO)
strategy.
Those who research how Search Engines rank website pages will
tell you that good SEO is a combination of 3 primary ingredients:
- On-page factors
- Off-page factors
- Page Rank (Google); Web Rank (Yahoo)
On-page factors are those attributes of a specific
website page. The most important are keyword density found in body, title and
description tags. Google claims to evaluate 130 attributes of every pages but
it is unlikely that all these attributes influence their Search Engine Results
Page (SERP).
But all the important search engines have acknowledged that
webmasters can manipulate their on-page factors to an extent that makes this
ingredient a secondary factor in their ranking algorithm.
Off-page factors are the links that appear on other
website pages and are considered by many as the most important ingredient for
2 reasons.
- They are considered a 3rd party reference of your website.
- Links on some one else's website are difficult to manipulate.
Page Rank is a quantifiable method of determining the interconnection of any one website page to other website pages. The more connections or links, the higher the Page Rank score. This quantifiable measure
is patented by Google and named after one of the Google founders, Mr. Lawrence
Page.
Yahoo claims to have a similar measure called Web Rank available
on their toolbar, but at the time of this writing, it is not functional and
little is known about how the number is calculated.
The rest of this document is devoted to a discussion of Off-page
factors, their importance, construction, reputation and relevancy.
Link importance -
As discussed briefly above, links that reside on other websites are considered
a very important ingredient for all the major search engines. That is because
there are lots of information implied in a link and links that reside on another
website are difficult to attain and manipulate.
A link to your website is often described as a vote for your website. The
assumption is that other webmasters are very particular to whom they link.
Webmaster prefer linking to websites that they believe their visitors would
find interesting and want to be associated with.
Since links are under the control of another webmaster, search engines consider
them less prone to manipulation and compared to on-page factors, a better indicator
of what a website page is really about.
But this does not mean that off-page factors are completely free of manipulation. In fact, the necessity of links for search engine rankings has birthed a cottage
industry. Link exchange businesses that will attain links from other websites
on your behalf are now common. Some webmasters have created multiple websites
for the expressed purpose of linking these sites together, thus, forming a link
ring. Others have created directory websites for exchanging links known as
"link farms" or "free link communities".
But there is another problem that the search engines don't talk about too much. Sometimes a legitimate website might link to another website with or without
knowledge that the other has adult content. Search engines may rank the legitimate
website well for its keywords. But if the legitimate website has an outbound
link to an adult content site, visitors may find themselves on a website page
they did not bargain for. These searchers hold the search engine accountable
for ranking a website that is linked or 'associated with' an adult content site.
Search Engines know that the SEO community understands that links are important
and have developed creative ways to give the Search Engines what they want.
So it is not surprising that Search Engines have deployed and are continuing
to develop new methods of filtering out "junk links" (defined by GoogleGuy) and
sites containing links to adult content websites.
Link construction -
There are 2 attributes of every link.
- The visible portion: either a graphic or text
- The href portion:
html code containing a URL that points to another website page, also known as
the target page.
Links that are used in advertisements are most often constructed
with affiliate URLs. These URLs commonly have tracking codes ('?' and/or '='
characters or long numeric strings) that Search Engines detect. (example found
on MSN:>http://g.msn.com/0AD0000Q/611135.1??PID=2124716&UIT=G&TargetID=1001195&AN=25713&PG=INVIHS). And since these types of links are not considered unbiased, Search Engines do
not count them.
Links that do not have tracking codes or variables in the URL
are referred to as 'static URLs'. Search engines count these links but the
value of the link depends on the visible portion.
Graphic links are pictures that may contain symbol or words,
but in either case, the search engines do not understand what the graphic may
be saying. They cannot read graphics. So there is no additional value in graphic
links.
|
Visible portion
|
Href portion
|
|
Affiliate URL
|
Static URL
|
|
Graphic
Link
|
No Value
|
Some value, counts toward Page Rank
|
|
Text
Links
|
URL
|
Better value
|
|
Words
|
Best value
|
Text links give the search engines something to read. The
words that are clickable are directly associated with and anchored to the href
portion, thus the name 'anchor text'. Search engines even have a search command
dedicated to anchor text. In Google, simply type the search command 'allinanchor:keyword'
(replace keyword with a specific keyword phrase; eg. allinanchor:dog)
will return website pages that have links whose anchor-text matches the keyword
(presumably most to least). Yahoo's command is slightly different: 'inurl:keyword'.
Search engines associate the anchor-text of a link with the target page. As
the number of links with specific anchor-text increases, 'link reputation' is
formed.
Link reputation -
Link reputation is defined by the anchor-text associated with links. The more
link reputation a website page has, the stronger its rankings in the SERPs (particularly
in Google).
It is this simple principle that has lead many webmasters to choose domain
names that have keywords in the domain separated by "-" or "_" marks. If a
domain name (e.g. www.flower-baskets.com)
is used as the link anchor-text and the domain name has your primary keywords
in it, then your link reputation will increase. This only works if your domain
name contains your keywords. Note that keywords in your domain name must be
separated by "-" or "_" marks, otherwise, the search engines will not be able
to detect distinct words in the domain name. Likewise, if your primary keywords
are "Botanical buckets", a domain name like www.flower-baskets.com
will not contribute to your desired reputation.
Link relevancy -
Search Engines assume that websites with the highest number of strong reputation
links from other related websites must be the 'authorities' and deserve better
rankings. Yet, Search Engines do not currently detect if a link is from a related
website.
Up until mid 2003, a link was a link - no matter where it was from. As long
as the search engines crawled and indexed a page with a link on it, and
the link had good reputation, it was a good link. So some webmasters exploited
this 'loop hole' and got links from non-related websites. GoogleGuy calls these
types of links 'junk links'. Now, Search Engines are deploying and developing
methods to filter out 'junk links'.
Although still disputed, Google appears to be filtering links based on class
C IP address. The premise for this filter is based on Google's 2003 patent.
Just because something is mentioned in a patent does not mean the patent holder
has to employ the methods. But observational evidence does seem to support
the theorem. It appears that Google may be either filtering or at least de-rating
links from the same class C IP address.
IP filtering does make sense. First, the method can be used to either filter
or de-rate the value of links on the same domain. Second, the method can be
used to detect multiple links from another website.
Lastly, links that reside on the same class C IP would necessarily be on the
same host and have a high probability of being a related website (mirror, clone,
or site authored by the same webmaster / company). This filter would thwart
efforts of webmasters that construct a number of websites, host all the sites
on the same host provider, then link them all together to boost their link reputation.
It would not affect links from websites that are hosted on different hosting
providers. But this latter approach is a lot more painful and may have technical
issues if each website is served from a common database.
Google has also created a website 'black list'. This black list appears to
be made up of websites that have violated Google's website policies. The list
is not available to the public so it is hard to know who is 'naughty' and who
is 'nice'. Several webmasters have reported cases of website penalizations
when a legitimate website has a link to a black listed site.
But in order to determine if a link is from a relevant website, search engines
need to do more. Google has already started to roll out Topic Sensitive Page
Rank (TSPR). Other search engines may use a form of website classifications.
Whether TSPR or website classifications, both share a common technical obstacle:
Search Engines must figure out a way to determine the meaning of a search phrase
and relate the phrase to some form of classification. For example, should a
link from the Java tourist bureau website count as a relevant link to a website
about Java scripting?
If you type "java" as a search term, should the search engine
return websites about:
-
the island named Java (a regional
category)?
-
coffee (a lifestyle category)?
-
a programming script (computer
category)?
Consider the search term "web ring". Are links from nature sites relevant? What about "closed cell"? Are links from websites about our judicial system relevant? Perhaps the user is referring to special research about living cells
– or a terrorist cell?
Many of the words we use have several meanings. It is context that determines
the meaning or 'sense' of specific words. Perhaps this is the reason 'personalized
search' is such a hot topic these days. For it is 'personalized search' that
relates keywords with website classifications, and website classifications allow
the search engines to qualify related links.
Google is already knee deep into TSPR. Google has launched 2 beta projects:
Each of these beta programs was introduced in the first ½ of 2004 and gives
us a glimpse of the power of TSPR. In order to understand how TSPR works, we
need a little background on PR (Page Rank).
The equation for PageRank is:
PR(A) = (1-d) + d(PR(t1)/C(t1)
+ ... + PR(tn)/C(tn))
Where:
't1 - tn' are pages
linking to ‘A’
'C' is the number of outbound
links on that page
'd' is a damping factor,
typically set to 0.85.
In order to run the equation, you must know all the pages linking
to 'A' and the PageRank of each of these linking pages. The only way to perform
this calculation is to iterate: run the equation many times, typically 20-40
times. In order to determine the true PageRank of any page in Google's index,
you must be Google: have an index of how all website pages are connected to
each other.
In laymen's terms, PageRank is a measure of the interconnection
or popularity of any one website page as measured by links from other website
pages. It is commonly thought of as 'votes'. The more links or 'votes' from
other website pages, the higher the PageRank score. But there is one more
element to the equation. The value of a 'vote' from a website page is divided
by the number of outbound links (total 'votes'). So then links or 'votes'
from pages that have fewer outbound links cast more of their voting power
to a page.
The important thing to note is the use of the damping factor of 0.85. We don't
really know the exact value that Google uses, but we do know that there must
be a value here (perhaps +/- 0.05). If we assume that 0.85 is the actual value,
then we can say that Page Rank only accounts for 85% of the possible value. So what about the 15% remaining? It really does not matter much if all website
pages are calculated in the same manner. But what is important is that there
is a 15% component that can be added to each website page and this component
can be the TS (Topic Sensitive) portion of TSPR. Another way of saying it is
that 85% of a calculated value is PR, 15% is TS.
So how is TS figured out? You can read the original authors paper here: http://dbpubs.stanford.edu:8090/pub/2002-6,
or read on for a more simplified explanation.
The theory of Topic Sensitivity starts with the assumption that there are authority
websites for a specific subject or keyword phrase. These authority websites
link out to other websites and other websites link to more websites. Every
time there is an outbound link, there is a component of that link that had its
origins from the authority websites. The topic sensitivity that is passed through
the link is defined by where the upstream websites got their links.
It may be easier to think of Topic Sensitivity as a 'bloodline'. Some dogs
are pure breeds, but most dogs are mutts, a combination of bloodlines. Mutts
may have blood from a few or many different dog types just as websites have
links from different website categories. If a dog has a lot of bloodline from
golden retrievers, then presumably that dog would be better at hunting than
other dogs. Likewise, if links to a website are primarily health related, then
the website would rank better for health related keyword phrases.
Therefore, prior to the mid 2003 era, Google rankings were largely dependant
on what others said you were (link reputation). In the new era, your
rankings will be dependant on who is saying what you are (link
relevancy and link reputation).
Yahoo and MSN do not have TSPR, but they can employ similar schemes. Their
strategy could involve the classification of websites and the correlation of
keyword phrases with each category. Then Yahoo and MSN could determine which
links to your website are relevant to your subject matter and filter out the
benefits of non-relevant links.
This may sound easy enough but the implementation of TSPR or website classification
strategies are extremely complex. First consider that every keyword phrase
must be classified. Then every website page must be classified. Now do this
across 4+ BILLION website pages that are continually being modified/changed/added
and deleted and deliver search engine result pages (SERPs) in less then 0.25
seconds. The classification task alone is enormous requiring extensive computing
power and the most advanced semantics algorithms (the study of words and their
meanings).
Google appears to be way ahead in this process. They have already created
beta programs that can be used to seed their keyword classification process. It is likely that Google is allowing the public to use these beta programs so
that they can classify keyword phrases in real time. If the data is statistically
significant, this process can be used to loop back into their Topic Sensitivity
algorithm. The more it is used, the better it gets.
Other search engines may need to wait for advances in computer technology and
semantics. The best semantics algorithms have an accuracy of about 40% compared
to the 'golden standard'. This is not a very encouraging value but it is far
better than results just 2 years ago; so much better that industry is starting
to apply significant resources.
Even with semantics advances, systems must be scaleable. 64 bit microprocessors
are well suited to semantics calculations and will have a significant impact
on calculation speeds over current 32 bit microprocessors. In coming years,
Search Engines will be able to take advantage of lower cost 64 bit microprocessors
and semantic algorithm advances that will directly affect SERP quality.
|