Introduction
Banning spiders and agents
Search engine spider identification
Further learning resources:
The following is a basic listing of search engine spider names and their "owners". This is by no means complete, as there are many thousands of search engines on the Internet, but it covers the more common beneficial spiders. Look for these in your traffic reports or search for the names through your server logs to discover which pages they have been spidering. You'll find that many of the entries will also have accompanying numbers or letters e.g Googlebot/2.1 or Slurp.so/1.0
Introduction
Home-grown solutions
In the early days of the web, sites were usually built with primitive text editors, one page at a time, one painstaking HTML tag after another. Yeah, back then also we walked five miles to school, uphill both ways. And we liked it.
Current web building tools make it easier than ever for content managers to maintain a site's consistent look and feel -- every page with your logo at the top, a primary navigation bar in a prominent place, and a footer with your company's address and phone number, for example.
Introduction
Home-grown solutions
Not using a web authoring tool? Even if you're the programmer type with an unnatural affinity for Notepad you can use shared page elements. Essentially you'll create separate text files that contain the code for shared page elements, and then call upon these files as needed.
The most common way of doing this is with server-side includes (SSI). Create separate HTML files for each shared page element. For example, you might have an HTML file called "banner.htm" that's been saved in a directory called "includes".
Introduction
Banning spiders and agents
Search engine spider identification
Further learning resources:
If you've been surfing search engine optimization web sites, you've no doubt come across the term spider, robot or bot on many occasions.
Five terms all describing basically the same thing, but in this article they'll be referred to collectively as spiders or "agents". A search engine spider is an automated software program used to locate and collect data from web pages for inclusion in a search engine's database and to follow links to find new pages on the World Wide Web. The term "agent" is more commonly applied to web browsers and mirroring software.
I learned of SSI (Server Side Includes) long before I built my current site. Since it seemed the only way to go, I checked it out once again for the new site to be.
If you've been lurking in the various forums and newsgroups
devoted to webmastering, you could hardly fail to notice the
heated debate going on at this time. No, it's not which is the
best browser. This debate is about web servers. More precisely,
which one is better: Apache or Internet Information Server
(IIS).
The jargon of the web has added many new words and terms to the English language.... way too many! - it's hard to keep up with the terminology, definition and explanations.
We've all hit those pages that give us mysterious error messages, so what do they mean? Below is a brief explanation of various HTTP Error codes.
I'll bet one time or another you've surfed the web and suddenly
found a pop-up window in front of you, demanding your approval
for a security certificate. I occasionally see these on shopping
sites, usually the smaller, less-well-funded companies.
If you've been lurking in the various forums and newsgroups
devoted to webmastering, you could hardly fail to notice the
heated debate going on at this time. No, it's not which is the
best browser. This debate is about web servers. More precisely,
which one is better: Apache or Internet Information Server
(IIS).
If you've been lurking in the various forums and newsgroups
devoted to webmastering, you could hardly fail to notice the
heated debate going on at this time. No, it's not which is the
best browser. This debate is about web servers. More precisely,
which one is better: Apache or Internet Information Server
(IIS).