Search engines
You can not find everything on the Internet by using search engines and you might ask if that is even desirable. The search engines of the future might rather help us make a good selection than to retrieve as much as possible. There is also an important problem to consider here – who decides what the search engines should find??
Search engines consist of three parts:
- Spider
- Index
- Search interface
The spider seeks out and collects web pages. The spider also checks if the page has been visited before and if the content is updated (sometimes also how often the page is updated – the more updates, the more spider visits). The spider also makes queue lists of the links it encounters. Those pages are sought out and collected later.
The search engines’ spider programs have certain limits to their collecting. Sometimes pages that are too deep down in the file structure are left and sometimes the spider’s collecting is limited when pages are too big (file size).
You can report a new web site to the search engines for collecting if you do not want to wait for the spider to come. And, if for some reason you do not want to be seen in the search engine you can specify this in a text file (robots.txt) on the web site.
In the indexing process the web pages are analyzed in order to see what information from the page should be indexed in the database. It can for example be words and phrases, metadata, file format, file size, date for collection, and language. Information about on what web sites the words occur must also be indexed, and for advanced search also where on the page the words are placed, e.g. title, body matter, Internet address or link text.
The problem with information seeking on the Internet is that the information is not structured. If people had indexed descriptions of the web pages in a more structured way from the start by using meta data, it would have facilitated information seeking on the Internet considerably. The title of the web site, author, date, subject words, description, file format and language are some examples of meta data that would have made searches much easier and effective.
The search interface is what most often comes to mind when you speak of search engines. This is where you pose your question by entering a number of words in a search form. The question is sent to the search engine’s index of all the web pages the spider has collected for the enormous database of the search engine. (Google has an index of more that 8 billion web pages)
The advanced search function offers a form that can simplify more complex searches. Take a closer look at that page, because it will reveal the possibilities of the search engine. Pay attention to what is hidden behind tabs, pull down menus or links. It can often be special search functions such as picture search, groups, categories, news etc.
The result list
The result of your search query is shown in a list of results where, among other things, you can see how many results you got, the title of the web page, and an excerpt from the page where you can see your search terms in their context. The presentation of the search result is important. If you get thousands of answers for your search query, in what order should the results be sorted? Should they be sorted alphabetically or by date or are there other ways? Sorting the results in some sort of ranking is easier said than done, because what factors should be weighed in and how important should they be considered? The occurrence and placement of the search terms and how many other (highly ranked) web sites link to the page are vital factors for the ranking algorithm. Many search tools try to find other ways to present the result list. Nowadays you can get the search result visualized or clustered, and at the same time get tips on how to improve your searches.
Secret algorithms
The algorithm that decides how the search engines rank the web pages in the result list is a well kept trade secret. To get a high (the highest) ranking in the result list is important because it is known that many searchers choose one of the links at the top of the list. Unfortunately the ranking war does not lead to better descriptions for the web pages and thereby making them more easily searchable, but rather the contrary. Misuse and sabotage increase the difficulty of finding the right information.
Criticism increases for search engines and their position of power. Who controls the information and how it should be searched/found/displayed? Should the algorithm be secret? Can you buy a better placement in the result list by advertising? Should there be censor for certain search terms? Are the “global search engines” really global? Can a search engine give penalty for certain pages? Should governments be allowed to decide what words their citizens can use for search terms?
– Discuss the search engines and the secret algorithm; who makes the decisions?
Tips and tricks
A list of different types of search engines is available on the library web site.
- Learn one search engine really well.
- Use and analyze the advanced search form and investigate what special facilities are hidden behind flaps and links.
- Choose a number of other search engines with good special functions which you can use for a comparison of the results. There is a surprisingly small overlap between the different search tools’ coverage.
- Keep up-to-date by reading a news service on Internet search tools regularly.
Test this yourself by using the same search terms in the search engines below and compare the search results:
———————————–






About Googles searchengine Googlebot
Eva Norling
2006-06-21









