Currently the Eggnchips search engine is one large application. However, whilst putting together the basic search engine, it has become apparent that the engine can be divided into a number of distinct components that combine to form the overall platform, these are:
- The component that submits web sites (I call it the Finder)
- The component that Gets the web sites and stores them in the Database (I call it the Getter)
- The component that handles the submitted search query and returns the results (I call it the Searcher)
A breakdown of the components follows:
Finder
The Finder receives work requests using a web based form. Once submitted this information (URL,title,description,keywords) is added to the work queue. The processor sub-component analyses the work queue and checks various parts of the information submitted then submits the results to the Getter. Some of the checks include whether the URL is well formed and valid, and obtaining manual authorisation to include the web site.
The Form submission may benefit from an image based input mechanism.
Getter
The Getter retrieves the information from its work queue, gets any other information that is required then places it into the various database structures. Some of the jobs of the Getter would be to retrieve counts of keywords, store the date of the submission, and categorise the web site.
The Getter needs to assess the impact of Spammers who may try to trick the Finder, possibly the addition of a Spam list to ensure that no site gets through. Also, a manual submission authorisation is required before a site goes live. There are a whole bunch of different ways that Spammers could try and add sites to the search engine so this is likely to become a topic of its own.
Searcher
The searcher is driven by a web based form. Once submitted the form checks the keywords against those submitted previously by the Getter and returns the necessary results to the end user.