It makes efficient use of storage space to store the index.

thesis on Efficient Content-Based Image Retrieval was a seminal work that developed new indexing techniques for image databases using images.
Azzam, Ibrahim Ahmed Aref (2006) Implicit Concept-based Image Indexing and Retrieval for Visual Information Systems.

The indexing function is performed by the indexer and the sorter.

Finally,the major applications: crawling, indexing, and searching will be examinedin depth.

The indexerperforms a number of functions.

This scheme requires slightly more storagebecause of duplicated docIDs but the difference is very small for a reasonablenumber of buckets and saves considerable time and coding complexity inthe final indexing phase done by the sorter.

The indexer performs another important function.

Anyone who has used a searchengine recently, can readily testify that the completeness of the indexis not the only factor in the quality of search results.

It puts the anchor text into theforward index, associated with the docID that the anchor points to.
The sorter also produces a list of wordIDsand offsets into the inverted index.

We ran the indexer and the crawlersimultaneously.

This thesis presents a research work based on an integrated multi-modal approach for sports video indexing and retrieval. By combining specific features extractable from multiple (audio-visual) modalities, generic structure and specific events can be detected and classified. During browsing and retrieval, users will benefit from the integration of high-level semantic and some descriptive mid-level features such as whistle and close-up view of player(s). The main objective is to contribute to the three major components of sports video indexing systems. The first component is a set of powerful techniques to extract audio-visual features and semantic contents automatically. The main purposes are to reduce manual annotations and to summarize the lengthy contents into a compact, meaningful and more enjoyable presentation. The second component is an expressive and flexible indexing technique that supports gradual index construction. Indexing scheme is essential to determine the methods by which users can access a video database. The third and last component is a query language that can generate dynamic video summaries for smart browsing and support user-oriented retrievals.

All major Online databases and search engines index Journal of Medical Thesis

The indexer runs at roughly 54 pages per second.

If we assume that Moore's law holdsfor the future, we need only 10 more doublings, or 15 years to reach ourgoal of indexing everything everyone in the US has written for a year fora price that a small company could afford.

Furthermore, mostqueries can be answered using just the short inverted index.

Thesis Indexing : Index Your Thesis

In this thesis, we address the fussy problem of video content indexing and retrieval and in particular automatic semantic video content indexing. Indexing is the operation that consists in extracting a numerical or textual signature that describes the content in an accurate and concise manner. The objective is to allow an efficient search in a database. The search is efficient if it answers to user's needs while keeping a reasonable deadline. The automatic aspect of the indexing is important since we can imagine the difficulty to annotate video shots in huge databases. Until now, systems were concentrated on the description and indexing of the visual content. The search was mainly led on colors and textures of video shots. The new challenge is now to automatically add to these signatures a semantic description of the content. First, a range of indexing techniques is presented. The generic structure of a content based indexing and retrieval system is presented. This presentation is followed by the introduction of major existing systems. Then, the video structure is described to finish on a state-of-the-art of color, texture, shape and motion indexing methods. Second, we introduce a method to compute an accurate and compact signature from key-frames regions. This method is an adaptation of the latent semantic indexing method originally used to index text documents. Our adaptation allows capturing the visual content at the granularity of regions. It offers the opportunity to search on local areas of key-frames by contrary to most of existing methods. Then, it is compared to another region-based method that uses the Earth mover's distance. Finally, we propose two methods to improve our signatures and make them more robust to intra-shot and inter-scale variabilities. Following the same logic, we study the effects of the relevance feedback loop on search results. Third, we address the difficult task of semantic content retrieval. Experiments are led in the framework of TRECVID. It allows to have a huge amount of videos and their labels. Annotated videos are used to train classifiers that allow to estimate the semantic content of unannotated videos. We study three classifiers; each of them represents one of the major classifier family: probabilistic classifiers, memory-based classifiers and border-based classifiers. Fourth, we pursue on the semantic classification task through the study of fusion mechanisms. This operation is necessary to combine efficiently outputs from a number of classifiers trained on various features, such as color and texture. For this purpose, we propose to use simple operations such as the sum or the product that are combined by a binary tree. This structure of binary tree allows to model all combinations of operations on two operands. Genetic algorithms are then used to determine the best structure and operators. A comparison with a fusion system based on SVM is proposed to show its efficiency. Then, new modalities (text and motion) are introduced to improve classification performances. We, then, raise the desynchronization problem that exists between speech and visual content. From obtained detection scores, we compare different retrieval systems depending on the query type: by image example, by example and keywords or only by keywords. Finally, this thesis concludes on the introduction of a new active learning approach. Active learning is an iterative technique that aims at reducing the annotation effort. The system selects samples to be annotated by a user. The selection is done depending on the current knowledge of the system and the estimated usefulness of samples. The system quickly increases its knowledge at each iteration and can therefore estimate the classes of remaining unlabeled data. However, current systems have the drawback to allow only the selection of one sample per iteration, otherwise their performance decreases. We propose a solution to this problem and study it in the case of one-label annotation and the more challenging task of multi-label annotation.