Skip to content

Introduction

What is Indexing?

Indexing is the process of systematically organizing and cataloging data to facilitate efficient and fast retrieval. For example, in the context of video data, indexing involves analyzing and capturing key information from video files, making it easier to search, categorize, and retrieve specific content.

Joyspace supports text, audio, video and images modalities for indexing and searching. You can use any single modality at a time, or if you have data belonging to multiple modalities, you can index multiple modalities as well.

At Joyspace, we use machine learning to analyze data files and extract relevant information. This information is then stored in a in-memory-database, allowing you to perform a super-fast search and retrieve relevant the content.

Importance of Indexing

Efficient indexing is the backbone of a seamless search experience. By indexing data files, you enable quick and precise retrieval of relevant content. This not only enhances user experience but also optimizes resource utilization, ensuring that your applications can handle large volumes of data with ease.

Joyspace extracts thousands of signals for a single data point, be it text, audio, video or image data. This ensures that every single detail in the data is captured and indexed. This information is then used to create a search index, which is a database of all the information extracted from the video files. This database is stored in-memory. This allows search results to be extremely fast.

Joyspace provides a robust indexing system that is optimized for multitude of multimodal data. This system is designed to handle large volumes of data, and is capable of processing hundreds of millions of files per day.

Indexing

At high level, Joyspace Search Engine Indexing consists of three phases.

  1. Data Gathering: In this phase, we gather data either by crawling the web, or your provided data sources. From your perspective, this is the phase where you provide us with the data using our index_data API. This is the only step that you have to do.

  2. Signal Extraction: This is where the real magic happens. We extract the signals from the data that you have provided us. These signals are then used to create the index. Signals are the features that are extracted from the data. For example, if you are indexing text, then the signals could be the entities, entity relationships, and so on. At Joyspace, we are proud to facilitate indexing of text, audio, video and images. Our sophisticated machine learning techniques allow us to extract signals in an intelligent and efficient manner that is not only accurate but also fast.

  3. Storage For Search: Once signals are extracted, they must be stored to facilitate fast and accurate retrieval at the query time. Joyspace Search is a in-memory search engine. This means that the signals are stored in the memory of the machine that is running the search engine. This makes the search process extremely fast.

Indices

A group of documents, when indexed, create Indices. A single index is a small portion of the data that is available as a search result. Sometimes a single document creates a single index, and in turn multiple documents create indices. For example, an ecommerce store's product's indices usually contains single index in a single product.

At other times, a single document creates multiple indices. This happens when a single can be further divided into smaller segments and each segment can be searched for independently. For example, a single document containing a book can be indexed into multiple indices. Each index can be searched for independently.

Indices for us is a collection of indexed data that is made available for searching as a search result.

A single group of files that you provide us with creates a single indice.

Note that you can group Files at your end, and create multiple indices from multiple groups of files. For example, if you two ecommerce stores, you should create two indices by indexing each group of products in a different API call. You can find more details in the Indexing section.