Skip to content

How to index AWS data in Joyspace Search

Introduction

This is a guide on how to index your data that lies in AWS. We will be using the AWS cloud . It is assumed that your data is in S3.

User and Access

1. Create a new IAM user

Sign into your AWS account and go to IAM. Create a new IAM user. Image create a new IAM user

2. Grant Read access

Joyspace only needs a read access to the data. We never need to write anything to your cloud. Grant a read access to the above created IAM user.

Note

Some of the steps here might be simplified. Depending on your access policies, you might have to carry out different steps here.

Image grant read permission for the newly created user

3. Review and Create

After you have granted the read access, you can review the IAM user and create it. Image review and create the IAM user

Get API Key and Secret

Now that user is created, we need an API Key and Secret.

1. Create Acess Key

Click on newly created user, and click "Create access key". Image create access key

2. Select Use Case

Select "Application running outside AWS" as your use case. Image select use case

3. (Optional) Description Tag

Optionally, create a description tag for your access key. Image select use case

4. Download Keys

Your "Access key" and "Secret access key" should be shown to you. You may have to un-hide the "Secret access key". Download the keys by clicking on the "Download .csv file" button. You will need these keys for making the request. Image download the keys

Arrange Your Data

Now, you need to arrange your data. You may not need this step if data already exists in S3 buckets.

1. Example of Data Arrangement

Demonstred below is just one way to arrange your data. We do not need your data to exit in a single bucket. It may very well reside in multiple buckets. You also don't need your metadata in the same bucket. It is likely that all your data and metadata is spread across numerous buckets.

Note

It is likely that AWS web interface is not the best way to arrange your data. You may want to use a command line tool like AWS CLI, or AWS SDK in your programming language of choice.

Image example of data arrangement

2. Fully Qualified S3 URL

You can get the fully qualified S3 URL by clicking on the object in the bucket. You can access these URLs programatically by using the AWS SDK. At the time of making the request, you will need to pass these URLs as a parameter. Image get fully qualified S3 URL

Example Request

curl -X 'PUT' \
  'https://sandbox.joyspace.ai/api/v1/index_data' \
  -H 'accept: application/json' \
  -H 'joyspace-api-key: <Your API Key>' \
  -H 'joyspace-account-id: <Your Account ID>' \
  -H 'Content-Type: application/json' \
  -d '{
    "input_source_type": "AMAZON_S3",
    "data_type": "VIDEO",
    "index_name: "example_index_name",
    "cloud_auth": {
      "aws_access_key_id": <Your AWS Access Key>,
      "aws_secret_access_key": <Your AWS Secret Access Key>,
      "aws_region": "string"
    }
    "files_list": [
      {
        "data_id": "1",
        "data_filepath": "s3://bucket_name/my_videos_dir/video_1.mp4",
        "metadata_json_path": "s3://bucket_name/my_videos_dir/video_1_metadata.json"
      },
      {
        "data_id": "2",
        "data_filepath": "s3://bucket_name/my_videos_dir/video_2.mp4",
        "metadata_json_path": "s3://bucket_name/my_videos_dir/video_2_metadata.json"
      }
    ]
  }'