How to index AWS data in Joyspace Search
Introduction
This is a guide on how to index your data that lies in AWS. We will be using the AWS cloud . It is assumed that your data is in S3.
User and Access
1. Create a new IAM user
Sign into your AWS account and go to IAM. Create a new IAM user.
2. Grant Read access
Joyspace only needs a read access to the data. We never need to write anything to your cloud. Grant a read access to the above created IAM user.
Note
Some of the steps here might be simplified. Depending on your access policies, you might have to carry out different steps here.
3. Review and Create
After you have granted the read access, you can review the IAM user and create it.
Get API Key and Secret
Now that user is created, we need an API Key and Secret.
1. Create Acess Key
Click on newly created user, and click "Create access key".
2. Select Use Case
Select "Application running outside AWS" as your use case.
3. (Optional) Description Tag
Optionally, create a description tag for your access key.
4. Download Keys
Your "Access key" and "Secret access key" should be shown to you. You may have to un-hide the "Secret access key". Download the keys by clicking on the "Download .csv file" button. You will need these keys for making the request.
Arrange Your Data
Now, you need to arrange your data. You may not need this step if data already exists in S3 buckets.
1. Example of Data Arrangement
Demonstred below is just one way to arrange your data. We do not need your data to exit in a single bucket. It may very well reside in multiple buckets. You also don't need your metadata in the same bucket. It is likely that all your data and metadata is spread across numerous buckets.
Note
It is likely that AWS web interface is not the best way to arrange your data. You may want to use a command line tool like AWS CLI, or AWS SDK in your programming language of choice.
2. Fully Qualified S3 URL
You can get the fully qualified S3 URL by clicking on the object in the bucket. You can access these URLs programatically by using the AWS SDK. At the time of making the request, you will need to pass these URLs as a parameter.
Example Request
curl -X 'PUT' \
'https://sandbox.joyspace.ai/api/v1/index_data' \
-H 'accept: application/json' \
-H 'joyspace-api-key: <Your API Key>' \
-H 'joyspace-account-id: <Your Account ID>' \
-H 'Content-Type: application/json' \
-d '{
"input_source_type": "AMAZON_S3",
"data_type": "VIDEO",
"index_name: "example_index_name",
"cloud_auth": {
"aws_access_key_id": <Your AWS Access Key>,
"aws_secret_access_key": <Your AWS Secret Access Key>,
"aws_region": "string"
}
"files_list": [
{
"data_id": "1",
"data_filepath": "s3://bucket_name/my_videos_dir/video_1.mp4",
"metadata_json_path": "s3://bucket_name/my_videos_dir/video_1_metadata.json"
},
{
"data_id": "2",
"data_filepath": "s3://bucket_name/my_videos_dir/video_2.mp4",
"metadata_json_path": "s3://bucket_name/my_videos_dir/video_2_metadata.json"
}
]
}'