JSON Parser Project
JSON Parser Project
This is a guide on building a serverless JSON parser using AWS Lambda and DynamoDB. AWS Lambda allows you to upload your code and create a lambda function. AWS takes care of provisioning and managing the servers that you need to run the code. You don’t need to worry about OS, patching, scaling, etc. The architecture of our app will be:
- The user drops a JSON file into an S3 bucket.
- This triggers a Lambda function which parses the file.
- Then the lambda function writes the data to DynamoDB.
Objective
Once set up, we should have a good understanding of how to set up a JSON parser with Lambda and DynamoDB. We should be able to drop a JSON file into our S3 bucket and see the data in DynamoDB. We should be able to see and interact with our data in a DynamoDB table.
Prerequisites
To do this guide you must have a little bit of experience with Lambda, DynamoDB, Python, and how to create S3 buckets. You will need to create a role that allows Lambda access to S3, DynamoDB, and CloudWatch for logs. See “IAM Basics Tutorial” for more details on creating roles. The lambda role we created is called as follows:
1
IAM role = "json-parser-role"
Next, you will also need to create an S3 bucket and a DynamoDB table. To create an S3 bucket go to S3 and click create bucket and give the bucket a name.
1
Bucket name = "json-parser-bucket"
To create a DynamoDB table:
- Go to DynamoDB in the console and click “Create Table”.
- Set your table name and primary key. Leave rest as default.
1 2
Table name = "parser_table" Primary key = "user_id"
You will also need a JSON file. Make sure your JSON file has the “user_id” as a key. You can use the JSON data below as an example. Save it as: parser_data.json
.
1
{"user_id":"j34kxen4dfh","first_name":"John","last_name":"Jones","Location":["USA"]}
Create Lambda Function
First, we will create a lambda function.
- Go to lambda in the AWS console and click “Create function”.
- Select “Author from scratch”, give the function a name, and select the latest version of Python for the runtime.
1 2
Name = "json-parser-lambda" Runtime = "Python 3.x"
- Select “Use an existing role”. Then select the role you created in the prerequisites and click “Create function”.
1
Existing role = "json-parser-role"
Add Trigger
Now we need to create a trigger to call our lambda function. We will use S3 as the trigger. Essentially when users drop a JSON file into the bucket, S3 will trigger the function we created.
- In your lambda function “Designer” click “Add trigger” and select “S3”.
- Under S3, configure the bucket to be your
json-parser-bucket
, event type is “All object create events”, and add .json to the suffix. Basically, where there is a.json
file uploaded to the bucket this trigger will be activated.1 2 3
Bucket = "json-parser-bucket" Event type = "All object create events" Suffix = ".json"
- Then click Add to add the trigger to your Lambda function.
Testing the Trigger
Next, we can test this trigger to see if the trigger is working and to see what the output of the event looks like.
- Scroll down to the Environment Editor, paste the following code in the editor, and click “Save”.
1 2 3 4
import json def lambda_handler(event, context): print(str(event)) return {'statusCode': 200,'body': json.dumps('Hello from Lambda!')}
- Go to your
json-parser-bucket
and upload your “parser_data.json” into the bucket. If your trigger worked then you should see the event logged in CloudWatch. - Go to CloudWatch > Logs > /aws/lambda/json-parser-lambda to see the logs.
- Open up the latest entry and you should see a message in the form of json data that looks like the following:
1
{'Records': [{'event... 'bucket': {'name': 'json-parser-bucket', 'ownerIdentity': {'principalId'... 'object': {'key': 'parser_data.json'...
This is what the lambda handler event looks like. Now we can use boto3 and Python’s dictionary in our code to extract the key information that we need.
Write the Execution Code
Now we can write our execution code. Scroll down to the Environment Editor, and add the following code in the editor, and click “Save”.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import json
import boto3
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb', 'us-east-1')
#
def lambda_handler(event, context):
#
#Get bucket name and file name from event
bucket = event['Records'][0]['s3']['bucket']['name']
file_name = event['Records'][0]['s3']['object']['key']
#
#Get the object, read object, convert to json
s3_object = s3.get_object(Bucket=bucket,Key=file_name)
reader = s3_object['Body'].read()
json_data = json.loads(reader)
#
#Put json data into dynamodb table
table = dynamodb.Table('parser_table')
#
try:
table.put_item(Item=json_data)
return {'statusCode': 200,'body': json.dumps('Data inserted.')}
except:
print('DynamoDB insert was unsuccessful')
return {'statusCode': 400,'body': json.dumps('Error when trying to insert data.')}
Our code uses boto3
to interact with AWS resources. We use the s3 client to get the object that was dropped into our bucket, read it, and convert it into JSON. Then we use the Dynamodb resource to put the data into our table. Once the code is saved the function is ready to go.
Result
Now you should be able to drop a JSON file into your S3 bucket and the data will be written to DynamoDB. Simply go to your S3 bucket and drop your JSON file. Then go to your DynamoDB table and view your items. If configured correctly, we should be able to see your table populated with the JSON data.