Post

JSON Parser Project

JSON Parser Project

This is a guide on building a serverless JSON parser using AWS Lambda and DynamoDB. AWS Lambda allows you to upload your code and create a lambda function. AWS takes care of provisioning and managing the servers that you need to run the code. You don’t need to worry about OS, patching, scaling, etc. The architecture of our app will be:

  1. The user drops a JSON file into an S3 bucket.
  2. This triggers a Lambda function which parses the file.
  3. Then the lambda function writes the data to DynamoDB.

Objective

Once set up, we should have a good understanding of how to set up a JSON parser with Lambda and DynamoDB. We should be able to drop a JSON file into our S3 bucket and see the data in DynamoDB. We should be able to see and interact with our data in a DynamoDB table.

Prerequisites

To do this guide you must have a little bit of experience with Lambda, DynamoDB, Python, and how to create S3 buckets. You will need to create a role that allows Lambda access to S3, DynamoDB, and CloudWatch for logs. See “IAM Basics Tutorial” for more details on creating roles. The lambda role we created is called as follows:

1
IAM role = "json-parser-role"

Next, you will also need to create an S3 bucket and a DynamoDB table. To create an S3 bucket go to S3 and click create bucket and give the bucket a name.

1
Bucket name = "json-parser-bucket"

To create a DynamoDB table:

  1. Go to DynamoDB in the console and click “Create Table”.
  2. Set your table name and primary key. Leave rest as default.
    1
    2
    
    Table name = "parser_table"
    Primary key = "user_id"
    

You will also need a JSON file. Make sure your JSON file has the “user_id” as a key. You can use the JSON data below as an example. Save it as: parser_data.json.

1
{"user_id":"j34kxen4dfh","first_name":"John","last_name":"Jones","Location":["USA"]}

Create Lambda Function

First, we will create a lambda function.

  1. Go to lambda in the AWS console and click “Create function”.
  2. Select “Author from scratch”, give the function a name, and select the latest version of Python for the runtime.
    1
    2
    
    Name = "json-parser-lambda"
    Runtime = "Python 3.x"
    
  3. Select “Use an existing role”. Then select the role you created in the prerequisites and click “Create function”.
    1
    
    Existing role = "json-parser-role"
    

Add Trigger

Now we need to create a trigger to call our lambda function. We will use S3 as the trigger. Essentially when users drop a JSON file into the bucket, S3 will trigger the function we created.

  1. In your lambda function “Designer” click “Add trigger” and select “S3”.
  2. Under S3, configure the bucket to be your json-parser-bucket, event type is “All object create events”, and add .json to the suffix. Basically, where there is a .json file uploaded to the bucket this trigger will be activated.
    1
    2
    3
    
    Bucket = "json-parser-bucket"
    Event type = "All object create events"
    Suffix = ".json"
    
  3. Then click Add to add the trigger to your Lambda function.

Testing the Trigger

Next, we can test this trigger to see if the trigger is working and to see what the output of the event looks like.

  1. Scroll down to the Environment Editor, paste the following code in the editor, and click “Save”.
    1
    2
    3
    4
    
    import json
    def lambda_handler(event, context):
     print(str(event))
     return {'statusCode': 200,'body': json.dumps('Hello from Lambda!')}
    
  2. Go to your json-parser-bucket and upload your “parser_data.json” into the bucket. If your trigger worked then you should see the event logged in CloudWatch.
  3. Go to CloudWatch > Logs > /aws/lambda/json-parser-lambda to see the logs.
  4. Open up the latest entry and you should see a message in the form of json data that looks like the following:
    1
    
    {'Records': [{'event... 'bucket': {'name': 'json-parser-bucket', 'ownerIdentity': {'principalId'... 'object': {'key': 'parser_data.json'...
    

    This is what the lambda handler event looks like. Now we can use boto3 and Python’s dictionary in our code to extract the key information that we need.

Write the Execution Code

Now we can write our execution code. Scroll down to the Environment Editor, and add the following code in the editor, and click “Save”.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import json
import boto3
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb', 'us-east-1')
#
def lambda_handler(event, context):
#
	#Get bucket name and file name from event
	bucket = event['Records'][0]['s3']['bucket']['name']
	file_name = event['Records'][0]['s3']['object']['key']
#
	#Get the object, read object, convert to json
	s3_object = s3.get_object(Bucket=bucket,Key=file_name)
	reader = s3_object['Body'].read()
	json_data = json.loads(reader)
#
	#Put json data into dynamodb table
	table = dynamodb.Table('parser_table')
#
	try:
		table.put_item(Item=json_data)
		return {'statusCode': 200,'body': json.dumps('Data inserted.')}
	except:
		print('DynamoDB insert was unsuccessful')
		return {'statusCode': 400,'body': json.dumps('Error when trying to insert data.')}

Our code uses boto3 to interact with AWS resources. We use the s3 client to get the object that was dropped into our bucket, read it, and convert it into JSON. Then we use the Dynamodb resource to put the data into our table. Once the code is saved the function is ready to go.

Result

Now you should be able to drop a JSON file into your S3 bucket and the data will be written to DynamoDB. Simply go to your S3 bucket and drop your JSON file. Then go to your DynamoDB table and view your items. If configured correctly, we should be able to see your table populated with the JSON data.

This post is licensed under CC BY 4.0 by the author.