7

How to track, analyze, and visualize data using AWS — A Cloud Guru

 2 years ago
source link: https://acloudguru.com/blog/engineering/how-to-track-analyze-and-visualize-user-group-data-using-aws
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
Banjo Obayomi
Nov 4, 2021 13 Minute Read

Using AWS to understand trends in user group data

AWS user groups are communities that meet regularly to share ideas, answer questions, and learn about new services and best practices. The COVID-19 pandemic has shifted these in-person community gatherings to virtual events. With this shift, we have noticed that groups have become less engaged over time.

For example, in the North American region, 60% of groups had at least one meetup in the past 12 months, while only 36% of groups have had one meetup in the past 3 months. This data indicates that “Zoom fatigue” is growing in AWS user groups as events have stayed virtual.


Accelerate your career

Get started with ACG and transform your career with courses and real hands-on labs in AWS, Microsoft Azure, Google Cloud, and beyond.


In order to re-engage with the user groups, my team at AWS wanted a faster, effective, and reliable way to analyze user group data and visualize that into a dashboard. This solution has helped in multiple ways, including:

  • Tracking active meetups: The team now has visibility into which user groups have been active in the past 12, 6 or 3 months.
  • Visualizing where members are: The team now has a map to see where our groups are located, and how big their footprint is.
  • Keeping up with events: The team now has a table ordered by last event timestamp to see which groups recently have had an event.

Overview of the solution

This post describes how I built the solution, from geocoding address data to the final visualization on Amazon QuickSight

In this solution, I started by developing Python code to scrape user group data from Meetup

In order to plot each meetup on a map, geolocation data was needed. I used Amazon Location Service to geocode the address into a longitude and latitude coordinate. 

The transformed data is then published to an Amazon Simple Storage Service (Amazon S3) bucket. 

I used Amazon EventBridge to set up a daily job to trigger a lambda function to collect the user group data. The reporting and visualization layer is built using QuickSight. Finally, the entire pipeline is deployed by using AWS SAM.

The following diagram illustrates this architecture.

arch_diag.png

Collecting user group data

The user groups use meetup.com to organize their events. My team is curious about the groups in Canada and the U.S listed on the User Groups in the Americas page.

I used BeautifulSoup and the requests library to scrape the content from the AWS User Group website. 

The script first gets the meetup URL for each user group through the get_user_group_data function. Based on the presence of certain div attributes, it stores the relevant meetup URL and name in a list to be scrapped. 

Next, the get_meetup_info function iterates through the list and parses the information on each individual meetup page such as number of members, and meetup location. The raw data is saved as a CSV for further processing.

The solution in this post is for demonstration purposes only. We recommend running similar scripts only on your own websites after consulting with the team who manages them — or be sure to follow the terms of service for the website that you’re trying to scrape.

The following shows a sample of the script.

meetup_json = {}
	page = requests.get(meetup_url)
	usergroup_html = page.text
	soup = BeautifulSoup(usergroup_html, "html.parser")

	# Get Meetup Name
	meetup_name = soup.findAll("a", {"class": "groupHomeHeader-groupNameLink"})[0].text

	# Meetup location
	meetup_location = soup.findAll("a", {"class": "groupHomeHeaderInfo-cityLink"})[
    	0
	].text

	# Number of members
	meetup_members = (
    	soup.findAll("a", {"class": "groupHomeHeaderInfo-memberLink"})[0]
    	.text.split(" ")[0]
    	.replace(",", "")
	)

	# Past events
	past_events = (
    	soup.findAll("h3", {"class": "text--sectionTitle text--bold padding--bottom"})[
        	0
    	]
    	.text.split("Past events ")[1]
    	.replace("(", "")
    	.replace(")", "")
	)

Geocoding user groups

In order to plot each meetup group on a map, we need the longitude and latitude for each city in the meetup group. I was able to use Amazon Location Service to geocode each city name into longitude and latitude coordinates using a place index. For more information about creating a place index, see Amazon Location Service Developer Guide.    

Here is an example Python code of using a place index for geocoding.

import boto3

def get_location_data(location: str):
	"""
	Purpose:
    	get location data from name
	Args:
    	location - name of location
	Returns:
    	lat, lng -  latitude and longitude of location
	"""
	client = boto3.client("location")
	response = client.search_place_index_for_text(
    	IndexName="my_place_index", Text=location
	)

	print(response)
	geo_data = response["Results"][0]["Place"]["Geometry"]["Point"]

	#  Example output for Arlington, VA:   'Results': [{'Place': {'Country': 'USA', 'Geometry': {'Point': [-77.08628999999996, 38.89050000000003]}, 'Label': 'Arlington, VA, USA', 'Municipality': 'Arlington', 'Region': 'Virginia', 'SubRegion': 'Arlington County'}}
	lat = geo_data[1]
	lng = geo_data[0]

	print(f"{lat},{lng}")

	return lat, lng

Using SAM to orchestrate deployment

After testing the script locally, the next step was to create a mechanism to run the script daily and store the results in S3. I used the AWS Serverless Application Model (SAM) to create a serverless application that does the following.

  1. Create an S3 bucket
  2. Create a CloudWatch event to trigger every 24 hours
  3. Deploy a Python lambda function to run the data scraping code

Here is the outline used to deploy the serverless application highlighting sample code I used.

1. From a terminal window, initialize a new application
sam init

2. Change directory:
cd ./sam-meetup

3. Update dependencies
* update my_app/requirements.txt

requests
pandas
bs4

4. Update the code
Add in your code to example `my_app/app.py`

import json
import logging

import get_meetup_data


def lambda_handler(event, context):

	logging.info("Getting meetup data")

	try:
    	get_meetup_data.main()
	except Exception as error:
    	logging.error(error)
    	raise error

	return {
    	"statusCode": 200,
    	"body": json.dumps(
        	{
            	"message": "meetup data collected",
        	}
    	),
	}

5. Update template.yml

Globals:
  Function:
	Timeout: 600
Resources:
  S3Bucket:
	Type: 'AWS::S3::Bucket'
	Properties:
  	BucketName: MY_BUCKET_NAME
  GetMeetupDataFunction:
	Type: AWS::Serverless::Function
	Properties:
  	CodeUri: my_app/
  	Handler: app.lambda_handler
  	Policies:
    	- S3WritePolicy:
        	BucketName: MY_BUCKET_NAME
  	Runtime: python3.9
  	Architectures:
    	- x86_64
  	Events:
    	GetMeetupData:
      	Type: Schedule
      	Properties:
        	Schedule: 'rate(24 hours)'
        	Name: MeetupData
        	Description: getMeetupData
        	Enabled: True

6.  Run `sam build`

7. Deploy the application to AWS
sam deploy --guided

For more detailed information on developing SAM applications, check out Getting started with AWS SAM.


cost-optimization-blog-header.jpg

Automating AWS Cost Optimization
AWS provides unprecedented value to your business, but using it cost-effectively can be a challenge. In this free, on-demand webinar, you’ll get an overview of AWS cost-optimization tools and strategies.


Visualizing data with QuickSight

To share the user group data, I chose to use QuickSight using Amazon S3 as the data source.

QuickSight is a native AWS service that seamlessly integrates with other AWS services such as Amazon Redshift, Athena, Amazon S3, and many other data sources.

As a fully managed service, QuickSight enabled the team to easily create and publish interactive dashboards. In addition to building powerful visualizations, QuickSight provides data preparation tools that make it easy to filter and transform the data into the exact needed dataset. For more information about creating a dataset, see Creating a Dataset Using Amazon S3 Files.

The following are example screenshots from the dashboard.

quicksight_map.png
quicksight_table.png
quicksight_graph.png

Get a crash course on Amazon QuickSight and how to put eyes on your data with this AWS BI tool.


Conclusion

In this post, we discussed how to successfully achieve the following:

  • Geocode addresses using Amazon Location Service
  • Use Amazon EventBridge and AWS Lambda to transform and load the data daily to S3
  • Visualize and share the data stored using Amazon QuickSight
  • Automate and orchestrate the entire solution using SAM

The team at AWS uses this solution to plan on how best to engage the user community. If you’re interested in participating in your local community, check out the user group page here: https://aws.amazon.com/developer/community/usergroups/

About the Author

Banjo is a Senior Developer Advocate at AWS, where he helps builders get excited about using AWS. Banjo is passionate about operationalizing data and has started a podcast, a meetup, and open-source projects around utilizing data. When not building the next big thing, Banjo likes to relax by playing video games especially JRPGs and exploring events happening around him.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK