How to track, analyze, and visualize data using AWS — A Cloud Guru
source link: https://acloudguru.com/blog/engineering/how-to-track-analyze-and-visualize-user-group-data-using-aws
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Using AWS to understand trends in user group data
AWS user groups are communities that meet regularly to share ideas, answer questions, and learn about new services and best practices. The COVID-19 pandemic has shifted these in-person community gatherings to virtual events. With this shift, we have noticed that groups have become less engaged over time.
For example, in the North American region, 60% of groups had at least one meetup in the past 12 months, while only 36% of groups have had one meetup in the past 3 months. This data indicates that “Zoom fatigue” is growing in AWS user groups as events have stayed virtual.
Accelerate your career
Get started with ACG and transform your career with courses and real hands-on labs in AWS, Microsoft Azure, Google Cloud, and beyond.
In order to re-engage with the user groups, my team at AWS wanted a faster, effective, and reliable way to analyze user group data and visualize that into a dashboard. This solution has helped in multiple ways, including:
- Tracking active meetups: The team now has visibility into which user groups have been active in the past 12, 6 or 3 months.
- Visualizing where members are: The team now has a map to see where our groups are located, and how big their footprint is.
- Keeping up with events: The team now has a table ordered by last event timestamp to see which groups recently have had an event.
Overview of the solution
This post describes how I built the solution, from geocoding address data to the final visualization on Amazon QuickSight.
In this solution, I started by developing Python code to scrape user group data from Meetup.
In order to plot each meetup on a map, geolocation data was needed. I used Amazon Location Service to geocode the address into a longitude and latitude coordinate.
The transformed data is then published to an Amazon Simple Storage Service (Amazon S3) bucket.
I used Amazon EventBridge to set up a daily job to trigger a lambda function to collect the user group data. The reporting and visualization layer is built using QuickSight. Finally, the entire pipeline is deployed by using AWS SAM.
The following diagram illustrates this architecture.
Collecting user group data
The user groups use meetup.com to organize their events. My team is curious about the groups in Canada and the U.S listed on the User Groups in the Americas page.
I used BeautifulSoup and the requests library to scrape the content from the AWS User Group website.
The script first gets the meetup URL for each user group through the get_user_group_data function. Based on the presence of certain div attributes, it stores the relevant meetup URL and name in a list to be scrapped.
Next, the get_meetup_info function iterates through the list and parses the information on each individual meetup page such as number of members, and meetup location. The raw data is saved as a CSV for further processing.
The solution in this post is for demonstration purposes only. We recommend running similar scripts only on your own websites after consulting with the team who manages them — or be sure to follow the terms of service for the website that you’re trying to scrape.
The following shows a sample of the script.
meetup_json = {}
page = requests.get(meetup_url)
usergroup_html = page.text
soup = BeautifulSoup(usergroup_html, "html.parser")
# Get Meetup Name
meetup_name = soup.findAll("a", {"class": "groupHomeHeader-groupNameLink"})[0].text
# Meetup location
meetup_location = soup.findAll("a", {"class": "groupHomeHeaderInfo-cityLink"})[
0
].text
# Number of members
meetup_members = (
soup.findAll("a", {"class": "groupHomeHeaderInfo-memberLink"})[0]
.text.split(" ")[0]
.replace(",", "")
)
# Past events
past_events = (
soup.findAll("h3", {"class": "text--sectionTitle text--bold padding--bottom"})[
0
]
.text.split("Past events ")[1]
.replace("(", "")
.replace(")", "")
)
Geocoding user groups
In order to plot each meetup group on a map, we need the longitude and latitude for each city in the meetup group. I was able to use Amazon Location Service to geocode each city name into longitude and latitude coordinates using a place index. For more information about creating a place index, see Amazon Location Service Developer Guide.
Here is an example Python code of using a place index for geocoding.
import boto3
def get_location_data(location: str):
"""
Purpose:
get location data from name
Args:
location - name of location
Returns:
lat, lng - latitude and longitude of location
"""
client = boto3.client("location")
response = client.search_place_index_for_text(
IndexName="my_place_index", Text=location
)
print(response)
geo_data = response["Results"][0]["Place"]["Geometry"]["Point"]
# Example output for Arlington, VA: 'Results': [{'Place': {'Country': 'USA', 'Geometry': {'Point': [-77.08628999999996, 38.89050000000003]}, 'Label': 'Arlington, VA, USA', 'Municipality': 'Arlington', 'Region': 'Virginia', 'SubRegion': 'Arlington County'}}
lat = geo_data[1]
lng = geo_data[0]
print(f"{lat},{lng}")
return lat, lng
Using SAM to orchestrate deployment
After testing the script locally, the next step was to create a mechanism to run the script daily and store the results in S3. I used the AWS Serverless Application Model (SAM) to create a serverless application that does the following.
- Create an S3 bucket
- Create a CloudWatch event to trigger every 24 hours
- Deploy a Python lambda function to run the data scraping code
Here is the outline used to deploy the serverless application highlighting sample code I used.
1. From a terminal window, initialize a new applicationsam init
2. Change directory:cd ./sam-meetup
3. Update dependencies
* update my_app/requirements.txt
requests
pandas
bs4
4. Update the code
Add in your code to example `my_app/app.py`
import json
import logging
import get_meetup_data
def lambda_handler(event, context):
logging.info("Getting meetup data")
try:
get_meetup_data.main()
except Exception as error:
logging.error(error)
raise error
return {
"statusCode": 200,
"body": json.dumps(
{
"message": "meetup data collected",
}
),
}
5. Update template.yml
Globals:
Function:
Timeout: 600
Resources:
S3Bucket:
Type: 'AWS::S3::Bucket'
Properties:
BucketName: MY_BUCKET_NAME
GetMeetupDataFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: my_app/
Handler: app.lambda_handler
Policies:
- S3WritePolicy:
BucketName: MY_BUCKET_NAME
Runtime: python3.9
Architectures:
- x86_64
Events:
GetMeetupData:
Type: Schedule
Properties:
Schedule: 'rate(24 hours)'
Name: MeetupData
Description: getMeetupData
Enabled: True
6. Run `sam build
`
7. Deploy the application to AWSsam deploy --guided
For more detailed information on developing SAM applications, check out Getting started with AWS SAM.
Automating AWS Cost Optimization
AWS provides unprecedented value to your business, but using it cost-effectively can be a challenge. In this free, on-demand webinar, you’ll get an overview of AWS cost-optimization tools and strategies.
Visualizing data with QuickSight
To share the user group data, I chose to use QuickSight using Amazon S3 as the data source.
QuickSight is a native AWS service that seamlessly integrates with other AWS services such as Amazon Redshift, Athena, Amazon S3, and many other data sources.
As a fully managed service, QuickSight enabled the team to easily create and publish interactive dashboards. In addition to building powerful visualizations, QuickSight provides data preparation tools that make it easy to filter and transform the data into the exact needed dataset. For more information about creating a dataset, see Creating a Dataset Using Amazon S3 Files.
The following are example screenshots from the dashboard.
Get a crash course on Amazon QuickSight and how to put eyes on your data with this AWS BI tool.
Conclusion
In this post, we discussed how to successfully achieve the following:
- Geocode addresses using Amazon Location Service
- Use Amazon EventBridge and AWS Lambda to transform and load the data daily to S3
- Visualize and share the data stored using Amazon QuickSight
- Automate and orchestrate the entire solution using SAM
The team at AWS uses this solution to plan on how best to engage the user community. If you’re interested in participating in your local community, check out the user group page here: https://aws.amazon.com/developer/community/usergroups/
About the Author
Banjo is a Senior Developer Advocate at AWS, where he helps builders get excited about using AWS. Banjo is passionate about operationalizing data and has started a podcast, a meetup, and open-source projects around utilizing data. When not building the next big thing, Banjo likes to relax by playing video games especially JRPGs and exploring events happening around him.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK