Excel: Basic data analysis from a record of Steam games
Not too long ago, I tried browsing onlinejobsph after leaving the site since last year, because I don't think I'll get hired anytime soon on that platform.
This time, I saw a VA post that has something to do with data.
Here's the overview from their document:
"You are an analyst in a tech company specializing in digital games. Your boss has provided you with a large dataset of Steam games published between [Start Date] and [End Date]. He has asked you to organize this data and provide a summary of key trends.
Because this type of analysis will be performed regularly, you also need to document the steps you take to produce your output. This documentation will serve as a reference for future analysts."
After that, a link to a dataset of scraped Steam games list and a template to fill was provided. The CSV has file size of 76MB and total of 40,832 rows.
Columns are;
name,desc_snippet,recent_reviews,all_reviews,release_date,developer,publisher,popular_tags,game_details,languages,achievements,genre,game_description,mature_content,minimum_requirements,recommended_requirements,original_price,discount_price
Screenshot sample:
Here's our to-do;
1. For games released between 2012 and 2015, provide a count of those in the Indie, Action, and Adventure genres. If a game falls under multiple genres, count it in each applicable category.
2. List the top 5 popular tags of all time.
3. Generate a bar chart that visualizes the number of game releases for each year from 2015 to 2025.
Create an SOP, documenting the steps you took for steps 1 to 3. Keep in mind that another employee will be using it as reference in the future. You may create a new tab in your worksheet for this.
I tried to do the task on Google Sheets but it seems it can't handle the file size so the data cleaning and processing was done on Excel desktop app. I'll provide the steps I took for each item on the list.
1. First thing we have to do is load the CSV in Excel and apply a filter. We need to check the release_date column and make sure that every row has correct data format. We can see that most of the rows follow the format DD-MMM-YY
Checking on the filter, we have lots of dirty data that are not in the format we expect it to be.
Create a table from the data we have and create a helper column to get the release year for each game entry:
=YEAR([@[release_date]])
If we only want to count for rows with single genre, we can use the following:
=COUNTIFS(Table2[[genre]:[genre]],Y2,Table2[[release_year]:[release_year]],">=2012",Table2[[release_year]:[release_year]],"<=2015")
Where:
- Table2 is the table containing our data
- genre column contains the genre of the game
- Y2 is the string containing the genre (e.g. Indie)
- release_year is the helper column earlier showing the release year
We will have the following count:
But the requirement for us is, if a game falls under multiple genres, we need to count it in each applicable category. This means the count above is not correct for our goal.
I checked with AI and was advised to use Power Query. Steps taken below:
- Open the CSV in Excel and load in Power Query Editor. Remove everything else aside from popular_tag column
- Split columns based on 'comma' delimiter
- Unpivot the columns
- Group by tags and count, then sort descending to get the top tags
Here's our top 10 genre:
We have the count so we can proceed with the visualization as we were asked to create a graph.
- Use the cleaned dataset from before, then create a pivot table
- Add the release_year in columns and filter for the years we need to graph
- Drag the name column in values to get a count of names (games)
- Create a bar graph viz from pivot table and add data labels/modify as necessary.
Based on the records, Steam had the most number of games released last 2018. Although it's based on the scraped records so we can't rely on the accuracy (specially on later years.)
PS: I initially wrote the draft for this blog entry last Feb. 2025 (published September 2025) and submitted my response to the job listing back then, but I didn't hear anything from them so safe to say I did not make it.

Comments
Post a Comment