Posts

Excel, Python: Basic data scraping for NBL Australia basketball game stats

Image
  I recently went to Onlinejobsph again to look for quick gigs that can be done on the side using my skills. This time, I found a post looking for someone who can scrape basketball data. Reached out and got a reply about the site they want to scrape from, and what data they need. I replied back probing for more details to decide how much should I charge and the scope of the requirement but after few days, received no response and the listing is already removed. The post stated they're not interested in it being done by AI because they've already tried it. I mentioned to them that I actually use AI but I have working knowledge of Python to know and understand if the code is garbage or not. Since I don't want my initial effort go to waste, I actually started building the script as a way to practice. Site is NBL which is a basketball league from Australia. Luckily, they don't have heavy anti-bot measure (or they probably have, but I have not triggered it on testing) The fi...

Excel: Basic data analysis from a record of Steam games

Image
  Not too long ago, I tried browsing onlinejobsph after leaving the site since last year, because I don't think I'll get hired anytime soon on that platform. This time, I saw a VA post that has something to do with data. Here's the overview from their document: "You are an analyst in a tech company specializing in digital games. Your boss has provided you with a large dataset of Steam games published between [Start Date] and [End Date]. He has asked you to organize this data and provide a summary of key trends. Because this type of analysis will be performed regularly, you also need to document the steps you take to produce your output. This documentation will serve as a reference for future analysts." After that, a link to a dataset of scraped Steam games list and a template to fill was provided. The CSV has file size of 76MB and total of 40,832 rows. Columns are; name,desc_snippet,recent_reviews,all_reviews,release_date,developer,publisher,popular_tags,game_deta...

DataSolve 2: How to compare retrieved values from a Python dictionary and print comparison results

Image
So, this is not necessarily data related but I'll just file this under DataSolve series. This will be the second installment. Basically, DataSolve willl be anything that I saw online and decided to answer. We are looking into a question posted by a member of a local Python group (PH) I'm a member of. The question goes: Hi. sa mga veterans natin dyan. paano ko ico-compare yung dictionary ko kung si A ay greater than B? e.g. import random data = [ { 'team': 'example_user_1', 'follower_count': 346, }, { 'team': 'example_user_2', 'follower_count': 215, }, { 'team': 'example_user_3', 'follower_count': 500, }, { 'team': 'example_user_4', 'follower_count': 600, }, ] random_data = random.choice(data) # supposed to compare the data in the dictionary print(f" Compare A: {random_data['team']}") print(f" Compare B: {random_data['team']}") kasi an...

Whodunnit? A solution and walkthrough for SQL Murder Mystery

Image
  There are two ways to go through the game. Either you just jump and figure out yourself or go with the walkthrough. Since I want to challenge myself and see if the basic SQL knowledge I got is enough, I jumped with the game directly. No help of AI. I'll walk you through how I solved the game and found out the killer--and more. Please note that I will not treat you as a total beginner without SQL knowledge here.

DataSolve 1: How to combine columns from multiple Excel files using Python

Image
  In this new series of blog entries, I will be taking questions I see online and try to provide an answer for them. This is learning by doing and familiarizing myself with the tools and terms of the field. I started learning Python from online tutorials before considering data analytics. Reason is I saw this programming language as something relatively easy to pick up, with good job potential if taken seriously, and a tool for automating tasks. I consider myself on beginner-intermediate level since I haven't fully grasped the concept of Classes. Python is already on the final part of the data analytics journey under Refocus. For this initial release, I stumbled upon a question on a facebook group 'Data Engineering Pilipinas' and figured it's a good way to practice my Python for data-related stuff and create a blog entry here. You may need a pretty good understanding of Python to understand the items below. From the screenshot, here's what is required: "The obj...

Delivery data visualization via PowerBI

Image
I mentioned previously that on each module we complete in Refocus, there's a final task that will be checked by the mentor/s. PowerBI is already under the advanced module and I will be revisiting one of the assignments under the PowerBI module. This seems to be the final task few months ago but they've already made changes to the modules so this one became one of our assignments that also needs to be submitted for mentor review. We have a sample dataset from a delivery company operating in India. The delivered items include sides, drinks, deserts, etc., and are usually delivered in boxes or bags. In general, the delivery will be done using a car but in bigger cities where homes and restaurants are closer together, the company may opt into using bikes or motorized scooters. The requirements are the following: - Import and clean the data then build a dashboard for it. The dashboard should show the vehicle type, type of order, weather conditions, and road traffic. Should also show...

Revisiting Yellevate: Using Excel and SQL to generate business recommendation

Image
  I enrolled on a Data Analytics course from Refocus since January 2023 and completed the first project assignment around April 2023. The course consists of multiple modules highlighting different software/tools and after each module, there will be a project where you'll have to work with other students for 2 weeks. Each module has assignments that tests what you know ultimately leading to the module project. This blog entry will serve as a documentation of what we did and my thought process so this will be longer and not in a presentation format. I will also not include the step-by-step process because this is not a tutorial. I'll simplify and elaborate stuff if necessary. My assumption is that you only have little to no knowledge of the tools and process used. If you're currently enrolled in Refocus and doing the same project assignment, it would be best to not read this blog yet so you get to rely only on what you've learned on the course so far. You don't want t...