Revisiting Yellevate: Using Excel and SQL to generate business recommendation

 


I enrolled on a Data Analytics course from Refocus since January 2023 and completed the first project assignment around April 2023. The course consists of multiple modules highlighting different software/tools and after each module, there will be a project where you'll have to work with other students for 2 weeks. Each module has assignments that tests what you know ultimately leading to the module project.


This blog entry will serve as a documentation of what we did and my thought process so this will be longer and not in a presentation format. I will also not include the step-by-step process because this is not a tutorial. I'll simplify and elaborate stuff if necessary. My assumption is that you only have little to no knowledge of the tools and process used.


If you're currently enrolled in Refocus and doing the same project assignment, it would be best to not read this blog yet so you get to rely only on what you've learned on the course so far. You don't want to cheat on your learning process. We also did the same--there are a few students before us who already posted the project online and we refrained from checking them until we have submitted our output.


Let's start.


Yellevate is the sample pseudo-company on this project. They provide marketing services to other mid-sized companies which includes email marketing, website development, and content creation among other services. They use analytics to keep track of how they perform. They track a client's advertising operations by checking impressions, traffic, and other KPIs (key performance indicators) applicable.


Yellevate has a serious problem: They've been struggling with client disputes for the past few years. They define disputes as dissatisfied clients refusing to pay for the service they received. Statistically, nearly 20% of the disputes raised against Yellevate resulted in a payment opt-out which led to approximately 5% annual loss in revenue (in USD).


The company's top executives told the team that the information below will be needed for them to assess their next steps regarding the dispute problem:

1. Processing time in which invoices are settled

2. Processing time for the company to settle disputes

3. Percentage of disputes received by the company that were lost

4. Percentage of revenue lost from disputes

5. Country where they reached the highest losses from lost disputes


Let's define what I listed above.

- Settled means the invoice was already paid.

- Processing time is the interval (days) between the invoice being generated/sent to the client and the invoice being paid.

- Dispute happens when a client refuses to pay the invoice.

- Lost disputes means the client did not pay us. Clients can dispute their invoice but the outcome is either they eventually pay or not.


Note: I may occasionally refer to 'we' as the Yellevate company since they hired us to do the analysis. On some parts of the blog, I may refer to 'we' as the data analytics team. The context of the surrounding sentences should be obvious.


We have limited information available and here's the assumptions that are always true, for the purpose of this project:

- All values/currency are in USD.

- All services that Yellevate did were fully completed.

- Quality of service is not the reason for the disputes.

- Yellevate believes that most disputes are either from contract technicalities or clients thinking they can get away with not paying for the service.


In real-life scenario/s, there are lots of reasons for client dissatisfaction but we're limiting the scope here.


We used a shared Google Docs file as a workspace where we can input information on our progress and task assignment. The modules recommended that we use pgAdmin as a graphical user interface (GUI) when interacting with the data.


Yellevate sent us a CSV file containing the following information:

- Customer country and ID

- Invoice number, date, amount

- Disputed

- Lost Dispute

- Settled date (number of days it took to pay from invoice date)

- Due date (clients need to pay within a month after the invoice was sent)

- Days late (if client paid after the due date)


We created a new database on pgAdmin and loaded the CSV file. The data types are also set. These are the column names from the CSV:



Data cleaning is the first part of the task where we check if the file received from Yellevate is good to use or not. We reviewed the data by running SQL queries. I will not put the actual codes used here.


- Check if there are null/empty values on each column and if the values in each row is correct.

- Check distinct values in country to make sure that no country name is misspelled. Check disputed and dispute_lost columns if distinct values are 0 and 1.

- Check the year for columns that have date entries to make sure there are no significant outliers.

- Invoice number is supposed to be unique whereas customer ID is not; so check distinct values for that column. It's possible for a client to use Yellevate's service multiple times.

- Check if the values in the days_late column is correct by subtracting the values on settled_date from the due_date. If the output is negative, it means the client paid before the due date. We don't want negative numbers so query is set to output '0' if paid on time or before due date, and output the actual number of delayed days if paid after due date.

- The days_to_settle column was checked by subtracting the settled_date from invoice date.


After running the queries, we found that the data is already clean. We created a new column named disputed_notdisputed with values 'Disputed' and 'Not Disputed' based on the disputed column for readability.


We then added the granted_notgranted column with values 'Granted' or 'Not Granted' that checks the disputed_notdisputed and dispute_lost column. If the item from the disputed_notdisputed column reads 'Disputed' and if the value from dispute_lost column is '1', then it will output 'Granted'. Note that the dispute_lost column only has two values which are 0 and 1. 0 means that the client dispute was not successful and we were able to charge them while 1 means that we lost the dispute and the client did not pay.


We checked if a transaction is disputed, and if we lose on that dispute because we don't expect that we will lose a dispute if the transaction is not disputed in the first place. The table in the database was updated and exported to CSV for visualization later in Excel.


Before we visualize the data in Excel, we can now answer the items that the company executives want from us based on what I listed earlier. We can do this by running queries in SQL. The values are rounded to either whole number or two decimal places, if applicable.


1. The average processing time in which invoices are settled is 26 days. We got this number from getting the average of the items in days_settled column.

2. The average processing time for Yellevate to settle disputes is 36 days. We got this number from getting the average of the items in days_settled column but only filtering the rows where disputed value on the row is 1. 1 means that a transaction was disputed, regardless if we got the money or not.

3. Percentage of disputes received by the company that were lost is 17.69% which is pretty close to the 20% approximate value mentioned by Yellevate. We got this by getting the sum for dispute_lost and disputed columns then dividing the sum of dispute_lost column against the sum of disputed column. The quotient is then multiplied to 100 to get the percentage.

Simpler explanation: We checked how many transactions are disputed then got the percentage for how much of the disputed transactions were lost.

4. Percentage of revenue lost from disputes is 4.67% which is also close to the 5% value Yellevate mentioned. We got this percentage by getting the sum for invoice_amount column where we lost a dispute then divide it against the sum of invoice_amount column (or the amount that we expect to get from our transactions). The quotient is multiplied by 100 and rounded off to two decimal places.

5. The country where we lost the most amount of money from lost disputes is France. We just simply summed up the values from invoice_amount column but only filtering for transactions with disputes lost. The values are grouped by country in descending order.


Now that we already answered Yellevate's non-negotiables, we can now proceed with visualizing and offering our recommendations.


We turn the raw information we got from the CSV to a table so we can use it and create pivot. One of my teammates was assigned for this part and he made multiple tables for possible chart/visuals creation. Here's some of those.

This pivot shows the average for days_late and days_to_settle. There's also pivot to count the sum of disputed transactions and lost disputes.


Despite not receiving the most number of disputes from France, Yellevate lost the most amount of money there from disputes.



Based on our findings, Yellevate can now do a deep-dive or root-cause analysis with other variables that may be affecting the numbers.

Here are few recommendations and questions that may be asked:

1. Yellevate may review the differences and/or similarities of contracts especially for those customers with large number of disputes granted. We will save time and manpower hours by minimizing/avoiding disputes in the first place.

2. They may review the dispute process and look into how disputes are being handled. What are they doing with disputes from clients in China that they often get paid for the disputes raised there? 

3. Do they charge different rates for services in different countries and we are just working with the converted USD amount? Does the cost of their service affect the client's decision to pay?

4. Should they be more careful with picking clients and do better in terms of working with the client's budget?


Ultimately, companies need to understand and find out the 'why' for the trends that the analysis uncovers. On some cases, the analysis and visuals may already be the explanation that we are looking for.


This group project, albeit simple, gave us a glimpse on how our future work may look like. I feel like I may have missed a few corners on this blog post but it doesn't look bad for a first entry.


Thanks for reading and I'll see you again. 

- Marc









Comments