Data analysis

Data analysis

Analyzing data can be done in different ways and not specifically one is the best. It depends on clients, the systems they’re using and/or the amount of data. Data analysis is much more than juggling around with some numbers. Below an explanation of all that's involved.

Prep

A very important part of data analysis is cleaning and preparing data. Something I definitely underestimated. Before you start cleaning, you need to prepare your data. The first step is to make sure you have a copy of the original data, and work with that. You want to make sure that if you’re doing something wrong, you can always go back to the original data. It ensures you that you don't permanently delete data from the original source. It also prevents you from overwriting original data. Don't forget to always, ALWAYS keep a raw (original) file.

Make sure you have all the data you’re going to work with (or at least everything you know about at this stage). Make sure you have access to the data and store it somewhere where the client wants it (because of security reasons for example), whether it is on your computer, their cloud or a shared folder. It doesn’t really matter, as long as it’s clear and both parties agreed on it. This preparation saves you a ton of time in a later stage when someone suddenly wants to have a look and you just found out you saved data in three different places and a client doesn’t have access but wants to.

Image

Cleaning

When the preparation is done, you can actually start working with the data. First step is to clean it. So, what do you do when you clean your data? Most data files have all sorts of incorrect data, outliers, missing data, typographical errors or a combination of it. It’s up to you to decide on all of that before you start. Working with an uncleaned file can give you all sort of wrong answers and can show you incorrect information of what’s actually going on. For example, missing data stated as 'Nan' blocks building graphics since it’s not numerical so graphics can’t work with that data. Outliers are numbers that lie outside of the rest of the data and it could give you an incorrect mean (average) for example. If there is 5 people with the age of 20 and 1 with the age of 65, your mean is 27.5, which is not very representative for this group of 6 people. You might consider to delete that 65 year old person of your data file. Another example is if someone’s age is 0. It’s highly unlikely that this is correct and therefore deleting this person or changing the age to the average age might be a good solution.

Stated above is not a standard way of working. It's just some examples. It really depends on your client, your data and what you’re trying to achieve. But it’s important to look at it, so you know what’s going on in your data.

Because it's not a standard way of working, it’s important to keep communicating with your client/manager. They might have a strong opinion about it, do have preferences or want to know more. If you start deleting all the data or adjusting it and they have a very good reason to do want something else, you pretty much wasted all your time and their time and you can start over again. Taking them through those stages also helps them understand where you’re at, and why it might take longer than what they expect before you can show your first results.

Get a feel

So when you’re done cleaning up your data, hopefully you have a better feel of the data. Maybe it’s a personal preference but I’d like to get a good feel for the data so that I know and understand more what I’m looking at. It's likely that I end up working on data where I don’t know exactly every detail about, and if I have a better feel for the data, it’s easier to work with. I know better if I’m on the right track, whether my analysis makes sense, and if it’s a more an exploratory analysis I know better what else I can check out to see if one affects the other.

Start analyzing

While you’re analyzing, again, it’s important to keep up the communication with your manager/client. Make sure you share the first interesting significant findings as soon as possible, share a graphic (with explanation!!), tell them your next step, and spar with them, based on the first findings. It’s a thin line you’re on. They gave you the job so they don’t have to do it themselves, to save time or because it’s not their specialty but they know everything about the business and those insights can be very helpful for you by asking questions and staying in touch. It helps you moving in the right direction and focussing on the stuff they want to know more about. But make sure you don't overdo it.

Image
Image

Communicate

In my description above, it sounds like communicating is a lot. Yes, it is important, but it doesn’t have to be very time-consuming. It can be a short call, or message on slack or whatsapp, a quick e-mail, that only takes a few seconds, just to keep them updated. And no, managers don’t need to know about every f*rt you make but keep them in the loop. Nothing more frustrating for a manager than not hearing anything at all, and suddenly finding out you’ve gone down the wrong track, deleted stuff that shouldn’t be deleted or you’re stuck somewhere but they didn’t know and the solution was dead-easy because they look at it differently than you do. You end up starting over again, which is even more frustrating for them (and yourself!). As mentioned, it is a thin line, but try to find the important junctions where it is important to keep them up to date, send them a screenshot or ask them a question. It is gonna save you (and them) a lot of time.

Show your data

Analyzing itself does not take up most of the time in general. Once you’re done, make sure you show the data the way they want it. Presentation? Dashboard? Print screens? I don’t know, but it is important to know! Come up with (short) clear answers if possible, straight to the point, because they are looking for an answer. Click here to find out more about data visualization on.