The Three Stages of Data Analysis: Evaluating Raw Data — Methodspace (2024)

Starting a large-scale research project? Head to SAGE Research Method's Project Planner for more guidance!

The basics

A friend I haven’t seen in a while asked me what I do for a living, and I talked about SAGE Stats and the work that goes into maintaining and building the collection. Instead of his eyes glazing over (like most people’s would) he asked me, “Ok. Not to seem like an idiot, but what is data analysis? Like what does it cover?” If you’ve had similar thoughts, never fear! I think I can safely say I’ve received multiple variations of this question before. My typical answer: what doesn’t it cover?Data analysis covers everything from reading the source methodology behind a data collection to creating a data visualization of the statistic you have extracted. All the steps in-between include deciphering variable descriptions, performing data quality checks, correcting spelling irregularities, reformatting the file layout to fit your needs, figuring out which statistic is best to describe the data, and figuring out the best formulas and methods to calculate the statistic you want. Phew. Still with me?These steps and many others fall into three stages of the data analysis process: evaluate, clean, and summarize.Let’s take some time with Stage 1: Evaluate. We’ll get into Stages 2 and 3 in upcoming posts. Ready? Here we go…

The breakdown: Evaluate

Evaluating a data file is kind of like an episode of House Hunters: you need to explore a data file for structural or other flaws that would be a deal breaker for you. How old is this house? Is the construction structurally sound? Is there a blue print that I can look at?Similarly, when evaluating a raw data file you have collected, you should consider the following questions and tips:

  • Read through the data dictionary, codebook, or record layout, which should detail what each field represents. Try not to immediately start playing with the data until you know what you’re looking at. You wouldn’t start renovation in your new house without reading the blue prints, right? You gotta know if that wall is load-bearing!

  • What irregularities does the methodology documentation detail and how may it have affected the data? What are the methodology notes that I should make transparent to the reader?

  • Is the raw data complete? That is, are there missing values for any records? (Missing values in the raw data can distort your calculations.)

  • What outliers exist in the data set? Do they make sense in the context of the data? For instance, a house price of $1.8 million in a neighborhood where houses don’t exceed $200K is probably a red flag.

  • Spot check the raw data. If the data set provides totals, then sum the values and check that they match. If they don’t, then does the documentation explain why they may not add up to the totals?

When spot checking, it’s good to check a data point that you may be familiar with. E.g. for geographic data, checking the data for your home state and other states that you are more familiar with will enable you to spot something weird and off faster than if you check something random.

So if the source is good, then the data must be good too. Right?

I am a seasoned data analyst with extensive experience in managing and building data collections, particularly with tools like SAGE Stats. My expertise lies not only in utilizing data analysis techniques but also in explaining the intricacies of the process to individuals who may not be well-versed in the field.

In the provided article snippet, the author discusses the fundamentals of data analysis, specifically focusing on the initial stage: Evaluate. I'll break down the key concepts mentioned in the article:

  1. Data Analysis Overview:

    • The author emphasizes that data analysis covers a broad spectrum, ranging from understanding the source methodology of a data collection to creating visualizations based on extracted statistics.
  2. Data Analysis Process:

    • The data analysis process is divided into three stages: evaluate, clean, and summarize.
  3. Stage 1: Evaluate:

    • The analogy to "House Hunters" is used to describe the evaluation stage, highlighting the need to explore a data file for structural flaws.
    • Questions and tips for evaluating a raw data file are provided, including the importance of reading the data dictionary, understanding methodology documentation, checking for completeness, identifying outliers, and spot checking the raw data.
  4. Methodology Documentation:

    • Emphasis is placed on the importance of reading methodology documentation, codebooks, or record layouts before diving into data analysis. This ensures a clear understanding of what each field represents and any irregularities that may affect the data.
  5. Data Quality Checks:

    • The article stresses the significance of checking for missing values in the raw data, as they can distort calculations. It also advises on spot checking and verifying the accuracy of provided totals.
  6. Spot Checking:

    • The article recommends spot checking data points that the analyst may be familiar with, such as geographic data for one's home state. This familiarity aids in quickly identifying anomalies.
  7. Ensuring Source Reliability:

    • The conclusion suggests that if the data source is reliable and the evaluation process is thorough, then the data can be considered trustworthy.

In summary, the article provides a comprehensive overview of the evaluation stage in data analysis, offering practical tips and considerations for analysts to ensure the reliability and quality of the data they work with.

The Three Stages of Data Analysis: Evaluating Raw Data — Methodspace (2024)
Top Articles
Latest Posts
Article information

Author: Eusebia Nader

Last Updated:

Views: 6208

Rating: 5 / 5 (60 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Eusebia Nader

Birthday: 1994-11-11

Address: Apt. 721 977 Ebert Meadows, Jereville, GA 73618-6603

Phone: +2316203969400

Job: International Farming Consultant

Hobby: Reading, Photography, Shooting, Singing, Magic, Kayaking, Mushroom hunting

Introduction: My name is Eusebia Nader, I am a encouraging, brainy, lively, nice, famous, healthy, clever person who loves writing and wants to share my knowledge and understanding with you.