Counting SNPs in a raw data file with Excel

At the DNA Sig meeting in May 2019, I showed an Excel file containing my MyHeritage Raw Data file with a table of quantities of SNPs per Chromosome.  Some people wanted to know how to do the same for their data.  I am finally getting around to telling how.

To do this requires a few steps.

  • Download your data file from your provider.
  • Uncompress the ZIP file.  Usually the raw file is a CSV file.
  • Import the CSV file into Excel
  • Delete the first few descriptor lines.
  • Put column headings into the file.
  • Save the file as an Excel file
  • Create a NAME Range over the Chromosome column.  In the snapshot below, I created a NAME called ALLDATA which covered all of my data. Do whatever is easier for you.
  • Manually create column F, called CHROM below, which contains just a number for each Chromosome.  You will need 1-22.
  • In column G, called COUNT below,, use the COUNTIF function over the NAME Range (ALLDATA) to count the associated value in column F. Copy that formula down through the 22 rows.
  • Add a SUM at the bottom of column G to verify totals

It is harder to write this how-to post than it is to do the actual tally.  If you have a spreadsheet program with a COUNTIF function, go ahead, do it for your data.  If you have tested at multiple companies, you could compare totals for each!  Enjoy…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s