When records wouldn’t tell me the identity of my great-great grandfather, I decided to analyze my DNA matches to find him.
This is the second in a series of blogs about finding my great-great grandfather, the dad of my great-grandmother, Agnes Florence Thornton Saxton. These blogs contain old images, maps, formulas, and charts. Although I’ve made an effort to optimize them for mobile devices, they are much easier to see in desktop mode.
The message on Ancestry is gone, but I’ll never forget how I felt when I read that note and found out my great-great grandfather was not who I thought he was. After taking an Ancestry’s autosomal DNA test, I had sent a message to one of the known descendants of the Stevenson family. All my life, I had believed that Matthew Stevenson was my great-great-grandfather. But I was looking at DNA evidence that made that impossible. So, I was grasping for straws, or clues, because I didn’t know what was going on. The woman who answered my questions was polite and even friendly. Yet, on her second reply, her words cut through me, even though I knew they had to be true.
“Well, you’re not a (DNA) match, so you can’t be one of us,” she wrote in the summer of 2021.
Nope, I was not a Stevenson. But, who was I? I started searching for an answer. Then, I stopped. The amount of information was overwhelming. My dad’s side of the family tree had more than 30,000 matches in the Ancestry DNA database. I also didn’t know how to analyze DNA matches, and that task seemed impossible. Instead, I researched my mom’s line and forgot about this puzzle until July 2023. That’s when I found a box of family papers that included a story written by my grandaunt, Dorothy Saxton Dusa, which is here. The story has the incorrect details about Matthew Stevenson, but it has the correct details about my great-great grandmother, Sarah “Sadie” Eccles Lewis. Suddenly, finding my biological great-great grandfather seemed possible. I realized that the solution was probably hiding inside my DNA matches.
Analyzing DNA Matches
I had to learn how to read the results of my DNA matches. Ancestry has a lot of information to help get started, which includes a detailed white paper about how its matching system works. It also discusses probability, which is the foundation of DNA research. Another online program helped me narrow my results further.
One of the first things I did was join the Facebook group called DNA detectives They have a wealth of information in their files and many members who know what they are talking about. I learned about the Leeds Method in that group. It was developed by Dana Leeds as a visual approach to organizing and analyzing DNA matches. The Leeds Method calls for using a spreadsheet. But I was only interested in tracing one line, so I adopted a modified approach using the built-in tools on Ancestry DNA. My approach included:
- Assigning a group and color to each of the 16 surnames (last names) of my great-great grandparents with one of them being labeled “Unknown,”
- Finding common ancestors to sort matches into each group,
- Viewing public trees of matches in the unknown group,
- Determining the approximate relationship of those matches based on the amount of DNA that we share,
- Learning how shared matches are related to each other,
- Choosing a potential common ancestor from the public trees based on this research, and
- Building a tree to test my theory.
Looking for High cMs
Because I already had identified the other lines, sorting my matches into groups wasn’t that hard. It wasn’t long before I had several matches in the “Unknown” group that I couldn’t place anywhere else on the tree. In fact, I stopped once I realized that I had included all the matches that had to be closely related based on the amount of DNA I shared with them. I knew this because of the number of centimorgans that I shared with a match.
A centimorgan (cM) is a unit used to measure the length of DNA segments shared between two people. The higher the number of shared cMs, the closer the relationship is likely to be. However, it’s almost impossible to know the exact relationship based on cMs alone. That’s because there is a range of possible relationships for each cM value and sometimes, they overlap. It’s easier to establish parent-child relationships. Those typically share between 3,400-3,500 cMs. After that, the numbers fall off quickly. A first cousin could share as little as 500 cMs, and fifth cousins or more distant might share as little as 0 cMs. Once you get to fifth cousins or more distant, it’s likely that you will share little to no DNA with them at all. While it might seem impossible to share no DNA with a cousin or ancestor, it’s entirely possible. In fact, it’s possible to share no DNA with a third cousin.
As I looked at the unknown matches, however, I realized that I had a problem. The amount of DNA that I shared with the matches in that group was low. They could have been any of several possible relationships, as you can see in great detail on The Shared cM Project tool v4 or more generally in the image above.
Going Pro
I must give a shout-out to Ancestry. When the company launched its Pro Tools feature earlier this year, I decided to give it a try. My main reason for adding the feature to my account was not for its DNA benefits, but for its ability to help clean duplicates and other issues within the tree. Nevertheless, the DNA benefits provided the missing piece of information that led me to a conclusion.
One of my first cousins on that side of the family also has tested. Because the Ancestry Pro Tools let me see how my matches were related to my cousin and me, I was able to see the cM values between her and the matches that we share. They included four matches from the unknown group. For those matches, her cM values were much higher than mine. That meant they had to be more closely related than I thought. Then, the moment came where it all fell together.
The shared match in that group who had the highest cM value was also the one who had the most detailed family tree. With a 168 cM match to my first cousin, I realized that he could be a half-first cousin 2x removed or a second cousin 1x removed based on data from The Shared cM Project. But in either case, I was able to figure out that this matches’ great-grandparents had to be my great-great-great grandparents. Lucky for me, the match had all great-grandparents on the tree.
Feeling the Chills
It’s hard to describe the feeling that came over me. I had chills. At that moment, I realized that two of the eight people that I was staring at were the great-great-great grandparents that I had fought so hard to find. Victory was close. One of their children was going to be the father of my great-grandmother. Was he on the tree, too?
Soon, I was scouring for more information. I found the answer in my other shared matches. Because I could see how the shared matches in the unknown group matched with my other matches in that group, I was able to figure out how the matches were related. That led me to discover a common name: McCaksey. And sure enough, the match with 168 cM was descended from a McCaskey. My great-great-great grandparents were John W. McCaskey and Sarah Richardson.
John W. and Sarah Richardson had two sons. One of them was William, who was born Dec. 6, 1872. The other was John Gruard McCaskey, who was born July 3, 1874. Because the cM values with these matches could have been one of several possible relationships, it was difficult to tell if the father was more likely John G. or his brother, William. For this, I turned to DNA Painter’s WATO (What Are The Odds) tree. It let me upload my existing trees, add cM values to known matches, and run hypotheses on relationship probabilities. It considered many possibilities. John was the highest on the list with an 84% probability. I wrote about my discovery, which you can read here.
Calculating the Probabilities
However, I wanted to see the calculations for myself. So, using Bayes’ Theorem, I developed a problem. You can see it here. I found that John was my great-grandfather with a 66% likelihood. The numbers are different because I ran fewer hypotheses, but they appear to be equally accurate and reliable.
It’s unlikely I will get new data for anything closer than what I have. The best match on the McCaskey line is older than 80, and these relationship values degrade through the generations. But I am confident that I have found my lost great-great grandfather, and I’m moving on to the next topic.
View Matt Saxton’s family tree here on Ancestry.
Leave a Reply