Fossil data and fuzzy sets
By Cathy Willermet
The nature of fossil evidence available to answer
questions of human origins is fragmentary, has a patchy distribution over
large geographic areas, and is often difficult to date accurately. There
is disagreement over both how many groups of humans are represented, and
how to measure variability within and between these groups. Not surprisingly,
interpretations of this evidence vary among paleoanthropologists.
Living species of plants and animals are sometimes
difficult to define, since sometimes individuals of "good" species
are observed to mate with individuals of other species. Fossil species are
even more difficult to define, since we only have parts of a skeleton, and
could not observe its behavior when it was alive. Also, no two fossil individuals
are exactly alike, just like no two living individuals are exactly alike.
So it is often difficult to classify an individual specimen into a fossil
species.
What paleoanthropologists try to do is set up
a list of features that most of the individuals of a species share (for
instance, a large brow ridge or a sagittal keel). But any given specimen
may exhibit only some of those features. If the specimen has enough of the
features (maybe 70% or so) then it will be placed into that fossil species,
even if it does not show all of the features associated with that species.
We have to classify fossils this way because we often do not have complete
specimens that preserve all of the features in which we are interested.
However, because of the variation that exists
a) because no two individuals are alike, and b) we have to construct the
fossil species based on what we find, we're never really sure how good our
reconstruction of fossil species is. Think about it: if aliens dug up the
human species 100,000 years from now, and found Lucy Lawless (aka Xena),
Carrie Fisher, Dan Majerle, and Billy Crystal, they might place them into
two groups (tall and short)!

Also, two species can overlap on some variables.
For example, Neandertals and modern humans overlap in terms of brain size
(the Neandertal average is bigger than our average). If we use just brain
size, we can't clearly separate these two groups. Lots of features overlap
this way. So there are two things we can do:
· Only look at the features that separate
out the groups. If we do this we must ignore the data that do not separate
the groups. This means that our selective choice of data can influence
the results significantly, because we first assume that there are two groups
to begin with.
· Try to figure out how much the groups overlap. If we do this we
can get a sense of the amount of variation that exists in our fossil populations,
and maybe we can tell if we have two groups, three groups, or one big very
variable group.
Of course the best thing is to do both.
What do you do when a fossil shows lots of features from more than one list?
Unfortunately this happens a lot. If a species is in a transition period
(linear speciation) or very recently split into two species (branching speciation)
then the two fossil species are going to be very similar because they share
a very recent common ancestor. Maybe the two groups were still interbreeding
sometimes. How can you tell them apart with just the bones?
If you set up your research to require an individual
specimen to belong only to one species, then you might have problems with
classification. Perhaps you should design your research to allow specimens
to belong to more than one species. At this point, you should be thinking,
"Wait a minute...I only belong to one species, Homo sapiens.
I'm not partially in another species!" Of course. But think about this...
Suppose you were a forensic anthropologist working
for the police. The police brought you a skull that was found in the desert.
They want to find out who this person was who died. It is your job to narrow
down their missing person search. They want you to figure out, first, if
the skull is from a male or a female. Now of course the skull will only
belong to a male OR a female. But as you take measurements on the skull,
you find that some measurements fall into the male range, and some into
the female range. This is normal, as human males and females overlap for
many measurements. Let's say that the skull comes out male in 60% of the
measurements. What do you do? You could a) tell the police that the skull
is from a male, or b) tell the police that it is a male, but likely a small
male. Which conveys more information? B, of course.
The same logic can be applied to fossil species.
If you have a fossil that shows features of more than one group, you could
say a) it's mostly like species A, therefore it IS species A, or b) it's
mostly like species A, but it has lots of species B features too, so maybe
we should look at these species a bit more carefully, maybe better study
the boundary between these two groups.
Putting a specimen into more than one set requires a special kind of math,
called fuzzy logic. NOT fuzzy as in ill-thought-out. Fuzzy as opposed to
crisp.
Perhaps some definitions are in order. Crisp sets
are collections of discrete elements which could be conceptualized as forming
a distinct group. Crisp set membership is to a degree of 1 or 0, meaning
that an element is either in the set (1) or not (0). Set membership is exclusive,
meaning an element is in only one set, unless it is in the intersection
between two sets. Objects in this intersection belong 100% to both sets.
This is how we normally classify things. Thinking about sets this way is
fine for much of the data we encounter. However, interpretations of specimens
that fall in the intersection between two sets can be problematic.

For example, let's look at two sets of vehicles,
cars and trucks. There are many variables that one can use to classify vehicles,
including shape, size of engine, type of shock absorbers, and so on. Two
specimens, a sedan and a pickup truck, are easy to place crisply into their
respective sets. But what does one do when faced with a specimen like an
El Camino? Perhaps this is not too difficult; it shares features with elements
of both sets, so it is placed in the intersection of these sets. However,
it is the interpretation of this that then becomes problematic. While the
sedan is 100% a car and the pickup truck is 100% a truck, in a crisp set
the El Camino is both 100% a car and 100% a truck. This interpretation does
not reflect the blend of features presented by the El Camino. What we know
intuitively, and what we want to communicate, is that the El Camino is partly
a car and partly a truck.

The concept of fuzzy sets originates from the
observation that the world is not really composed of crisp sets, it is not
really black and white, but rather is composed of shades of grey. Many variables
overlap more than one group, and we want our classification to reflect this.
Fuzzy set membership of an element can be to a degree anywhere from 0 to
1. Specimens can be a member of many sets to a degree. Most importantly,
objects in an intersection between two sets can belong to both sets to a
degree.
Revisiting our example, we can set up our two
sets as fuzzy sets. The vertical axis represents fuzzy membership in that
set (to what degree the specimen belongs in that set); the horizontal axis
represents the sets. In this model, the sedan and the pickup truck still
sit comfortably within their respective sets. Now, however, the El Camino
is partially in both sets, and our interpretation can reflect that the El
Camino is partly a car and partly a truck. Depending upon the criteria one
uses to define these sets, the El Camino could be 50% a car, 72% a car,
and so on.
So fuzzy sets can apply to fossils too. Or males
and females. My dissertation research involves trying to figure out how
many groups of humans are living in the world around 100,000 years ago,
when we have Neandertals in Europe and the Near East, and some other archaic-looking
humans in North Africa and Europe, and some more modern-looking humans in
the Near East and East/South Africa. One species? Two? Three? I can't wait
to find out...