**Abstract**

Dominance structure of animal societies is likely to be more complex than simple linearity, but few, if any, methods exist for quantifying nonlinear structure. Our R package “Perc” presented a new network-based method, called Percolation and Conductance, which permits nonlinear structure in dominance interactions. It uses information from both direct and indirect dominance pathways to calculate consistency in the direction of transitive dominance pathways from A to B (e.g. via pathway through C, D, and/or E). Greater consistency results in higher certainty that A outranks B. It creates a matrix of probabilities that the row individual outranks the column individual. It performs simulated annealing processes to explore possible rank orders and finds the best possible rank order. By applying simple R functions, researchers will be able to quantify rank order and dominance uncertainty with the help of the package.

Rank relationships amongst members of a social group are most commonly represented as an ordered list of individuals that are ranked from highest to lowest. Numerous methods exist to identify this rank order, and most methods fall under one of two approaches: (1) finding the appropriate ranking by reordering the rows and columns of a win/loss matrix or (2) calculating a cardinal dominance index which ranks individuals by the proportion of others dominated, both of which are based upon a data set of agonistic and/or submissive interactions. Such methods produce a linear rank order, whether or not the society is expected to have a linear hierarchy or whether the data fit the assumption of linearity (which is particularly problematic when calculating cardinal ranks (Shev et al 2012)). Most researchers agree that the dominance structure of animal societies is likely to be more complex than simple linearity, but few, if any, methods exist for quantifying nonlinear structure. We present a new network-based method, called Percolation and Conductance, that permits nonlinear structure to emerge via estimates of network-wide directional consistency in the flow of dominance interactions and detection of blocks of dominance ambiguity that are indicative of nonlinear segments of a hierarchy. An additional benefit of this method is the ability to quantify dyadic-level dominance certainty. Dyadic dominance potentials are calculated using multiple indirect dominance pathways in the network (via common third parties) to infer missing data in the win/loss matrix and enhance the calculation of dominance potentials from direct interactions. Greater consistency in the direction of transitive dominance pathways from A to B (e.g. via pathway through C, D, and/or E) result in higher certainty that A outranks B, whereas greater evidence of inconsistent direction (e.g. some directed pathways go from A to B whereas other directed pathways go from B to A) result in dominance ambiguity. Because the number of indirect pathways is typically exponentially larger than direct win/loss interactions, one can be relatively confident that dyads with ambiguous dominance potentials are truly ambiguous due to evidence of directional inconsistency in their network pathways rather than lack of data.

Rank order using the percolation-conductance method is based on a matrix that combines information from direct win/loss interactions with information from indirect pathways as described above to create a matrix of probabilities that the row individual outranks the column individual. Percolation-conductance finds these multi-step, directed pathways between all pairs of nodes by performing a series of random walks through the network. A simulated annealing process explores possible rank orders (from the space of all possible rank orders) – the more simulated annealing runs that are performed, a larger number of possible rank orders are explored, meaning that the best possible rank order is more likely to be found. The starting location of simulated annealing is determined by the total wins per individual in the matrix. Multiple simulated annealing steps are required to arrive at the optimal rank order, keeping in mind that multiple rank orders may be equally good if there are nonlinear segments in the hierarchy. The heat map function (plot.conf.mat) will plot the ranking and can allow for the detection of non-linear dominance structures. A block of individuals with probabilities near 0.5 indicates a subgroup of individuals whose relationships are not clearly defined. A second metric yielded by this method, dominance uncertainty, reflects the directional consistency of information flow through the direct and indirect pathways in a directed and weighted network.

The Percolation-Conductance method of quantifying rank and dominance uncertainty is computation complicated and requires researchers to have a sophisticated computational background to process the data. Our group, in collaboration with collaboraters in UC Davis Statistics Deparment, developed an R package that implements the percolation-conductance algorithm on directed dyadic interactions. By applying simple R functions, researchers will be able to quantify rank order and dominance uncertainty with the help of the package. The goal of this tutorial is to show how to quantify rank and dominance uncertainty from win-loss raw data.

The function `as.conflictmat`

is used to import raw edgelists or matrices for further analysis. Raw edgelists or matrices are records for directed dyadic win-loss interactions. The prefered R data types for `as.conflictmat`

are “`matrix`

” and “`data.frame`

”. Raw edgelists or matrices can be imported as R data type of either `data.frame`

or `matrix`

to be used in `as.conflictmat`

.

The most simple and commonly used raw data is a **edgelist of two columns** consisting all dyadic win-loss interactions, with the first columns being the winner and the second column being the loser. Duplicates are allowed in this two-column edgelist because multiple interactions between a pair are represented by multiple rows of the same winner and loser. Below is an example of a two-column edgelist.

```
library(Perc)
# displaying the first 5 rows of the example data.
head(sampleEdgelist, 5)
```

```
## Iname Rname
## 1 Kale Kibitz
## 2 Kale Kibitz
## 3 Kale Kibitz
## 4 Kale Kibitz
## 5 Kale Kolyma
```

**A weighted edgelist** is also allowed as raw edgelist data. It is a “`matrix`

” or “`data.frame`

” of three columns, with the first two columns being winner IDs and loser IDs respectively and the third column being the frequency of the win-loss interaction between the winner and the loser. An example of a weighted edgelsit is shown below:

```
# displaying the first 5 rows of the example data.
head(sampleWeightedEdgelist, 5)
```

```
## Initiator1 Recipient1 Freq
## 1 Angelina Alexis 1
## 2 Angelina Darcy 1
## 3 Angelina FirstTimeMom 1
## 4 Angelina Mak 1
## 5 Angelina Tabitha 3
```

In addition, **a matrix representing dyadic win-loss interaction** is also accepted as raw data. For example,

```
# displaying the first 5 rows and columns
sampleRawMatrix[1:5, 1:5]
```

```
## Alexis Angelina Aristotle BaldSpotMom C19
## Alexis 0 0 0 0 0
## Angelina 1 0 0 0 0
## Aristotle 0 0 0 0 0
## BaldSpotMom 6 0 0 0 0
## C19 0 0 0 0 0
```

The function `as.conflictmat`

is used to import raw data of any accepted format, transform raw data to a “`conflict matrix`

” for further analysis. “`conflict matrix`

” is very specifically refered to a class in R. If your raw data is an edgelist of either two- or three- column, it will transform the your raw edgelist to a matrix in which counts representing for how many times a row wins over a column. Running this on a win-loss matrix creates an identical conflict matrix. This is required for all data formats to create R object of class conflict matrix.

Below is an example on transforming a two-column edgelist:

```
# convert an two-column edgelist to conflict matrix
confmatrix <- as.conflictmat(sampleEdgelist)
# displaying the first 5 rows and columns of the converted matrix
confmatrix[1:5, 1:5]
```

```
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 0 0 0 0 0
## Kale 0 0 0 0 4
## Kalleigh 0 1 0 0 0
## Keira 0 0 0 0 0
## Kibitz 0 6 0 0 0
```

Below is an example on transforming a raw win-loss matrix:

```
# convert a win-loss matrix to conflict matrix
confmatrix2 <- as.conflictmat(sampleRawMatrix)
# displaying the first 5 rows and columns of the converted matrix
confmatrix2[1]
```

```
## [1] 0
```

When importing weighted edgelist (three-column), you'll need to specify `weighted = TRUE`

if your raw data is a weighted edgelist. Below is an example:

```
confmatrix3 <- as.conflictmat(sampleWeightedEdgelist, weighted = TRUE)
```

**Directionality** is very importnat when using the function `as.conflictmat`

to import raw data. By default, winners (sources) are in the first column and loses (targets) are in the second column for edgelists. Row names are assumed to be the winner for a raw win-loss matrix. Values in a raw win-loss matrix describe that for how many times the corresponding row ID win over the corresponding column ID. If the directionality in your raw data happens to be opposite from the default, you could specify `swap.order = TRUE`

in `as.conflictmat`

. It will tell R that the directionality in your raw data is the opposite of default setting. Running `as.conflictmat`

on your raw data with `swap.order = TRUE`

will automatically correct the directionality when transforming your raw data to a `conflict matrix`

. Here is an example,

```
# displaying the first 5 rows and columns of the raw win-loss matrix
sampleRawMatrix[1:5, 1:5]
```

```
## Alexis Angelina Aristotle BaldSpotMom C19
## Alexis 0 0 0 0 0
## Angelina 1 0 0 0 0
## Aristotle 0 0 0 0 0
## BaldSpotMom 6 0 0 0 0
## C19 0 0 0 0 0
```

```
# use swap.order = TRUE when running as.conflictmat.
confmatrix4 <- as.conflictmat(sampleRawMatrix, swap.order = TRUE)
# displaying the first 5 rows and columns of the converted matrix
confmatrix4[1:5, 1:5]
```

```
## Alexis Angelina Aristotle BaldSpotMom C19
## Alexis 0 1 0 6 0
## Angelina 0 0 0 0 0
## Aristotle 0 0 0 0 0
## BaldSpotMom 0 0 0 0 0
## C19 0 0 0 0 0
```

Before computing the dominance probability and finding the best rank order, you want to know whether or not long pathways are present in your data, and if present, whether or not these long pathways can be useful for gathering additional information about rank relationships.

You could find pathways of **particular length** that **starting** at a particular node (ID). For example,

```
# Testing findIDpaths.R
pathKuai <- findIDpaths(confmatrix, "Kuai", len = 3)
# Displaying the first 5 rows of pathKuai
pathKuai[1:5, ]
```

```
## [,1] [,2] [,3] [,4]
## [1,] "Kuai" "Kalleigh" "Kale" "Kibitz"
## [2,] "Kuai" "Kalleigh" "Kale" "Kolyma"
## [3,] "Kuai" "Kalleigh" "Kimora" "Kibitz"
## [4,] "Kuai" "Kalleigh" "Kimora" "Kioga"
## [5,] "Kuai" "Kalleigh" "Kimora" "Koppy"
```

```
# When there's no pathway starting at a particular individual, you'll get an output like the example below:
pathKalani <- findIDpaths(confmatrix, ID = "Kalani", len = 2)
```

```
## no pathways of length 2 found starting at Kalani
```

You could also use `transitivity`

to find out information on transitivity, a measure of how consistent or inconsistent the flow of information is through the network as a whole. This procedure looks for the number of triangles (e.g. A__B CA) and determines whether or not they are transitive. __

```
conftrans <- transitivity(confmatrix)
conftrans$transitive # number of transitive triangles
```

```
## [1] 14
```

```
conftrans$intransitive # number of intransitive triangles
```

```
## [1] 4
```

```
conftrans$transitivity # transitivity
```

```
## [1] 0.7777778
```

```
conftrans$alpha # alpha
```

```
## [1] 6.468627
```

The function `transitivity`

allow estimation for `alpha`

, which is a value used in the conductance procedure to weight the information from the indirect pathways. The higher the transitivity of a network (the higher the alpha), the greater the weight the conductance procedure will give the indirect pathways.

After transforming your raw data in an R object of `conflict matrix`

class, we will use the function `conductance`

to find all indirect pathways of a particular length (defined by `maxLength`

) and update the conflict matrix. The maximum length of indirect pathways should be decided based on the data. We use a max length of 4 for our dominance rank. This yields the dominance probability matrix which represents the probability the row outranks the column. Values in the matrix are from 0 - 1, with 0.5 being the most uncertain. If a value is less than 0.5, the corresponding row ID is more likely to lose against the corresponding column ID; if a value is greater than 0.5, the corresponding row ID is more likely to win over the corresponding column ID.

```
DominanceProbability <- conductance(confmatrix, maxLength = 2)
```

`DominanceProbability`

is a list of 2 elements. The first element, named `imputed.conf`

is the updated conflict matrix which incorporates information from indirect pathways from the original conflict matrix. Comparing the original conflict matrix with the updated conflict matrix will show the information that is gained through the indirect pathways:

```
# displaying the first 5 rows and columns of the original conflict matrix
confmatrix[1:5, 1:5]
```

```
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 0 0 0 0 0
## Kale 0 0 0 0 4
## Kalleigh 0 1 0 0 0
## Keira 0 0 0 0 0
## Kibitz 0 6 0 0 0
```

```
# displaying the first 5 rows and columns of the imputed conflict matrix
DominanceProbability$imputed.conf[1:5, 1:5]
```

```
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 0.00000000 0.000000 0 0.00000000 0.0000000
## Kale 0.00000000 0.000000 0 0.00000000 4.0557534
## Kalleigh 0.05575338 1.000000 0 0.11150676 0.2230135
## Keira 0.00000000 0.000000 0 0.00000000 0.0000000
## Kibitz 0.00000000 6.055753 0 0.05575338 0.0000000
```

Examining information gained through indirect pathways will provide you information to decide what is the appropriate `maxLength`

for your data. You could examine the information gained through indirect pathways by substracting the original conflict matrix from imputed conflict matrix. The information gained could visually examined by generating a corresponding heatmap using the function `plot.conf.mat`

.

```
# substracting the original conflict matrix from imputed conflict matrix.
informationGain <- DominanceProbability$imputed.conf - confmatrix
# examine the first five rows and columns of the informationGain
informationGain[1:5, 1:5]
```

```
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 0.00000000 0.00000000 0 0.00000000 0.00000000
## Kale 0.00000000 0.00000000 0 0.00000000 0.05575338
## Kalleigh 0.05575338 0.00000000 0 0.11150676 0.22301352
## Keira 0.00000000 0.00000000 0 0.00000000 0.00000000
## Kibitz 0.00000000 0.05575338 0 0.05575338 0.00000000
```

```
# generating a heatmap representing information gained by using informatio from indirect pathways.
plotConfmat(informationGain, ordering = NA, labels = TRUE)
```

As we could see from the heatmap, using `maxLength = 2`

does provide us more information than the original raw conflict matrix that we imported from the raw data. In real analysis, you may want to consider using `maxLength = 3`

or even more and repeat this diagnosis process to examine the information gain. This will allow you to have a better idea on what is the appropriate `maxLength`

for your data.

For example, if you want to examine whether or not `maxLength = 3`

provide more information than `maxLength = 2`

. Here is an example:

```
DominanceProbabilityLength3 <- conductance(confmatrix, maxLength = 3)
informationGain2 <- DominanceProbabilityLength3$imputed.conf - DominanceProbability$imputed.conf
plotConfmat(informationGain2, ordering = NA, labels = TRUE)
```

As we can see in this heatmap that using `maxLength = 3`

still gives you some more information. In real analysis, you may want to repeat the analysis and try `maxLength = 4`

. In addition, you may also want to consider your alpha value to decide how much you want to trust those long pathways.

The second element of `DominanceProbability`

the dominance probability matrix, named `p.hat`

. Here's how you could pull out the dominance probability matrix:

```
# displaying the first 5 rows and columns of the dominance probability matrix
DominanceProbability$p.hat[1:5, 1:5]
```

```
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 0.0000000 0.5000000 0.4236125 0.5000000 0.5000000
## Kale 0.5000000 0.0000000 0.1180829 0.5000000 0.4040371
## Kalleigh 0.5763875 0.8819171 0.0000000 0.6325280 0.7095211
## Keira 0.5000000 0.5000000 0.3674720 0.0000000 0.4236125
## Kibitz 0.5000000 0.5959629 0.2904789 0.5763875 0.0000000
```

When you want to use dominance probability matrix for further analysis, often a long-format representation of the matrix is easier. The function `dyadicLongConverter`

convert win-loss uncertainty to long format.

```
# displaying the first 5 rows of the converted long format win-loss probability.
dyadicLongConverter(DominanceProbability$p.hat)[1:5, ]
```

```
## ID1 ID2 ID1 Win Probability ID2 Win Probability
## 12 Kalani Kale 0.5000000 0.5000000
## 23 Kalani Kalleigh 0.4236125 0.5763875
## 24 Kale Kalleigh 0.1180829 0.8819171
## 34 Kalani Keira 0.5000000 0.5000000
## 35 Kale Keira 0.5000000 0.5000000
## RankingCertainty
## 12 0.5000000
## 23 0.5763875
## 24 0.8819171
## 34 0.5000000
## 35 0.5000000
```

Simulated rank order is computed using a simulated annealing algorithm by running the function `simRankOrder`

on `DominanceProbability$p.hat`

. The argument `num`

is the number of simulated annealing runs to generate best rank order. The argument `kmax`

tells you how long (perhaps iterations) the simulated annealing function looks around for a local optimum to find the best rank order. By default, `kmax = 1000`

was used. But it takes a long time to run when using `kmax = 1000`

. The example below uses kmax = 10 just because it runs quickly and it is enough to illustrate the process, however this value is not recommended when processing real data. Here is how to use `simRankOrder`

to find the simulated rank order:

```
# find simRankOrder
s.rank <- simRankOrder(DominanceProbability$p.hat, num = 10, kmax = 10)
```

The function `s.rank`

returns a list containing simulated rank order and Costs for each simulated annealing run. The best rank order will be the one with the lowest cost.

```
# displaying the first 5 rows of the simulated rank order
s.rank$BestSimulatedRankOrder[1:5, ]
```

```
## ID ranking
## 1 Koppy 1
## 2 Kioga 2
## 3 Kuai 3
## 4 Kalleigh 4
## 5 Kyushu 5
```

```
# displaying cost for each simulated annealing run
s.rank$Costs
```

```
## simAnnealRun Cost
## 1 1 2.321503
## 2 2 2.392065
## 3 3 2.392065
## 4 4 2.392065
## 5 5 2.321503
## 6 6 2.392065
## 7 7 2.392065
## 8 8 2.321503
## 9 9 2.321503
## 10 10 2.392065
```

```
# rank orders generated by each simulated annealing run
s.rank$AllSimulatedRankOrder
```

```
## SimAnneal1 SimAnneal2 SimAnneal3 SimAnneal4 SimAnneal5 SimAnneal6
## 1 Koppy Koppy Koppy Koppy Koppy Koppy
## 2 Kioga Kioga Kioga Kioga Kioga Kioga
## 3 Kuai Kalleigh Kalleigh Kalleigh Kuai Kalleigh
## 4 Kalleigh Kyushu Kyushu Kyushu Kalleigh Kyushu
## 5 Kyushu Kimora Kimora Kimora Kyushu Kimora
## 6 Kimora Kuai Kuai Kuai Kimora Kuai
## 7 Kibitz Kibitz Kibitz Kibitz Kalani Kibitz
## 8 Keira Kolyma Kolyma Keira Kibitz Keira
## 9 Kalani Kale Kalani Kalani Kolyma Kalani
## 10 Kolyma Keira Kale Kolyma Kale Kolyma
## 11 Kale Kalani Keira Kale Keira Kale
## SimAnneal7 SimAnneal8 SimAnneal9 SimAnneal10
## 1 Koppy Koppy Koppy Koppy
## 2 Kioga Kioga Kioga Kioga
## 3 Kalleigh Kuai Kuai Kalleigh
## 4 Kyushu Kalleigh Kalleigh Kyushu
## 5 Kimora Kyushu Kyushu Kimora
## 6 Kuai Kimora Kimora Kuai
## 7 Kalani Kibitz Kibitz Kalani
## 8 Kibitz Kalani Kolyma Kibitz
## 9 Keira Kolyma Kalani Kolyma
## 10 Kolyma Kale Keira Keira
## 11 Kale Keira Kale Kale
```

After finding the simulated rank order, you can apply the rank order to your dominance probability matrix in order to generate a heatmap with its rows and columns ordered by rank. This heatmap will highlight areas of non-linear dominance rank (i.e. areas that are brown in the example below). A block of individuals with probabilities near 0.5 indicates a subgroup of individuals whose relationships are not clearly defined.

```
plotConfmat(DominanceProbability$p.hat, ordering = s.rank[[1]]$ID, labels = TRUE)
```

Besides ranking information, the undirected certainty information in dominance relationship is also important. The values in the dominance probability matrix range from 0-1 indicating both the direction of the relationship and the certainty of the relationship. The function `valueConverter`

converts all values to values between 0.5 - 1.0, which, in essence, removes directionality from the relationship leaving a single metric of dominance certainty (0.5 indicates total uncertainty and 1.0 indicates total certainty). For example,

```
# displaying the first 5 rows and columns of the converted matrix
valueConverter(DominanceProbability$p.hat)[1:5, 1:5]
```

```
## Kalani Kale Kalleigh Keira Kibitz
## Kalani 1.0000000 0.5000000 0.5763875 0.5000000 0.5000000
## Kale 0.5000000 1.0000000 0.8819171 0.5000000 0.5959629
## Kalleigh 0.5763875 0.8819171 1.0000000 0.6325280 0.7095211
## Keira 0.5000000 0.5000000 0.6325280 1.0000000 0.5763875
## Kibitz 0.5000000 0.5959629 0.7095211 0.5763875 1.0000000
```

When you want to understand dyadic level dominance uncertainty, you could get the information from `dyadicLongConverter`

. For example,

```
# displaying the first 5 rows of Ranking Certainty
dyadicLongConverter(DominanceProbability$p.hat)[1:5, c("ID1", "ID2", "RankingCertainty")]
```

```
## ID1 ID2 RankingCertainty
## 12 Kalani Kale 0.5000000
## 23 Kalani Kalleigh 0.5763875
## 24 Kale Kalleigh 0.8819171
## 34 Kalani Keira 0.5000000
## 35 Kale Keira 0.5000000
```

When individual-level win-loss relationship uncertainty is the focus of your question, the function `individualDomProb`

computes the mean and standard deviation for individual-level win-loss probability.

```
individualDomProb(DominanceProbability$p.hat)[1:10,]
```

```
## ID Mean SD
## 1 Kalani 0.6079515 0.1705907
## 2 Kale 0.6403923 0.1628646
## 3 Kalleigh 0.7350389 0.1343705
## 4 Keira 0.6606999 0.1782413
## 5 Kibitz 0.6974806 0.1677272
## 6 Kimora 0.7269768 0.1736934
## 7 Kioga 0.7037738 0.1644045
## 8 Kolyma 0.6072364 0.1559790
## 9 Koppy 0.7157361 0.1433639
## 10 Kuai 0.6873502 0.1703318
```