Supplemetary materials contains all results reported in the printed version of the study. Additional information include detailed descriptive statistics and values of all nodes and edges within both networks. Code can be displayed by click on the optional fields.
To analyze the network we used the packages qgraph (Epskamp et al. 2012), igraph (Csárdi and Nepusz 2006), bootnet (Epskamp and Fried 2018) and NetworkComparisonTest (Van Borkulo et al. 2017). For data preparation we used dplyr (Wickham and Henry 2017), tidyr (Wickham et al. 2017) and labelled (Larmarange 2019). For descriptive statistics, psych (Revelle 2018), lsr (Navarro 2015) and corrgram (Wright 2018) were used. The package DT (Xie, Cheng, and Tan 2019) provided download ready data tables while kableExtra (Zhu 2019) was used to render tables. Additional graphics were created with the packages ggplot2 (Wickham 2009) and cowplot (Wilke 2019).
Missing data
Information about frequency and pattern of missing values are displayed following the recommendations by (McKnight 2007).
rbind(.dataset1, .dataset2) %>%
mutate(sample = factor(c(rep("Dataset 1", nrow(dataset1)), rep("Dataset 2", nrow(dataset2))))) %>%
# calculate sum of missings for all 47 gender unspecific items (exclude id and sample variables)
mutate(missings = apply(.[,grepl("S_[[:digit:]]$|[[:digit:]][[:digit:]]$", names(.))], 1, function(x) {sum(is.na(x)) })) -> .missing
merge(.missing %>% group_by(sample) %>% summarise(N = length(sample), miss = sum(missings), Max = max(missings)),
.missing %>% group_by(sample) %>% filter(missings > 0) %>% summarise(n = length(missings)),
by = "sample") %>%
# average missings...
## ...total sample
mutate(amn = miss/N) %>%
## ...per participant
mutate(amp = miss/n) %>%
## ...per item
mutate(ami = miss/47) -> .descriptive_missing_table
# adding more infos
.descriptive_missing_table %>%
## % of participants with missings in the total sample
mutate(pn = round(n/N*100,2)) %>%
## % of cells with missing
mutate(pc = round(miss/(N*47)*100,2)) %>%
dplyr::select(sample, N, miss, pc, n, Max, pn, everything())-> .descriptive_missing_table
Dataset 1
.missing %>%
filter(sample == "Dataset 1") %>%
count(missings) %>%
mutate(p = round(n/sum(n)*100,2)) %>%
ggplot(., aes(x = factor(missings), y = n)) +
geom_bar(stat = "identity") +
geom_text(aes(label = paste(p, "%", sep = "")), hjust = -0.2) +
scale_y_continuous(limits = c(0, 250)) +
labs(x = "Frequency of missing values",
y = "Number of cases with missing values") +
theme_classic() +
theme(legend.position = "top",
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 13, color = "black"),
strip.text = element_text(size = 14, color = "black"),
legend.text = element_text(size = 12, color = "black"),
panel.grid.major.x = element_line(color = "lightgrey", linetype = "dotted")) +
coord_flip()
Figure. Barplot for the frequency of missing values. On x-axis, the amount of missing values is displayed and on y-axis the number of cases with such missing values are plotted. Overall 47 missing values were possible for all gender-unspecific items.
Global Structure
In Dataset 1 out of 11938 possible values for all 47 gender unspecific items, 49 cells showed missing values (0.41%). The structure of missing values for each case is shown in the figure below.
.missing %>%
filter(sample == "Dataset 1") %>%
dplyr::select(Patienten_Code, S_01 : S_47) %>%
mutate_at(vars(-Patienten_Code), function(x) {ifelse(is.na(x), 1, 0)}) %>%
gather(., item, val, -Patienten_Code) %>%
ggplot(., aes(x = item, y = Patienten_Code, fill = factor(val))) +
geom_tile() +
scale_fill_manual(values = c("black", "grey90"),
labels = c("not missing", "missing")) +
labs(x = "Item",
y = "Case (not shown)",
fill = "") +
theme_classic() +
theme(axis.text.x = element_text(size = 12, color = "black", angle = 90),
axis.text.y = element_blank())
Figure. Dataset 1 Structure of missing values. Case = every single participants. S_1-S_47 = 47 gender-unspecific items from SOMS-7T.
# data preparation for pattern detection
.missing %>%
filter(sample == "Dataset 1") %>%
dplyr::select(S_01 : S_47) %>%
mutate_all(., function(x) { ifelse(is.na(x), 1, 0) }) %>%
data.frame(.) * 2^(mapply(rep, 1:47, nrow(.missing %>% filter( sample == "Dataset 1")))) %>%
as.data.frame() -> .missing_ds1
.missing_ds1 %>%
mutate(pattern = apply(., 1, sum)) %>%
dplyr::select(pattern) -> .pattern_ds1
Pattern
Following (McKnight 2007), we examined how “messy” data is. Missing values may not be random, if the pattern shows a structure, e.g. all participants show missing values on specific items. If there are many multiple different pattern of missing data, this could be an indication for “messy data”. If there are only very few or only one pattern, missing data is seen as “clean” Ratio of missing data pattern was used as an indicator for the messines. A coefficient of 1 indicates that every participant has an own pattern. High messininess indicates that the pattern of missing data is more likely to be completly at random (MCAR), while a clean structre is more likely if the data is not missing at random (MNAR). If there are no missing values, this coefficient can not be calculated. Excluding the patterns without missings, there were 29 different patterns of missing data in Dataset 1. The ratio of missing data patterns was 0.85. This indicates, that most of the participants with missing data had their own pattern. We conclude, that the structure of missing data can be viewed as MAR.
ggplot(.pattern_ds1 %>% filter(pattern > 0), aes(x = factor(pattern))) +
geom_bar() +
labs(x = "Pattern-ID (not shown)",
y = "Number of occurence") +
theme_classic() +
theme(axis.text = element_text(color = "black", size = 12),
axis.title = element_text(color = "black", size = 13),
axis.text.x = element_blank())
Figure. Occurence of all missing value patterns in Dataset 1. The IDs of the patterns has no further meaning. The plot includes all patterns with missing values.
Item
.missing %>%
filter(sample == "Dataset 1") %>%
dplyr::select(S_01 : S_47) %>%
mutate_all(., function(x) { ifelse(is.na(x), 1, 0) }) %>%
gather(., item, miss) %>%
group_by(item) %>%
summarise(n = sum(miss)) %>%
mutate(p = n / nrow(.dataset2)*100) -> .missing_ds1_itemdata
The mean frequency of missing values in Dataset 1 was 0.18% (SD = 0.23). 8 Items showed a higher frequency of missing values, 0 showed a lower frequency. The item with the highest amound of missing values on Dataset 1 was S_08 “Pain during sexual intercouse”).
The figure below shows the pattern of missing values for each item (left) and per item separated for gender (right). Overall the frequency of missing values is slighly higher for women. Again, S_08 was the item with the highest difference in missing values for both genders.
ggplot(.missing_ds1_itemdata, aes(x = item, y = p)) +
geom_point(size = 2) +
geom_rect(aes(ymin = 0.18 - 0.18,
ymax = 0.18 + 0.23,
xmin = 0, xmax = 48),
alpha = .5, fill = "lightgrey") +
geom_hline(aes(yintercept = mean(p)), color = "black") +
geom_point(size = 2) +
geom_segment(aes(x = item, xend = item,
y = mean(p), yend = p)) +
labs(x = "SOMS7-Item (only gender unspecific)",
y = "% of missing values",
title = "Total percentage of missing values per item") +
scale_y_continuous(limits = c(0, 5), breaks = seq(0,5,1)) +
theme_classic() +
theme(axis.text = element_text(color = "black", size = 12),
axis.title = element_text(color = "black", size = 13)) +
coord_flip() -> .missing_plot1_ds1
.missing %>%
filter(sample == "Dataset 1") %>%
dplyr::select(sex, S_01 : S_47) %>%
mutate_at(vars(-sex), function(x) { ifelse(is.na(x), 1, 0) }) %>%
# remove missings in gender variable
na.omit(.) %>%
gather(., item, miss, -sex) %>%
group_by(item, sex) %>%
summarise(n = sum(miss)) %>%
mutate( p = ifelse(sex == "female", n/nrow(.dataset1 %>% filter(sex == "female"))*100,
n/nrow(.dataset1 %>% filter(sex == "male"))*100)) %>%
ggplot(., aes(x = item, y = p, color = sex)) +
geom_point() +
geom_rect(aes(ymin = 0.18 - 0.18,
ymax = 0.18 + 0.23,
xmin = 0, xmax = 48),
alpha = .5, fill = "lightgrey", color = "lightgrey") +
geom_hline(aes(yintercept = mean(.missing_ds1_itemdata$p)), color = "black") +
geom_point(size = 2) +
geom_line(aes(group = sex)) +
labs(x = "SOMS7-Item (only gender unspecific)",
y = "% of missing values",
title = "Percentage of missing values for both genders per item") +
scale_y_continuous(limits = c(0, 5), breaks = seq(0,5,1)) +
theme_classic() +
theme(axis.text = element_text(color = "black", size = 12),
axis.title = element_text(color = "black", size = 13),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.ticks.y = element_blank(),
legend.position = "right")+
coord_flip() -> .missing_plot2_ds1
plot_grid(.missing_plot1_ds1, .missing_plot2_ds1)
Figure. Frequency of missing values for each item (total) and seperated for both genders. Vertical line marks the mean frequency. Grey area consists values between -1 und +1 SD around the mean.
Dataset 2
.missing %>%
filter(sample == "Dataset 2") %>%
count(missings) %>%
mutate(p = round(n/sum(n)*100,2)) %>%
ggplot(., aes(x = factor(missings), y = n)) +
geom_bar(stat = "identity") +
geom_text(aes(label = paste(p, "%", sep = "")), hjust = -0.2) +
scale_y_continuous(limits = c(0, 425)) +
labs(x = "Frequency of missing values",
y = "Number of cases with missing values") +
theme_classic() +
theme(legend.position = "top",
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 13, color = "black"),
strip.text = element_text(size = 14, color = "black"),
legend.text = element_text(size = 12, color = "black"),
panel.grid.major.x = element_line(color = "lightgrey", linetype = "dotted")) +
coord_flip()
Figure. Barplot for the frequency of missing values. On x-axis, the amount of missing values is displayed and on y-axis the number of cases with such missing values are plotted. Overall 47 missing values were possible for all gender-unspecific items.
Global Structure
In Dataset 2 out of a total of 26978 possible values for all 47 gender unspecific items, 1375 cells showed missing values (5.1%). The structure of missing values for each case is shown in the figure below.
.missing %>%
filter(sample == "Dataset 2") %>%
dplyr::select(Patienten_Code, S_01 : S_47) %>%
mutate_at(vars(-Patienten_Code), function(x) {ifelse(is.na(x), 1, 0)}) %>%
gather(., item, val, -Patienten_Code) %>%
ggplot(., aes(x = item, y = Patienten_Code, fill = factor(val))) +
geom_tile() +
scale_fill_manual(values = c("black", "grey90"),
labels = c("not missing", "missing")) +
labs(x = "Item",
y = "Case (not shown)",
fill = "") +
theme_classic() +
theme(axis.text.x = element_text(size = 12, color = "black", angle = 90),
axis.text.y = element_blank())
Figure. Dataset 2 Structure of missing values. Case = every single participants. S_1-S_47 = 47 gender-unspecific items from SOMS-7T.
# data preparation for detecting patterns
.missing %>%
filter(sample == "Dataset 2") %>%
dplyr::select(S_01 : S_47) %>%
mutate_all(., function(x) { ifelse(is.na(x), 1, 0) }) %>%
data.frame(.) * 2^(mapply(rep, 1:47, nrow(.missing %>% filter( sample == "Dataset 2")))) %>%
as.data.frame() -> .missing_ds2
.missing_ds2 %>%
mutate(pattern = apply(., 1, sum)) %>%
dplyr::select(pattern) -> .pattern_ds2
Pattern
Excluding the patterns without missings, there were 132 different patterns of missing data in Dataset 2. The ratio of missing data patterns was 0.73. This indicates, that most of the individuals with missing data had their own pattern. We conclude, that the structure of missing data can be viewed as MAR.
ggplot(.pattern_ds2 %>% filter(pattern > 0), aes(x = factor(pattern))) +
geom_bar() +
labs(x = "Pattern-ID (not shown)",
y = "Number of occurence") +
theme_classic() +
theme(axis.text = element_text(color = "black", size = 12),
axis.title = element_text(color = "black", size = 13),
axis.text.x = element_blank())
Figure. Occurence of all missing value patterns in Dataset 2. The IDs of the patterns has no further meaning. The plot includes all patterns with missing values.
Item
.missing %>%
filter(sample == "Dataset 2") %>%
dplyr::select(S_01 : S_47) %>%
mutate_all(., function(x) { ifelse(is.na(x), 1, 0) }) %>%
gather(., item, miss) %>%
group_by(item) %>%
summarise(n = sum(miss)) %>%
mutate(p = n / nrow(.dataset2)*100) -> .missing_ds2_itemdata
The mean frequency of missing values in Dataset 2 was 5.1% (SD = 1.31). 5 items showed a higher frequency of missing values, 5 showed a lower frequency. The item with the highest amound of missing values in Dataset 2 was S_08.
The figure below shows the pattern of missing values for each item (left) and for each item separated for gender (right). Overall the frequency of missing values is slighly lower for males. Again, S_08 was the item with the highest difference in missing values of both genders..
ggplot(.missing_ds2_itemdata, aes(x = item, y = p)) +
geom_point(size = 2) +
geom_rect(aes(ymin = 5.10 - 1.31,
ymax = 5.10 + 1.31,
xmin = 0, xmax = 48),
alpha = .5, fill = "lightgrey") +
geom_hline(aes(yintercept = mean(p)), color = "black") +
geom_point(size = 2) +
geom_segment(aes(x = item, xend = item,
y = mean(p), yend = p)) +
labs(x = "SOMS7-Item (only gender unspecific)",
y = "% of missing values",
title = "Total percentage of missing values per item") +
scale_y_continuous(limits = c(0, 15), breaks = seq(0,15,1)) +
theme_classic() +
theme(axis.text = element_text(color = "black", size = 12),
axis.title = element_text(color = "black", size = 13)) +
coord_flip() -> .missing_plot1_ds2
.missing %>%
filter(sample == "Dataset 2") %>%
dplyr::select(sex, S_01 : S_47) %>%
mutate_at(vars(-sex), function(x) { ifelse(is.na(x), 1, 0) }) %>%
# remove missings in gender
na.omit(.) %>%
gather(., item, miss, -sex) %>%
group_by(item, sex) %>%
summarise(n = sum(miss)) %>%
mutate( p = ifelse(sex == "female", n/nrow(.dataset2 %>% filter(sex == "female"))*100,
n/nrow(.dataset2 %>% filter(sex == "male"))*100)) %>%
ggplot(., aes(x = item, y = p, color = sex)) +
geom_point() +
geom_rect(aes(ymin = 5.10 - 1.31,
ymax = 5.10 + 1.31,
xmin = 0, xmax = 48),
alpha = .5, fill = "lightgrey", color = "lightgrey") +
geom_hline(aes(yintercept = mean(.missing_ds2_itemdata$p)), color = "black") +
geom_point(size = 2) +
geom_line(aes(group = sex)) +
labs(x = "SOMS7-Item (only gender unspecific)",
y = "% of missing values",
title = "Percentage of missing values for both genders per item") +
scale_y_continuous(limits = c(0, 15), breaks = seq(0,15,1)) +
theme_classic() +
theme(axis.text = element_text(color = "black", size = 12),
axis.title = element_text(color = "black", size = 13),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
axis.ticks.y = element_blank(),
legend.position = "right")+
coord_flip() -> .missing_plot2_ds2
plot_grid(.missing_plot1_ds2, .missing_plot2_ds2)
Figure. Frequency of missing values for each item (total) and seperated for both genders. Vertical line marks the mean frequency. Grey area consists values between -1 und +1 SD around the mean.
Descriptive statistics
Age & Gender
# table for age
rbind(
.dataset1 %>%
dplyr::select(Alter) %>%
na.omit() %>%
summarise(Missing = sum(is.na(Alter)),
M = mean(Alter, na.rm = T),
SD = sd(Alter, na.rm = T),
min = min(Alter, na.rm = T),
max = max(Alter, na.rm = T)),
.dataset2 %>%
dplyr::select(Alter) %>%
na.omit() %>%
summarise(Missing = sum(is.na(Alter)),
M = mean(Alter, na.rm = T),
SD = sd(Alter, na.rm = T),
min = min(Alter, na.rm = T),
max = max(Alter, na.rm = T))
) %>%
mutate(sample = c("Dataset 1", "Dataset 2")) %>%
dplyr::select(sample, everything()) %>%
kable(., "html",
digits = 2,
caption = "Table: Number of missings, mean, standard deviation, minimum and maximum for age") %>%
kable_styling("striped", full_width = F)
Table: Number of missings, mean, standard deviation, minimum and maximum for age
|
sample
|
Missing
|
M
|
SD
|
min
|
max
|
|
Dataset 1
|
0
|
48.03
|
12.86
|
22
|
75
|
|
Dataset 2
|
0
|
43.30
|
11.74
|
16
|
73
|
# table for gender
merge(
.dataset1 %>%
dplyr::select(sex) %>%
mutate(sex = forcats::fct_explicit_na(sex, na_level = "missing")) %>%
count(sex) %>%
mutate(p = n/nrow(dataset1)*100),
.dataset2 %>%
dplyr::select(sex) %>%
mutate(sex = forcats::fct_explicit_na(sex, na_level = "missing")) %>%
count(sex) %>%
mutate(p = n/nrow(dataset2)*100),
by = "sex", all.y = T
) %>%
kable(., "html",
digits = 2,
caption = "Table: Frequencies for gender",
col.names = c("gender","n", "%", "n", "%")) %>%
kable_styling(full_width = F) %>%
add_header_above(., c(" " = 1, "Dataset 1" = 2, "Dataset 2" = 2))
Table: Frequencies for gender
|
|
Dataset 1
|
Dataset 2
|
|
gender
|
n
|
%
|
n
|
%
|
|
female
|
163
|
64.17
|
371
|
64.63
|
|
male
|
91
|
35.83
|
202
|
35.19
|
|
missing
|
NA
|
NA
|
1
|
0.17
|
Itemstatistics
The following table shows the frequency of missing values, mean and standard deviation as well as skewness and kurtosis for each of the 47 gender-unspecific SOMS-7T items.
rbind(dataset1, dataset2) %>%
mutate(sample = c(rep("Dataset 1", nrow(dataset1)), rep("Dataset 2", nrow(dataset2)))) %>%
gather(., item, value, - sample) %>%
mutate(item = as.numeric(gsub("^S_", "", item))) %>%
group_by(item, sample) %>%
summarise(n = length(na.omit(value)),
miss = sum(is.na(value)),
M = mean(value, na.rm = T),
SD = sd(value, na.rm = T),
Min = min(value, na.rm = T),
Max = max(value, na.rm = T),
Skew = skew(value, na.rm = T),
Kurtosis = kurtosi(value, na.rm = T)) %>%
# kable(., "html",
# digits = 2, row.names = F,
# caption = "Table: Descriptive item statistics for both samples",
# col.names = c("Item-no.", "sample", "n", "missing", "M", "SD", "Min", "Max", "Skew", "Kurtosis")) %>%
# kable_styling("striped")
DT::datatable(., rownames = F,
filter= "top",
caption = "Table: Descriptive item statistics for both samples",
colnames = c("Item-no.", "sample", "n", "missing", "M", "SD", "Min", "Max", "Skew", "Kurtosis"),
extensions = "Buttons",
options = list(dom = "Bfrtip", pageLength = 20,
buttons = c("csv", "pdf"))) %>%
formatRound(c("M", "SD", "Min", "Max", "Skew", "Kurtosis"), 3)
Distributions
rbind(dataset1, dataset2) %>%
mutate(sample = c(rep("Dataset 1", nrow(dataset1)), rep("Dataset 2", nrow(dataset2)))) %>%
gather(., item, value, - sample) %>%
mutate(item = gsub("^S_", "S", item)) %>%
mutate(item = factor(item, c("S1","S2","S3","S4","S5","S6","S7","S8","S9","S10","S11","S12","S13","S14",
"S15","S16","S17","S18","S19","S20","S21","S22","S23","S24","S25","S26","S27",
"S28","S29","S30","S31","S32","S33","S34","S35","S36","S37","S38","S39","S40",
"S41","S42","S43","S44","S45","S46","S47"))) %>%
group_by(sample, item, value) %>%
summarise(n = length(value)) %>%
mutate(p = ifelse(sample == "Dataset 1", n/nrow(dataset1), n/nrow(dataset2))) %>%
na.omit() %>%
ggplot(., aes(x = factor(value), y = p, fill = sample)) +
facet_wrap(~ item, ncol = 4) +
geom_bar(stat = "identity", position = position_dodge()) +
scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, .25), labels = c("0%", "25%", "50%", "75%", "100%")) +
labs(y = "Total count of cases",
x = "Symptom severity",
fill = "Sample") +
theme_classic() +
theme(legend.position = "top",
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 13, color = "black"),
strip.text = element_text(size = 14, color = "black"),
legend.text = element_text(size = 12, color = "black"),
legend.title = element_text(size = 13, color = "black"))
Warning: Factor `item` contains implicit NA, consider using
`forcats::fct_explicit_na`
Warning: Factor `item` contains implicit NA, consider using
`forcats::fct_explicit_na`

Sample comparison
rbind(dataset1, dataset2) %>%
mutate(MW = apply(., 1, mean, na.rm = T)) %>%
mutate(sample = factor(c(rep("Dataset 1", nrow(dataset1)), rep("Dataset 2", nrow(dataset2))))) %>%
dplyr::select(MW, sample) %>%
group_by(sample) %>%
summarise(M = mean(MW, na.rm = T),
SD = sd(MW, na.rm = T),
Min = min(MW, na.rm = T),
Max = max(MW, na.rm = T),
Schiefe = skew(MW, na.rm = T),
Kurtosis = kurtosi(MW, na.rm = T)) %>%
kable(., "html",
digits = 2,
caption = "Table: Key statistics for the global severity of symtoms for all 47 gender-unspecific SOMS-7 items for both samples") %>%
kable_styling("striped", full_width = F)
Table: Key statistics for the global severity of symtoms for all 47 gender-unspecific SOMS-7 items for both samples
|
sample
|
M
|
SD
|
Min
|
Max
|
Schiefe
|
Kurtosis
|
|
Dataset 1
|
0.83
|
0.45
|
0.15
|
3.02
|
1.25
|
2.19
|
|
Dataset 2
|
0.98
|
0.65
|
0.02
|
4.00
|
1.23
|
1.97
|
The mean value of severity of symptom was compared using the Welch-Test.
rbind(dataset1, dataset2) %>%
mutate(MW = apply(., 1, mean, na.rm = T)) %>%
mutate(sample = factor(c(rep("Dataset 1", nrow(dataset1)), rep("Dataset 2", nrow(dataset2))))) %>% dplyr::select(MW, sample) -> .comparison
t.test(MW ~ sample, data = .comparison) #t(678.41) = -3.91, p < .001
Welch Two Sample t-test
data: MW by sample
t = -3.9088, df = 678.41, p-value = 0.0001021
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.23044444 -0.07634035
sample estimates:
mean in group Dataset 1 mean in group Dataset 2
0.8279417 0.9813341
psych::cohen.d(.comparison, "sample") # d = .27 [0.12, 0.41]
Call: psych::cohen.d(x = .comparison, group = "sample")
Cohen d statistic of difference between two means
lower effect upper
MW 0.11 0.26 0.41
Multivariate (Mahalanobis) distance between groups
[1] 0.26
r equivalent of difference between two means
MW
0.12
The standardized mean difference (Cohens d) was
lsr::cohensD(MW ~ sample, data = .comparison) #0.26
[1] 0.2570848
Correlations
# polychoric correlations
Dataset1_poly <- cor_auto(dataset1[,1:47])
Variables detected as ordinal: S_01; S_02; S_03; S_04; S_05; S_06; S_07; S_08; S_09; S_10; S_11; S_12; S_13; S_14; S_15; S_16; S_17; S_18; S_19; S_20; S_21; S_22; S_23; S_24; S_25; S_26; S_27; S_28; S_29; S_30; S_31; S_32; S_33; S_34; S_35; S_36; S_37; S_38; S_39; S_40; S_41; S_42; S_43; S_44; S_45; S_46; S_47
Dataset2_poly <- cor_auto(dataset2[,1:47])
Variables detected as ordinal: S_01; S_02; S_03; S_04; S_05; S_06; S_07; S_08; S_09; S_10; S_11; S_12; S_13; S_14; S_15; S_16; S_17; S_18; S_19; S_20; S_21; S_22; S_23; S_24; S_25; S_26; S_27; S_28; S_29; S_30; S_31; S_32; S_33; S_34; S_35; S_36; S_37; S_38; S_39; S_40; S_41; S_42; S_43; S_44; S_45; S_46; S_47
# Spearman correlation
dataset1_cor <- cor(dataset1[,1:47], use = "pairwise", method = "spearman")
dataset2_cor <- cor(dataset2[,1:47], use = "pairwise", method = "spearman")
# backup correlation data frame, because we modify them later
.dataset1_cor <- cor(dataset1[,1:47], use = "pairwise", method = "spearman")
.dataset2_cor <- cor(dataset2[,1:47], use = "pairwise", method = "spearman")
rbind(merge(rbind(dataset1_cor %>%
.[lower.tri(.)] %>%
abs() %>%
cut(., breaks = c(0, .10, .30, .50, 1), include.lowest = T) %>%
table(),
dataset1_cor %>%
.[lower.tri(.)] %>%
abs() %>%
cut(., breaks = c(0, .10, .30, .50, 1), include.lowest = T) %>%
table() %>%
prop.table(.)*100) %>%
as.data.frame() %>%
mutate(statistic = c("Frequency (n)", "Percentage (%)")),
rbind(Dataset1_poly %>%
.[upper.tri(.)] %>%
abs() %>%
cut(., breaks = c(0, .10, .30, .50, 1), include.lowest = T) %>%
table(),
Dataset1_poly %>%
.[upper.tri(.)] %>%
abs() %>%
cut(., breaks = c(0, .10, .30, .50, 1), include.lowest = T) %>%
table() %>%
prop.table(.)*100) %>%
as.data.frame() %>%
mutate(statistic = c("Frequency (n)", "Percentage (%)")),
by = "statistic"),
merge(rbind(dataset2_cor %>%
.[lower.tri(.)] %>%
abs() %>%
cut(., breaks = c(0, .10, .30, .50, 1), include.lowest = T) %>%
table(),
dataset2_cor %>%
.[lower.tri(.)] %>%
abs() %>%
cut(., breaks = c(0, .10, .30, .50, 1), include.lowest = T) %>%
table() %>%
prop.table(.)*100) %>%
as.data.frame() %>%
mutate(statistic = c("Frequency (n)", "Percentage (%)")),
rbind(Dataset2_poly %>%
.[upper.tri(.)] %>%
abs() %>%
cut(., breaks = c(0, .10, .30, .50, 1), include.lowest = T) %>%
table(),
Dataset2_poly %>%
.[upper.tri(.)] %>%
abs() %>%
cut(., breaks = c(0, .10, .30, .50, 1), include.lowest = T) %>%
table() %>%
prop.table(.)*100) %>%
as.data.frame() %>%
mutate(statistic = c("Frequency (n)", "Percentage (%)")),
by = "statistic")) %>%
kable(., "html",
digits = 2,
caption = "Table: Sum of correlation coefficients in both Datasets",
col.names = c("statistic",
"[< .1]", "[.1, .3]", "[.3, .5]", "[> .5]",
"[< .1]", "[.1, .3]", "[.3, .5]", "[> .5]")) %>%
kable_styling("striped", full_width = F) %>%
add_header_above(., c(" " = 1, "Spearman correlation" = 4, "Polychoric correlations (NPD)" = 4)) %>%
pack_rows("Dataset 1", 1, 2) %>%
pack_rows("Dataset 2", 3, 4)
Table: Sum of correlation coefficients in both Datasets
|
|
Spearman correlation
|
Polychoric correlations (NPD)
|
|
statistic
|
[< .1]
|
[.1, .3]
|
[.3, .5]
|
[> .5]
|
[< .1]
|
[.1, .3]
|
[.3, .5]
|
[> .5]
|
|
Dataset 1
|
|
Frequency (n)
|
309.00
|
688.00
|
76.00
|
8.00
|
152.00
|
577.00
|
301.00
|
51.00
|
|
Percentage (%)
|
28.58
|
63.64
|
7.03
|
0.74
|
14.06
|
53.38
|
27.84
|
4.72
|
|
Dataset 2
|
|
Frequency (n)
|
60.00
|
799.00
|
207.00
|
15.00
|
11.00
|
460.00
|
552.00
|
58.00
|
|
Percentage (%)
|
5.55
|
73.91
|
19.15
|
1.39
|
1.02
|
42.55
|
51.06
|
5.37
|
Eigenvalues
# calculate eigenvalues
.Dataset1_cor_eigen <- dataset1_cor %>% eigen(.) %>% .$values
.Dataset1_poly_eigen <- Dataset1_poly %>% eigen(.) %>% .$values
.Dataset2_cor_eigen <- dataset2_cor %>% eigen(.) %>% .$values
.Dataset2_poly_eigen <- Dataset2_poly %>% eigen(.) %>% .$values
# plotting eigenvalues
cbind(.Dataset1_cor_eigen, .Dataset1_poly_eigen, .Dataset2_cor_eigen, .Dataset2_poly_eigen) %>%
as.data.frame(.) %>%
mutate(item = as.numeric(rownames(.))) %>%
gather(., eigen, value, -item) %>%
mutate(sample = ifelse(grepl("Dataset2", eigen), "dataset2", "dataset1"),
type = ifelse(grepl("poly", eigen), "Polychoric (NPD)", "Spearman")) %>%
ggplot(., aes(y = value, x = item, color = type)) +
facet_wrap(~ sample, scales = "free") +
geom_point() +
geom_line() +
geom_hline(aes(yintercept = 0), linetype = "dashed", color = "grey50") +
scale_color_manual(values = c("orange", "blue")) +
labs(x = "SOMS7-Itemnumber",
y = "Eigenvalue",
color = "Correlation") +
theme_classic() +
theme(legend.position = "bottom",
axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 13, color = "black"),
strip.text = element_text(size = 15, color = "black"))
Figure. Distribution of the Eigenvalues for Spearman (blue) and polychoric (orange) correlations. Distributions are shown for both Datasets (left Dataset 1, right Dataset 2). The polochoric correlation matrix was not positive defined, so the nearest positive defined matrix (NPD) was used.
Comparison of correlation coeffcients
The correlation between the Spearman and the polychoric correlation matrices was very low in both datasets (both r < .10). The average for Spearman and polychoric correlation, was higher in Dataset 2. While in Datset 1 the mean correlation was low, in Dataset 2 the correlation coefficient based on polychoric correlation was medium sized.
data.frame(sample = c("Dataset 1", "Dataset 2"),
'mean r' = c(mean(dataset1_cor[lower.tri(dataset1_cor)]), mean(dataset2_cor[lower.tri(dataset2_cor)])),
'mean rp' = c(mean(Dataset1_poly[lower.tri(Dataset1_poly)]), mean(Dataset2_poly[lower.tri(Dataset2_poly)])),
r = c(cor(dataset1_cor %>% .[upper.tri(.)], Dataset1_poly[lower.tri(Dataset1_poly)]),
cor(dataset2_cor %>% .[upper.tri(.)], Dataset2_poly[lower.tri(Dataset2_poly)]))) %>%
kable(., "html",
digit = 3,
caption = "Table: Mean Spearman and polychoric correlation and correlation between both correlation matrices",
col.names = c("Sample", "Spearman", "Polychoric", "r")) %>%
kable_styling("striped", full_width = F) %>%
add_header_above(., c(" " = 1, "Mean correlation" = 2, " " = 1)) %>%
footnote(general = "r = correlation between Spearman and polychoric correlation matrices.",
footnote_as_chunk = T, general_title = "Note.")
Table: Mean Spearman and polychoric correlation and correlation between both correlation matrices
|
|
Mean correlation
|
|
|
Sample
|
Spearman
|
Polychoric
|
r
|
|
Dataset 1
|
0.158
|
0.247
|
-0.043
|
|
Dataset 2
|
0.229
|
0.321
|
-0.074
|
|
Note. r = correlation between Spearman and polychoric correlation matrices.
|
For both datasets, a heatmap is shown in the figures below. Results indicate, that correlations in Dataset 1 are lower and more negative compared to Dataset 2.
# overwrite upper diagonal with polychoric correlations
# Note: don't use this correlation data frame later on, because the two diagonals differ. Use backup instead!
dataset1_cor[upper.tri(dataset1_cor)] = Dataset1_poly[upper.tri(Dataset1_poly)]
# plot correlations
dataset1_cor %>%
psych::corPlot(., xlas = 3,
main = "Dataset 1:\nSpearman correlation (below diagonal) and\npolychoric correlation (NPD-matrix, above diagonal)")
Figure. Heatmap for Spearman (below diagonal) and polychoric correlation (upper diagonal) matrices for Dataset 1.
# overwrite upper diagonal with polychoric correlations
# Note: don't use this correlation data frame later on, because the two diagonals differ. Use backup instead!
dataset2_cor[upper.tri(dataset2_cor)] = Dataset2_poly[upper.tri(Dataset2_poly)]
# plot correlations
dataset2_cor %>%
psych::corPlot(., xlas = 3,
main ="Dataset 2:\nSpearman correlation (below diagonal) and\npolychoric correlation (NPD-matrix, above diagonal)")
Figure. Heatmap for Spearman (below diagonal) and polychoric correlation (upper diagonal) matrices for Dataset 2.
Cross sample network
We aggregated both data sets into one dataset and determined a cross sample network. Again, stability, the number of clusters and the most important nodes and edges were determined. The estimation of this cross-sample network is based on the assumption that both samples are comparable. In addition, such a network gives an idea of the average relationship between symptoms across all 828 patients. One strength is, that there were significantly more people to estimate the network parameters.
# aggregate data
both <- rbind(dataset1, dataset2)
# correlation matrix for cross-sample
.both_cor <- cor(both, use = "pairwise", method = "spearman")
# cross network estimate
net_both <- estimateNetwork(both,
default = "EBICglasso",
tuning = 0.50,
lambda.min.ratio = 0.001,
corMethod = "cor",
corArgs = list(method = "spearman"))
Estimating Network. Using package::function:
- qgraph::EBICglasso for EBIC model selection
- using glasso::glasso
# bootstrapping...
# set.seed(1701)
# boot1_both <- bootnet(net_both, nBoots = 5000, caseN = 50, nCores = 6)
# # Total time: 9:12 + 5:01 minutes
# set.seed(1701)
# boot2_both <-bootnet(net_both, nBoots = 5000, caseN = 50, nCores = 6, type = "case", statistics = c("edge", "strength", "closeness", "betweenness"))
# # Total time: 07:38 + 5:02 minutes
# # save it
# saveRDS(boot1_both, file = "Bootstrap_1 AllDatasets.RData")
# saveRDS(boot2_both, file = "Bootstrap_2 AllDatasets.RData")
# load bootstrap data
boot1_both <- readRDS("Bootstrap_1 AllDatasets.RData")
boot2_both <- readRDS("Bootstrap_2 AllDatasets.RData")
.net3 <- qgraph(.both_cor,
graph = "glasso",
sampleSize = 828,
label.cex = 2,
tuning = .50,
lambda.min.ratio = 0.001,
layout = "spring")
# as igraph
.net_both <- as.igraph(.net3)
# walktrap
.cluster_walk_both <- cluster_walktrap(.net_both, weights = E(.net_both)$weight)
.group_walk_both <- list("Gastrointestinal symptoms" = c(2, 7, 10:17, 20:23),
"Neurological symptoms" = c(1, 30,34,35,37,39,40:47),
"Cardiovascular symptoms" = c(6,18,19,24:29,31,32,36),
"Musculoskeletal problems" = c(3,4,5),
"Urogenital symptoms" = c(8,9,33,38))
Network with Cluster
qgraph(.both_cor,
graph = "glasso",
groups = .group_walk_both,
color = c("blue", "red", "yellow", "green", "orange"),
vsize = 7,
sampleSize = 828,
label.cex = 1,
tuning = 0.50,
lambda.min.ratio = 0.001,
layout = "spring",
legend = F,
mar = c(5,2,2,2))
# plot legend
legend(x = -1., y = -1.15,
legend = c("Neurological symptoms",
"Gastrointestinal symptoms",
"Musculoskeletal problems",
"Cardiovascular symptoms",
"Urogenital symptoms"),
ncol = 3,
bty = "n",
pt.cex = 2,
pch = 21,
pt.bg = c("red", "blue", "green", "yellow", "orange"),
col = "black")
Figure. Network structure for 47 symptoms from the SOMS-7T based on an aggregated cross-sample dataset (N = 828). Items with the same color belong to the same community.
Network Inference
.z <- list(c("Neurological symptoms",1, 30,34,35,37,39,40:47),
c("Gastrointestinal symptoms", 2, 7, 10:17, 20:23),
c("Musculoskeletal problems", 3:5),
c("Cardiovascular symptoms", 6,18,19,24:29,31,32,36),
c("Urogenital symptoms", 8,9,33,38)) %>%
lapply(., function(x){
as.data.frame(x) %>%
mutate(item = paste("S_", sep = "", x), c = x[[1]])
}) %>%
lapply(., function(x){
x[-1,]})
# merge list
.z <- plyr::ldply(.z, rbind)
# add cluster-id to centrality statistics
.z <- merge(.z, data.frame(item2 = names(both),
Degree = as.vector(centrality(net_both)$InDegree),
Betweenness = as.vector(centrality(net_both)$Betweenness),
Closeness = as.vector(centrality(net_both)$Closeness)) %>%
mutate(item = paste("S_", sep = "", row.names(.))), by = "item")
.z %>%
dplyr::select(item = item2, c, Degree : Closeness) %>%
.[order(.$item),] -> .z
.z$label <- .labels[1:47, "Englisch_Label"]
.z %>% dplyr::select(item, label, everything()) -> .z
Summary
.z %>%
dplyr::select("Node strength" = Degree, Betweenness, Closeness) %>%
psych::describe() %>% t() %>% .[-c(1:2),] %>%
kable(., "html",
caption = "Table: Central tendencies for the main centrality measures of the cross-sample network") %>%
kable_styling("striped", full_width = F)
Table: Central tendencies for the main centrality measures of the cross-sample network
|
|
Node strength
|
Betweenness
|
Closeness
|
|
mean
|
0.8791340
|
74.553191
|
0.0008931
|
|
sd
|
0.1754636
|
60.981397
|
0.0000780
|
|
median
|
0.8966017
|
60.000000
|
0.0008901
|
|
trimmed
|
0.8866111
|
66.820513
|
0.0008948
|
|
mad
|
0.2169545
|
41.512800
|
0.0000593
|
|
min
|
0.4087010
|
0.000000
|
0.0006773
|
|
max
|
1.1849385
|
286.000000
|
0.0010436
|
|
range
|
0.7762374
|
286.000000
|
0.0003662
|
|
skew
|
-0.3891301
|
1.422432
|
-0.2703184
|
|
kurtosis
|
-0.4948183
|
2.092059
|
0.5843139
|
|
se
|
0.0255940
|
8.895051
|
0.0000114
|
Key statistics for each node in the network
# table
DT::datatable(.z, rownames = F,
filter= "top",
caption = "Table: Centrality measures for each item in the cross-sample network",
colnames = c("Item", "Itemlabel", "Cluster", "Node strength", "Betweenness", "Closeness"),
extensions = "Buttons",
options = list(dom = "Bfrtip", pageLength = 7,
buttons = c("csv", "pdf"))) %>%
formatRound("Degree", 3) %>%
formatRound("Betweenness", 0) %>%
formatRound("Closeness", 5)
Centrality plot
.z %>%
dplyr::select(-c, -label) %>%
mutate_at(vars(Degree, Betweenness, Closeness), function(x) { scale(x, scale = T) }) %>%
gather(., centrality, measure, -item) %>%
mutate(centrality = recode_factor(centrality,
"Degree" = "Node strength",
"Betweenness" = "Betweenness",
"Closeness" = "Closeness")) %>%
ggplot(., aes(y = item, x = measure)) +
facet_wrap(~ centrality, scales = "free_x") +
geom_point() +
geom_line(group = "1") +
scale_x_continuous(limits = c(-3,3), breaks = seq(-3, 3, 1)) +
labs(x = "z-value",
y = "SOMS7 item",
title = "Dataset 1") +
theme_classic() +
theme(axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 13, color = "black"),
strip.text = element_text(size = 18, color = "black"))
Warning: attributes are not identical across measure variables;
they will be dropped
Warning: Removed 1 rows containing missing values (geom_point).
Figure. Standardized centrality indices (betweenness, closeness, node strength) for the cross-sample network (Dataset 1 and 2).
## Edge weights
merge(getWmat(net_both) %>%
as.data.frame() %>%
mutate(node1 = rownames(.)) %>%
gather(., node2, weight, -node1) %>%
.[order(.$weight, decreasing = T),] %>%
filter(row_number() %% 2 != 0) %>%
unite(., Edge, node1, node2, sep = " - "),
summary(boot1_both) %>%
dplyr::select(type, node1, node2, starts_with("CI")) %>%
unite(., Edge, node2, node1, sep = " - ") %>%
as.data.frame() %>%
dplyr::select(-type),
by = "Edge") %>%
.[order(.$weight, decreasing = T),] %>%
datatable(.,
rownames = F,
caption = "Table: Edge-weights for each edge within the cross-sample network with bootstrap confidence intervals",
colnames = c("Edge", "Weight", "lower 95% CI", "upper 95% CI"),
extensions = "Buttons",
options = list(dom = "Bfrtip", buttons = c("csv", "pdf"))) %>%
formatRound(c("weight", "CIlower", "CIupper"), 3)
Key nodes & edges
Nodes with highest centrality
cbind(
data.frame(head(sort(centrality(net_both)$OutDegree, decreasing = T))) %>%
mutate(Item = rownames(.)) %>% dplyr::select(Item),
# Betweenness
data.frame(head(sort(centrality(net_both)$Betweenness, decreasing = T))) %>%
mutate(Item = rownames(.)) %>% dplyr::select(Item),
# Closeness
data.frame(head(sort(centrality(net_both)$Closeness, decreasing = T))) %>%
mutate(Item = rownames(.)) %>% dplyr::select(Item)
) -> .mostcentralnodes_both
# coding dummy names...
names(.mostcentralnodes_both) <- c("a", "b", "c")
.mostcentralnodes_both %>%
mutate(Rang = 1:nrow(.)) %>%
dplyr::select(Rang, everything()) %>%
# top 5
.[-6,] %>%
kable(., "html",
caption = "Table: Top 5 of the most central nodes in the cross-sample Dataset",
col.names = c("Rang", "Node strength", "Betweenness", "Closeness"),
align = "c") %>%
kable_styling("striped", full_width = F) %>%
footnote(general = "The rang of the nodes indicates its position. A higher rang indicates, that this node is more central then other nodes.",
footnote_as_chunk = T, general_title = "Note.")
Table: Top 5 of the most central nodes in the cross-sample Dataset
|
Rang
|
Node strength
|
Betweenness
|
Closeness
|
|
1
|
S_26
|
S_10
|
S_02
|
|
2
|
S_12
|
S_02
|
S_19
|
|
3
|
S_05
|
S_34
|
S_10
|
|
4
|
S_25
|
S_22
|
S_18
|
|
5
|
S_28
|
S_35
|
S_34
|
|
Note. The rang of the nodes indicates its position. A higher rang indicates, that this node is more central then other nodes.
|
Nodes with lowest centrality
cbind(
data.frame(head(sort(centrality(net_both)$OutDegree, decreasing = F))) %>%
mutate(Item = rownames(.)) %>% dplyr::select(Item),
# Betweenness
data.frame(head(sort(centrality(net_both)$Betweenness, decreasing = F))) %>%
mutate(Item = rownames(.)) %>% dplyr::select(Item),
# Closeness
data.frame(head(sort(centrality(net_both)$Closeness, decreasing = F))) %>%
mutate(Item = rownames(.)) %>% dplyr::select(Item)
) -> .mostcentralnodes_both
# coding dummy names
names(.mostcentralnodes_both) <- c("a", "b", "c")
.mostcentralnodes_both %>%
mutate(Rang = 1:nrow(.)) %>%
dplyr::select(Rang, everything()) %>%
# top 5
.[-6,] %>%
kable(., "html",
caption = "Table: Top 5 of the less central nodes in the cross-sample Dataset",
col.names = c("Rang", "Node strength", "Betweenness", "Closeness"),
align = "c") %>%
kable_styling("striped", full_width = F) %>%
footnote(general = "The rang of the nodes indicates its position. A higher rang indicates, that this node is less central then other nodes.",
footnote_as_chunk = T, general_title = "Note.")
Table: Top 5 of the less central nodes in the cross-sample Dataset
|
Rang
|
Node strength
|
Betweenness
|
Closeness
|
|
1
|
S_43
|
S_16
|
S_43
|
|
2
|
S_17
|
S_31
|
S_31
|
|
3
|
S_13
|
S_42
|
S_21
|
|
4
|
S_21
|
S_43
|
S_16
|
|
5
|
S_31
|
S_21
|
S_45
|
|
Note. The rang of the nodes indicates its position. A higher rang indicates, that this node is less central then other nodes.
|
Strongest edge
merge(getWmat(net_both) %>%
as.data.frame() %>%
mutate(node1 = rownames(.)) %>%
gather(., node2, weight, -node1) %>%
.[order(.$weight, decreasing = T),] %>% head(10) %>%
# every second row is a dublicate -> removing those
filter(row_number() %% 2 != 0) %>%
unite(., Edge, node1, node2, sep = " - "),
summary(boot1_both) %>%
dplyr::select(type, node1, node2, starts_with("CI")) %>%
filter(node1 == "S_26" & node2 == "S_27" |
node1 == "S_04" & node2 == "S_05" |
node1 == "S_06" & node2 == "S_25" |
node1 == "S_20" & node2 == "S_23" |
node1 == "S_28" & node2 == "S_29") %>%
as.data.frame() %>%
unite(., Edge, node2, node1, sep = " - ") %>%
dplyr::select(-type),
by = "Edge") %>%
.[order(.$weight, decreasing = T),] %>%
kable(., "html",
digits = 3,
caption = "Table: Top 5 edges in the cross-sample dataset with weights and 95% confidence intervals",
row.names = F,
col.names = c("Edge", "Weight", "lower bound", "upper bound")) %>%
kable_styling("striped", full_width = F) %>%
add_header_above(., c(" " = 2, "95% CI" = 2)) %>%
footnote(general = "The confidence intervals were bootstrapped (B = 5000).",
footnote_as_chunk = T, general_title = "Note.")
Table: Top 5 edges in the cross-sample dataset with weights and 95% confidence intervals
|
|
95% CI
|
|
Edge
|
Weight
|
lower bound
|
upper bound
|
|
S_27 - S_26
|
0.498
|
0.441
|
0.555
|
|
S_05 - S_04
|
0.464
|
0.401
|
0.526
|
|
S_25 - S_06
|
0.381
|
0.320
|
0.442
|
|
S_23 - S_20
|
0.380
|
0.316
|
0.444
|
|
S_29 - S_28
|
0.368
|
0.299
|
0.438
|
|
Note. The confidence intervals were bootstrapped (B = 5000).
|
Stability & Accuracy
Stability Plot
plot(boot2_both, statistics = "all") + ggtitle("Cross-Sample Dataset")
Figure. Visualization of the correlation between centrality measures of the original sample (y-axis) and the case-dropping bootstrapped sample (x-axis). The percentage on the x-axis indicates the amount of cases which remain in the bootstrapped sample.
CS-Coefficient
corStability(boot2_both, cor = 0.7)
=== Correlation Stability Analysis ===
Sampling levels tested:
nPerson Drop% n
1 207 75.0 93
2 219 73.6 84
3 231 72.1 103
4 242 70.8 101
5 254 69.3 87
6 266 67.9 104
7 278 66.4 119
8 290 65.0 98
9 302 63.5 101
10 313 62.2 93
11 325 60.7 101
12 337 59.3 95
13 349 57.9 95
14 361 56.4 112
15 373 55.0 99
16 384 53.6 107
17 396 52.2 112
18 408 50.7 104
19 420 49.3 99
20 432 47.8 102
21 444 46.4 105
22 455 45.0 108
23 467 43.6 110
24 479 42.1 98
25 491 40.7 93
26 503 39.3 101
27 515 37.8 92
28 526 36.5 103
29 538 35.0 113
30 550 33.6 106
31 562 32.1 86
32 574 30.7 113
33 586 29.2 92
34 597 27.9 102
35 609 26.4 97
36 621 25.0 98
37 633 23.6 102
38 645 22.1 97
39 656 20.8 90
40 668 19.3 96
41 680 17.9 103
42 692 16.4 84
43 704 15.0 89
44 716 13.5 108
45 727 12.2 106
46 739 10.7 114
47 751 9.3 88
48 763 7.9 98
49 775 6.4 106
50 787 5.0 93
Maximum drop proportions to retain correlation of 0.7 in at least 95% of the samples:
betweenness: 0.393
- For more accuracy, run bootnet(..., caseMin = 0.378, caseMax = 0.407)
closeness: 0.393
- For more accuracy, run bootnet(..., caseMin = 0.378, caseMax = 0.407)
edge: 0.693
- For more accuracy, run bootnet(..., caseMin = 0.679, caseMax = 0.708)
strength: 0.55
- For more accuracy, run bootnet(..., caseMin = 0.536, caseMax = 0.564)
Accuracy can also be increased by increasing both 'nBoots' and 'caseN'.
Node-Strength Differences
plot(boot1_both, "strength", order = "sample", labels = T, plot = "difference") + ggtitle("Cross-Sample Dataset")
Expected significance level given number of bootstrap samples is approximately: 0.05
Figure. Significant differences in the node strength for Cross-sample Network. Black areas = significant differences between two nodes, grey = no significant differences. Diagonal = estimated mean node strength across all bootstrap samples, starting with the weakest node in the lower left to the strongest node in the upper right.
Edge-Weights Stability
plot(boot1_both, labels = F, order = "sample") + ggtitle("Cross-Sample Dataset")
Figure. Edges (y-axis) between two nodes and their weighting (x-axis) in the sense of a partial correlation. Red dots = Edge weights of the original network. Grey areas = 95% confidence interval around this weight.Black dots = mean values across all bootstraps.