Agreement assessment in continuous data with repeated measurements

Agreement assessment

Agreement: the degree of similarity between measurements.
Depending on the context, agreement assessment is called by different names:
- Measurements taken with the same measurement method: reliability, repeatability.
- Measurements taken with different measurement methods: inter-rater reliability, reproducibility, concordance analysis, comparison data analysis, inter-rater agreement.
Methodology also depends on the type of data (nominal, ordinal, continuous)

Concordance analysis

A (continuous) characteristic is measured by several measurement methods.
Aim: to estimate the degree of agreement between the methods.
Example: Two devices (M1,M2) assessing the systolic blood pressure.
Measurements taken at the same time –> Same “true value”.

Concordance analysis

Assessment of concordance

Conditions for perfect concordance:

1) Equality of means \(\mu_1=\mu_2\).

m1	m2	s1	s2	r
120	120	24.4	31.6	0.83

Assessment of concordance

2) Perfect correlation \(\rho_{12}=1\).

m1	m2	s1	s2	r
120	120	24.4	29.5	1

Assessment of concordance

3) Equality of variances \(\sigma^{2}_{1}=\sigma^{2}_{2}\).

m1	m2	s1	s2	r
120	120	24.4	24.4	1

Concordance coefficient

Concordance coefficient (Lin, 1989):

\[\rho_{CCC}=\frac{2\cdot\rho_{12}\sigma_1\sigma_2}{\sigma^{2}_{1}+\sigma^{2}_{2}+(\mu_{1}-\mu_{2})^{2}}\]

Possible values between \(-1\) and \(1\).
Completely disagreement \(\rightarrow \rho_{CCC}=0\)
Perfect agreement \(\rightarrow \rho_{CCC}=1\)
Negative values:
- Independence (actual parameter value is near 0).
- Error in data.

Interpretation of Concordance coefficient

Landis and Koch (1977) criteria for Kappa also applies for CCC.

CCC	Interpretation
<0.2	Slight
[0.2,0.4)	Fair
[0.4,0.6)	Moderate
[0.6,0.8)	Substantial
>=0.8	Almost Perfect

Koo and Li (2016)

CCC	Interpretation
<0.5	Poor
[0.5,0.75)	Moderate
[0.75,0.9)	Good
>=0.9	Excellent

Inference

Confidence interval.
- Based on Normal distribution.
- Fisher’s Z transformation is commonly used (more important with small sample sizes).
\[Z=\frac{1}{2}ln\left(\frac{1+\rho}{1-\rho}\right)\]
Hypothesis testing.
- Is the CCC greater than a specific value? Commonly 0.8 or 0.9.
- Are two CCCs different?

Concordance with repeated mesurements

Every subject is assessed more than one time by each method: replicates.
Measurements taken at the same time –> Same “true value”.

Lin’s estimator cannot be applied to such a design.

Concordance with repeated mesurements

Naive method: use the mean of the repeated measurements.
Only correct if taking the mean of the measurements is the actual measurement process.
Otherwise, the concordance is artificially increased –> Reduces within-subjects variability and measurement error.
Alternative: Carrasco and Jover (2003) demonstrated the equivalence between the CCC and the Intraclass Correlation Coefficient (ICC).

Intraclass correlation coefficient

ICC is based on the decomposition of the (outcome) variance into between and within-subjects variance components.
Between-subjects variance: \(\sigma_{\alpha}^2\)
Within-subjects variance (discordance components):
- Between-methods variability: \(\sigma_{\beta}^2\)
- Random error variance: \(\sigma_{e}^2\)
- Also possible to add subjects-methods interaction: \(\sigma_{\alpha\beta}^2\)
Variance components are estimated using a linear mixed effects model.

\[\rho_{ICC}=\frac{\sigma_{\alpha}^2}{\sigma_{\alpha}^2+\sigma_{\beta}^2+\sigma_{\alpha\beta}^2+\sigma_{e}^2}\]

In this context, this ICC is known as CCC estimated by variance components.

Blood Pressure example

Blood pressure (BP) measured twice on 384 subjects using two devices

CCC estimate

cccrm R package (Carrasco et al., 2013).
First release (v1.0.0) in 2011. Authors: Josep L. Carrasco and Josep Puig.
Last release (v3.0.4) in Feb 2025: Authors: Josep L. Carrasco and Gonzalo Peón.

library(cccrm)

est<-ccc_vc(bpres,ry="SIS",rind="ID",rmet="METODE",int=TRUE)
summary(est)

         Subjects   Subjects-Method            Method             Error 
380.1874530093099   0.0000005623391   2.2953421378809  52.8673411494370 

CCC estimated by variance compoments 
       CCC  LL CI 95%  UL CI 95%     SE CCC 
0.87329122 0.85313329 0.89084538 0.00959477

Longitudinal repeated mesurements

Suppose the repeated measurements are not taken at the same time.
The subject’s “true value” may change between measurements.
No point in evaluating the concordance between different times (M11 and M22, M12 and M21).

Longitudinal repeated mesurements

More variance components needed.
Between-subjects variance:
- Subjects: \(\sigma_{\alpha}^2\)
- Subjects-Time: \(\sigma_{\alpha\gamma}^2\).
Within-subjects variance (discordance components):
- Methods-time: \(\sigma_{\beta\gamma}^2\)
- Subjects-Methods: \(\sigma_{\alpha\beta}^2\)
- Random error: \(\sigma_{e}^2\)

\[\rho_{ICC}=\frac{\sigma_{\alpha}^2+\sigma_{\alpha\gamma}^2}{\sigma_{\alpha}^2+\sigma_{\alpha\gamma}^2+\sigma_{\beta\gamma}^2+\sigma_{\alpha\beta}^2+\sigma_{e}^2}\]

CCC estimate

est<-ccc_vc(bpres,ry="SIS",rind="ID",rmet="METODE",rtime="NM")
summary(est)

       Subjects Subjects-Method   Subjects-Time          Method           Error 
      373.54730         3.69175        20.69585         2.27938        30.65388 

CCC estimated by variance compoments 
        CCC   LL CI 95%   UL CI 95%      SE CCC 
0.914997171 0.900184010 0.927695639 0.006993417

CCC for longitudinal repeated mesurements

Assumption: the same level of agreement at all times.

CCC estimates by time

est_time<-ccc_est_by_time(bpres,ry="SIS",rind="ID",rmet="METODE",rtime="NM",
                plotit=TRUE,test=TRUE)

est_time$plot

est_time$ccc

  NM       CCC     LL95      UL95
1  1 0.9153748 0.897656 0.9301388
2  2 0.9145399 0.896657 0.9294434

CCC estimates by time: equality test

Test: \(\theta=b'\Sigma^{-1} b\) (Vanbelle S., 2017)
b: vector with CCC estimates
\(\Sigma\): variance-covariance matrix of CCC estimates.
\(\Sigma\) is estimated by non-parametric cluster bootstrap (500 resamples by default).
Under the null hypothesis of equality of CCCs, \(\theta\) follows a Chi-square distribution with \(t-1\) degrees of freedom; \(t\) = number of times.

est_time$res_test

        Chi.Sq DF    Pvalue
1 0.0004148078  1 0.9837507

Body fat data

Percentage body fat obtained from skinfold calipers and DEXA on a cohort of 90 adolescent girls.
Measurements were taken at ages 12.5, 13 and 13.5.

CCC estimate

est<-ccc_vc(bfat,ry="BF",rind="SUBJECT",rmet="MET",rtime="VISITNO")
summary(est)

       Subjects Subjects-Method   Subjects-Time          Method           Error 
      8.5920144       2.1082341       0.9204080       5.1576360       0.7697708 

CCC estimated by variance compoments 
       CCC  LL CI 95%  UL CI 95%     SE CCC 
0.54207819 0.43613835 0.63319749 0.05031125

CCC estimates by time

est_time<-ccc_est_by_time(bfat,ry="BF",rind="SUBJECT",rmet="MET",rtime="VISITNO",
                     plotit=TRUE,test=TRUE)

est_time$plot

est_time$ccc

  VISITNO       CCC      LL95      UL95
1    12.5 0.6693741 0.5520280 0.7607205
2      13 0.4837805 0.3698900 0.5833467
3    13.5 0.4886355 0.3737142 0.5887816

CCC estimates by time: test

est_time$res_test

    Chi.Sq DF         Pvalue
1 25.94093  2 0.000002328088

est_time$pair_comp

       Difs     Estimate         SE         Adj.P
1   12.5-13  0.185593602 0.04819816 0.00023562237
2 12.5-13.5  0.180738623 0.04122848 0.00003498342
3   13-13.5 -0.004854979 0.05254474 0.92638257992

References

Carrasco, JL, Jover, L. (2003). Estimating the generalized concordance correlation coefficient through variance components. Biometrics, 59, 849-858.

Carrasco JL, Phillips BR, Puig-Martinez J, King TS, Chinchilli V. (2013). Estimation of the concordance correlation coefficient for repeated measures using SAS and R. Computer Methods and Programs in Biomedicine, 109(3),293-304

Landis JR, Koch GG. (1977). The measurement of observer agreement for categorical data. Biometrics. 33(1),159-74.

Koo, T. K., Li, M. Y. (2016). A Guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15(2), 155–163.

Lin, L. I. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics 45, 255–268.

Vanbelle S. (2017). Comparing dependent kappa coefficients obtained on multilevel data. Biometrical Journal 59(5), 1016-1034