u/Confident-Slide4553

I have a consumer health monitoring system where users take blood tests every 4-12 weeks and get health scores. Classic selection bias: users who start monitoring because they feel unwell have worse baselines. On retest, scores improve even without intervention (regression to the mean).

My proposed correction: ANCOVA-based: Corrected_gain = Observed_gain - (1 - r_test_retest) × (Baseline - Population_mean)

Where r_test_retest is the ICC for each health domain score (estimated from pilot repeated-measures data).

Questions:

Is ANCOVA sufficient here, or does Lord's paradox apply? (The "treatment" isn't randomized — users self-select into a lifestyle program.)
Should I use the population mean from my reference dataset (N=7,840 general population) or the mean of my user cohort (biased toward health-conscious)?
In the user-facing UI: I plan to show the trend with a caveat ("Your improvement trend becomes more reliable after 2-3 test cycles") rather than suppressing it. Is this honest, or is it misleading for a consumer audience?
After how many test cycles does the regression effect become negligible for practical purposes? My gut says 2-3, but I'd like a citation or formula.

I'm building a composite health index that combines periodic blood biomarker data (every 4-12 weeks) with continuous wearable sensor data (daily) into domain-level health scores. After an external methodology review, I've resolved some initial issues but have new questions. Context:

What I've settled:

Evidence weights from per-SD mortality hazard ratios (all HRs converted to per-SD scale before computing ln(HR))
Reliability weights from CCC/ICC (not MAPE — switched after review showed MAPE conflates systematic bias with random noise)
Geometric mean combination: √(We × Wr) — confirmed as defensible by reviewer
Four independent health domains (no composite average across domains)

Where I need help:

Blood-wearable signal non-independence. In my metabolic domain, blood HbA1c and wearable step counts both encode insulin sensitivity signal. Google's WEAR-ME study (Nature 2026) showed wearable features explain 43% of HOMA-IR variance. I blend blood and wearable into one domain score with time-decaying weights (blood dominant when fresh, wearable dominant when blood is stale). Should I apply a correlation discount when the two signals share latent variance? If r(blood_score, wearable_score) > 0.45, what's the principled adjustment — reduce effective contribution by r/2? Or is there a better approach from multivariate composite construction?
Regression to the mean in a pre-post health monitoring system. Users who start monitoring because they feel unwell will have systematically worse baselines. Even without intervention, their scores will improve on retest. I'm planning ANCOVA correction (Corrected_gain = Observed_gain - (1-r_test-retest) × (Baseline - Pop_mean)) for backend analytics. Is ANCOVA sufficient, or should I also use Lord's paradox–aware methods? And in the user-facing display: should I suppress trend interpretation for the first 2 test cycles, or show it with a caveat?
Single-marker domain precision. One of my domains has only one blood marker (an inflammatory biomarker with intra-individual CV ≈ 44%, ICC ≈ 0.62). After log-transformation, effective ICC improves to ~0.70-0.75. I display a confidence band on this domain's score. Is there a minimum reliability threshold below which a single-marker domain score should not be shown at all? Or is the confidence band approach sufficient for a wellness (non-diagnostic) product?
Collinearity within a domain. Two of three blood markers in my metabolic domain share variance by design (one is mathematically derived from the other). VIF analysis is planned. If VIF > 2.5, should I discount the derived marker's weight, or is the intentional emphasis on the shared signal (glycemic control) defensible if clinically motivated?
Score normalization reference. I'm using a large US population survey (N=7,840) for age/sex-stratified z-scores. My target users are health-conscious Europeans aged 30-55 (BMI <27, no diabetes). What's the minimum overlap between reference and target population before normalization becomes misleading? Is sub-sampling the reference to match the target profile the right approach, or does that introduce selection bias?

ANCOVA correction for regression to the mean in a repeated-measures wellness monitoring system — is this sufficient?

Combining wearable + blood biomarker data into composite health scores — seeking methodology critique