Weighted Fusion vs Logistic Regression
Hi! We're building a technical debt heatmap tool that analyzes GitHub repos using commit message sentiment and code complexity for our thesis. We're debating between two approaches for fusing these signals:
- Weighted layer approach — I personally prefer this but we have no ground truth to justify the weights
- Logistic Regression trained on ApacheJIT — but worried about distribution shift on non-Apache repos especially for small personal repos.
Which problem is more defensible in a thesis context?
Is there a standard solution for justifying weights without labeled data?
Asides from those two, are there any better way/alternative to do this?