u/Adventurous-Row905

Weighted Fusion vs Logistic Regression

Hi! We're building a technical debt heatmap tool that analyzes GitHub repos using commit message sentiment and code complexity for our thesis. We're debating between two approaches for fusing these signals:

  1. Weighted layer approach — I personally prefer this but we have no ground truth to justify the weights
  2. Logistic Regression trained on ApacheJIT — but worried about distribution shift on non-Apache repos especially for small personal repos.

Which problem is more defensible in a thesis context?

Is there a standard solution for justifying weights without labeled data?

Asides from those two, are there any better way/alternative to do this?

reddit.com
u/Adventurous-Row905 — 5 days ago

Weighted Fusion vs Logistic Regression

Hi! We're building a technical debt heatmap tool that analyzes GitHub repos using commit message sentiment and code complexity. We're debating between two approaches for fusing these signals:

  1. Weighted layer approach — I personally prefer this but we have no ground truth to justify the weights
  2. Logistic Regression trained on ApacheJIT — but worried about distribution shift on non-Apache repos especially for small personal repos.

Which problem is more defensible in a thesis context?

Is there a standard solution for justifying weights without labeled data?

Asides from those two, are there any better way to do this?

reddit.com
u/Adventurous-Row905 — 5 days ago