u/benddit

I'm trying to parse a VCF analysis file into an Excel or TSV file by running Multi using a Qwen3.5 2B local LLM.

GPT provided me with detailed Python and command line instructions, but when I try to run those instructions locally via Multi, Qwen doesn't seems as resolute as Claude in finishing the job.

Even when I type continue or retry, it will run for several turns, report Finished but not actually finish.

See the screenshot.

Any advice? I prefer not to switch to a cloud model.

Here is the prompt I am running:

convert the .VCF file into a new Excel or TSV file using the instructions as below:

Yes. Make it **generic/schema-discovering**, but not a blind global split.

The rule should be:

| Field location                     |        Parsing                                                  |
| ---------------------------------- | ------------------------------------------------------------- |
| Whole VCF row                      | split by **tab**                                              |
| `INFO` column                      | split by **semicolon** into `key=value`                       |
| `FORMAT` + sample columns          | split `FORMAT` by **colon**, then map sample values by colon  |
| Pipe fields like `ANN` / `CSQ`     | split into a separate table by **comma**, then **pipe**       |
| Other values containing `:` or `,` | preserve as values unless the field is known to be structured |

Here is a more generic script.

```python
import gzip
import argparse
import re
from pathlib import Path
import pandas as pd


BASE_COLUMNS = [
    "CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER"
]

DEFAULT_PIPE_FIELDS = {"ANN", "CSQ", "EFF"}


def open_text(path):
    path = str(path)
    if path.endswith(".gz"):
        return gzip.open(path, "rt")
    return open(path, "r")


def parse_info_header(line):
    """
    Example:
    ##INFO=&lt;ID=ANN,Number=.,Type=String,Description="..."&gt;
    """
    m = re.match(r"##INFO=&lt;(.+)&gt;", line)
    if not m:
        return None

    body = m.group(1)
    parts = {}

    # Split on commas not inside quotation marks
    fields = re.split(r',(?=(?:[^"]*"[^"]*")*[^"]*$)', body)

    for field in fields:
        if "=" in field:
            k, v = field.split("=", 1)
            parts[k] = v.strip('"')

    return parts if "ID" in parts else None


def infer_pipe_subfields_from_description(description):
    """
    Tries to infer ANN/CSQ-style pipe subfields from header description.

    Handles common forms like:
    'Functional annotations: Allele | Annot | Annot_Impact | Gene_Name ...'
    'Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type...'
    """
    if not description:
        return None

    desc = description.replace("\\\"", "\"")

    # Look after "Format:" when present
    lower = desc.lower()
    if "format:" in lower:
        start = lower.index("format:") + len("format:")
        candidate = desc[start:]
    else:
        candidate = desc

    if "|" not in candidate:
        return None

    # Remove quotes and trailing punctuation
    candidate = candidate.strip(" .'\"")

    fields = [x.strip(" .'\"") for x in candidate.split("|")]
    fields = [x for x in fields if x]

    # Avoid false positives
    if len(fields) &lt; 3:
        return None

    # Normalize column names
    fields = [
        re.sub(r"[^A-Za-z0-9_]+", "_", x).strip("_") or f"field_{i+1}"
        for i, x in enumerate(fields)
    ]

    return fields


def parse_info(info_string):
    """
    INFO:
    AA=p.N998=;AC=2;DB;DP=1107;BIAS=2:2
    """
    out = {}

    if not info_string or info_string == ".":
        return out

    for item in info_string.split(";"):
        if not item:
            continue

        if "=" in item:
            key, value = item.split("=", 1)
            out[key] = None if value == "." else value
        else:
            # Flag field, e.g. DB
            out[item] = True

    return out


def parse_sample(format_string, sample_string):
    """
    FORMAT:
    GT:VP:VD:KD:AF:BD:ALD

    SAMPLE:
    1/1:1794:1883:2,1693:0.9928:1,1:927,961
    """
    if not format_string or format_string == ".":
        return {}

    keys = format_string.split(":")
    vals = sample_string.split(":")

    out = {}

    for i, key in enumerate(keys):
        out[key] = vals[i] if i &lt; len(vals) and vals[i] != "." else None

    # Preserve extra sample fields, if malformed or too long
    if len(vals) &gt; len(keys):
        for j, val in enumerate(vals[len(keys):], start=1):
            out[f"EXTRA_{j}"] = None if val == "." else val

    return out


def parse_pipe_records(value, variant_uid, field_name, subfields=None):
    """
    Parses ANN/CSQ/EFF-like fields.

    Multiple records are usually comma-separated:
    A|synonymous_variant|LOW|MTOR|...
    A|upstream_gene_variant|MODIFIER|RPL39P6|...
    """
    rows = []

    if not value or value == ".":
        return rows

    records = value.split(",")

    for record_index, record in enumerate(records, start=1):
        parts = record.split("|")

        row = {
            "variant_uid": variant_uid,
            "pipe_field": field_name,
            "record_index": record_index,
            "raw_record": record,
        }

        if subfields:
            for i, name in enumerate(subfields):
                row[name] = parts[i] if i &lt; len(parts) and parts[i] != "" else None

            if len(parts) &gt; len(subfields):
                for j, val in enumerate(parts[len(subfields):], start=1):
                    row[f"EXTRA_{j}"] = val if val != "" else None
        else:
            for i, val in enumerate(parts, start=1):
                row[f"{field_name}_{i}"] = val if val != "" else None

        rows.append(row)

    return rows


def maybe_numberize(df):
    for col in df.columns:
        df[col] = pd.to_numeric(df[col], errors="ignore")
    return df


def parse_vcf_to_excel(vcf_path, xlsx_path, pipe_fields=None, wide_samples=False):
    pipe_fields = set(pipe_fields or DEFAULT_PIPE_FIELDS)

    info_headers = {}
    pipe_subfields = {}

    variants = []
    samples = []
    pipe_rows = []

    sample_names = []

    with open_text(vcf_path) as f:
        for line_number, line in enumerate(f, start=1):
            line = line.rstrip("\n")

            if not line:
                continue

            if line.startswith("##INFO="):
                parsed = parse_info_header(line)
                if parsed:
                    info_id = parsed["ID"]
                    info_headers[info_id] = parsed

                    inferred = infer_pipe_subfields_from_description(
                        parsed.get("Description", "")
                    )

                    if inferred:
                        pipe_subfields[info_id] = inferred
                        pipe_fields.add(info_id)

                continue

            if line.startswith("##"):
                continue

            if line.startswith("#CHROM"):
                header = line.lstrip("#").split("\t")
                sample_names = header[9:]
                continue

            parts = line.split("\t")

            if len(parts) &lt; 8:
                print(f"Skipping malformed line {line_number}: fewer than 8 columns")
                continue

            chrom, pos, vid, ref, alt, qual, filt, info_string = parts[:8]

            variant_uid = f"{chrom}:{pos}:{ref}&gt;{alt}:{line_number}"

            variant_row = {
                "variant_uid": variant_uid,
                "CHROM": chrom,
                "POS": pos,
                "ID": None if vid == "." else vid,
                "REF": ref,
                "ALT": alt,
                "QUAL": None if qual == "." else qual,
                "FILTER": None if filt == "." else filt,
            }

            info = parse_info(info_string)

            for key, value in info.items():
                if key in pipe_fields or "|" in str(value):
                    pipe_rows.extend(
                        parse_pipe_records(
                            value=value,
                            variant_uid=variant_uid,
                            field_name=key,
                            subfields=pipe_subfields.get(key),
                        )
                    )
                else:
                    variant_row[f"INFO_{key}"] = value

            variants.append(variant_row)

            # FORMAT + sample fields
            if len(parts) &gt; 8:
                format_string = parts[8]
                sample_values = parts[9:]

                if wide_samples:
                    # One variant row with sample-prefixed columns
                    # Good only when sample count is small.
                    for sample_name, sample_string in zip(sample_names, sample_values):
                        parsed_sample = parse_sample(format_string, sample_string)
                        safe_sample = re.sub(r"[^A-Za-z0-9_]+", "_", sample_name)

                        for k, v in parsed_sample.items():
                            variant_row[f"SAMPLE_{safe_sample}_{k}"] = v
                else:
                    # Separate normalized sample table.
                    # Better for multiple samples.
                    for sample_name, sample_string in zip(sample_names, sample_values):
                        sample_row = {
                            "variant_uid": variant_uid,
                            "SAMPLE": sample_name,
                        }

                        sample_row.update(parse_sample(format_string, sample_string))
                        samples.append(sample_row)

    variants_df = pd.DataFrame(variants)
    samples_df = pd.DataFrame(samples)
    pipe_df = pd.DataFrame(pipe_rows)

    for df in [variants_df, samples_df, pipe_df]:
        if not df.empty:
            df.replace(".", pd.NA, inplace=True)
            maybe_numberize(df)

    with pd.ExcelWriter(xlsx_path, engine="openpyxl") as writer:
        variants_df.to_excel(writer, index=False, sheet_name="variants")

        if not samples_df.empty:
            samples_df.to_excel(writer, index=False, sheet_name="samples")

        if not pipe_df.empty:
            pipe_df.to_excel(writer, index=False, sheet_name="pipe_annotations")

        # Optional metadata sheet
        metadata_rows = []
        for info_id, meta in info_headers.items():
            metadata_rows.append({
                "INFO_ID": info_id,
                "Number": meta.get("Number"),
                "Type": meta.get("Type"),
                "Description": meta.get("Description"),
                "parsed_as_pipe_field": info_id in pipe_fields,
                "pipe_subfields": "|".join(pipe_subfields.get(info_id, [])),
            })

        if metadata_rows:
            pd.DataFrame(metadata_rows).to_excel(
                writer,
                index=False,
                sheet_name="info_metadata"
            )

    print(f"Wrote: {xlsx_path}")
    print(f"Variants: {len(variants_df):,}")
    print(f"Samples: {len(samples_df):,}")
    print(f"Pipe annotation rows: {len(pipe_df):,}")


def main():
    parser = argparse.ArgumentParser(
        description="Generic VCF parser to Excel with INFO, FORMAT/sample, and ANN/CSQ pipe-field parsing."
    )

    parser.add_argument("vcf", help="Input .vcf or .vcf.gz")
    parser.add_argument("xlsx", help="Output .xlsx")
    parser.add_argument(
        "--pipe-fields",
        default="ANN,CSQ,EFF",
        help="Comma-separated INFO fields to parse as pipe-delimited annotations. Default: ANN,CSQ,EFF",
    )
    parser.add_argument(
        "--wide-samples",
        action="store_true",
        help="Put sample FORMAT values into variants sheet instead of separate sheet."
    )

    args = parser.parse_args()

    pipe_fields = {
        x.strip()
        for x in args.pipe_fields.split(",")
        if x.strip()
    }

    parse_vcf_to_excel(
        vcf_path=args.vcf,
        xlsx_path=args.xlsx,
        pipe_fields=pipe_fields,
        wide_samples=args.wide_samples,
    )


if __name__ == "__main__":
    main()
```

Install:

```bash
pip install pandas openpyxl
```

Run:

```bash
python vcf_to_excel_generic.py input.vcf parsed_vcf.xlsx
```

For compressed VCF:

```bash
python vcf_to_excel_generic.py input.vcf.gz parsed_vcf.xlsx
```

If you want sample fields directly in the main variant table:

```bash
python vcf_to_excel_generic.py input.vcf parsed_vcf.xlsx --wide-samples
```

For other pipe-delimited INFO fields:

```bash
python vcf_to_excel_generic.py input.vcf parsed_vcf.xlsx --pipe-fields ANN,CSQ,EFF,MY_PIPE_FIELD
```

### What this script handles

| Problem                             | Handling                               |
| ----------------------------------- | -------------------------------------- |
| Different `INFO` fields across rows | dynamically creates columns            |
| Blank / missing values              | leaves blank cells                     |
| `DB`-style flag fields              | stores `True`                          |
| `BIAS=2:2`                          | preserves as value                     |
| `GT:DP:AD` sample fields            | expands correctly using `FORMAT`       |
| `ANN=A\|...\|...`                   | explodes into separate annotation rows |
| Unknown future `INFO` keys          | automatically included                 |

The important distinction is: **generic does not mean split every delimiter everywhere**. It means the parser discovers fields dynamically, but only applies `;`, `:`, and `|` in the VCF locations where they actually carry structure.

Gemma4 Not Washing Down with Lemonade

My goal is to talk to local models to manage my dad's healthcare LLM wiki, and people I trust said to use Lemonade Server. However, I have been having a hell of a time getting Gemma 4 working on Lemonade reliably and I am looking for advice.

Either help getting the darn thing working, or else any easy to use alternative.

Here's what's happened so far:

At one point, everything worked. I downloaded Lemonade, loaded Gemma 4 E2B, my friends walked me through updating to a compatible llama.cpp from GitHub by using Terminal commands:

lemonade backends install llamacpp:metal --force

lemonade config set llamacpp.metal_bin="/Users/Myname/Downloads/llama-b8779/llama-server"

The server worked exactly one time: I could chat with Gemma4 in Lemonade, I could query the server from my coder, it was all performing OK. However, when I restarted my computer, everything stopped working:

Error preparing model: Failed to load model ‘Gemma-4-E2B-it-GGUF’: llama-server failed to start

Error preparing model: Failed to load model ‘Qwen3.5-2B-GGUF’: llama-server failed to start

I think I tried everything to get it working again, unsuccessfully:

Uninstalling and reinstalling Lemonade
Updating to a newer llama.cpp
Contacting the Lemonade team in Discord with my logs (responsive, but couldn't resolve)

Has anyone gotten Gemma 4 working on Lemonade? I'm taking one last shot at a fix, or seeking easy-to-use alternatives.

Counterpoint: Investing in JD as a starter 📈

Pistons fans vs reality

Multi vs Gemini CLI head-to-head: Impressions from building a researcher agent

How do I get my local LLM to finish the job?