r/multidotdev

I'm trying to parse a VCF analysis file into an Excel or TSV file by running Multi using a Qwen3.5 2B local LLM.
GPT provided me with detailed Python and command line instructions, but when I try to run those instructions locally via Multi, Qwen doesn't seems as resolute as Claude in finishing the job.
Even when I type continue or retry, it will run for several turns, report Finished but not actually finish.
See the screenshot.
Any advice? I prefer not to switch to a cloud model.
Here is the prompt I am running:
convert the .VCF file into a new Excel or TSV file using the instructions as below:

Yes. Make it **generic/schema-discovering**, but not a blind global split.

The rule should be:

| Field location                     |        Parsing                                                  |
| ---------------------------------- | ------------------------------------------------------------- |
| Whole VCF row                      | split by **tab**                                              |
| `INFO` column                      | split by **semicolon** into `key=value`                       |
| `FORMAT` + sample columns          | split `FORMAT` by **colon**, then map sample values by colon  |
| Pipe fields like `ANN` / `CSQ`     | split into a separate table by **comma**, then **pipe**       |
| Other values containing `:` or `,` | preserve as values unless the field is known to be structured |

Here is a more generic script.

```python
import gzip
import argparse
import re
from pathlib import Path
import pandas as pd


BASE_COLUMNS = [
    "CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER"
]

DEFAULT_PIPE_FIELDS = {"ANN", "CSQ", "EFF"}


def open_text(path):
    path = str(path)
    if path.endswith(".gz"):
        return gzip.open(path, "rt")
    return open(path, "r")


def parse_info_header(line):
    """
    Example:
    ##INFO=&lt;ID=ANN,Number=.,Type=String,Description="..."&gt;
    """
    m = re.match(r"##INFO=&lt;(.+)&gt;", line)
    if not m:
        return None

    body = m.group(1)
    parts = {}

    # Split on commas not inside quotation marks
    fields = re.split(r',(?=(?:[^"]*"[^"]*")*[^"]*$)', body)

    for field in fields:
        if "=" in field:
            k, v = field.split("=", 1)
            parts[k] = v.strip('"')

    return parts if "ID" in parts else None


def infer_pipe_subfields_from_description(description):
    """
    Tries to infer ANN/CSQ-style pipe subfields from header description.

    Handles common forms like:
    'Functional annotations: Allele | Annot | Annot_Impact | Gene_Name ...'
    'Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type...'
    """
    if not description:
        return None

    desc = description.replace("\\\"", "\"")

    # Look after "Format:" when present
    lower = desc.lower()
    if "format:" in lower:
        start = lower.index("format:") + len("format:")
        candidate = desc[start:]
    else:
        candidate = desc

    if "|" not in candidate:
        return None

    # Remove quotes and trailing punctuation
    candidate = candidate.strip(" .'\"")

    fields = [x.strip(" .'\"") for x in candidate.split("|")]
    fields = [x for x in fields if x]

    # Avoid false positives
    if len(fields) &lt; 3:
        return None

    # Normalize column names
    fields = [
        re.sub(r"[^A-Za-z0-9_]+", "_", x).strip("_") or f"field_{i+1}"
        for i, x in enumerate(fields)
    ]

    return fields


def parse_info(info_string):
    """
    INFO:
    AA=p.N998=;AC=2;DB;DP=1107;BIAS=2:2
    """
    out = {}

    if not info_string or info_string == ".":
        return out

    for item in info_string.split(";"):
        if not item:
            continue

        if "=" in item:
            key, value = item.split("=", 1)
            out[key] = None if value == "." else value
        else:
            # Flag field, e.g. DB
            out[item] = True

    return out


def parse_sample(format_string, sample_string):
    """
    FORMAT:
    GT:VP:VD:KD:AF:BD:ALD

    SAMPLE:
    1/1:1794:1883:2,1693:0.9928:1,1:927,961
    """
    if not format_string or format_string == ".":
        return {}

    keys = format_string.split(":")
    vals = sample_string.split(":")

    out = {}

    for i, key in enumerate(keys):
        out[key] = vals[i] if i &lt; len(vals) and vals[i] != "." else None

    # Preserve extra sample fields, if malformed or too long
    if len(vals) &gt; len(keys):
        for j, val in enumerate(vals[len(keys):], start=1):
            out[f"EXTRA_{j}"] = None if val == "." else val

    return out


def parse_pipe_records(value, variant_uid, field_name, subfields=None):
    """
    Parses ANN/CSQ/EFF-like fields.

    Multiple records are usually comma-separated:
    A|synonymous_variant|LOW|MTOR|...
    A|upstream_gene_variant|MODIFIER|RPL39P6|...
    """
    rows = []

    if not value or value == ".":
        return rows

    records = value.split(",")

    for record_index, record in enumerate(records, start=1):
        parts = record.split("|")

        row = {
            "variant_uid": variant_uid,
            "pipe_field": field_name,
            "record_index": record_index,
            "raw_record": record,
        }

        if subfields:
            for i, name in enumerate(subfields):
                row[name] = parts[i] if i &lt; len(parts) and parts[i] != "" else None

            if len(parts) &gt; len(subfields):
                for j, val in enumerate(parts[len(subfields):], start=1):
                    row[f"EXTRA_{j}"] = val if val != "" else None
        else:
            for i, val in enumerate(parts, start=1):
                row[f"{field_name}_{i}"] = val if val != "" else None

        rows.append(row)

    return rows


def maybe_numberize(df):
    for col in df.columns:
        df[col] = pd.to_numeric(df[col], errors="ignore")
    return df


def parse_vcf_to_excel(vcf_path, xlsx_path, pipe_fields=None, wide_samples=False):
    pipe_fields = set(pipe_fields or DEFAULT_PIPE_FIELDS)

    info_headers = {}
    pipe_subfields = {}

    variants = []
    samples = []
    pipe_rows = []

    sample_names = []

    with open_text(vcf_path) as f:
        for line_number, line in enumerate(f, start=1):
            line = line.rstrip("\n")

            if not line:
                continue

            if line.startswith("##INFO="):
                parsed = parse_info_header(line)
                if parsed:
                    info_id = parsed["ID"]
                    info_headers[info_id] = parsed

                    inferred = infer_pipe_subfields_from_description(
                        parsed.get("Description", "")
                    )

                    if inferred:
                        pipe_subfields[info_id] = inferred
                        pipe_fields.add(info_id)

                continue

            if line.startswith("##"):
                continue

            if line.startswith("#CHROM"):
                header = line.lstrip("#").split("\t")
                sample_names = header[9:]
                continue

            parts = line.split("\t")

            if len(parts) &lt; 8:
                print(f"Skipping malformed line {line_number}: fewer than 8 columns")
                continue

            chrom, pos, vid, ref, alt, qual, filt, info_string = parts[:8]

            variant_uid = f"{chrom}:{pos}:{ref}&gt;{alt}:{line_number}"

            variant_row = {
                "variant_uid": variant_uid,
                "CHROM": chrom,
                "POS": pos,
                "ID": None if vid == "." else vid,
                "REF": ref,
                "ALT": alt,
                "QUAL": None if qual == "." else qual,
                "FILTER": None if filt == "." else filt,
            }

            info = parse_info(info_string)

            for key, value in info.items():
                if key in pipe_fields or "|" in str(value):
                    pipe_rows.extend(
                        parse_pipe_records(
                            value=value,
                            variant_uid=variant_uid,
                            field_name=key,
                            subfields=pipe_subfields.get(key),
                        )
                    )
                else:
                    variant_row[f"INFO_{key}"] = value

            variants.append(variant_row)

            # FORMAT + sample fields
            if len(parts) &gt; 8:
                format_string = parts[8]
                sample_values = parts[9:]

                if wide_samples:
                    # One variant row with sample-prefixed columns
                    # Good only when sample count is small.
                    for sample_name, sample_string in zip(sample_names, sample_values):
                        parsed_sample = parse_sample(format_string, sample_string)
                        safe_sample = re.sub(r"[^A-Za-z0-9_]+", "_", sample_name)

                        for k, v in parsed_sample.items():
                            variant_row[f"SAMPLE_{safe_sample}_{k}"] = v
                else:
                    # Separate normalized sample table.
                    # Better for multiple samples.
                    for sample_name, sample_string in zip(sample_names, sample_values):
                        sample_row = {
                            "variant_uid": variant_uid,
                            "SAMPLE": sample_name,
                        }

                        sample_row.update(parse_sample(format_string, sample_string))
                        samples.append(sample_row)

    variants_df = pd.DataFrame(variants)
    samples_df = pd.DataFrame(samples)
    pipe_df = pd.DataFrame(pipe_rows)

    for df in [variants_df, samples_df, pipe_df]:
        if not df.empty:
            df.replace(".", pd.NA, inplace=True)
            maybe_numberize(df)

    with pd.ExcelWriter(xlsx_path, engine="openpyxl") as writer:
        variants_df.to_excel(writer, index=False, sheet_name="variants")

        if not samples_df.empty:
            samples_df.to_excel(writer, index=False, sheet_name="samples")

        if not pipe_df.empty:
            pipe_df.to_excel(writer, index=False, sheet_name="pipe_annotations")

        # Optional metadata sheet
        metadata_rows = []
        for info_id, meta in info_headers.items():
            metadata_rows.append({
                "INFO_ID": info_id,
                "Number": meta.get("Number"),
                "Type": meta.get("Type"),
                "Description": meta.get("Description"),
                "parsed_as_pipe_field": info_id in pipe_fields,
                "pipe_subfields": "|".join(pipe_subfields.get(info_id, [])),
            })

        if metadata_rows:
            pd.DataFrame(metadata_rows).to_excel(
                writer,
                index=False,
                sheet_name="info_metadata"
            )

    print(f"Wrote: {xlsx_path}")
    print(f"Variants: {len(variants_df):,}")
    print(f"Samples: {len(samples_df):,}")
    print(f"Pipe annotation rows: {len(pipe_df):,}")


def main():
    parser = argparse.ArgumentParser(
        description="Generic VCF parser to Excel with INFO, FORMAT/sample, and ANN/CSQ pipe-field parsing."
    )

    parser.add_argument("vcf", help="Input .vcf or .vcf.gz")
    parser.add_argument("xlsx", help="Output .xlsx")
    parser.add_argument(
        "--pipe-fields",
        default="ANN,CSQ,EFF",
        help="Comma-separated INFO fields to parse as pipe-delimited annotations. Default: ANN,CSQ,EFF",
    )
    parser.add_argument(
        "--wide-samples",
        action="store_true",
        help="Put sample FORMAT values into variants sheet instead of separate sheet."
    )

    args = parser.parse_args()

    pipe_fields = {
        x.strip()
        for x in args.pipe_fields.split(",")
        if x.strip()
    }

    parse_vcf_to_excel(
        vcf_path=args.vcf,
        xlsx_path=args.xlsx,
        pipe_fields=pipe_fields,
        wide_samples=args.wide_samples,
    )


if __name__ == "__main__":
    main()
```

Install:

```bash
pip install pandas openpyxl
```

Run:

```bash
python vcf_to_excel_generic.py input.vcf parsed_vcf.xlsx
```

For compressed VCF:

```bash
python vcf_to_excel_generic.py input.vcf.gz parsed_vcf.xlsx
```

If you want sample fields directly in the main variant table:

```bash
python vcf_to_excel_generic.py input.vcf parsed_vcf.xlsx --wide-samples
```

For other pipe-delimited INFO fields:

```bash
python vcf_to_excel_generic.py input.vcf parsed_vcf.xlsx --pipe-fields ANN,CSQ,EFF,MY_PIPE_FIELD
```

### What this script handles

| Problem                             | Handling                               |
| ----------------------------------- | -------------------------------------- |
| Different `INFO` fields across rows | dynamically creates columns            |
| Blank / missing values              | leaves blank cells                     |
| `DB`-style flag fields              | stores `True`                          |
| `BIAS=2:2`                          | preserves as value                     |
| `GT:DP:AD` sample fields            | expands correctly using `FORMAT`       |
| `ANN=A\|...\|...`                   | explodes into separate annotation rows |
| Unknown future `INFO` keys          | automatically included                 |

The important distinction is: **generic does not mean split every delimiter everywhere**. It means the parser discovers fields dynamically, but only applies `;`, `:`, and `|` in the VCF locations where they actually carry structure.
Multi vs Gemini CLI head-to-head: Impressions from building a researcher agent

Multi v0.0.97 is live: parallel execution 🚀

We shipped Multi Agent support

How do I get my local LLM to finish the job?