u/Annual_Upstairs_3852 — reddlx

I’ve been learning backend/data-focused programming and wanted to build something practical instead of just tutorials, so I picked a messy real-world dataset: the SAM.gov Contract Opportunities bulk CSV.

The problem:
The dataset is huge and not very usable directly (especially in Excel), so I tried to turn it into something queryable.

What I built:

ingest large CSV → store in SQLite
basic indexing + search (title / notice ID)
simple ranking system based on a “company profile”
CLI interface for browsing + shortlisting

I also experimented with adding an optional local LLM (via Ollama) for summaries, but most of the system is just standard data handling + logic.

Repo: https://github.com/frys3333/Arrow-contract-intelligence-organization

What I’m trying to learn / improve:

better schema design for this kind of data
how to handle updates to large datasets efficiently
whether SQLite is the right choice vs something else
structuring projects like this in a clean way

If anyone has feedback on:

code structure
data pipeline design
or things I’m doing “wrong”

I’d really appreciate it — trying to level up from small scripts to more real-world systems.

reddit.com

u/Annual_Upstairs_3852 — 4 days ago

▲ 0 r/commandline

CLI tool for browsing SAM.gov contract opportunities locally

Built a terminal-based tool that:

loads SAM.gov bulk data into SQLite
lets you browse/search from CLI
shortlist opportunities

Simple, fast, no UI.

Repo: https://github.com/frys3333/Arrow-contract-intelligence-organization

Would love any feedback

reddit.com

u/Annual_Upstairs_3852 — 4 days ago

▲ 0 r/dataengineering

Local-first pipeline for SAM.gov bulk data (CSV → SQLite + ranking)

Flow:

bulk CSV ingest

normalization into SQLite

deterministic ranking layer

optional local LLM summarization

No cloud infra, no APIs.

Main challenge was making large flat CSV usable for real querying.

Repo: https://github.com/frys3333/Arrow-contract-intelligence-organization

I am relatively new to programming so I would love feedback on:

schema design

indexing strategy

incremental updates

u/Annual_Upstairs_3852 — 4 days ago

▲ 1 r/datasets

Tool to actually use the SAM.gov bulk dataset locally

SAM.gov publishes a full Contract Opportunities dataset, but it’s massive and hard to work with.

Built a tool that:

ingests the full dataset locally
makes it searchable
tracks changes across versions

Basically turns a raw dataset into something queryable.

Repo: https://github.com/frys3333/Arrow-contract-intelligence-organization

reddit.com

u/Annual_Upstairs_3852 — 4 days ago

▲ 5 r/govcon

Built a fully free tool to actually sort through SAM.gov opportunities (local, no API, ranks by fit)

I’ve been working on a free tool called Arrow to make SAM.gov a bit more usable.

The main issue I kept running into was how hard it is to actually triage opportunities. You can search, but figuring out what’s worth pursuing still ends up being very manual.

So I built something that:

pulls the full public SAM.gov opportunities dataset (no API needed)

stores everything locally so you can work with it fast

lets you search and rank contracts against your company profile (NAICS, mission, etc.)

highlights which opportunities are actually a good fit vs just keyword matches

optionally explains why something fits using a local AI model (runs on your machine, not cloud)

The goal isn’t to replace SAM.gov — just to make it easier to:

filter out noise

prioritize real opportunities

quickly scan large volumes of contracts

One thing I’ve noticed already:
most “search” tools surface a lot of irrelevant stuff, but when you rank based on fit + context, the top results get much more useful.

Still early, but I’ve been using it to scan thousands of opportunities much faster than manually browsing.

Curious how others here currently handle:

filtering opportunities

deciding what to pursue vs ignore

dealing with SAM.gov data at scale

check it out!

https://github.com/frys3333/Arrow-contract-intelligence-orginization

reddit.com

u/Annual_Upstairs_3852 — 5 days ago