
dog: A CLI tool for quickly inspecting parquet files.
I use parquet files at work on a nearly daily basis. What's annoying is that you need to use some kind of software to read and view these files with most people's work flow opening a REPL like R-console or ipython, import some packages, read in the parquet.
This is incredibly annoying if all you want to know is something simple like what columns exist. More often than not, these little things are all you need to know right then and opening a REPL is overkill.
That's what dog aims to solve. It has a simple suite of commands to quickly interrogate a parquet file including getting the column names, schema, statistics, and a summary view of data points. A lot of this is powered by the polars crate who do an amazing job.
I'd love any constructive feedback. I'm not a professional developer so I'm sure there are some obvious things I've overlooked or misunderstood. And if you use parquets daily maybe you find this tool useful. https://github.com/TrystanScottLambert/dog