
You can do basic web-search with just two simple cli tools
Hi! I was looking at the web search options available in the pi ecosystem and most of them wrap some API or require config..
I just want my tool to be able to
- Run a search query via a search provider
- Fetch pages preferably as markdown
For this I found that there exist two boring tools that work well together:
- The duckduckgo commandline tool
ddgr. This is just onesudo apt install ddgraway - The super weirdly named trafilatura tool. This is a python tool that extracts text content from a url. Has lots of options for presentation and what to include/exclude.
pip install trafilatura.. I suppose? I use NixOS so I dunno how to install this globally with Python. Python is hell.
What is trafilatura?
It's a commandline tool that extract meaningful content from a web-page. It's been actively maintained for over 9 years (probably longer?), and its primary use-case is to help with academic research. I suppose it's usually useful for researchers to do scraping.
Anyway, it is rich, mature, old and just a cli tool. It supports markdown output, regular output, a mode to show very little content, a mode to show more content. You can choose to include/exclude links etc.
Anyway. If you wrap these in a simple extension you get 100% local search that works for the common use-case of "just quickly look something up on a forum, documentation, wikipedia or Github".
I haven't looked into how to publish this as an extension, but if people like it I could package it up.
This is the extension as a gist if anyone wants to try it.
https://gist.github.com/Azeirah/9375fb67c5aee6ca1b7e046f8b7cf0cd
Trafilatura has been configured to do:
- Show links
- Show markdown
- Show the concise output, so not the verbose output. I did that to save tokens