u/sholopinho

🔥 Hot ▲ 71 r/AskNetsec

Challenge: How to extract a 50k x 250 DataFrame from an air-gapped server using only screen output

Hi everyone. I'm a medical researcher working on an authorized project inside an air-gapped server (no internet, no USB, no file export allowed).

The constraints:

I can paste Python code into the server via terminal.

I cannot copy/paste text out of the server.

I can download new python libraries to this server.

My only way to extract data is by taking photos of the monitor with my phone or printscreen.

The data:

A Pandas DataFrame with 50,000 rows and 250 columns. Most of the columns (about 230) are sparse binary data (0/1 for medications/diagnoses). The rest are ages and IDs.

What I've tried:

Run-Length Encoding (RLE) / Sparse Matrix coordinates printed as text: Generates way too much text. OCR errors make it impossible to reconstruct reliably.

Generating QR codes / Data Matrices via Matplotlib: Using gzip and base64, the data is still tens of megabytes. Python says it will generate over 30,000 QR code images, which is impossible to photograph manually.

I need to run a script locally on my machine for specific machine learning tuning. Has anyone ever solved a similar "Optical Covert Channel" extraction for this size of data? Any insanely aggressive compression tricks for sparse binary matrices before turning them into QR codes? Or a completely different out-of-the-box idea?

Thanks!

reddit.com
u/sholopinho — 6 days ago