Challenge: How to extract a 50k x 250 DataFrame from an air-gapped server using only screen output
Hi everyone. I'm a medical researcher working on an authorized project inside an air-gapped server (no internet, no USB, no file export allowed).
The constraints:
I can paste Python code into the server via terminal.
I cannot copy/paste text out of the server.
I can download new python libraries to this server.
My only way to extract data is by taking photos of the monitor with my phone or printscreen.
The data:
A Pandas DataFrame with 50,000 rows and 250 columns. Most of the columns (about 230) are sparse binary data (0/1 for medications/diagnoses). The rest are ages and IDs.
What I've tried:
Run-Length Encoding (RLE) / Sparse Matrix coordinates printed as text: Generates way too much text. OCR errors make it impossible to reconstruct reliably.
Generating QR codes / Data Matrices via Matplotlib: Using gzip and base64, the data is still tens of megabytes. Python says it will generate over 30,000 QR code images, which is impossible to photograph manually.
I need to run a script locally on my machine for specific machine learning tuning. Has anyone ever solved a similar "Optical Covert Channel" extraction for this size of data? Any insanely aggressive compression tricks for sparse binary matrices before turning them into QR codes? Or a completely different out-of-the-box idea?
Thanks!