u/Comfortable-Bar-9983

Unstructured Data in Medallion Architecture

Hi All, Greetings for the Day!!

I am working as an Azure data engineer and need some help. My main work revolves around batch data and dealing with structured and semi structured data.

Recently, in one of the interviews, I was asked that how will I design a data pipeline for unstructured data (images, pdfs, videos, etc), which I was unable to answer and hence got rejected. Now, I know that we can parse images in form of pixels and 2d arrays, similarly, pdfs can be parsed using pydf library. I haven't practically worked on them, so I want to understand how we can process them in a medallion architecture setup. How we can store them, collect them, etc.

I am looking for guidance and will really appreciate it if someone can show me even one example for the same.

Thanks & Best Regards

Edit : Thanks for the replies guys. My problem statement was to prepare unstructured data for data scientists team to use further (model training for example) and store it in medallion architecture setup. Archival is included as well.

reddit.com
u/Comfortable-Bar-9983 — 6 days ago

Hi Everyone

I am from Vasundhara and have a keen interest in sports. I bought a Argentina away jersey twice by mistake.

I bought it for 600 rupees and am looking to sell for 400 rupees (Not negotiable beneath that).

To avoid scams, I would like to sell the jersey in person only.

Some Details Below :

  1. Price : 400 (Non - Negotiable)
  2. Country : Argentina (Away kit)
  3. Player : Messi
  4. Type : Embroidery and Print (No Stickers)
  5. Size : L

Adding some photos for your reference.

Please dm if you are interested to buy it.

u/Comfortable-Bar-9983 — 15 days ago