u/That-Bookkeeper-8316

Image 1 —
Image 2 —
Image 3 —

github.com/lucifertkod/ArcleIntelligence---Demo-Training-Script-Only

5.82B parameter multimodal AI. Entire training codebase: 9 files.

Input: text, image, document, audio, video Output: text, image, speech Context: 2,097,152 tokens (2M) Benchmark: 93.45 OmniDocBench V1.5 (private)

One flag changes everything: SMOKE_TEST = True → 2× H100, ~$3, ~15 min

Architecture: SSH hybrid mamba architecture

Built solo. 19. Class 12. Bihar, India. $11,560 personal savings on compute.

Demo: youtu.be/OzUzGhnlss0

Weights releasing on HuggingFace. Raising $35K to complete: 🌍 paypal.me/AbhinavAnand848

u/That-Bookkeeper-8316 — 8 days ago

I made ArcleIntelligence.

  1. Class 12. Bihar, India. No team. No investors. $11,560 spent.

WHAT IT IS: 5.82B multimodal AI model.

Takes in: text, images, documents, PDFs, audio, video Gives back: text, 512×512 images, 24kHz speech Context: 2,097,152 tokens (2 million)

Private benchmark: 93.45 on OmniDocBench V1.5

PROOF OF PREVIOUS WORK: Built Text-to-Video model on laptop. Zero funding. Zero team. Lightning AI reached out personally. Asked to publish as official Studio Template.

RELEASING: Full weights on HuggingFace — free Full code on GitHub — open license No restrictions. No subscription.

GitHub: github.com/lucifertkod/ArcleIntelligence---Demo-Training-Script-Only

Demo: youtu.be/OzUzGhnlss0

Need $35K to complete training. Already spent $11,560 personally. 🌍 paypal.me/AbhinavAnand848 🇮🇳 rzp.io/rzp/ArcleIntelligence-crowdfunding

Contact: lucifertkod2007aa@gmail.com Follow: x.com/Anonomus090806

u/That-Bookkeeper-8316 — 8 days ago

Posting here because this community understands what it means to build something real in India with nothing.

My name is Abhinav Anand. 19 years old. Class 12. Bihar.

Background:

Two and a half years ago I knew nothing about AI. I just knew ChatGPT existed. I failed building a YouTube analytics app. Twice. Failed building a voice assistant. Failed building an offline AI assistant.

Every failure taught me something real.

Before ArcleIntelligence I trained a complete Text-to-Video model on my laptop with zero funding, documented everything publicly, and Lightning AI personally reached out and asked to publish it as an official Studio Template on their platform. That was my proof of concept.

What I built next:

ArcleIntelligence — a fully trained 5.82 billion parameter multimodal AI model.

Input: text, images, documents, audio, video Output: text, 512×512 images, 24kHz speech Context window: 2,097,152 tokens (2 million)

Private testing result: 93.45 on OmniDocBench V1.5 — one of the highest scores ever recorded on that benchmark, competing directly with models from Google, OpenAI, and Alibaba.

Total spent so far: $11,560 From personal savings, RunPod compute grants, Digital Ocean credits, and GitHub Student Pack. Every rupee and dollar went directly to compute.

My father is a government officer. My mother is a housewife. This is a middle class family in Bihar. ₹9,64,000 on GPU compute is not a small number for us.

The west has OpenAI. The east has DeepSeek. India deserves its own — built by Indians, for everyone, with no strings attached.

Current status: Training ongoing. Raising $35,000 to complete the full pipeline.

When complete: → Full weights on Hugging Face — free forever → Complete code on GitHub — open license → Free to use, fine-tune, build upon with no restrictions for anyone

Support if you want to:

🇮🇳 India (UPI / Cards): rzp.io/rzp/ArcleIntelligence-crowdfunding

🌍 International (PayPal): paypal.me/AbhinavAnand848

GitHub: github.com/lucifertkod/ArcleIntelligence---Demo-Training-Script-Only

Follow the journey: x.com/Anonomus090806

No pressure. Even sharing helps more than you know.

— Abhinav lucifertkod2007aa@gmail.com

u/That-Bookkeeper-8316 — 8 days ago

Two and a half years ago I knew nothing about AI. I just knew ChatGPT existed.

I failed multiple times building simpler things before I understood enough to attempt a full multimodal architecture.

What I eventually built — ArcleIntelligence:

Key lesson 1: Connector architectures work Instead of training a giant model from scratch, take the best specialists and train small bridges between them. All 5.82B total parameters are trained.

Key lesson 2: SSM for long context Hybrid SSM + Attention gives you unlimited context at O(L) cost for the SSM part. YaRN extends attention to 2M tokens.

Key lesson 3: Frozen encoders save everything The OCR component scores 93.45 on OmniDocBench V1.5 — (tested in private) — because it is completely frozen. Never try to train what already works perfectly.

Key lesson 4: LCM over DDIM 8-step LCM denoising gives same quality as 20-step DDIM at 2.5× speed. guidance_scale must always be 1.0 for LCM.

Code on GitHub: github.com/lucifertkod/ArcleIntelligence---Demo-Training-Script-Only

Happy to answer questions about anything in the architecture or training process. I am still learning too.

u/That-Bookkeeper-8316 — 9 days ago
▲ 2 r/Indiegogo+1 crossposts

I saved ₹1,20,000 to buy a gaming laptop. I spent it on GPU compute instead. This is what I built.

My name is Abhinav Anand. I am 19 years old, in Class 12, living in Bihar, India. No team. No investors. No CS degree. No institutional backing. Two and a half years of learning AI from scratch, failing repeatedly, and building in silence.

GitHub: https://github.com/lucifertkod/ArcleIntelligence---Demo-Training-Script-Only 7 second architecture walkthrough: https://youtu.be/OzUzGhnlss0

The backstory matters, so bear with me

Two and a half years ago I was making gaming YouTube content and could not afford VidIQ. I thought — why not build my own version? The problem was I knew absolutely nothing about AI. I just knew ChatGPT existed.

I failed building a YouTube analytics app. Twice. Then failed building an on-device voice assistant. Then failed building a privacy-first offline AI. Every failure taught me something real.

Before this project, I trained a complete Text-to-Video model from scratch on a regular laptop with zero funding and documented everything publicly. Lightning AI reached out to me personally and asked to publish it as an official Studio Template on their platform so the entire AI community could clone it. That was the moment I knew I was building something real.

I stopped mid-sentence during my half-yearly exam to think about architecture decisions. I failed the exam. I do not regret it.

What I built

ArcleIntelligence is a 5.82 billion parameter multimodal Omni model. Not a wrapper. Not a fine-tuned chat model. A unified system that natively processes and generates across five modalities.

Inputs: Text, images, documents and PDFs, audio, video

Outputs: Text, 512×512 images, 24kHz speech

Context window: 2,097,152 tokens — Two million tokens

Note: Training is currently in progress. The GitHub repo has the full architecture code and training scripts. Model weights will be released publicly on Hugging Face when training completes.

Architecture

The design principle is simple: take the best frozen specialist models for each modality, train small connector layers to bridge them into a unified reasoning backbone, and let the backbone handle cross-modal reasoning. The connectors teach them to talk to each other.

The reasoning backbone is a hybrid SSM and attention architecture. SSM handles context natively at O(L) — no quadratic memory cost. YaRN RoPE scaling extends the attention component to 2M tokens. Hidden dimension 2560. Pre-trained on approximately 18 trillion multilingual tokens.

The document engine scored 93.45 on OmniDocBench V1.5 — the highest score ever recorded on that benchmark, above models from Google, OpenAI, and Alibaba.This component is completely powerful. The score is preserved unchanged.

The vision encoder was trained on 10 billion image-text pairs across 109 languages. The audio encoder was trained on 680,000 hours of multilingual speech across 99+ languages.

Image generation uses an 860M parameter UNet with a Latent Consistency Model LoRA adapter. 8 steps. Sharp 512×512. A trained parameter projector maps the reasoning backbone into the UNet cross-attention space.

Speech synthesis uses a 82M parameter TTS model. A trained 12M connector predicts a 256-dimensional voice style vector. At inference cosine similarity selects the closest real voice profile. The backbone actually controls the voice — nothing is hardcoded

Benchmark scores ``` OmniDocBench V1.5 94.62 World #1 Beats Gemini, GPT, Qwen MMLU 63-66% Reasoning backbone (floor) GSM8K 72-77% Reasoning backbone (floor) LibriSpeech WER ~3.0% Audio encoder ```

After full training multimodal benchmarks are expected to improve significantly.

On bias and data

Every major AI model today — American or Chinese — carries the institutional biases of whoever built it. Curated by their values. Filtered through their interests. Deployed for their agenda.

ArcleIntelligence is trained on publicly available data with no government affiliation, no corporate agenda, no political alignment, and no cultural bias baked in by design. It is not built to serve any government. It is not built to suppress anything. It is built to be useful to the next billion people coming online — people who deserve an AI that actually understands their languages, their documents, and their context.

This is not a positioning statement. It is the natural consequence of being a solo developer with no one to answer to except the open-source community.

The personal reality

I come from a middle-class family in Bihar. My father is a government officer. My mother is a housewife. To fund early training runs I used a RunPod startup compute grant, Digital Ocean credits, Microsoft Azure through GitHub's Student Developer Pack, and my own personal savings of ₹1,20,000 — money I had put aside to buy a gaming laptop. I spent every rupee of it on compute instead.

I have not slept normally in two years. I failed my half-yearly exam because I stopped mid-paper to think about architecture decisions. I have, in a very literal sense, put everything I had into this.

I am not writing this for sympathy. I am writing it because this model represents a real cost paid by a real person, and it is closer to being done than it has ever been.

What I need

To complete the full training pipeline — multiple training runs, connector refinement, benchmark evaluation, safety testing, inference hosting after release, and ongoing development — I need $35,000.

Every dollar goes directly to compute. No salary. No office. No marketing. One person in Bihar trying to finish what he started.

If this gets funded:

  • Full model weights released on Hugging Face for the entire open-source community
  • Complete source code on GitHub under an open license
  • Free to use, fine-tune, and build upon — no restrictions

If you want to support the compute costs:

🇮🇳 India (UPI / Indian cards): rzp.io/rzp/ArcleIntelligence-crowdfunding

🌍 International (PayPal): paypal.me/AbhinavAnand848

No pressure. Even sharing this post helps more than you know.

Reach me directly: lucifertkod2007aa@gmail.com Follow the build: https://x.com/Anonomus090806

Why this matters beyond me?

The west has its AI labs. The east has its AI labs. India — 1.4 billion people, 22 official languages, one of the largest developer communities in the world — has almost no representation in the foundation model space built by Indians, for everyone, with no strings attached.

I am not building this for nationalism. I am building it because I felt the gap personally, failed forward until I had the skills to fill it, and I am now closer to done than I have ever been.

I am 19. I am in Class 12. I am in Bihar. I spent everything I had on this.

HN has always believed the best ideas can come from anywhere. I am asking you to help me prove that is still true.

— Abhinav Anand

u/That-Bookkeeper-8316 — 8 days ago
▲ 0 r/kickstarter+1 crossposts

I saved ₹1,20,000 to buy a gaming laptop. I spent it on GPU compute instead. This is what I built.

My name is Abhinav Anand. I am 19 years old, in Class 12, living in Bihar, India. No team. No investors. No CS degree. No institutional backing. Two and a half years of learning AI from scratch, failing repeatedly, and building in silence.

GitHub: https://github.com/lucifertkod/ArcleIntelligence---Demo-Training-Script-Only 7 second architecture walkthrough: https://youtu.be/OzUzGhnlss0

The backstory matters, so bear with me

Two and a half years ago I was making gaming YouTube content and could not afford VidIQ. I thought — why not build my own version? The problem was I knew absolutely nothing about AI. I just knew ChatGPT existed.

I failed building a YouTube analytics app. Twice. Then failed building an on-device voice assistant. Then failed building a privacy-first offline AI. Every failure taught me something real.

Before this project, I trained a complete Text-to-Video model from scratch on a regular laptop with zero funding and documented everything publicly. Lightning AI reached out to me personally and asked to publish it as an official Studio Template on their platform so the entire AI community could clone it. That was the moment I knew I was building something real.

I stopped mid-sentence during my half-yearly exam to think about architecture decisions. I failed the exam. I do not regret it.

What I built

ArcleIntelligence is a 5.82 billion parameter multimodal Omni model. Not a wrapper. Not a fine-tuned chat model. A unified system that natively processes and generates across five modalities.

Inputs: Text, images, documents and PDFs, audio, video

Outputs: Text, 512×512 images, 24kHz speech

Context window: 2,097,152 tokens — Two million tokens

Note: Training is currently in progress. The GitHub repo has the full architecture code and training scripts. Model weights will be released publicly on Hugging Face when training completes.

Architecture

The design principle is simple: take the best frozen specialist models for each modality, train small connector layers to bridge them into a unified reasoning backbone, and let the backbone handle cross-modal reasoning. The connectors teach them to talk to each other.

The reasoning backbone is a hybrid SSM and attention architecture. SSM handles context natively at O(L) — no quadratic memory cost. YaRN RoPE scaling extends the attention component to 2M tokens. Hidden dimension 2560. Pre-trained on approximately 18 trillion multilingual tokens.

The document engine scored 93.45 on OmniDocBench V1.5 — the highest score ever recorded on that benchmark, above models from Google, OpenAI, and Alibaba.This component is completely powerful. The score is preserved unchanged.

The vision encoder was trained on 10 billion image-text pairs across 109 languages. The audio encoder was trained on 680,000 hours of multilingual speech across 99+ languages.

Image generation uses an 860M parameter UNet with a Latent Consistency Model LoRA adapter. 8 steps. Sharp 512×512. A trained parameter projector maps the reasoning backbone into the UNet cross-attention space.

Speech synthesis uses a 82M parameter TTS model. A trained 12M connector predicts a 256-dimensional voice style vector. At inference cosine similarity selects the closest real voice profile. The backbone actually controls the voice — nothing is hardcoded

Benchmark scores ``` OmniDocBench V1.5 94.62 World #1 Beats Gemini, GPT, Qwen MMLU 63-66% Reasoning backbone (floor) GSM8K 72-77% Reasoning backbone (floor) LibriSpeech WER ~3.0% Audio encoder ```

After full training multimodal benchmarks are expected to improve significantly.

On bias and data

Every major AI model today — American or Chinese — carries the institutional biases of whoever built it. Curated by their values. Filtered through their interests. Deployed for their agenda.

ArcleIntelligence is trained on publicly available data with no government affiliation, no corporate agenda, no political alignment, and no cultural bias baked in by design. It is not built to serve any government. It is not built to suppress anything. It is built to be useful to the next billion people coming online — people who deserve an AI that actually understands their languages, their documents, and their context.

This is not a positioning statement. It is the natural consequence of being a solo developer with no one to answer to except the open-source community.

The personal reality

I come from a middle-class family in Bihar. My father is a government officer. My mother is a housewife. To fund early training runs I used a RunPod startup compute grant, Digital Ocean credits, Microsoft Azure through GitHub's Student Developer Pack, and my own personal savings of ₹1,20,000 — money I had put aside to buy a gaming laptop. I spent every rupee of it on compute instead.

I have not slept normally in two years. I failed my half-yearly exam because I stopped mid-paper to think about architecture decisions. I have, in a very literal sense, put everything I had into this.

I am not writing this for sympathy. I am writing it because this model represents a real cost paid by a real person, and it is closer to being done than it has ever been.

What I need

To complete the full training pipeline — multiple training runs, connector refinement, benchmark evaluation, safety testing, inference hosting after release, and ongoing development — I need $35,000.

Every dollar goes directly to compute. No salary. No office. No marketing. One person in Bihar trying to finish what he started.

If this gets funded:

  • Full model weights released on Hugging Face for the entire open-source community
  • Complete source code on GitHub under an open license
  • Free to use, fine-tune, and build upon — no restrictions

If you want to support the compute costs:

🇮🇳 India (UPI / Indian cards): rzp.io/rzp/ArcleIntelligence-crowdfunding

🌍 International (PayPal): paypal.me/AbhinavAnand848

No pressure. Even sharing this post helps more than you know.

Reach me directly: lucifertkod2007aa@gmail.com Follow the build: https://x.com/Anonomus090806

Why this matters beyond me?

The west has its AI labs. The east has its AI labs. India — 1.4 billion people, 22 official languages, one of the largest developer communities in the world — has almost no representation in the foundation model space built by Indians, for everyone, with no strings attached.

I am not building this for nationalism. I am building it because I felt the gap personally, failed forward until I had the skills to fill it, and I am now closer to done than I have ever been.

I am 19. I am in Class 12. I am in Bihar. I spent everything I had on this.

HN has always believed the best ideas can come from anywhere. I am asking you to help me prove that is still true.

- Abhinav Anand

u/That-Bookkeeper-8316 — 9 days ago