r/PHP

Improved markdown quality, code intelligence for 248 languages, and more in Kreuzberg v4.7.0
🔥 Hot ▲ 51 r/LocalLLaMA+10 crossposts

Improved markdown quality, code intelligence for 248 languages, and more in Kreuzberg v4.7.0

Kreuzberg v4.7.0 is here. Kreuzberg is a Rust-core document intelligence library that works with Python, TypeScript/Node.js, Go, Ruby, Java, C#, PHP, Elixir, R, C, and WASM. 

We’ve added several features, integrated OpenWEBUI, and made a big improvement in quality across all formats. There is also a new markdown rendering layer and new HTML output, which we now support. And much more (which you can find in our the release notes).

The main highlight is code intelligence and extraction. Kreuzberg now supports 248 formats through our tree-sitter-language-pack library. This is a step toward making Kreuzberg an engine for agents too. You can efficiently parse code, allowing direct integration as a library for agents and via MCP. Agents work with code repositories, review pull requests, index codebases, and analyze source files. Kreuzberg now extracts functions, classes, imports, exports, symbols, and docstrings at the AST level, with code chunking that respects scope boundaries. 

Regarding markdown quality, poor document extraction can lead to further issues down the pipeline. We created a benchmark harness using Structural F1 and Text F1 scoring across over 350 documents and 23 formats, then optimized based on that. LaTeX improved from 0% to 100% SF1. XLSX increased from 30% to 100%. PDF table SF1 went from 15.5% to 53.7%. All 23 formats are now at over 80% SF1. The output pipelines receive is now structurally correct by default. 

Kreuzberg is now available as a document extraction backend for OpenWebUI (by popular request!), with options for docling-serve compatibility or direct connection.

In this release, we’ve added unified architecture where every extractor creates a standard typed document representation. We also included TOON wire format, which is a compact document encoding that reduces LLM prompt token usage by 30 to 50%, semantic chunk labeling, JSON output, strict configuration validation, and improved security. GitHub: https://github.com/kreuzberg-dev/kreuzberg

And- Kreuzberg Cloud out soon, this will be the hosted version is for teams that want the same extraction quality without managing infrastructure. more here: https://kreuzberg.dev

Contributions are always very welcome

u/Eastern-Surround7763 — 5 hours ago
Testo (a new testing framework) is now in beta
▲ 16 r/PHP

Testo (a new testing framework) is now in beta

Hey r/php,

I've been working on Testo – a testing framework for PHP built from the ground up on a fully independent architecture of plugins, middleware, and events.

The philosophy is simple: give the developer full control without imposing anything. Everything unnecessary can be disabled, everything missing can be added. Unit tests, inline tests, benchmarks, code coverage, retries - these are all regular plugins built on the same mechanisms available to you.

The article goes into detail on features, code examples, and answers common questions.
If anything's not covered there, happy to answer in the comments.

php-testo.github.io
u/roxblnfk — 3 hours ago
Improved markdown quality, code intelligence for 248 formats, and more in Kreuzberg v4.7.0
▲ 2 r/PHP

Improved markdown quality, code intelligence for 248 formats, and more in Kreuzberg v4.7.0

Kreuzberg v4.7.0 is here. Kreuzberg is an open-source Rust-core document intelligence library with bindings for Python, TypeScript/Node.js, Go, Ruby, Java, C#, PHP, Elixir, R, C, and WASM. 

We’ve added several features, integrated OpenWEBUI, and made a big improvement in quality across all formats. There is also a new markdown rendering layer and new HTML output, which we now support. And many other fixes and features (find them in our the release notes).

The main highlight is code intelligence and extraction. Kreuzberg now supports 248 formats through our tree-sitter-language-pack library. This is a step toward making Kreuzberg an engine for agents. You can efficiently parse code, allowing direct integration as a library for agents and via MCP. AI agents work with code repositories, review pull requests, index codebases, and analyze source files. Kreuzberg now extracts functions, classes, imports, exports, symbols, and docstrings at the AST level, with code chunking that respects scope boundaries. 

Regarding markdown quality, poor document extraction can lead to further issues down the pipeline. We created a benchmark harness using Structural F1 and Text F1 scoring across over 350 documents and 23 formats, then optimized based on that. LaTeX improved from 0% to 100% SF1. XLSX increased from 30% to 100%. PDF table SF1 went from 15.5% to 53.7%. All 23 formats are now at over 80% SF1. The output pipelines receive is now structurally correct by default. 

Kreuzberg is now available as a document extraction backend for OpenWebUI, with options for docling-serve compatibility or direct connection. This was one of the most requested integrations, and it’s finally here. 

In this release, we’ve added unified architecture where every extractor creates a standard typed document representation. We also included TOON wire format, which is a compact document encoding that reduces LLM prompt token usage by 30 to 50%, semantic chunk labeling, JSON output, strict configuration validation, and improved security. GitHub: https://github.com/kreuzberg-dev/kreuzberg

Contributions ar always very welcome!

https://kreuzberg.dev/

u/Eastern-Surround7763 — 5 hours ago
Content negotiation in PHP: your website is already an API without knowing it (Symfony, Laravel and Temma examples)
▲ 5 r/PHP

Content negotiation in PHP: your website is already an API without knowing it (Symfony, Laravel and Temma examples)

I'm preparing a talk on APIs for AFUP Day, the French PHP conference. One of the topics I'll cover is content negotiation, sometimes called "dual-purpose endpoint" or "API mode switch."

The idea is simple: instead of building a separate API alongside your website, you make your website serve both HTML and JSON from the same endpoints. The client signals what it wants, and the server responds accordingly.

A concrete use case

You have a media site or an e-commerce platform. You also have a mobile app that needs the same content, but as JSON. Instead of duplicating your backend logic into a separate API, you expose the same URLs to both your browser and your mobile app. The browser gets HTML, the app gets JSON.

The client signals its preference via the Accept header: Accept: application/json for JSON, Accept: text/html for HTML. Other approaches exist (URL prefix, query parameter, file extension), but the Accept header is the standard HTTP way.

The same endpoint in three frameworks

Symfony

<?php

namespace App\Controller;

use Symfony\Bundle\FrameworkBundle\Controller\AbstractController;
use Symfony\Component\HttpFoundation\JsonResponse;
use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\Routing\Attribute\Route;

class ArticleController extends AbstractController
{
    #[Route('/articles', requirements: ['_format' => 'html|json'])]
    public function list(Request $request)
    {
        $data = ['message' => 'Hello World'];
        if ($request->getPreferredFormat() === 'json') {
            return new JsonResponse($data);
        }
        return $this->render('articles/list.html.twig', $data);
    }
}

In Symfony, the route attribute declares which formats the action accepts. The data is prepared once, then either passed to a Twig template for HTML rendering, or serialized as JSON using JsonResponse depending on what the client requested.

Laravel

Laravel has no declarative format constraint at the route level. The detection happens in the controller.

routes/web.php

<?php

use App\Http\Controllers\ArticleController;
use Illuminate\Support\Facades\Route;

Route::get('/articles', [ArticleController::class, 'list']);

Unlike Symfony, there is no need to declare accepted formats in the route. The detection happens in the controller via expectsJson().

app/Http/Controllers/ArticleController.php

<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;
use Illuminate\Routing\Controller;

class ArticleController extends Controller
{
    public function list(Request $request)
    {
        $data = ['message' => 'Hello World'];
        if ($request->expectsJson()) {
            return response()->json($data);
        }
        return view('articles.list', $data);
    }
}

The data is prepared once, then either serialized as JSON via response()->json(), or passed to a Blade template for HTML rendering.

Temma controllers/Article.php

<?php

use \Temma\Attributes\View as TµView;

class Article extends \Temma\Web\Controller {
    #[TµView(negotiation: 'html, json')]
    public function list() {
        $this['message'] = 'Hello World';
    }
}

In Temma, the approach is different from Symfony and Laravel: the action doesn't have to check what format the client is asking for. Its code is always the same, regardless of whether the client wants HTML or JSON. A view attribute handles the format selection automatically, based on the Accept header sent by the client.

Here, the attribute is placed on the action, but it could be placed on the controller instead, in which case it would apply to all actions.

u/amaurybouchard — 22 hours ago
▲ 1 r/PHP+1 crossposts

My xampp MySQL keep crashing

As of now it keeps on crashing and I have to get it from backup and replace the files inside the data folder again to make the port running default 3306 suddenly crashed and I am not able to figure out why

reddit.com
u/AccomplishedPath7634 — 21 hours ago
Week