u/Statute_of_Anne

Image 1 — "Mossy path" - revisited by monocular stereoscopy
Image 2 — "Mossy path" - revisited by monocular stereoscopy
▲ 30 r/ParallelView+1 crossposts

"Mossy path" - revisited by monocular stereoscopy

The first displayed image is one from a stereoscopic pair published by stubeans a few hours before this post..

The second image is a side-by-side stereo pair derived from the single image by using the 3D_SBS python tool in a ComfyUI workflow (all the software open-source and freely available for use offline).

The point of this exercise is to demonstrate that an ordinary 2D photograph of a 3D scene contains the necessary information for the brain to construct a 3D view. The ocular cortex of the brain is given the necessary prompting by the results of an image depth/perspective analysis being separated into the necessary two images.

There will be subtle differences between viewing the true stereo pair and the ersatz pair. In this instance they seem to be absent or minor. Sometimes, calculations leading to a constructed stereo pair go a little astray and anomalies will be visible when the combined image is perused in the brain.

The construction algorithm has several parameters enabling tweaking the result to alter the impressions of depth and focus.

Whilst dual lens recording equipment can give optimum results, the enhancement of images taken using lesser apparatus should not be gainsaid. Moreover, 2D pictures of scenes made by artists take on new interest when rendered into 3D; arguably they more closely represent what the artist had in mind, but could not fully realise because of the nature of the medium. Enhancements of this nature don't replace the original constructions, yet they might attune the minds of the artists and the viewers of their works more closely.

I intend to present examples of paintings and drawings revisited in stereoscopy.

u/Statute_of_Anne — 13 hours ago
▲ 1 r/ZImageAI+1 crossposts

Request to workflow publishers regarding Subgraphs

The Subgraph facility offers uncluttered workspaces, and that is most welcome.

One gripe arises. Sometimes, user-alterable parameters (e.g. steps, and cfg) are hidden along with the spaghetti. That's understandable when the workflow designer is convinced that their chosen parameters are optimal, and doesn't wish to confuse neophyte users.

However, when one wishes to explore the effects of parameter alterations it can be a tiresome task searching for relevant nodes within a massive tangle.

Reasonable middle ground could be offered by bringing commonly altered parameter specifications to the open display. These could be assembled within a node for this specific purpose, instead of displaying individual sampling nodes, etc.

reddit.com
u/Statute_of_Anne — 11 days ago

At present local-use Comfyui offers only two Chroma-variant workflow templates.

"Chroma1 Radiance Text to Image" and "Chroma: Text to image"

Each works well.

I've looked elsewhere and came across only one Image→Image workflow. This was overly elaborate and had a nightmare set of custom nodes. I couldn't work out how to reduce it to simplicity.

Can anyone suggest simple modifications to the template examples? Would that also involve a different Chroma variant? Else, can an Image→Text LLM be inserted in the flow?

Guidance would be appreciated?

reddit.com
u/Statute_of_Anne — 19 days ago

At present local-use Comfyui offers only two Chroma workflows templates.

"Chroma1 Radiance Text to Image" and "Chroma: Text to image"

Each works well.

I've looked elsewhere and came across only one Image→Image workflow. This was overly elaborate and had a nightmare set of custom nodes. I couldn't work out how to reduce it to simplicity.

Can anyone suggest simple modifications to the template examples? Would that also involve a different Chroma variant? Else, can an Image→Text LLM be inserted to replace the current text input box?

Guidance would be appreciated?

reddit.com
u/Statute_of_Anne — 19 days ago
▲ 4 r/comfyui+1 crossposts

Multimodal embedding models supplement existing AI base models and distilled/refined models. They are used for extending the scope (knowledge-base and internal reasoning) of extant models.

Apparently, embedding models appeal to some business/institutional users as the next best thing to horrendously expensive ab intio AI model construction and the still very costly distillation/refinement of pre-existing models. The process enables detailed local, perhaps proprietary, information to be used by models initially indiscriminately trained on anything the makers could get their hands upon. The pharmaceutical industry is a big player in this sphere.

An open-source example of this genre is Nomic Embed Multimodal 7B. It, and similar, are said to be compatible with mid-range domestic devices with 16+ GB VRAM and, say, 64 GB RAM (maybe less).

How does this type of tool compare in capabilities and ease of use to other low-cost ways, e.g. LoRas, to beef-up local AI uses?

reddit.com
u/Statute_of_Anne — 23 days ago
▲ 0 r/ZImageAI+1 crossposts

Multimodal embedding models supplement existing AI base models and distilled/refined models. They are means for extending the scope (knowledge-base and internal reasoning) of extant models.

Apparently, embedding models appeal to some business/institutional users as the next best thing to horrendously expensive ab intio AI model construction and the still very costly distillation/refinement of pre-existing models. The process enables detailed local, perhaps proprietary, information to be used by models initially indiscriminately trained on anything the makers could get their hands upon. The pharmaceutical industry is a big player in this sphere.

Multimodal embedding may encompass text, images, and data in other formats. It has similarity to using LoRas to direct AI attention along specified lines.

From 'conversation' with the 'Perplexity' AI, I am led to believe suitable free software for offline use, in the context of tools like Comfyui, exists and easily interdigitates with familiar open-source models (base and distilled). It is compatible with higher-end laptop specifications such as 16+ GB VRAM and 64 GB RAM.

With respect to image generation/processing, does embedding offer advantages over LoRa creation? That's concerning creation/set-up time, useable extension of AI versatility, and as an aid to generated visual character/scenery persistence? Does it extend to local AI video generation?

reddit.com
u/Statute_of_Anne — 23 days ago