06 Apr 2026 - tsp
Last update 06 Apr 2026
22 mins
In recent months a number of services emerged that allow generating 3D assets from either text prompts or reference images. While the web interfaces of these platforms are often polished and interactive, they are typically optimized for manual workflows and subscription-based usage. For engineering pipelines, reproducibility, and automation, however, what we really want is API access.
Beyond the purely technical perspective, this is also a rather fascinating shift: these systems dramatically lower the barrier for creating artistic 3D content. They do not replace skilled artists - and realistically they will not in the foreseeable future - but they act as powerful amplifiers. For experienced artists they can accelerate iteration and ideation, while at the same time enabling people without strong artistic 3D skills to finally materialize their ideas, worlds, and characters.
Things get even more interesting when combined with modern image generation systems (like Stable Diffusion and similar approaches). With reasonably consistent prompt engineering, one can first generate coherent visual concepts and then lift them into 3D space. This opens the door to semi-automatically building consistent 3D scenes, asset libraries, or even entire worlds that share a unified style.
In my own workflow I primarily use these systems as a complement to traditional CAD. Most of my classical modeling work is engineering-focused (mechanical parts, devices, tooling), where parametric CAD (mostly in FreeCAD) is still the right tool, tools like the presented one are not capable of performing proper technical design at this stage of development. However, for non-technical, artistic or decorative objects - especially for 3D printing - these generative approaches are extremely valuable. They allow me to produce shapes and aesthetics that I would otherwise struggle to model manually.
In this article I will walk through a Python-based pipeline, the whole script is provided at the end of the article, that uses the Tripo3D API to generate, process, and export 3D models in a fully automated fashion. The focus is not just on getting a model, but on building a structured pipeline that produces pseudo-deterministic outputs, metadata, and multiple export formats suitable for further processing (e.g., CNC, 3D printing, simulation, or game engines).
The web frontend is excellent for exploration, iteration, and interactive refinement as known from typical artistic workflows. However, it has a few limitations when used in technical environments. Typical limitations in technical environments include the lack of reproducible batch processing, limited control over export formats and intermediate steps, missing structured metadata capture, and poor integration into existing toolchains.
The script presented here addresses these issues by treating every operation as a task with persistent metadata, storing all intermediate results, supporting both text-to-model and image-to-model workflows, enabling per-part processing and exports, and enforcing deterministic naming and file organization. The key gain of this pipeline approach is automation and therefore scalability - there is no meaningful scaling without automation. Once the process is expressed as a pipeline, generating tens, hundreds, or thousands of assets becomes a straightforward extension rather than a manual effort. Additionally, using the API directly typically means usage-based billing instead of subscription-based time periods, which aligns much better with batch workloads and sporadic large-scale generation runs (taking on the order of ten minutes for a single high quality model without texture and rigging).
The pipeline is structured into several stages:
Each stage is implemented as an asynchronous task and stored together with its metadata.
Conceptually, the pipeline looks like this (violet is your data, green the mandatory step and yellow optional steps):

The Tripo API internally works with tasks. Instead of hiding this abstraction, the script embraces it.
Every step:
task_idThis is implemented via:
async def wait_success(client: TripoClient, task_id: str, label: str):
task = await client.wait_for_task(task_id, polling_interval=2.0, timeout=None, verbose=True)
status = str(getattr(task, "status", "")).lower()
if "success" not in status:
raise RuntimeError(...)
return task
This pattern ensures robust error handling, since failures are detected explicitly and surfaced immediately, while also providing full traceability because every step is captured as a task with associated metadata. At the same time, it enables straightforward debugging and replay, as individual steps can be inspected, reproduced, or rerun without having to reconstruct the entire pipeline.
Instead of only saving meshes, the pipeline stores everything about each task:
async def save_task_metadata(task: Any, out_dir: Path, stem: str) -> Path:
meta = task_to_dict(task)
path = out_dir / f"{stem}.task.json"
path.write_text(json.dumps(meta, indent=2, ensure_ascii=False))
return path
This is extremely useful when comparing different parameter settings, debugging failed generations, and building higher-level automation on top of the pipeline. In addition you can resume operation with an model at each intermediate step this way.
Generated assets are renamed into a consistent scheme:
01_base.glb
02_textured_full.glb
03_textured_part__001__wheel.glb
06_export_full_stl.stl
08_export_part_stl__003__handle.stl
This is handled via:
dst = out_dir / f"{stem}.{model_type}{src.suffix}"
Together with sanitize_name() this guarantees filesystem-safe naming.
The entry point is either:
text_to_model()image_to_model()Example:
base_task_id = await client.text_to_model(
prompt=args.prompt,
texture=bool(args.texture),
face_limit=args.face_limit,
generate_parts=args.generate_parts,
)
Important parameters:
texture: Generate textures directly if set to true.face_limit: Control mesh complexity by supplying the absolute face limit. Note that cost typically scales with this limit.generate_parts: Ask the model to segment the object.smart_low_poly: Useful for real-time applications - first generate a low polygon representation first and later expand to high polygon count.
One particularly interesting feature is automatic part discovery. Conceptually, you can think of the generative model not only producing a single mesh, but internally reasoning about the object as a composition of semantic substructures—wheels, handles, bodies, limbs, or decorative elements—very much like how a human would describe or sketch it. While the API does not expose this internal representation directly, traces of it appear in the task output, where parts may be listed explicitly or implicitly. By probing these structures defensively, the pipeline reconstructs a usable set of part identifiers.
This is powerful because it turns a monolithic generated mesh into something closer to a structured assembly. Once parts are identifiable, they can be processed independently: textured differently, exported separately, simplified or refined with different parameters, or even replaced downstream. In practical terms, this enables workflows that resemble classical CAD assemblies or game asset pipelines, but starting from a generative model rather than manual modeling.
What makes this particularly compelling is that it bridges a gap between purely artistic generation and engineering-style decomposition. Instead of treating the generated object as a static artifact, it becomes a manipulable system. For example, you can generate a complex object once, then iterate only on a specific component (e.g., retexturing just the “handle” or exporting only the “base” for printing). This selective control is where generative models begin to feel less like black boxes and more like cooperative tools that expose structure—imperfectly, but often sufficiently—to be integrated into real workflows.
Since the SDK does not clearly document where part names are stored, my script uses a defensive extraction strategy:
def discover_part_names(task: Any) -> List[str]:
candidates = [task, getattr(task, "output", None)]
...
This scans:
The result is a list of part names such as:
["body", "wheel", "handle", "base"]
Note that this approach may break at any point in time. It just worked while I wrote the script but it relies on undocumented behaviour.
Before looking at the concrete approaches, it is useful to briefly clarify what “texturing” actually means in this context. A generated 3D model typically consists of geometry (vertices, edges, faces) that define the shape, and separate surface information that defines how it looks. Texturing is the process of assigning image-based or procedurally generated information onto the surface of that geometry via UV mappings, effectively telling the renderer or downstream tool what color, roughness, metallic properties, and fine visual details each point on the surface should have. Without textures, most models look like uniform gray meshes; with textures, they become visually rich objects with materials such as wood, metal, fabric, or painted surfaces. In many pipelines this also includes PBR (physically based rendering) parameters, which control how light interacts with the surface. For purely functional workflows such as single-color 3D printing, however, textures are typically not required at all - the geometry alone is sufficient, and formats like STL intentionally ignore any surface appearance information.
There are two approaches implemented:
texture_task_id = await client.texture_model(
original_model_task_id=full_mesh_task_id,
texture=True,
pbr=True,
text_prompt=args.texture_prompt,
)
This produces a single coherent material.
part_tex_task_id = await client.texture_model(
part_names=[part_name],
...
)
This enables:
The export stage is surprisingly powerful and still very simple.
async def export_one_format(...):
export_task_id = await client.convert_model(
format=fmt,
flatten_bottom=flatten_bottom,
pivot_to_center_bottom=pivot_to_center_bottom,
pack_uv=pack_uv,
)
The flatten_bottom option modifies the geometry such that the lowest region of the model is projected onto a plane, effectively creating a flat base. This is particularly useful for 3D printing because many printers require stable contact with the build plate. Without a flat surface, models may require support structures, which increase print time, material usage, and post-processing effort. By flattening the bottom, the model can often be printed directly, improving adhesion and reliability.
The pivot_to_center_bottom parameter adjusts the coordinate system of the model such that its origin is moved to the center of the base. This is not just a convenience for slicers, but fundamentally changes how the model is positioned and manipulated in downstream tools. With this pivot, rotations occur around a physically meaningful point (the contact surface), and placement into scenes or assemblies becomes more predictable. For printing workflows, this often means the object appears correctly aligned on the build plate without additional transformations.
The pack_uv parameter operates on the texture coordinate space rather than geometry. It reorganizes the UV layout to make more efficient use of available texture space. This reduces wasted texture area, improves resolution of surface details, and is especially relevant when exporting to formats used in rendering or game engines where texture memory and quality are important.
STL is the most basic and widely supported format for 3D printing. It encodes only the surface geometry as a triangle mesh and intentionally contains no information about colors, materials, or textures. This simplicity makes it robust and universally compatible, but also limits it to purely geometric workflows. The major advantage is that STL is extremely simple to implement in comparison to the other alternatives.
3MF can be seen as a modern replacement for STL. It supports not only geometry but also metadata such as colors, materials, multiple objects in a single file, and even printer-specific settings. For advanced printing workflows, especially with multi-material or color printers, 3MF is often the better choice.
GLTF and FBX are formats primarily used in rendering, simulation, and game engines. They support hierarchical scene structures, materials, textures, animations, and sometimes skeletal rigs. GLTF is designed as a modern, efficient, and open standard (often described as the JPEG of 3D), while FBX is older, widely supported, and deeply integrated into many commercial tools.
USDZ is a format designed for augmented reality ecosystems, particularly in environments like mobile devices. It supports compact packaging of geometry, materials, and animations in a way that is optimized for real-time rendering and distribution, making it suitable for AR previews or interactive product visualization.
For manufacturing workflows, splitting models is often essential.
async def export_parts_for_format(...):
for idx, part_name in enumerate(part_names, start=1):
stem = f"{base_stem}__{idx:03d}__{sanitize_name(part_name)}"
This results in:
The pipeline also supports automatic rigging. In the context of 3D graphics, rigging refers to adding an internal skeleton (a hierarchy of bones or joints) to a static mesh, together with weights that define how each part of the surface deforms when those bones move. You can think of it as turning a rigid statue into something that can be posed or animated: bending an arm, rotating a head, or walking becomes possible because the mesh is now bound to this underlying structure. In practice, rigging also defines constraints, joint limits, and sometimes control handles that make animation easier to author. For many readers coming from CAD or printing workflows this concept may be unfamiliar, since purely geometric models are typically static; however, for game engines, simulation, or character animation, rigging is the essential step that converts geometry into something dynamic and controllable.
rig_task_id = await client.rig_model(
rig_type=normalize_rig_type(args.rig_type),
spec=normalize_rig_spec(args.rig_spec),
)
Supported rig types correspond to different anatomical or kinematic archetypes. A biped rig assumes two legs and typically two arms arranged around a vertical spine; it is the standard for humanoid characters and benefits from well-established conventions (for example Mixam -style skeletons), making retargeting animations straightforward. A quadruped rig is optimized for four-legged locomotion with a horizontal spine and coordinated gait cycles; it better captures weight distribution and natural motion for animals like dogs or horses, but requires different animation clips and controllers than bipeds. An avian rig introduces wings and often tail articulation, with joints arranged to support flapping, folding, and gliding; it is useful for birds or winged creatures but can be more complex due to additional degrees of freedom and coupled motions. A serpentine rig represents elongated bodies composed of many segments; instead of discrete limbs, motion is produced by propagating waves along the body, which is ideal for snakes or tentacle-like structures but requires spline- or chain-based control schemes.
Each choice encodes assumptions about joint hierarchy, constraints, and typical motion patterns. The advantage is that the resulting skeleton is immediately compatible with common animation tools and libraries for that class, enabling reuse of existing animation data (retargeting) and predictable behavior in physics or IK solvers. The downside is that mismatching the rig type to the geometry can produce unnatural deformation or require additional cleanup.
In animation and game pipelines this is extremely valuable because it converts a static mesh into a controllable asset that can be posed, animated, and simulated in real time. Engines rely on skeletal animation for efficiency (skinning on the GPU), blending between clips (like idle, walk, run), inverse kinematics for interactions (feet on ground, hands on objects), and physics-driven secondary motion. With a suitable rig, the same model can be reused across scenes and behaviors, integrated into state machines, and driven by gameplay logic turning a generated object into a fully interactive entity rather than a fixed piece of geometry with minimal or no additional manual work.
My script exposes the described features, making it easy to integrate into other tools:
Example usage:
./tripo.py \
--mode text \
--prompt "small steampunk robot with tracks" \
--texture \
--generate-parts \
--export-stl \
--out ./output
A few interesting takeaways from using this setup:
Most importantly: once this pipeline exists, it becomes trivial to generate hundreds of assets in a reproducible way.
If you want to move from a virtual 3D object to a physical object in a very easy way, just use the Cura slicer and a simple 3D printer like the Creality Ender3 V3 SE, the newer Creality Ender3 V3 KE featuring a ceramis heater or a more costly but capable multi filament printer like the Creality K2 Plus for multi color prints. Most models that are created for decorative purposes are directly printable without further modifications. If you need to perform further modifications use either the slicers limited editing capabilities or move on to Blender, which is an amazing tool with a steep but rewarding learning curve, before entering the CAD/CAM pipeline.
There are several obvious extensions:
n8n), combining idea and description generation, consistent image generation via diffusion models, 3D asset creation, assembly into animated scenes, and optionally fully automated video production workflowsFrom a systems perspective, this is where things get interesting: the moment generative models become just another pseudo-deterministic component in an engineering pipeline.
Using the Tripo3D API directly allows transforming 3D generation from an interactive tool into a programmable system component. By structuring the workflow into tasks, capturing metadata, and enforcing pseudo-deterministic outputs, the script provides a solid foundation for integrating generative 3D models into real engineering processes. What makes this particularly compelling is not just the ability to generate individual assets, but to embed generation into larger systems: once automated, the process scales naturally, turning what would otherwise be manual, creative effort into a reproducible and extensible pipeline.
At the same time, this does not replace traditional modeling or artistic workflows, but complements them. Artists will continue to achieve higher-quality and more refined results using interactive tools, while API-driven approaches enable entirely different use cases: batch generation, rapid prototyping, dataset creation, and integration into simulation, robotics, or manufacturing pipelines. In this sense, generative 3D becomes less of a standalone tool and more of a building block that can be composed with other systems.
For practical applications—whether generating printable objects, populating virtual environments, or building automated content pipelines—the combination of automation, usage-based cost models, and flexible export capabilities makes this approach particularly attractive. It allows engineers and technically inclined users to access domains that previously required significant artistic effort, while still leaving room for manual refinement where needed.
If you are already working with CNC, simulation, robotics, or content pipelines, this is where the real shift begins: the moment 3D asset generation becomes just another programmable step in a larger system, rather than a separate, manual process.
The complete script is available as a GitHub GIST:
This article is tagged:
Dipl.-Ing. Thomas Spielauer, Wien (webcomplainsQu98equt9ewh@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/