MegaCpp EngineeringApplied C++ model systems

</>

Article

Grounded engineering note from the MegaCpp stack

Published April 18, 20265 min readDavid Gornshtein

libtpu

JAX

Torch XLA

PJRT

TPU

libtpu and JAX interaction: shared runtime, separate ownership

Q: Can I upgrade JAX or PyTorch/XLA independently because both eventually use libtpu?

Not safely. Treat each TPU lane as a frontend-plus-runtime bundle: JAX's TPU install path is meant to align jax, jaxlib, and libtpu, while PyTorch/XLA's TPU install path pairs torch with torch_xla[tpu]. The practical MegaCpp rule is to record the active wheel bundle, runtime probe, and cache policy together before trusting a mixed-tooling result. Shared libtpu lowers the stack boundary; it does not erase version ownership. For adjacent failure modes, see Torch 2.12 / XLA breakage matrix and Torch/XLA 2.11 TPU reality.

Q: Do JAX and PyTorch/XLA need separate persistent cache directories just because they share libtpu?

Not to prevent a literal cache-key collision. JAX documents that its persistent cache key includes the non-optimized HLO, the jaxlib version, relevant XLA compilation flags, and device configuration, while PyTorch/XLA exposes its own cache initialization API and explicit readonly mode before any computations. The more realistic mixed-tooling risk is ownership confusion: nobody decided who writes where, when the cache becomes readable, or which frontend actually populated the first compile. That is why separate cache directories or an explicit writer/reader policy are still the safer default in shared TPU bringup. It keeps first-compile ownership legible and pairs cleanly with Graph recompilation hell and the checked-in XLA compile/runtime controls sample.

How PyTorch/XLA, JAX, PJRT, and libtpu relate on TPU without collapsing distinct layers into one vague runtime claim.

By David Gornshtein

MegaCpp

Focused on applied C++ model engineering

Article Preview

libtpu and JAX interaction: shared runtime, separate ownership

Published April 18, 2026•5 min read•David Gornshtein

libtpuQuick term guidelibtpuThe TPU backend library that pairs with PJRT/XLA and owns device-side execution underneath the frontend.GroundingAbout: libtpu / PJRT ownership boundaries Example: TPU backend ownership note Example: XLA runtime probe sample is part of the TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries runtime stack used by current TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries software. OpenXLA describes PJRTQuick term guidePJRTThe TPU runtime interface between frontend code and the backend executor; it is the ownership seam between JAX/Torch-XLA frontends and libtpu.GroundingAbout: libtpu / PJRT ownership boundaries About: Torch XLA / PJRT reality Example: TPU backend ownership note as the uniform device API between frameworks and device-specific runtimes, which is the right first-touch mental model here: PyTorch/XLAQuick term guideXLAThe compiler/runtime layer that lowers frontend tensor programs into executable TPU or accelerator graphs, with shape stability and ownership boundaries as the main operational concerns here.GroundingAbout: XLA vs CUDA stack decisions Reference: Torch XLA / PJRT reality Reference: XLA SPMD sharding annotations uses PJRTQuick term guidePJRTThe TPU runtime interface between frontend code and the backend executor; it is the ownership seam between JAX/Torch-XLA frontends and libtpu.GroundingAbout: libtpu / PJRT ownership boundaries About: Torch XLA / PJRT reality Example: TPU backend ownership note as its runtime interface, JAXQuick term guideJAXA separate frontend above PJRT/libtpu. In these TPU posts it mainly matters as the owner of NamedSharding, PartitionSpec, and the optional call_jax or Pallas-adjacent bridge lanes.Groundinglibtpu / PJRT / JAX ownership boundaries Pallas on TPU also targets TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries through PJRTQuick term guidePJRTThe TPU runtime interface between frontend code and the backend executor; it is the ownership seam between JAX/Torch-XLA frontends and libtpu.GroundingAbout: libtpu / PJRT ownership boundaries About: Torch XLA / PJRT reality Example: TPU backend ownership note-based runtime paths, and libtpuQuick term guidelibtpuThe TPU backend library that pairs with PJRT/XLA and owns device-side execution underneath the frontend.GroundingAbout: libtpu / PJRT ownership boundaries Example: TPU backend ownership note Example: XLA runtime probe sample remains a TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries-specific runtime layer underneath rather than the same thing as either frontend. That means a mixed-tooling machine is not automatically broken, but it does mean version hygiene and ownership boundaries matter more than they do in a simpler single-frontend stack.

Shared libtpuQuick term guidelibtpuThe TPU backend library that pairs with PJRT/XLA and owns device-side execution underneath the frontend.GroundingAbout: libtpu / PJRT ownership boundaries Example: TPU backend ownership note Example: XLA runtime probe sample does not mean shared frontend APIs either: PyTorch/XLAQuick term guideXLAThe compiler/runtime layer that lowers frontend tensor programs into executable TPU or accelerator graphs, with shape stability and ownership boundaries as the main operational concerns here.GroundingAbout: XLA vs CUDA stack decisions Reference: Torch XLA / PJRT reality Reference: XLA SPMD sharding annotations brings xr.use_spmd() and mark_shardingQuick term guidemark_sharding(...)PyTorch/XLA's explicit tensor-placement annotation API: attach a mesh plus partition spec to a tensor so one TPU XLA program lowers with stable owned placement.GroundingAbout: XLA SPMD sharding annotations Example: FSDP sharding sample Reference: FSDP2 on XLA TPU, while JAXQuick term guideJAXA separate frontend above PJRT/libtpu. In these TPU posts it mainly matters as the owner of NamedSharding, PartitionSpec, and the optional call_jax or Pallas-adjacent bridge lanes.Groundinglibtpu / PJRT / JAX ownership boundaries Pallas on TPU brings its own sharding objects and cache controls. At first touch, the safest mental model is: one substrate, two frontend vocabularies, and a narrow bridge only where a TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries hotspot truly needs it. If you want the longer runtime-boundary version of the same story, read this together with Torch XLA and PJRT reality, libtpu, PJRT, JAX, and ownership boundaries, and XLA SPMD sharding annotations.

This article stays on TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries/JAXQuick term guideJAXA separate frontend above PJRT/libtpu. In these TPU posts it mainly matters as the owner of NamedSharding, PartitionSpec, and the optional call_jax or Pallas-adjacent bridge lanes.Groundinglibtpu / PJRT / JAX ownership boundaries Pallas on TPU/PJRTQuick term guidePJRTThe TPU runtime interface between frontend code and the backend executor; it is the ownership seam between JAX/Torch-XLA frontends and libtpu.GroundingAbout: libtpu / PJRT ownership boundaries About: Torch XLA / PJRT reality Example: TPU backend ownership note ownership boundaries.

A common wrong mental model

The wrong claim is: "JAXQuick term guideJAXA separate frontend above PJRT/libtpu. In these TPU posts it mainly matters as the owner of NamedSharding, PartitionSpec, and the optional call_jax or Pallas-adjacent bridge lanes.Groundinglibtpu / PJRT / JAX ownership boundaries Pallas on TPU and PyTorch/XLAQuick term guideXLAThe compiler/runtime layer that lowers frontend tensor programs into executable TPU or accelerator graphs, with shape stability and ownership boundaries as the main operational concerns here.GroundingAbout: XLA vs CUDA stack decisions Reference: Torch XLA / PJRT reality Reference: XLA SPMD sharding annotations both use libtpuQuick term guidelibtpuThe TPU backend library that pairs with PJRT/XLA and owns device-side execution underneath the frontend.GroundingAbout: libtpu / PJRT ownership boundaries Example: TPU backend ownership note Example: XLA runtime probe sample, therefore they inherently conflict." Public documentation does not support that. Both ecosystems expose TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries support through the same broad runtime family.

The more accurate framing is that both toolchains sit on a shared runtime substrate, so version drift, cache state, and mismatched assumptions can surface as hard-to-read failures. That is the same reason Graph recompilation hell treats "the graph contract changed" as the default hypothesis before blaming model math.

Where ownership really lives

Layer	Typical owner
application config	backend selection, fallback policy, logging
framework frontend	PyTorch/XLA or JAX tracing and execution model
PJRT runtime	runtime interface between frontend and accelerator backend
TPU runtime stack	device-specific execution and version compatibility

That table is the useful abstraction. Most TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries failures are not "the TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries does not exist" failures. They are ownership failures: the wrong backend was selected, a fallback hid the real problem, or a frontend/runtime mismatch was interpreted as a model bug.

The sharding API split is one concrete example of that table. mark_shardingQuick term guidemark_sharding(...)PyTorch/XLA's explicit tensor-placement annotation API: attach a mesh plus partition spec to a tensor so one TPU XLA program lowers with stable owned placement.GroundingAbout: XLA SPMD sharding annotations Example: FSDP sharding sample Reference: FSDP2 on XLA TPU belongs to the PyTorch/XLAQuick term guideXLAThe compiler/runtime layer that lowers frontend tensor programs into executable TPU or accelerator graphs, with shape stability and ownership boundaries as the main operational concerns here.GroundingAbout: XLA vs CUDA stack decisions Reference: Torch XLA / PJRT reality Reference: XLA SPMD sharding annotations frontend layer, not to PJRTQuick term guidePJRTThe TPU runtime interface between frontend code and the backend executor; it is the ownership seam between JAX/Torch-XLA frontends and libtpu.GroundingAbout: libtpu / PJRT ownership boundaries About: Torch XLA / PJRT reality Example: TPU backend ownership note or to libtpuQuick term guidelibtpuThe TPU backend library that pairs with PJRT/XLA and owns device-side execution underneath the frontend.GroundingAbout: libtpu / PJRT ownership boundaries Example: TPU backend ownership note Example: XLA runtime probe sample. JAXQuick term guideJAXA separate frontend above PJRT/libtpu. In these TPU posts it mainly matters as the owner of NamedSharding, PartitionSpec, and the optional call_jax or Pallas-adjacent bridge lanes.Groundinglibtpu / PJRT / JAX ownership boundaries Pallas on TPU expresses the same runtime intent through different frontend-owned objects, which is why XLA SPMD sharding annotations is a better companion than a generic TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries packaging note when the confusion starts with layout terms.

The checked-in public-safe readback for that split is TPU backend ownership overview, TPU runtime probe sample, XLA flag profile sample, XLA compile/runtime controls sample, XLA backend fallback sample, JAX bridge call surface, and call_jax bridge runtime sample. They keep ownership, runtime truth, launch policy, fallback routing, and the narrow call_jaxQuick term guidecall_jaxThe Torch/XLA bridge lane that hands one narrowed TPU operation to JAX instead of moving the whole program into a JAX-owned frontend path.GroundingAbout: libtpu / PJRT / JAX ownership boundaries Example: XLA call_jax bridge Example: call_jax bridge runtime seam visible without turning this article into an internal setup note.

Why mixed-tooling environments still need discipline

A shared runtime does not mean a free-for-all. It means the operator should keep runtime selection explicit, record which frontend actually executed the run, and avoid silently inheriting shell state from unrelated tooling. That is part of the same bringup discipline described in TPU v6e Host Bringup: record the active frontend, the runtime contract, and the validation rung that actually ran.

The goal is not to ban shared tooling. The goal is to make the active runtime contract legible.

Device ownership is the first hard boundary. Treat one TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries lane as one active owner at a time: once one process has initialized the PJRTQuick term guidePJRTThe TPU runtime interface between frontend code and the backend executor; it is the ownership seam between JAX/Torch-XLA frontends and libtpu.GroundingAbout: libtpu / PJRT ownership boundaries About: Torch XLA / PJRT reality Example: TPU backend ownership note client and libtpuQuick term guidelibtpuThe TPU backend library that pairs with PJRT/XLA and owns device-side execution underneath the frontend.GroundingAbout: libtpu / PJRT ownership boundaries Example: TPU backend ownership note Example: XLA runtime probe sample, the next lane may fail rather than transparently sharing the same chip. Treat that as a process-ownership problem before you treat it as a frontend-compatibility problem.

Startup sequencing matters too. Set PJRT_DEVICE, JAX_PLATFORMS, and cache locations before the first TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries import or compilation. If you move from one frontend lane to another, let the first process exit cleanly and release TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries ownership before the second lane starts. Graph recompilation hell, XLA-safe AdamW and TPU runtime flags on v6e, and the public JAX persistent compilation cache docs all reinforce the same rule: startup policy belongs before compile time, not in the middle of a failing run.

The visible failure shape also differs by launch policy. JAXQuick term guideJAXA separate frontend above PJRT/libtpu. In these TPU posts it mainly matters as the owner of NamedSharding, PartitionSpec, and the optional call_jax or Pallas-adjacent bridge lanes.Groundinglibtpu / PJRT / JAX ownership boundaries Pallas on TPU can default toward CPU when it is allowed to auto-select platforms, while JAX_PLATFORMS can instead force initialization failure if a requested backend does not come up. That is exactly why mixed-tooling debug notes should record the actual frontend owner and the observed fallback mode, not just the shared libtpuQuick term guidelibtpuThe TPU backend library that pairs with PJRT/XLA and owns device-side execution underneath the frontend.GroundingAbout: libtpu / PJRT ownership boundaries Example: TPU backend ownership note Example: XLA runtime probe sample presence.

Cache behavior adds one more boundary that is easy to flatten away. JAXQuick term guideJAXA separate frontend above PJRT/libtpu. In these TPU posts it mainly matters as the owner of NamedSharding, PartitionSpec, and the optional call_jax or Pallas-adjacent bridge lanes.Groundinglibtpu / PJRT / JAX ownership boundaries Pallas on TPU and PyTorch/XLAQuick term guideXLAThe compiler/runtime layer that lowers frontend tensor programs into executable TPU or accelerator graphs, with shape stability and ownership boundaries as the main operational concerns here.GroundingAbout: XLA vs CUDA stack decisions Reference: Torch XLA / PJRT reality Reference: XLA SPMD sharding annotations each expose their own cache configuration surfaces, and both want that policy fixed before the first real compile. The operational risk is ambiguous write ownership or a bridged lane recompiling because cache policy was never configured before first trace. In practice, separate cache locations or an explicit writer/reader policy are still the simpler default.

There is one more handoff seam that operators often misread as a libtpuQuick term guidelibtpuThe TPU backend library that pairs with PJRT/XLA and owns device-side execution underneath the frontend.GroundingAbout: libtpu / PJRT ownership boundaries Example: TPU backend ownership note Example: XLA runtime probe sample ownership bug: the first post-handoff compile window. Even after the first process exits and the chip lock is gone, the next frontend can still fail during startup because its first compile shape or startup memory policy is too aggressive for that lane. The checked-in XLA memory calibration catalog sample, XLA startup calibration records sample, and XLA startup retry classifier sample keep that split visible: TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries ownership, startup memory pressure, and retry policy are different seams and should not be collapsed into one generic "libtpuQuick term guidelibtpuThe TPU backend library that pairs with PJRT/XLA and owns device-side execution underneath the frontend.GroundingAbout: libtpu / PJRT ownership boundaries Example: TPU backend ownership note Example: XLA runtime probe sample problem."

Practical takeaway

If a TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries lane is failing, debug in this order:

Which frontend actually owns execution for this run?
Is the TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.GroundingAbout: Torch XLA / PJRT reality History: TPU v6e host bring-up Reference: libtpu / PJRT ownership boundaries already owned by another live process?
Which runtime contract is active before imports: PJRT_DEVICE, JAX_PLATFORMS, and cache setup?
Did a fallback or cache artifact hide the real boundary that failed?

That framing is more accurate than blaming JAXQuick term guideJAXA separate frontend above PJRT/libtpu. In these TPU posts it mainly matters as the owner of NamedSharding, PartitionSpec, and the optional call_jax or Pallas-adjacent bridge lanes.Groundinglibtpu / PJRT / JAX ownership boundaries Pallas on TPU or PyTorch/XLAQuick term guideXLAThe compiler/runtime layer that lowers frontend tensor programs into executable TPU or accelerator graphs, with shape stability and ownership boundaries as the main operational concerns here.GroundingAbout: XLA vs CUDA stack decisions Reference: Torch XLA / PJRT reality Reference: XLA SPMD sharding annotations in the abstract.

FAQ

Frequently asked questions

Does sharing libtpu mean JAX and PyTorch/XLA always conflict?+

No. The real issue is ownership clarity, not the mere presence of two tools on one machine. JAXQuick term guideJAXA separate frontend above PJRT/libtpu. In these TPU posts it mainly matters as the owner of NamedSharding, PartitionSpec, and the optional call_jax or Pallas-adjacent bridge lanes. and PyTorch/XLAQuick term guideXLAThe compiler/runtime layer that lowers frontend tensor programs into executable TPU or accelerator graphs, with shape stability and ownership boundaries as the main operational concerns here. can coexist, but once you stop knowing which frontend owns execution, cache directories, or runtime initialization for the failing lane, the debugging story degrades quickly. That is why this article pairs best with libtpu, PJRT, JAX, and ownership boundaries instead of with generic package-management advice.

Do PJRT_DEVICE=TPU and JAX_PLATFORMS=tpu by themselves prove the same healthy TPU lane?+

No. They are launcher directives for different frontends, not proof that one shared runtime contract is healthy. PJRT_DEVICE=TPU selects the PyTorch/XLAQuick term guideXLAThe compiler/runtime layer that lowers frontend tensor programs into executable TPU or accelerator graphs, with shape stability and ownership boundaries as the main operational concerns here. PJRTQuick term guidePJRTThe TPU runtime interface between frontend code and the backend executor; it is the ownership seam between JAX/Torch-XLA frontends and libtpu. device, while JAX_PLATFORMS controls which JAXQuick term guideJAXA separate frontend above PJRT/libtpu. In these TPU posts it mainly matters as the owner of NamedSharding, PartitionSpec, and the optional call_jax or Pallas-adjacent bridge lanes. backends must initialize. The safer order is: set them before imports, run a real TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels. runtime probe, then record which backend actually executed. That is why the checked-in probe, fallback, and bridge examples stay separate instead of collapsing into one vague "TPU enabled" claim.

Can I upgrade JAX or PyTorch/XLA independently because both eventually use libtpu?+

Not safely. Treat each TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels. lane as a frontend-plus-runtime bundle: JAXQuick term guideJAXA separate frontend above PJRT/libtpu. In these TPU posts it mainly matters as the owner of NamedSharding, PartitionSpec, and the optional call_jax or Pallas-adjacent bridge lanes.'s TPU install path is meant to align jax, jaxlib, and libtpuQuick term guidelibtpuThe TPU backend library that pairs with PJRT/XLA and owns device-side execution underneath the frontend., while PyTorch/XLAQuick term guideXLAThe compiler/runtime layer that lowers frontend tensor programs into executable TPU or accelerator graphs, with shape stability and ownership boundaries as the main operational concerns here.'s TPU install path pairs torch with torch_xla[tpu]. The practical MegaCpp rule is to record the active wheel bundle, runtime probe, and cache policy together before trusting a mixed-tooling result. Shared libtpu lowers the stack boundary; it does not erase version ownership. For adjacent failure modes, see Torch 2.12 / XLA breakage matrix and Torch/XLA 2.11 TPU reality.

Do JAX and PyTorch/XLA need separate persistent cache directories just because they share libtpu?+

Not to prevent a literal cache-key collision. JAXQuick term guideJAXA separate frontend above PJRT/libtpu. In these TPU posts it mainly matters as the owner of NamedSharding, PartitionSpec, and the optional call_jax or Pallas-adjacent bridge lanes. documents that its persistent cache key includes the non-optimized HLO, the jaxlib version, relevant XLAQuick term guideXLAThe compiler/runtime layer that lowers frontend tensor programs into executable TPU or accelerator graphs, with shape stability and ownership boundaries as the main operational concerns here. compilation flags, and device configuration, while PyTorch/XLA exposes its own cache initialization API and explicit readonly mode before any computations. The more realistic mixed-tooling risk is ownership confusion: nobody decided who writes where, when the cache becomes readable, or which frontend actually populated the first compile. That is why separate cache directories or an explicit writer/reader policy are still the safer default in shared TPUQuick term guideTPUGoogle's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels. bringup. It keeps first-compile ownership legible and pairs cleanly with Graph recompilation hell and the checked-in XLA compile/runtime controls sample.

If the first TPU process exited cleanly, can the next lane still fail for reasons other than chip ownership?+

Yes. Lock release only answers the process-ownership question. The next lane can still fail in its first compile or startup window because the launch shape is too large, the startup memory contract moved, or the retry path keeps selecting a known-bad candidate. That is why the runtime probe belongs next to the startup calibration samples instead of under a generic package-install checklist.

Glossary

Terms used in this article

Start here for quick definitions, then follow the linked posts for deeper context.

libtpu

The TPU backend library that pairs with PJRT/XLA and owns device-side execution underneath the frontend.

Grounding

PJRT

The TPU runtime interface between frontend code and the backend executor; it is the ownership seam between JAX/Torch-XLA frontends and libtpu.

Grounding

JAX

A separate frontend above PJRT/libtpu. In these TPU posts it mainly matters as the owner of NamedSharding, PartitionSpec, and the optional call_jax or Pallas-adjacent bridge lanes.

Grounding

XLA

The compiler/runtime layer that lowers frontend tensor programs into executable TPU or accelerator graphs, with shape stability and ownership boundaries as the main operational concerns here.

Grounding

TPU

Google's Tensor Processing Unit accelerator/runtime surface, where the important boundary in these posts is usually XLA or PJRT ownership rather than handwritten GPU kernels.

Grounding

trace_pallas

The native PyTorch/XLA custom-kernel lane that traces a Pallas kernel into a payload the XLA side can keep without crossing into a generic JAX bridge call.

Grounding

call_jax

The Torch/XLA bridge lane that hands one narrowed TPU operation to JAX instead of moving the whole program into a JAX-owned frontend path.

Grounding

mark_sharding(...)

PyTorch/XLA's explicit tensor-placement annotation API: attach a mesh plus partition spec to a tensor so one TPU XLA program lowers with stable owned placement.

Grounding

Topic hubs

Topic Hub

TPU v6e and XLA Runtime Surfaces

A curated reading order for TPU work: bring-up, PJRT and Torch/XLA boundaries, SPMD sharding, and the kernel/runtime traps that made TPU performance non-obvious.

David Gornshtein • MegaCppMore posts →

libtpu and JAX interaction: shared runtime, separate ownership

A common wrong mental model

Where ownership really lives

Why mixed-tooling environments still need discipline

Practical takeaway

Read next

References

Frequently asked questions

Terms used in this article

Continue with a curated reading path

TPU v6e and XLA Runtime Surfaces