To Tool or Not To Tool?

When can AI “just do it” and when do you need tooling to help for software development?

Dec 23, 2025

At CodeYam, we are building out tools to help you understand what code does by generating simulations. Simulations, by our definition, mean generating high-quality mock data that is designed to change the execution flow of your code, so you can see the results of a given function or method across a range of data or user scenarios.

In creating this tooling, our goal is to help both AI agents and human developers (and their less technical colleagues) collaborate more effectively to build software. As it happens, this is valuable for software development teams regardless of whether AI is being used or not. AI amplifies existing challenges that can exist among human colleagues and, when AI is being used in software development, the speed of change and potential for miscommunication increase significantly. This makes tools such as CodeYam more useful.

Why Can’t AI Just Do Everything Itself?

Given this, we are often asked why an AI can’t just do this all itself. If a powerful enough AI existed, why couldn’t it simulate the code it is writing and display it to the user?

We’ll put aside the other aspects of CodeYam that help with storing, organizing, and leveraging these artifacts for sharing, documentation, and testing and focus on AI tool use.

That is the primary question we want to explore here: should an AI use tools and, if so, when?

We think the answer to this is simply yes, AI should use tools. The primary logic is simple: intelligent beings use tools, so why wouldn’t an AI? In fact, as intelligence increases so does tool use. The more intelligent you are, the more often you use tools, and the more complex those tools can be.

A Framework for Thinking About Tool Use

There is more nuance to this question, though. It parallels the concept of System 1 and System 2 thinking popularized by Daniel Kahneman in Thinking, Fast and Slow. System 1 thinking is fast, automatic, and intuitive; System 2 thinking is slow, deliberate, and effortful.

You might consider adding a System 3 thinking: thinking that an intelligence cannot do on its own, but rather requires the help of a tool to accomplish.

This System 1, 2, and 3 thinking becomes a good framework to think about tool usage for AI, regardless of how intelligent it becomes.

System 1, System 2, and System 3 Tasks

Essentially there are certain tasks that an intelligence can do almost without thinking. If you are fluent in English and reading this, it takes very little effort to do so. Even more obviously, if someone talks to you in your native language you essentially can’t stop yourself from understanding them. Similarly, if I were to ask you what 2 + 2 is, the answer probably arrives immediately with essentially no effort.

At the next level are tasks that an intelligence can do on their own but it takes effort, goes more slowly, and is more error-prone. For humans examples include trying to understand something in a language you’ve studied but are not fluent in or trying to calculate 7 * 17. It likely takes some time and effort to complete the task and you’re more likely to get it wrong, but you can do it without the help of a tool.

Finally, there are those tasks where we need tools. In this category, we have examples such as trying to understand a language you’ve never studied before that is not similar to any language you already know. Or trying to calculate the square root of 32,317. Most likely, unless you’re a math prodigy, you just can’t figure out how to do this on your own. While this may not be impossible to figure out on your own, the required effort and likelihood of an error are so high as to make it not worthwhile. This is especially true if a tool is available (e.g. a calculator) that can complete the task quickly and with a low likelihood of error.

Applying This Framework to AI

Now, the threshold for each of these levels changes depending on the intelligence in question. This becomes the primary question for AI: How intelligent can it become and, for any given level of intelligence, what tasks should it leverage tools for?

More specifically, for an AI, what tasks:

Are easy enough that the AI should do them directly (using a tool would be wasteful)?
Would benefit from tools to speed the AI up, allow it to save energy, and lower the risk of error?
Are so hard that a tool is required (these would not be achievable without one)?

At any level of intelligence, there are tasks in all three categories. Maybe there is some God-tier AI that is so efficient that all tasks reside in the first bucket. Given the sheer complexity of the universe, that seems extremely unlikely. While AI will continue to advance, we’re certainly nowhere near that level of omnipotence at the current moment. And even if achieved, it seems unlikely that there are never opportunities for tools to improve efficiency and performance.

For the near future, we’re left wondering what tasks will fit into which categories for a given AI.

Why Software Simulation Requires Tooling Today

For CodeYam specifically, our testing has shown that for most current AI models the software development tasks we are asking the AI to do fall into the third category: they cannot be done without help from tooling.

The task is to understand how a complex method or function behaves by passing data through it. To do this, we need a very accurate understanding of the data structures across the entire dependency tree of that method or function.

This may not sound that complex, and we originally did not realize how complex it would be to achieve. However, code becomes very complex very quickly! If you think about all of the variations of data that can pass through code, this task really explodes in complexity.

As a side note, we did consider database schemas and type systems for this information. They are helpful, but they describe types in their full structure and do not handle recursion very well. Most functions and methods use a fraction of the structure described in a database schema or type system and are not very informative about how many layers of a given data structure are required to properly simulate the code with mock data. In the end, extensive static code analysis was necessary to calculate the most accurate and relevant data structure.

Tool Use as a Measure of Intelligence

For CodeYam, we believe that while all of the AI models and agents we’ve tested completely fail at this task, and therefore require a tool to accomplish the task at all, there’s a reasonable chance in the near future that an AI will be available where this task is in category two (possible without tooling but faster / better as a result of tooling) instead of three (impossible without tooling). Given the complexity of code, though, we believe we are quite far away from an AI being able to do this task so easily and effectively that using a tool actually slows it down.

It’s worth asking this question for any number of tasks an AI is being asked to perform, and not just assuming that any given task will be easy enough for the AI to achieve.

Just because an AI can perform a task, doesn’t mean it shouldn’t use a tool to help it perform the task more quickly and effectively. Ignoring helpful tools is a sign of lesser intelligence, not greater intelligence.

Thanks for reading. If you’re a developer interested in software simulation, join the CodeYam waitlist. You can also subscribe for future posts.

Join waitlist

A guest post by

Jared Cosulich

CodeYam’s Substack

Discussion about this post

Ready for more?