Over the last year, we have seen an explosion in interest around AI agents. Unlike copilots, AI agents are software usually based on LLMs promising to fully take on tasks autonomously and open up new possibilities for productivity. At Pioneer Square Labs (PSL), we have been at the forefront of exploring these shifts, ideating, validating, and creating startups that leverage cutting-edge AI technologies and approaches ("PSL and Mayfield Form AI Co-Investing Partnership", "Not Just Another Coding Bot").[1,2] As a startup studio, our mission is to build the next generation of innovative companies with world-class founders and at this point in time, all centered around AI capabilities.
In my role as a Venture Lead at PSL, I am frequently asked about the best applications for AI agents and where to build. While the question doesn’t live independently of the typical need finding, competitive landscape, and market sizing activities needed to evaluate any idea in general, AI agents represent something a little different than traditional SaaS. As Pete Flint and Anna Pinol, both investors at NFX, point out in their blog post "The AI Workforce is Here: The Rise of a New Labor Market", if software can both organize and execute tasks, the $230B of B2B SaaS market and $5.5T knowledge work labor market will begin fusing into one massive market.[3]
Certainly the typical cost-quality-speed tradeoffs, as with any service, still apply. However, when quality is suitable, even when using many layers of internal “thinking” or meta-cognitive strategies under the hood powered by the most expensive and capable LLMs, the costs and speed are still typically an order of magnitude or more better than the equivalent work done by people.
So, based on their capabilities, where can the quality of outputs from AI agents provide leverage today, and what future applications hold the most promise?
When considering the best applications for autonomous AI agents, it is essential to understand the concept of fault tolerance. Just like it sounds, high fault tolerance tasks are those where minor errors or imperfections are acceptable, such as basic content generation or preliminary rough data analysis. In contrast, low fault tolerance tasks require high accuracy and precision, such as medical diagnostics or financial auditing, and many times touch on life safety.
While this categorization is a start, it does not capture a very important consideration, the perceived effort to complete a task or job.
In our line of work, we are always looking to differentiate between painkillers and vitamins, to identify when we are seeing real ”product pull” vs when we feel like a solution is needing to be pushed. While some customers are cold, hard financial buyers primarily focused on ROI and financial performance, many more are a mix of rational and emotional buyers who care about how solutions make them feel, and in most cases large premiums can be commanded as a result.
So what are some reasons why the perceived effort for a task might be high?
Understanding perceived effort involves investigating the cognitive processes engaged during various types of tasks. Task execution can be said to be typically dominated by one of two well-established brain networks: the Default Mode Network (DMN) or the Executive Control Network (ECN).[4]
The DMN is more strongly associated with spontaneous, intuitive thinking and creativity. During tasks such as idea generation and brainstorming, the DMN is predominantly active. These activities involve free-flowing, associative processes that can feel effortless to those with well-honed creative skills.[5] However, even within these generative tasks, some level of subconscious evaluation occurs, integrating aspects of the ECN. For example, a marketing professional developing a new campaign concept might engage the DMN extensively. Furthermore, since this might be a major responsibility for the role, they have likely developed significant skill at this over the years.
Conversely, the ECN is responsible for deliberate, analytical thinking and is crucial for tasks requiring structured problem-solving and critical evaluation.[6] In the context of software engineering, debugging a complex piece of code heavily engages the ECN. This task demands rigorous analysis and systematic troubleshooting, which can be cognitively demanding, particularly if the engineer is tackling unfamiliar or intricate codebases.
Context switching, the cognitive process of shifting attention between different tasks or mental frameworks, adds another layer of complexity.[7] When individuals generate and evaluate their own ideas, less context loading is needed, as they are already familiar with the initial thought process. This reduces cognitive load and allows for a more fluid transition between modes. In contrast, evaluating work generated by others requires significant context loading to understand the original intent and framework, deeply engaging the ECN from the outset and increasing cognitive load.
What does this mean for AI agent applications?
Because of my engineering background I have developed the habit of thinking in figures-of-merit and rough proportionality relationships when I need to understand the dynamics and features at play for large parameter spaces. I find that this approach is an outstanding way to help me discover insights that might otherwise be unexpected or counterintuitive.
Regarding our AI agent application space, I propose a figure-of-merit called Perceived Leverage. Perceived Leverage (PL) is defined as the practical advantage gained by deploying AI agents to perform a task traditionally executed by humans, balancing time efficiency with the perceived effort required to correct AI errors. The equation for Perceived Leverage is a follows:
To understand Perceived Leverage, let's break down each component of the equation and see how they contribute to the overall measure.
Tperson represents the time required by an individual to complete the task. This is the baseline against which we measure the efficiency of AI. If a task takes an individual a long time to complete, the potential for AI to provide leverage is greater.
TAI is the time required by AI to complete the task (in the same units as Tperson). This variable highlights the speed at which AI can perform compared to individuals. A lower TAI indicates quicker performance by the AI, which increases the leverage.
Dtask denotes the perceived difficulty of the original task on a scale from 1 to 10, with 1 representing very easy tasks and 10 representing extremely difficult tasks. This factor considers how challenging the task is when considered from the perspective of the specific individual. A high Dtask suggests a complex task, perhaps further outside of the individual’s core competencies, where there is potential to offer significant benefits if the tasks can be done effectively by AI.
DEC stands for the perceived difficulty of troubleshooting or correcting errors introduced by an AI Agent, also on a scale from 1 to 10. For individuals who excel at tasks like troubleshooting that may be more ECN-dominated, DEC may be lower. Conversely, if the task is more generative, engaging the DMN, and this is the individual's strength, Dtask might be lower. When DEC is high, it indicates that fixing AI errors is challenging, which has a large negative drag on the perceived leverage.
ErrAI refers to the AI error rate. It measures the frequency of errors made by the AI while performing the task. A high ErrAI means more errors, leading to increased correction effort.
Errreq is the required (acceptable) error rate. This sets the standard for acceptable performance. If the acceptable error rate is very low, it means there are higher standards for accuracy, making it more difficult for the AI to meet these requirements.
The Perceived Leverage formula offers a practical and nuanced way to evaluate the benefits of AI agents. Here’s why it’s a useful tool for understanding AI’s impact.
First, Perceived Leverage provides a comprehensive evaluation by considering the perceived difficulty of tasks. This includes both the original task and the challenge of error correction, ensuring a more accurate measure of AI's impact. Additionally, AI can transform creative (DMN) tasks into analytical (ECN) tasks. For those skilled in troubleshooting, this shift aligns tasks with their strengths, making complex creative tasks more manageable and efficient.
The formula also highlights the importance of error rates in assessing AI’s value. A significant gap between AI performance and the required error rate can greatly increase correction efforts. The logarithmic function is meant to capture how small increases in AI errors can significantly impact correction time. This effect is even more pronounced when the required tolerance is very low, making the system highly sensitive to deviations from the acceptable error rate. This heightened sensitivity helps identify when AI is genuinely beneficial versus when it might be less efficient than manual work.
Moreover, Perceived Leverage balances AI’s speed with the effort required for error correction. This balance ensures that the benefits of AI are realized only when errors are manageable, providing a more realistic view of AI’s efficiency.
By integrating these factors, Perceived Leverage offers a detailed assessment of AI's potential. The reason I like it is it can be used to help understand not just where AI can save time, but also where it might introduce additional work. This insight can guide you toward more effective and efficient use of AI agents in your workflow.
To illustrate how Perceived Leverage can clarify the real benefits and limitations of AI, let’s consider two scenarios. These examples will demonstrate how an application that initially seems promising can turn out to be less beneficial, and vice versa, highlighting the divergent perspectives about AI's hype.
At first glance, AI in legal document review seems highly promising due to its potential to drastically reduce the time lawyers spend on this labor-intensive task. However, when we apply the Perceived Leverage formula, a different picture emerges.
Time Durations: A lawyer might take 240 minutes (4 hours) to review a set of legal documents, while an AI agent could do it in 2 minutes.
Task Difficulties: The perceived difficulty of the document review task for a lawyer is moderate, rated at 4 out of 10, given this is part of their core expertise. However, the difficulty of correcting errors that are made in a review completed by an AI agent is very high, rated at 9 out of 10, as they will still need to understand the context thoroughly to ensure no errors are missed.
Error Rates: The required error rate for legal documents is extremely low, at 0.01% (0.0001), while the AI error rate might be 0.1% (0.001).
Plugging these values into the formula:
Despite AI's speed, the moderate task difficulty for a lawyer and the high difficulty of error correction, combined with the gap between AI's good but insufficient accuracy and the extremely low required error rate, results in a Perceived Leverage of less than 1. This example shows that while AI appears promising, the effort needed to correct its errors can outweigh the benefits, highlighting why some professionals are skeptical about AI’s effectiveness in complex tasks like legal review.
Let’s look at AI for generating marketing content, which might not initially seem as impactful as legal document review. We'll consider two variations to highlight different perspectives on AI’s leverage.
Variation A: Precise Emulation of Writing Voice
In this scenario, the user needs AI to closely match their writing style and is very aware when it doesn’t, leading to a very low acceptable error rate.
Time Durations: A writer might take 120 minutes to create a draft, while AI can do it in 1 minute.
Task Difficulties: For the writer, drafting in their own voice is relatively straight forward, rated at 2 out of 10. However, since they aren’t a professional editor and don’t enjoy it, identifying and correcting AI-generated content that doesn't match their voice is much harder, rated at 7 out of 10.
Error Rates: The acceptable error rate for matching the voice is very low, at 0.5% (0.005), while the AI error rate is 2% (0.02).
Using these values we get:
Here, despite the AI's speed, the relatively low difficulty for the writer to draft in their own voice and the much higher difficulty of correcting AI-generated content to precisely match their voice, combined with the discrepancy between the AI's error rate and the very low acceptable error rate, results in a Perceived Leverage of less than 1. This highlights why professional writers might be critical of AI, as the effort required to refine AI output can outweigh its speed advantage, diminishing its overall benefit for these types of tasks.
Variation B: Technical Writing for Engineers
In contrast, let's consider a scenario where an engineer uses AI for technical documentation. Unlike the writer focused on stylistic precision, the engineer is less concerned with the nuances of tone and more with conveying the technical information accurately and efficiently.
Time Durations: An engineer might take 180 minutes to draft technical documentation, while AI can do it in 1 minute.
Task Difficulties: The perceived difficulty of technical writing for the engineer is high, rated at 7 out of 10. It is something they dislike doing. However, since the engineer enjoys troubleshooting, the difficulty of correcting AI-generated technical content is lower, rated at 4 out of 10.
Error Rates: The acceptable error rate for technical writing is higher, at 1% (0.01), while the AI error rate remains the same as the previous example at 2% (0.02).
Plugging these values into the formula:
In this variation, the AI application demonstrates high perceived leverage. The lower difficulty of error correction, coupled with the higher acceptable error rate, makes AI in technical writing a valuable tool. This is especially true for engineers who prefer troubleshooting over writing, as it allows them to focus on their strengths and leverage AI's speed and efficiency.
These examples for AI in content generation help to illustrate how the same AI error rate can have vastly different impacts depending on the application and the user's specific needs.
To make the concept of Perceived Leverage more accessible and easier to apply, we can consolidate the equation into a few logical components. This simplification not only helps in understanding the critical factors at play but also makes it more practical for evaluating AI agent applications. To support this effort we can introduce three key concepts: Naive Leverage (NL), Task Difficulty Ratio (TDR), and AI Error Disparity Ratio (AEDR).
Naive Leverage (NL) measures the time savings offered by an AI agent not taking into account any other considerations. It is calculated as the ratio of the time a person would take to complete the task to the time the AI agent takes:
This ratio highlights how much faster an AI agent can perform the task compared to a person not taking any other factor into account. A higher Naive Leverage means the potential of greater time savings, which is the fundamental starting point for evaluating AI’s leverage. If your application has low Naive Leverage out the gate, then it’s a non-starter.
AI Error Disparity Ratio (AEDR) assesses the AI’s performance quality relative to the required accuracy. It is defined as the ratio of the AI's error rate to the acceptable error rate:
A lower AEDR indicates that the AI's performance is closer to or even better than the required accuracy, which enhances its reliability and effectiveness. It should be noted our Perceived Leverage figure of merit only holds when this value is equal or greater than 1.
Finally, Task Difficulty Ratio (TDR) compares the perceived difficulty of error correction to the original task, calculated as:
A TDR greater than 1 indicates that error correction is more difficult than the original task, while a TDR less than 1 indicates that it is easier. Given our range for task difficulty, this ratio can span values from 0.1 to 10.
By combining these simplified components, we can express Perceived Leverage in a more concise and intuitive form:
This equation captures the essence of time savings, accuracy, and task difficulty, providing a clear measure of AI's practical advantage. To further illustrate how these factors interact and impact Perceived Leverage, let’s look at some visual representations.
Naive Leverage vs. AI Error Disparity Ratio (Colored by Perceived Leverage)
This graph helps users quickly assess the potential leverage of AI applications by plotting Naive Leverage against AI Error Disparity Ratio. Color coding represents different ranges of Perceived Leverage, allowing for easy identification of high-leverage areas in green and low-leverage areas in orange and red. For simplicity, we assume a typical Task Difficulty Ratio (TDR) of 1.
The above graph highlights how improvements in AI accuracy can significantly enhance perceived leverage, especially when Naive Leverage is already high. The Perceived Leverage value of 1.5 has been colored neutral here to represent that values below this level may not be worth the overhead involved to acquire customers and get them set up to use the solution. The area of the graph in dark green indicates where we want to be, where the Perceived Leverage begins to balloon.
The following graph illustrates how task difficulty impacts Perceived Leverage by plotting Naive Leverage against Task Difficulty Ratio. Color coding indicates different ranges of Perceived Leverage, providing a clear comparison of scenarios where error correction is easier (TDR < 1), the same (TDR = 1), or harder (TDR > 1) than the original task. For this graph, we assume a typical AI Error Disparity Ratio (AEDR) of 2. This plot emphasizes the importance of matching AI applications to tasks where error correction isn’t perceived as an excessive burden by the user.
Future enhancements to the Perceived Leverage model could include factors like accounting for task size complexity, the synergies of human-AI collaboration, and integrating dynamic improving error rates reflecting AI agents getting better over time with human feedback. Additionally, while the model is meant to be directionally accurate, experimental data would need to be collected to calibrate it for more precise outputs. This might involve adding a fitting term to the log relationship or modifying that relationship more significantly. These enhancements would make the model even more robust and reflective of real-world applications.
However, as formulated, the Perceived Leverage model is designed to be a practical guide for evaluating the effectiveness of AI agent applications, focusing on time savings, accuracy, and task difficulty. While it has limitations, my aim was to create a lightweight framework that captures core dynamics without being overly complex.
The goal of the Perceived Leverage model is to offer a clear, quantifiable measure of AI's practical benefits, helping you cut through the hype to identify where AI agents can provide true, immediate value that people are willing to pay for today. By leveraging this model, you'll be empowered to pinpoint opportunities where AI can make a significant impact now and foresee future applications with promising benefits.
At Pioneer Square Labs, we are committed to supporting world-class founders who are eager to explore and harness the potential of AI. If you have a vision for transforming productivity with AI, we want to hear from you. Together, we can navigate the rapidly evolving AI landscape, unlock its full potential, and build the next generation of groundbreaking startups.
[1] Soper T. Seattle’s Pioneer Square Labs and Silicon Valley stalwart Mayfield form AI co-investing partnership. GeekWire. 2024. Available from: https://www.geekwire.com/2024/seattles-pioneer-square-labs-and-silicon-valley-stalwart-mayfield-form-ai-co-investing-partnership/
[2] Bishop T. Not just another coding bot: Pioneer Square Labs releases 'Jacob' AI agent as open-source project. GeekWire. 2024. Available from: https://www.geekwire.com/2024/not-just-another-coding-bot-pioneer-square-labs-releases-jacob-ai-agent-as-open-source-project/
[3] Flint P, Pinol A. The AI Workforce is Here: The Rise of a New Labor Market. NFX. 2024. Available from: https://www.nfx.com/post/ai-workforce-is-here
[4] Dixon ML, Andrews-Hanna JR, Spreng RN, Irving ZC, Mills C, Girn M, Christoff K. Interactions between the default mode network and dorsal attention network vary across default subsystems, time, and cognitive states. Cereb Cortex. 2016;26(4):1501-12. Available from: https://doi.org/10.1093/cercor/bhu198
[5] Raichle ME. The brain's default mode network. Annu Rev Neurosci. 2015;38:433-47. Available from: https://doi.org/10.1146/annurev-neuro-071013-014030
[6] Seeley WW, Menon V, Schatzberg AF, Keller J, Glover GH, Kenna H, et al. Dissociable intrinsic connectivity networks for salience processing and executive control. J Neurosci. 2007;27(9):2349-56. Available from: https://doi.org/10.1523/JNEUROSCI.5587-06.2007
[7] Monsell S. Task switching. Trends Cogn Sci. 2003;7(3):134-40. Available from: https://doi.org/10.1016/S1364-6613(03)00028-7
Pioneer Square Labs (PSL) is a Seattle-based startup studio and venture capital fund. We partner with exceptional founders to build the next generation of world-changing companies, combining innovative ideas, expert guidance, and investment capital. PSL operates through two primary arms: PSL Studio, which focuses on creating new startups from scratch, and PSL Ventures, which invests in early-stage companies. Our mission is to drive innovation and growth by providing the necessary resources and support to turn big ideas into successful, impactful businesses. If you have a groundbreaking vision, connect with us hello@psl.com, and let’s build something extraordinary.