Latest Content
Latest Signals
Could small custom-built or tuned large language models be superior for business use cases than the generic models from large vendors? Infosys' chairman seems to think so:
Indian technology grandee Nandan Nilekani expects companies around the world will increasingly build their own smaller-scale artificial intelligence models to streamline operations and boost productivity, dampening hope of a substantial enterprise payday for more powerful generative products.
The chair of IT services major Infosys told the Financial Times he was “not so sure” companies would want to shoulder the high costs and the potential “black box” of data and copyright liabilities associated with large language models behind popular applications, such as OpenAI’s ChatGPT.
There may be something in this, especially if the reasoning models don’t quite meet expectations.
Researchers are sharing encouraging early reports about o1-preview as an aid for tackling complex scientific challenges.
Researchers at the national lab have also been surprised by o1-preview’s ability to recognize when it doesn’t have all the necessary information to answer a question and make reasonable assumptions for variables it might be missing, the person said.
The Lawrence Livermore example is similar to the positive reaction Australian-American mathematician Terence Tao shared after the initial release of o1-preview and o1-mini. Tao used the models to solve math problems and write proofs—something that a typical ChatGPT user probably wouldn’t do.
“It may only take one or two further iterations of improved capability” until such a reasoning model becomes a “competent graduate student…at which point I could see this tool being of significant use in research level tasks,” he said.
This dilemma mirrors one potentially faced by junior lawyers: if AI handles graduate-level research tasks, how will the next generation of researchers develop their skills?
An example of a fine-tuned LLM getting good results versus a human:
LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. Here, to evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results. We find that LLMs surpass experts in predicting experimental outcomes. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet. Like human experts, when LLMs indicated high confidence in their predictions, their responses were more likely to be correct, which presages a future where LLMs assist humans in making discoveries. Our approach is not neuroscience specific and is transferable to other knowledge-intensive endeavours.
They used LoRA to finetune Mistral-7B-v0.1 on neuroscience literature.
Generative AI is increasingly able to perform entry-level work in white-collar jobs, which will impact workforces far into the future.
Across the white-collar economy, entry-level jobs are suddenly vulnerable to automation because they involve low-stakes assignments of the sort that generative AI is best at. AI could therefore sever the career ladder of industries like finance and law, forcing many would-be bankers and lawyers to look elsewhere for work.
Having used some of the more recent AI meeting tools as a meeting scribe, I can attest to the fact that they are increasingly able to perform routine tasks with a good degree of accuracy. Compared to the early days when we’d have a good laugh at the software’s attempts to summarise, it’s now a useful tool for taking notes and extracting action items.
Consider the legal field. Law is among the industries most exposed to generative AI’s capabilities because of its orientation toward language. Traditionally, the first few years of a newly accredited lawyer’s career is spent working under the tutelage of more senior lawyers and engaged in routine tasks—missives like “document review,” basic research, drafting client communications, taking notes, and preparing briefs and other legal documents. Advances in AI-powered legal software have the potential to create vast efficiencies in these tasks, enabling their completion in a fraction of the time—and a fraction of the billable hours—that it has historically taken junior lawyers and paralegals to complete them.
If we don’t need to train up junior lawyers, how do we grow the legal workforce? Or do we need to rethink the role of a lawyer?
Stripe APIs are adding payments and metering capabilities to LLM agentic workflows:
In the case you want to have an agent perform purchases:
Agentic workflows need not have exclusively virtual outcomes. Imagine a travel agent that can book flights for your company. Using LLMs and function calling we can assemble a set of agents that can search for flights online, return options, and ultimately identify a booking URL. With Stripe, you can embed financial services and enable the automation of the purchase flow as well. Using Stripe Issuing, you can generate single-use virtual cards that agents can use for business purchases. This enables your agents to spend funds. The Issuing APIs allow you to approve or decline authorizations programmatically, ensuring your purchase intent matches the authorization. Spending controls allow you to set budgets and limit spending for your agents
Additionally, it can be used for metering and billing:
Conducting agentic workflows have material cost – typically measured by token use or time. With usage-based billing, you can charge based on a customer’s usage of your product. The toolkit provides middleware to easily track prompt and completion token counts and send billing events for that customer.
The Issuing API sounds particularly useful in stopping an LLM Agent buying travel tickets to Yellowstone National Park, or worse. From the Claude announcment on computer use:
In one, Claude accidentally clicked to stop a long-running screen recording, causing all footage to be lost. In another, Claude suddenly took a break from our coding demo and began to peruse photos of Yellowstone National Park.