UK-based startup Cosine has unveiled its latest innovation, Genie, which is being hailed as the "world's best AI software engineer." This announcement follows the company's successful $2.5 million seed funding round, led by prominent venture capital firms including SOMA and Uphonest Capital, with additional support from Lakestar and Focal.
Cosine's Genie has set a new benchmark in AI software engineering by achieving an unprecedented score of 30.08% on the SWE-Bench, an industry-standard benchmark for evaluating AI models' software engineering skills. This score significantly surpasses the previous best of 19.27% held by Factory Code Droid, and dwarfs the performance of other well-known AI models, such as OpenAI's GPT-4, which scored just 1.31% on the same benchmark.
Cosine's approach to developing Genie focuses on emulating human reasoning in software engineering. Unlike other AI models that rely on prompting base models, Genie has been trained on a proprietary dataset that codifies human problem-solving processes. This dataset is derived from real-world examples of software engineers at work, allowing Genie to tackle problems like a human engineer, rather than generating random code until something works.
The training process involves a data pipeline that uses a combination of artifacts, static analysis, self-play, step-by-step verification, and fine-tuned AI models trained on a large amount of labeled data. This meticulous approach ensures that Genie can solve bugs, build features, refactor code, and perform a wide range of coding tasks either autonomously or in collaboration with human developers.
This vision is rooted in the belief that by codifying human reasoning, AI models can be trained to perform complex tasks across various domains, thereby transforming the way development and developers work. The founders first realized the potential of large language models to imitate human software developers in early 2022, and have since been working tirelessly to bring this vision to fruition.
The implications of Genie's capabilities are profound. As AI software engineering continues to evolve, Cosine's Genie sets a new benchmark for the industry. Its ability to autonomously perform end-to-end programming tasks with a high degree of reliability has the potential to revolutionize software development, making engineering resources no longer a constraint for tech teams.
Cosine plans to expand Genie's capabilities to cover more programming languages and frameworks, exploring both smaller models for simpler tasks and larger models for complex challenges. This expansion is part of the company's broader strategy to create a family of models that can be ported to any state-of-the-art foundational model, allowing them to leverage the smartest base model available at any given time.
Despite Genie's impressive performance, there are still challenges to overcome. The SWE-Bench has recently modified its submission requirements, now asking for the full working process of AI models in addition to the final results. This poses a significant challenge for Cosine, as publicly sharing this information would essentially open-source their approach, undermining the competitive advantage they have worked hard to develop.
Cosine's Genie represents a significant leap forward in AI software engineering, setting a new standard for the industry. With its ability to emulate human reasoning and perform complex coding tasks autonomously, Genie is poised to revolutionize the field, offering a glimpse into a future where AI and human developers work together seamlessly to solve the most challenging problems.