• Apple started with almost no Swift examples and achieved surprising results
  • StarChat-Beta was pushed into uncharted territory without clear guidance
  • Nearly one million working SwiftUI programs emerged after repeated iterations

Apple researchers recently revealed an experiment in which an AI model was trained to generate user interface code in SwiftUI, even though almost no SwiftUI examples were present in the original data.

The study began with StarChat-Beta, an open source model designed for coding. Its training sources, including TheStack and other collections, contained almost no Swift code.

This absence meant the model did not have the advantage of existing examples to guide its responses, which made the results surprising when a stronger system eventually emerged.

Creating a loop of self-improvement

The team’s solution was to create a feedback cycle. They gave StarChat-Beta a set of interface descriptions and asked it to generate SwiftUI programs from those prompts.

Each generated program was compiled to ensure it actually ran. Interfaces that worked were then compared with the original descriptions using another model, GPT-4V, which judged whether the output matched the request.

Only those that passed both stages remained in the dataset. This cycle was repeated five times, and with every round, the cleaner dataset was fed back into the next model.

By the end of the process, the researchers had nearly one million working SwiftUI samples and a model they called UICoder.

The model was then measured against both automated tests and human evaluation, where results showed it not only performed better than its base model, but also achieved a compilation success rate higher than GPT-4.

One of the striking aspects of the study is that Swift code had been almost entirely excluded from the initial training data.

According to the team, this happened by accident when TheStack dataset was created, leaving only scattered examples found on web pages.

This oversight rules out the idea that UICoder merely recycled code it had already seen – instead, its improvement came from the iterative cycle of generating, filtering, and retraining on its own outputs.

While the results centered on SwiftUI, the researchers suggested the approach “would likely generalize to other languages and UI toolkits.”

If so, this could open paths for more models to be trained in specialized domains where training data is limited.

The prospect raises questions about reliability, sustainability, and whether synthetic datasets can continue to scale without introducing hidden flaws.

UICoder was also trained under carefully controlled conditions, and its success in wider settings is not guaranteed.

Via 9to5mac

You might also like