8. Further Work

We feel there is still room for improving the RAG performance. Input data cleansing and filtering is an ongoing task. During our preliminary testing work, we continued to discover obsolete examples or mistakes within examples as they were retrieved from the vector store. Not only did they result in mistakes and deprecated syntax in the generated code, they sometimes prevented more up-to-date examples from being retrieved as a data source. Periodic rebuilding of the vector store with carefully curated input data should be a routine practice.

We have run a preliminary test on incorporating a larger corpus of data, including the FABlib API Documentation [fabrictestbed-extensions documentation — fabric-fablib.readthedocs.io, n.d.] as well as public user forum comments[Forums – FABRIC Knowledge Base — learn.fabric-testbed.net, n.d.]. The result, however, was poor: without further filtering of added data, vector store search retrieved back too much data that was not useful in generating the requested code. This degrades the performance and risks context window overflow, not to mention requires more computational resources. That said, this is an area where a more complex RAG architecture with added filtering mechanisms may be able to improve the overall performance.

Similarly, some RAG models use an execution feedback where the error message output from the execution of the initial code output is sent back to the LLM as a piece of information[Su et al., 2024]. That is certainly promising, although there are difficulties in creating that pipeline since typical FABRIC code is often used to reserve virtual resources running on real-world servers, a task with various types of failure possibilities, rather than performing computational tasks.

Perhaps the most interesting possibility, along with a larger input data, is to create a multistep design for longer and more complicated tasks where a series of small actions must be executed in the correct order. Rather than attempting to generate a long script all at once, it should first determine an order of events, then generate code blocks of each of them.