In addition to my professional capacity as a scientist, I also work on different personal projects related to Generative AI and Large Language Models. I have participated in some competitions and hackathons as well. Overall, I try to build a small solution or two that can help make our lives better.
How does one measure the impact of GenAI? CodePrompt is a small, hand-crafted dataset built to assess the efficiency of code generation using the PaLM 2 LLM. CodePrompt consists of 30 coding problems in Python. These range from code generation to completion and troubleshooting errors.
(Banner generated using Bannerbear)
We often spend a lot of time to create presentation slide decks. With SlideDeck AI, users and generative AI co-create a presentation slide deck in a few steps using Mistral 7B Instruction tuned. Previously, it used Llama 2, and had won 3rd place in the Llama 2 Hackathon with Clarifai.
Gemini Senpai is a small, experimental AI assistant prototype built using Gemini's function calling. Currently, Gemini Senpai allows users to generate Python code and small Python applications spanning multiple modules. Build software with an AI assistant, collaboratively.
As scientists and engineers, we often draw a lot of diagrams depicting systems, for example, architecture, state machines, and flow diagrams. However, writing their descriptions can often be tedious, but without which system documentation remains incomplete. With Sys2Doc, one can generate system documentation based on a given diagram of any system. Sys2Doc is powered by Gemini Pro Vision.
Building an RAG app is easy. However, optimizing it is necessarily not. With RAG2Rich, one can identify the optimal parameters/configurations using answer "richness" score, which is evaluated based on the context relevance, answer relevance, and groundedness measures computed by TruLens. In other words, RAG2Rich offers a scientific approach toward optimizing Retrieval-Augmented Generation System. RAG2Rich is powered by Vertex AI and PaLM 2, among others.
Poetry is food for the soul. On the other hand, an image is worth a thousand words. With Poem2Pic, one blends poetry with art. Poem2Pic enables the generation of an image based on a poem. In particular, Flan-T5, a large language model (LLM), is used to generate a very short summary of an input poem. The summary is then fed to Stable Diffusion in order to generate an image. The final image is displayed to the user.
I have also created or curated a few datasets.
A synthetically generated dataset to train a Natural Language Understanding (NLU) model for developing Intent-based Networks (IBNs). A set of sample intents that capture some common network operations, such as creation of a flow between two endpoints, are provided. This dataset is primarily intended to be used with Rasa but can be used with any other framework as well.
A non-exhaustive collection of Bengali literary works (poems, stories, novels, songs, essays, letters) by more than 35 authors. These works span between the 8th and 20th centuries. All these works are in the public domain.
A tiny question-answer dataset in Bengali, created from scratch. Bangla Nirdeshabali covers different topics, such as literature, culture, and geography.
A subset of the Bangla Sahitya dataset. It contains Bengali poems (public domain).