Intro
In this guide, I will explain how to set up a connection to the OpenAI API in an Android project. To be precise, I will focus on text completion. OpenAI API contains more features – see the API overview.
The idea here is to create a Quiz Application. Based on a given subject, API will generate 10 questions. Each question will have 4 possible answers. The correct answer will be marked, too. With the sprinkle of magic, GPT will act as a backend for our application.
Architecture
I will use an MVVM architecture with StateFlow and Compose for UI. When it comes to OpenAI connection, there is no official library for Android as of today. Because of that, I am using an unofficial library, which also works with coroutines and Kotlin Multiplatform.
Of course, you can also connect to the API manually using any HTTP client, such as Retrofit, Ktor, or OkHttp. If you want to go down this road, hereโs the API reference.
I will include the logic in a single ViewModel, which might not be an ideal solution, but it is good enough for a project of this size. The ViewModel will receive an event and update the state via StateFlow. Activity will collect this StateFlow, and render the adequate UI.
Letโs break it down
The general flow looks like this:
- The user enters the subject
- The app connects to the backend and receives a set of questions and answers
- Questions get displayed one after another
- When questions are answered, a summary screen is shown
So, we need 3 screens:
- InputSubjectScreen, where the user will enter the subject
- QuestionScreen, where the question will be displayed
- SummaryScreen, where the score will be displayed
Implementing logic and model
State
The screens listed below can be translated into possible UI states. The state is represented by a sealed class.
Event
We can create another sealed class for events that users might trigger:
ViewModel
Base ViewModel structure contains StateFlow and onEvent() function. Here, we will also hold generated questions and the question index, which will tell which question is currently displayed. Question class is not defined yet, we will get there in a second!
Getting questions from the API
When a subject is entered, we update the state to display the loading screen and then connect to the OpenAI API to get the questions. With this prompt, we should get a JSON response. I am using gpt-3.5-turbo
model because of the response time, but you can use gpt-4
. It might take more time to complete, but the quality of the response is a lot better.
You need to provide your own API key (marked in the code as API_KEY
). If you are not sure how to do it, visit this link.
A word of warning!
There’s no guarantee that the model will always return the correct JSON structure, even with the โReturn JSON and nothing elseโ phrase in the prompt. I never had this issue with gpt-4
, but it happened a few times with gpt-3.5-turbo
. Remember to handle this case if you want to use this API connection in the production environment.
Let’s go through the remaining events
When the question is answered, the answer is saved in the Question model. Then, if there are any questions left, the next question is shown. If not, the amount of correct answers is summed up, and a summary screen is shown.
The easiest event to handle is when the quiz is restarted. The question index is restarted back to 0, and the state is updated, so the user should see the first screen again.
Model
Time for the missing part – we need classes that will represent the Question and the Answer. I defined the structure in the prompt above. Here is the relevant part:
Basing on this, I created adequate model classes:
userAnswerId
is not a part of API response – it is used to save the answer picked by a user. That is why it does not have a @SerializedName
annotation.
The harder part is done! Now we can move to the visual part of the application.
Implementing User Interface
Activity and navigation
The Activity serves as an entry point and renders the screen based on the collected state from our ViewModel.
Input Subject Screen
This is the first screen that the user sees. It accepts a function as a parameter to run when the subject is entered, which is simply passing an event to the ViewModel. The screen contains a TextField and a Button.
Loading Screen
The loading screen is a stateless screen with Text and an Indicator. Easy one.
Question Screen
The question screen is responsible for showing the question and possible answers. It does accept the Question and the action to execute when the question is answered.
Summary Screen
And finally, the summary screen shows the final score and contains a button that allows the user to restart the quiz. This is done, like above, by sending an event to the ViewModel.
Ideas for improvement
I tried to keep the code as simple as possible for learning purposes, but obviously, there is plenty of room for improvement. Here are some ideas:
Handle parsing error
As I mentioned, it is not always guaranteed that we receive a correct JSON structure from the API. It would be wise to handle this case with an elegant error screen. Also, it is possible that the response will be correct, but it will have an additional sentence at the end, especially with gpt-3.5-turbo
and older models. Because of that, you can try to search for JSON in the response, extract it and ignore the rest.
Handle timeouts
Another idea is to handle the timeout error. Sometimes, getting the response will take some time, especially when the API is overloaded. gpt-3.5-turbo
is visibly faster than gpt-4
. The timeout value can be set in MainViewModel:62
:
Instantly Marking the Correct Answer After Selection
Right now, the result is visible at the end of the quiz, on the Summary Screen. Marking the correct answer instantly might improve the user experience, providing faster feedback.
Outro
Thank you for exploring this guide! If you found it valuable, please consider subscribing. Your support drives me to share more quality content. Happy coding!