Hints
Reach out to us if you want us to help you out with this exercise. We're more than happy to do so!
Last updated
Reach out to us if you want us to help you out with this exercise. We're more than happy to do so!
Last updated
The part of this exercise is about creating our prompt and authentication token (see the introduction presentation for more on this). The prompt is basically what we ask the AI do - exactly the same as using ChatGPT.
We add two new columns with two Text Column tools. We call the first column Prompt and the second Token.
In the prompt, we input the below text. You can and should experiment with this to get the results you want. This is known as Prompt Engineering (fancy word for something straightforward...).
"You are to evaluate the product reviews data on a scale from 0-10 where 10 is an absolute fantastic review and 1 is a horrible review. The product is a bluetooth speaker called Amazon Tap. ONLY OUTPUT AN INTERGER NUMBER BETWEEN 1 AND 10 (inclusive)."
In the Token column, we input our OpenAI API Key. Follow the steps in the Introduction Presentation to generate you own.
The second part is where the magic happens 🪄. We use an API tool to call OpenAI and generate our product review scores.
The configuration of the API tools is shown below.
In #1, we use a POST call
In #2, we input the right URL (this is the address telling OpenAI what service we're looking for). This is the URL: https://api.openai.com/v1/chat/completions
In #3, we add our API headers. We add Content-Type = application/json to tell OpenAI that we want to receive a response in JSON. We add Authorization = bearer [Token] to authenticate our call. This is basically an ID Card telling OpenAI who we are. The [Token] is a reference to our Token column that we created earlier. So the [Token] looks up the value in the Token column. You can also statically input "bearer sk_QhX...".
In #5, we input our body. This is the specification of what we want OpenAI to do. You really only have to pay attention to the messages object. In the content, we again reference two columns. First we give a reference to the prompt column that we created earlier. Then we give a reference to the reviews_text column which is what contains the actual product review. So in essence we're giving the AI model the instruction first and then telling giving the AI model that text it should evaluate.
The API action can take a while to run. So be patient! In our case, you can see that it take around 50 seconds below.
After calling the API we get a couple of new columns - some of which are structured as JSON. JSON looks a bit unfamiliar but we can make it familiar with the Parse JSON tool.
Before doing that we remove all rows that are where Response Status is not equal to 200. Response Status == 200 means that it was successful and we only want those.
Then we go into the Parse JSON, where we essentially parse through the nested layers/levels that are in the response result column. We give the Parse JSON an ID Column in #1 (use the review_id), the in #2 we give it what column what want to parse and in #3 we give it the number of level we want to parse. JSON data is often comprised of levels of data that are nested. You can read more about JSON format here. For now, go with 4 and try to change it to see how you data changes.
In the final part, we remove all the columns except the review_id and the content column which we in Tool ID 6. In the Combine, we merge our original dataset using the review_id column, so we have all our original data an our newly create AI generated review score. In Tool ID 10, we rename our content column, AI_review_score and we are done!
You can very easily calculate the cost of your API call by doing the below if you want.
This is the formula in the Calculate tool:
(([sum_prompt_tokens]/1000)*0.0005)+(([sum_completion_tokens]/1000)*0.0015)