In order to develop an effective Conversation Intelligence module for Superlayer, our team conducted a thorough analysis of the available technical solutions. We sought to identify the best combination of technologies that would enable seamless meeting recording, accurate transcription, and insightful analytics, all while being user-friendly and reliable.
We assessed various AI models to perform speech-to-text. While many models excelled in English language processing, we found that their performance diminished for other languages. Upon obtaining the transcripts, we aimed to explore diverse approaches for generating valuable summaries, identifying crucial topics, and extracting key insights from each call.
When a sales representative joins a scheduled call with a client, the entire session is recorded. In an asynchronous fashion, the call is permanently stored and a transcript is generated. Once the raw transcript is ready, a customized data transformation is performed to provide advanced features such as the timeline and the diarized text. Intermediate results of the overall process will be available as soon as its corresponding sub-task is completed for a promptly and seamlessly user experience.
The feature’s primary focus is to extract valuable observations that indicate whether a client has a solid inclination to purchase or try out a product or if there is any indication to the contrary.
To accomplish this, we employed specific Prompt Engineering strategies, including:
- Clearly defining the call transcript’s domain: Outlining who is involved in the professional conversation.
- Structuring the task: Formally and meticulously describe the interpretation task. This involves explaining the corresponding emotions and tone of voice that would classify insights as positive or negative, apart from their objective content.
- Preventing false information: To avoid generating incorrect insights, the output is limited, and the model is made to cross-examine its results by providing exact transcript references for each retrieved insight.
- Templating prompt and output: Making sure the input prompt and the produced output are properly structured. Each component (like a transcript, insight description, title, reference, or categorization) is denoted using specific delimiters so that both the prompt and output can be programmatically processed.
Given that the LLM's input and output do not fit within the maximum model tokens available, a token window (segmenting) approach was involved. Even if it does come with the drawback of making more API calls, we found out that this strategy best meets the feature requirements.