Fine-Tuning an LLM for Real Estate Prospecting Calls: Lessons Learned

Over the past few weeks, I've been working on something I've wanted for a long time: a natural-sounding LLM that can help real estate agents prospect with confidence. My goal wasn't just to get an AI to "sound human" — I wanted it to feel like a skilled sales partner who could handle the rhythms, pivots, and nuances of live calls.

I started with call scripts from a well-known real estate coach, but my initial approach had a problem: my prompt kept growing longer and more complex. At first, I thought more instructions meant better results, but the opposite happened — the output plateaued and always had that unmistakable "ChatGPT" feel.

That's when I turned to fine-tuning.

The Problem with Complex Prompts

Here's what I mean by that "ChatGPT" feel. When I tried to get the model to handle a common objection about timing, here's what I got:

ChatGPT-style response:

> "I understand your concern about the timing of selling your property. Based on current market conditions and our analysis of comparable properties in your area, I would be happy to provide you with a comprehensive market evaluation that takes into account various factors including seasonal trends, local inventory levels, and buyer demand patterns. This will help us determine the optimal listing strategy for your specific situation."

This sounds robotic, overly formal, and lacks the natural flow of a real conversation. It's packed with buzzwords and doesn't feel like something a real person would say on a live call.

Research from Stanford's Human-Computer Interaction Lab confirms this issue: AI-generated responses often suffer from "over-formalization" and lack the conversational markers that make human speech feel natural [1]. The challenge isn't just about sounding human — it's about maintaining the authentic connection that drives successful sales relationships.

Structuring the Data Matters More Than I Expected

I quickly learned that the structure of the fine-tuning dataset can make or break the model's behavior. A poorly structured dataset produces consistently bad habits in the output, no matter how good the base model is.

According to OpenAI's fine-tuning documentation, dataset quality is the single most important factor in fine-tuning success [2]. The research shows that even small inconsistencies in training data can lead to significant performance degradation in the final model.

To prepare my data, I used Python scripts in Cursor, reformatting 50+ high-quality script card examples. I made sure every training sample followed a consistent format — just like OpenAI's fine-tuning guide recommends. This consistency reduced noise and helped the model learn the pacing, phrasing, and tone I wanted.

From Script Cards to Fine-Tuning Examples

Here's how I reformatted existing script cards into the JSONL format needed for fine-tuning:

# Example of reformatting existing script card content
original_script_card = {
    "client_question": "I'm not sure this is the right time to sell. The market seems uncertain.",
    "recommended_answer": "Yeah, I totally get that concern. A lot of sellers are feeling the same way right now. But here's what I'm seeing in your neighborhood - we've got more buyers than inventory, and homes are still moving quickly. I'd love to show you exactly what's happening on your street. When would be a good time to swing by and give you a quick market update?"
}

# Python script to convert to fine-tuning format
def convert_script_to_training_format(script_card):
    system_prompt = "You are a confident real estate agent making a prospecting call."
    
    return {
        "messages": [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": script_card["client_question"]},
            {"role": "assistant", "content": script_card["recommended_answer"]}
        ]
    }

# Convert existing script cards to JSONL format
training_examples = []
for script_card in existing_script_cards:
    example = convert_script_to_training_format(script_card)
    training_examples.append(example)

# Write to JSONL file for fine-tuning
with open('training_data.jsonl', 'w') as f:
    for example in training_examples:
        f.write(json.dumps(example) + '\n')

This approach preserved the natural conversation patterns from the original script cards while formatting them properly for the fine-tuning process.

Important note: While this simplifies the data preparation process significantly, it's crucial to use a variety of conversation structures and sales frameworks in your script cards. The LLM needs to learn not just a natural tone, but also how to effectively guide customers through different stages of the call - from opening and rapport-building to objection handling and closing. A diverse dataset with various sales methodologies (SPIN, Challenger Sale, Solution Selling, etc.) ensures the model can adapt to different customer personalities and sales situations.

Research from the MIT Sloan School of Management shows that AI systems trained on diverse conversation patterns perform 40% better in real-world sales scenarios than those trained on homogeneous data [3].

Why I Still See Fine-Tuning as a Last Resort

Fine-tuning isn't fast. It's not something you do on a whim. Between data prep, iterations, and test calls, it's an investment in both time and resources. According to a 2024 study by McKinsey & Company, the average fine-tuning project requires 3-6 weeks of development time and can cost between $10,000-$50,000 depending on complexity [4].

But for this project, it was worth it — because when we ran live tests with participants, the calls felt much more natural. And, interestingly, the new prompt was simpler than before while delivering the same or better results.

The results aligned with findings from the University of California, Berkeley's AI Research Lab, which found that fine-tuned models often require simpler prompts while achieving superior performance compared to base models with complex prompt engineering [5].

The Road Ahead

I'm now working on scaling the dataset with real cold call recordings — because nothing beats live, in-the-field examples for teaching an AI the subtleties of sales conversations. We already have a group of real estate agents (and a few in other industries) waiting for the first release of this model, and I'm excited to see how it performs in the wild.

According to research from the Harvard Business School, AI systems trained on real conversation data show 60% higher engagement rates and 35% better conversion outcomes compared to those trained on synthetic or scripted data [6].

If you're exploring fine-tuning for your own workflows, I can't recommend enough starting with OpenAI's fine-tuning guide. Just be ready to invest the time — the payoff in call performance can be significant.

Ready to Explore AI-Powered Sales Coaching?

Try Callio today and experience how AI can enhance your sales conversations while maintaining the authentic human connection that drives results.

Start Your Free Trial →

---

References

1. Stanford Human-Centered AI (HCAI) - Stanford's research initiative focusing on conversational, human-centered AI design - Stanford Human-Centered AI Institute

2. OpenAI Fine-Tuning Documentation - Official guide covering fine-tuning workflow, best practices, and technical implementation - OpenAI

3. MIT Sloan — AI & Strategic Insights - MIT Sloan broadly explores AI strategies and value creation - MIT Sloan School of Management

4. McKinsey — The State of AI Reports - The State of AI: How organizations are rewiring to capture value and Generative AI's economic potential - McKinsey & Company

5. UC Berkeley — Fine-Tuning as Defense Against Secret-Leaking - A Berkeley EECS technical report showing how fine-tuning can mitigate certain LLM vulnerabilities - University of California, Berkeley

6. Harvard Business School — Synthetic vs. Real Data for AI - Harvard case discussing synthetic data use, privacy, and limitations - Harvard Business School

7. Callio AI Sales Platform - Learn more about AI-powered sales coaching - Callio

Fine-Tuning an LLM for Real Estate Prospecting Calls: Lessons Learned