Research & Analysis
Data Source Transparency: How UAE Property Data Flows from DLD to BigQuery and How Claude Sonnet Analyzes It
UAE Property AI Bot delivers project cards, rental yields, and investment-style analysis in seconds. But speed means nothing without trust. This article lays out exactly how that analysis is produced — from the official government source, through every technical step, to the AI model that writes the final output. No black box. No invented numbers.
Published: January 28, 2026
The source: Dubai Land Department
Every property and rental figure in UAE Property AI Bot traces back to one place: the Dubai Land Department, or DLD. The DLD is the government body responsible for registering real estate transactions, tenancy contracts, and all related data in Dubai. Its records are the official basis for sale prices, rent levels, and transaction volumes.
We do not scrape third-party portals. We do not mix in unverified listings. The entire pipeline is built on DLD-sourced datasets — transaction records and registered rent contracts. That keeps UAE Property AI Bot's numbers aligned with what the regulator publishes and what professionals rely on for due diligence.
From DLD to the cloud: Google Cloud Storage
DLD data enters our infrastructure as structured files — typically CSV. These files are stored in a dedicated bucket in Google Cloud Storage. The contents fall into two main categories.
Transactions cover sale and purchase records: prices, dates, project and area identifiers, and related fields. Rent Contracts cover registered tenancy data: rent amounts, contract dates, and property references.
The upload and refresh frequency depends on how often new DLD exports are produced and loaded into the bucket. Importantly, the same files that sit in Cloud Storage are the exact same files that feed BigQuery. There are no hidden intermediate sources and no manual edits to the numbers in between.
From Cloud Storage to BigQuery: building the tables
The CSV files in Cloud Storage are loaded into Google BigQuery using Python scripts. Each script reads a specific file — Transactions or Rent Contracts — and loads it into a dedicated BigQuery dataset. The load process uses a truncate-and-replace approach, meaning each run replaces the table with the latest file contents. Schemas are applied consistently so that columns like prices, dates, and area identifiers are typed correctly and ready for querying.
The result is simple: the same DLD-derived files become a single source of truth inside BigQuery. No numbers are changed or reinterpreted during this step. It is a direct transfer from file to table.
How the bot queries the data
When you search for a project, a master community, or a developer, the bot runs structured SQL queries against the BigQuery tables. These queries are defined in the codebase and return specific, useful outputs.
At the project level, you get transaction counts, typical prices per square meter, completion status, and unit counts. Where available, you also get rental metrics: the number of registered rent contracts, current rent levels, and how rents have moved over time. Developer and master community summaries are rolled up where relevant.
All of this is assembled into a structured payload — a clean summary of what BigQuery returned. That payload is what gets sent to the AI. The model does not receive raw CSV files or unstructured text. It receives only the specific data that the queries pulled from DLD-sourced tables.
How Claude Sonnet turns data into analysis
Claude Sonnet, accessed via OpenRouter, takes the structured data from BigQuery and turns it into readable analysis. The process has clear boundaries.
The model receives two things: the structured data payload returned by BigQuery, and a fixed system prompt. That system prompt describes relevant Dubai market context — regulatory frameworks, known supply dynamics, typical yield ranges — and sets strict rules for how the model must behave.
Those rules are direct. Claude is told to base its analysis — advantages, risks, market context, and summary — only on the data it has been given and the context defined in the prompt. It must not invent figures. It must correctly read the direction of trends using the actual data fields it receives. And if the data is insufficient to make a claim, it must say so, rather than filling the gap with generic marketing language.
The output is formatted analysis: sections covering advantages, risks and limitations, market context, and an overall summary.
Claude Sonnet is an interpreter of the data we query, not a source of new information. It has no direct access to DLD, no access to the internet during a request, and no ability to pull in outside data. It works only with what we send. That keeps the entire chain auditable from start to finish.
What transparency means in practice
Transparency here is not a marketing claim. It is a description of how the system is actually built.
The source is clear. All numbers trace back to DLD-sourced files stored in Cloud Storage and loaded into BigQuery. No third-party or unverified data is blended into the core metrics.
The method is documented. The pipeline — the scripts that load data, the SQL that queries it, the structure of the tables — is all recorded and reviewable. If you want to understand where a specific number comes from, the path is there to follow.
The AI's role is defined. Claude Sonnet does not generate or hallucinate data. It is constrained by its instructions to use only what it is given and to flag gaps when they exist.
None of this means the underlying DLD data is perfect. Like any government registry, it has gaps and lags. What it means is that UAE Property AI Bot is explicit about the origin of every number and every step it takes to get from that origin to the analysis you see. The chain is: DLD → Cloud Storage → BigQuery → structured input → Claude Sonnet → analysis. Every link in that chain is documented and traceable.
Summary
UAE Property AI Bot is built on a single, clearly defined data chain. Dubai Land Department data is stored in Google Cloud Storage, loaded into BigQuery, and queried by the bot and website. The results are passed as structured input to Claude Sonnet, which produces analysis under strict instructions not to invent figures and to rely only on what it has been given. The outcome is a system where every number can be traced back to an official source, and every step along the way is documented — no black box, no hidden data sources, no guesswork.