
Abstract
Zefir is a french prop tech that makes instant cash offers on homes through an online process. To evaluate homes, the team of pricing analyst relies on a database of past real estate ads and transactions. New ads are ingested on a daily basis and parsed to extract relevant home attributes. Some of these characteristics, such as home floor, are highly determinant of the price. Zefir product team had reported that the floor information was often missing from the listed home attributes, despite being clearly stated in the ad description. We devised a data enrichment strategy to extract floor information from the ad description. A combination of regular expressions was used to parse ingested ads and populate or correct the home floor field, in a backtrackable manner. A migration of the database was carried out to enrich historically ingested ads. This feature allowed to correct ~20% and enrich ~30% of real estate ads ingested on the Zefir Data server.