Data Projects: Do These Situations Sound Familiar? Get in Touch!
Eight Months, One Billion Parameters, No Business Value
Eight months after the project’s inception, the first results of a customised LLM were finally presented to senior management. The room was buzzing with excitement, but as the team unveiled their findings, it became clear that the real question had been overlooked: “Who will use this algorithm, and how?”
Complexity is not synonymous with impact. In data projects, success depends on understanding the problem, setting clear goals, and striving to deliver measurable value from day one. A model in production within the first few weeks is the true indicator of progress.
In the end, technology doesn’t matter. What truly counts is the value the solution provides to the end user. If it doesn’t address their problem, it’s just code on a screen.
One GPU for Everyone
One of our clients used the same GPU to manually train multiple independent models. The result? Frequent bugs, memory conflicts, and endless debugging that led to significant delays.
The issue is a familiar one: Data Scientists excel at building AI models but are rarely equipped to design and deploy the infrastructure needed to run them. Asking them to handle both is like expecting a surgeon to also administer anaesthesia—both roles are essential, but each requires distinct expertise.
Machine Learning Engineers and Data Engineers are Software Engineers. The smallest unit of work is the team, which must be multidisciplinary and balanced towards Software Engineers, even in R&D contexts.
The Throw It All Data Lake
“Just add it to the Data Lake, and the Data team will handle it later.”
How many times have we heard this mantra over the years? While it seems convenient, it’s akin to a plumber tossing every spare pipe into his truck. Chaos does not create effective systems, any more than blindly accumulating data does.
Collecting data without a specific purpose is wasteful; it adds unnecessary costs and clutters storage resources. Data collection must be deliberate. Every piece of data should be meticulously labelled with its source, assessed for its quality, and guaranteed to have a clear use.
Effective data practices not only save resources but also preserve the integrity and value of your entire system.