Strategic imperatives for gen-AI builders
In the meantime, companies building AI businesses or features on scraped training data cannot sit on their hands, waiting.
So, what can innovative businesses do now, amidst this uncertainty? A proactive, risk-aware strategy is essential:
1. Acknowledge and continuously assess risk
The first step is a candid internal acknowledgment of the legal risks associated with your current and planned data sourcing practices for LLM training. Building a commercially viable generative AI product more than likely requires scraped web data - developers need to be comfortable with that fact.
2. Start an audit trail for data inputs
Robust data governance is no longer a “nice-to-have.” Before the laws get codified, start documenting the provenance of all training data. Where did it come from? How was it sourced? While this won’t cure an underlying infringement, it’s crucial for due diligence, responding to inquiries, and potentially for negotiating licenses.
3. Respect emerging norms and explicit creator signals
Cultivate an organizational culture that respects the intent of creators, even when the legal lines are still being drawn. For example, a growing band of creators is beginning to inject "no-AI" flags into HTML and image meta data, while Creative Commons is being pushed to adopt a noAI licence. Consider taking these no-AI markers into account when building out your training datasets, particularly if they are made explicit and upfront by the website or creator.Â
While the precise legal enforceability of these signals is still being debated, at least outside of Europe, ignoring such explicit preferences, where they are made, is a matter of ethics and is likely to be viewed unfavorably by courts and regulators.
4. Stay agile, stay informed, seek expert counsel
The legal and regulatory landscape for AI is dynamic. Invest in staying abreast of developments through industry associations, legal updates, and expert consultations. At Zyte, I regularly talk with customers navigating these same issues. Don’t hesitate to seek specialized legal advice tailored to your specific use cases and risk profile.