‘You might think of the goal of semantic technologies as separating the signal from the noise.’
Around 80% of publicly exchangeable data from chatbots, social media, customer surveys, online reviews and uploaded documents remains unstructured; this represents the ‘noise’. So, data engineers and scientists resort to semantic technologies to turn this amorphous data into more valuable information and more-trustworthy knowledge to support generative AI output and decision-making based on it.
The neat definition above from Cambridge Semantics Inc. stretches across the spectrum of semantic technologies: data mining, category tagging, AI, natural language processing (NLP) and semantic search. You can find the full Cambridge Semantics article here on their Semantic University webpage.
A Linguist Looks at Semantic Tech
As the etymology of the word ‘semantic’ suggests, human languages present a challenge to machines, which are programmed to mine words: our own language is highly nuanced and filled with ambiguities and duplicities. Humans inject confirmation bias and heuristic behaviour into verbalisation, often imprecisely, and our language is also prone to evolve.
Enabling computers to understand and communicate using human language for the purpose of extracting, mimicking, and manipulating data is known as natural language processing (NLP). In the process we establish ontologies to define the interrelationships of words and inference rules to create ‘value chains’ to improve the accuracy of classifications and the related data-linked output.
Beyond this, however, there is still the need for additional text analysis strategies – sentiment analysis, for example – to understand the feelings and opinions expressed within written responses for positive or negative outcomes.
‘Semantic technologies are automation tools developed so computers can understand human language more quickly, accurately and at scale.’
This enlightening statement comes from an article titled ‘EX MACHINA: Humans, Machines, Language and Semantic Technologies’.
The overarching aim of semantic technologies is to enable machines to decipher and make sense of complex data hoards without any prior knowledge of the data sets. Data management and enterprise analytics companies increasingly rely on this more sophisticated data analysis to create and manage links between machine-readable interrelated data on the Web. Through the refining process of semantic querying, this is far more useful material than siloed or raw data.
Semantic technologies will be helping businesses process enormous vaults of unrefined data faster and more forensically, enhancing the knowledge discovery and accuracy of AI-driven answers and insights, so they will penetrate every industry sector. This proliferation will speed up as proprietary AI platforms cooperate and collaborate; the resulting cross-fertilisation of linked data models will only refine the automated mining of inefficient human languages into the precious ‘gold’ of semantic metadata.
As AI systems collect more and more data, people expect them to quickly understand and explain information. The meaning attached to this data will help check if the information is correct and makes sense.
An Ontotext study on the fundamentals of semantic technology provides an expanded definition of how AI will harness the results:
Semantic technology defines and links data… by developing languages to express rich, self-describing interrelations of data in a form that machines can process… helping AI-driven systems store, manage and retrieve information based on meaning and logical relationships.
The architecture behind data linking is commonly represented by knowledge graphs, which can provide a compelling visual aid to the complex routing behind content management systems and how data unification was achieved.
You can watch a webinar titled ‘Knowledge Graphs Maps’ featuring Ontotext CEO, Atanas Kiryakov to learn more.
Semantic metadata is the new gold… as important for the data as packaging is for goods.
Semantic technology goes beyond the goals of greater context accuracy, avoiding hallucinations, and cross-referencing linked data to validate the output of generative AI. Large Language Models (LLMS) that underpin platforms like ChatGPT and those powering the AI on smartphones like Google’s Gemini, plus the new generation of Small Language Models (SLMs, like META’s Llama 3 and Microsoft’s Phi 3…and let’s not forget Apple’s SIRI), are battling with safety concerns.
Gen AI tools are interfacing with search engines on the public internet and inevitably the embedded dangerous content, misinformation, disinformation, and conspiracy theories. So, semantic technologies are becoming embedded in safety protocols and will be critical in identifying prompt injections that maliciously attempt to trick generative AI.
Read the Raconteur ‘Three Minute Explainer: What Prompt Injections?’
Semantics adds another layer to the Web and can show related facts instead of just matching words. You may have read about the aspirations for the Semantic Web, where tagged data descriptors provide further meaning to the metadata on the existing Web to establish how it fits into the architecture of content management structures.
Tim Berners-Lee, founder of the World Wide Web, has co-written a paper on the Semantic Web which provides his insights and explanations.
‘The Semantic Web will bring structure to the meaningful content of Web pages…’
What Does This Mean for Technical Communicators?
If you are a technical communicator, you will soon need to be proficient in Semantic Knowledge Management and the associated languages that semantic technologies are built on: RDF (representing the format used to store data on the Semantic Web), SPARQL (the semantic query language) and OWL (showing the schema behind the hierarchies of the interrelationships of data).
Tech departments will also want you, the technical communicator, to show familiarity with knowledge graphs and the related ontologies and taxonomies (see our August Newsletter) for their own data mining and NLP technologies.
You will need a deep appreciation of how and why unrefined and unstructured data must be expressed in plain language that deciphers the context and leaves as little as possible to the imagination (or AI ‘hallucination’). This refinement of clarity is often expressed as the “semantic ladder” to transform loose and imprecise descriptions (Level 1 of the ladder) into exact and unambiguous elucidations (Level 5).
As a technical communicator, you may find yourself on the team that chooses and defines the key and critical terminology and the relationships between words and their impact. You’ll be at the very heart of semantic annotation or the data mining of AI-driven machines and translation engines in your ecosystem.
You may just play a part in turning the adage that claims ‘a picture paints a thousand words’ on its head. Because, without climbing the semantic ladder to refine, cross-check, and vet the syntax and sentiment, a single word could equally paint a thousand outputs in the algorithms of generative AI.
If you wish to further your career in the fast-moving, ever-changing world of digital technical communication and receive our regular newsletter, sign up to the Firehead Training Academy and follow our antics.