Vocabulary Control
Overview
Vocabulary Control is the process of developing, maintaining, and using a controlled vocabulary; it refers to the set of terms that must be used in order to index documents, as well as to search for them in a given system. It can be defined as a list of terms, which show their relationships, and represents the specific subject of the document.
Arpita Sarkar
Student of Jadavpur University, M.LIS
Q1. What is vocabulary control?
Q2. Need of controlled vocabulary in information retrieval.
Q3. What are the tools of vocabulary control?
Q4. What is classaurus?
Q5. What is the difference between Natural language and Artificial language?
Q1. What is vocabulary control?
Vocabulary Control refers to the method of creating, maintaining, and using a controlled vocabulary, where a limited set of terms must be used to index documents and to search for these documents in a specific system. It can be defined as a list of terms showing their connections and used to represent the particular subject of the document.
A natural language has many synonyms, quasi-synonyms, homonyms, acronyms, ambiguous terms, etc. If we use natural language for subject indexing, the subject matter may be described by any of the words or phrases, without any limitations, such as those found in the documents themselves. However, certain issues in searching do occur when no control is placed on the vocabulary. In summary, vocabulary control helps in solving problems that arise from natural language.
Q2. Need of controled vocabulary in information retrieval.
- Controlled vocabularies aim to structure information and provide terminology to catalogue and retrieve information.
- controlled vocabularies also ensure that preferred terms are consistently used and that similar content is assigned with the same terms.
- They are essential during the indexing phase because, without them, catalogers may not use the same term consistently to denote the same person, place, or thing. In the retrieval phase, different users might use various synonyms or broader terms for a specific concept.
- Control vocabulary helps users to retrieve the relevant document of user demand.
- By using control vocabulary user can get relevant information in a short time.
- Control vocabulary is very helpful in IR because it can determine the personalization features of users easily.
- In the retrieval process various users may use different synonyms or more generic terms to refer to a given concept that’s why vocabulary control is used to retrieve information.
- It helps to enhance the indexing method to retrieve or search relevant documents on user choice.
- Sometimes professional words may not be knowledgeable or aware to the user due to searching for any information. by the help of control vocabulary, it becomes so accessible to the user.
The necessity for vocabulary control stems from two fundamental aspects of natural language, which are:
- Two or more words or terms can describe a single concept
Example:
salinity/saltiness
VHF/Very High Frequency
- Two or more words that have the same spelling can represent different concepts
Example:
Mercury (planet)
Mercury (metal)
Mercury (automobile)
Mercury (mythical being)
Vocabulary control are used for: list of terms showing their relationships; occurrence of imprecisely defined words; rapidly changing terminology; numerous synonyms for a term and the controlled vocabulary identifies synonyms terms and selects one preferred term among them.
Q3. What are the tools of vocabulary control?
- Classification scheme- CC, DDC, UDC
- Subject heading list-
Subject Heading List is the printed or published list of subject headings which may be produced from the subject authority file maintained by an organization or individual.
Subject heading list contains the preferred subject access terms (controlled vocabulary) that are assigned as an added entry in the bibliographic record which works as an access point and enables the work to be searched and retrieved by subject from the library catalogue database.
Ex. Library of Congress Subject Heading (LCSH), Sears’ List of Subject Heading, MeSH
- Thesaurus-
A thesaurus is a kind of dictionary that represents all the concepts for a specific domain in a consistent manner and labels each concept with a preferred term. Like the previously described examples of controlled vocabularies, thesauri contain preferred terms, variant terms, and broader and narrower terms. Additionally, the thesaurus also includes related terms, which may or may not be part of the same hierarchical structure of the term. A commonly used thesaurus for describing art, architecture, and material culture objects is the Getty Art & Architecture Thesaurus.
Ex. Roget’s Thesaurus
- Taxonomy
- Foxonomy
- Ontology
Q4. What is classaurus?
- Classaurus is a vocabulary control tool.
- Developed by Ganesh Bhattacharyya and used for POPSI, the pre-cordinate indexing system.
- It is a faceted systematic scheme of hierarchical classification incorporating all the essential features of a conventional retrieval thesaurus, i. e control of synonyms, quasi-synonyms and antonyms in extended senses.
- Like faceted classification schemes, there are separate schedules for each of the elementary categories (entity, property and action) and for common modifiers (form, time, place and environment).
- A classaurus can be designed either before starting the indexing work or along with indexing work.
- But in all cases, its designing warrants both a prior and pragmatic approach.
Itself features both a faceted classification scheme as well as that of a conventional alphabetical Thesaurus. It is an elementary category-based (faceted) systematic scheme of hierarchical classification in verbal plane incorporating all the necessary and sufficient features of a conventional information retrieval thesaurus.
Q5. What is the difference between Natural language and Artificial language?
| Natural Language | Artificial Language |
Origin | develop naturally through social interaction. | intentionally created by individuals or groups. |
Flexibility
| evolve over time, adapting to cultural changes | remain static unless modified by their creators. |
Complexity
| Natural languages are inherently complex, with idioms, dialects, cultural references and regional dialects. | while artificial languages are typically more straightforward, with defined syntax and semantics, fixed grammar and vocabulary. |
Purpose
| humans interact with one another through natural languages. | artificial languages are constructed for specific purposes, such as communication in fiction and science fiction, or experimentation in linguistics and logic. |
Direction | Can be ambiguous | Single meaning |
Understandability | It is human based language, so this is user friendly. | It is machine-based language, only machine can understand, user unfriendly. |
Example | English, Spanish, Mandarin. | Programming languages (e.g., Python, Java), constructed languages (e.g., Esperanto, Klingon).
|