Word “Data” is derived from Latin word “Dare” which
means “to give”. Data is the foundation on which information and knowledge are
built. Data can be categorized, measured and represented. Data is usually
representative in nature. Data can take various forms, such as number,
character, symbol, sound, image, waves of different frequency. Further through
raw data information is implied and knowledge is derived. Data can be recorded
and stored either in analogue or digital form. Individually separate, distinct
in nature, aggregative, diverse in their characteristic, ease of
understandability and comprehensiveness improves quality of data.
The two-broad classification of data: qualitative and
quantitative. Qualitative data consists of numeric records. They are generally
physical properties (height, weight, length, width, area). They data is
analyzed through visualizations, descriptive and inferential statistics.
Qualitative data are non-numeric. They are texts, pictures, sounds, videos.
They are analyzed through machine learning & data mining techniques.
Based on structure data is categorized as: structured,
semi-structured and unstructured. Structured data are those which can be easily
organized, stored and transferred into various data models. Semi-structured
data are loosely structured data which does not have a predefined model or
schema. They are irregular and often nested hierarchically, but have reasonably
consistent fields, provides self-defining content metadata and a means to
structure data. Unstructured data do not have a data model or structure. They
cannot be easily combined or computed.
By its origin data can also be classified as: Captured
(data collected directly through survey or experiment); Exhaust (data collected
through a device, system); Transient (data which are never processed or
examined); Derived (data generated or processed from a system set).
Other classification of data: Generated data is called
primary data whereas data made available for analysis is called secondary data.
Derived data are called tertiary data. Data which act as unique identifiers are
called indexical data. Data representing a phenomenon are called attribute data
and data about data is a metadata.