Iterative Data Term Library: For Quick Reference During Study and Research

Shikha SaxenaOct 5·8 min read

Handy Data Dictionary

Photo by Artiom Vallat on Unsplash

Since the advent of Internet and advances in Data Science and Technology, we come across a number of terms related to Data Science and Technology on a daily basis. We need quick answers to them on the go, with seamless continuity, to read the paper/ publication /article. Creating one such quick reference data term dictionary in an alphabetical order that I will keep iterating time to time. Hope it is helpful . Constructive feedback is highly appreciated.

A

Algorithms : A mathematical formula that performs calculations on a software program to analyze data

Artificial Intelligence : Machine Intelligence gained after after training data is fed continuously, analyzed and actions are taken by machine itself after a period of time, without even being programmed for the same.

Artificial Neural Network : Machine Artificial neural network work on the concept of human neural network of brains. ANN models artificial neurons network as in brain.

B

Big Data : Vast amount of everyday increasing data with volume, variety, velocity, veracity and value.

Big Data scientist: Scientists who utilize algorithms to analyze and draw inference from Big Data.

Big Data Tool: Are the big data software to hold, collect and process big data.

Bitcoin : This virtual currency is a decentralized digital currency operating without a bank or administrators , shared between users without intervention of intermediaries.

Business Intelligence (BI) : Technologies and strategies use to make use of business information data through analyses and interpretation.

C

Cryptocurrency: Are cyber digital currencies or binary data currencies managed through encryption. Can be used to buy goods or services. stored in form of coins (Bitcoins, Ethereum, Tether) or records in computerized ledger.

Cyber: Related to computer world, virtual, digital, information Technology.

Cyber Security: Protection of computer systems, networks, hardware and software programs, electronic data and information from theft or damage via national or international groups, individuals or bots.

Cyber Crime: Harm to computer system , hardware or software , data by other computer networks, groups or individuals.

D

Data : Information in the form of text, image, file, links. Either numeric , alphabetic, imageries that can be analyzed in some way.

Data Analyst: Is the person who in charge of data and analyze it to bring about important inferences useful for business.

Data Analytics : Study analyses, interpretation and Communication of data patterns to make conclusions and useful interpretations.

Data Architect: A specialist who designs and maintain and manage data systems. Set policies to store access and retrieve data. Organize and integrate ne wit technologies in the data system infrastructure.

Data Base : Collection of organized information in the form of numeric, text, image data in structured manner on computer systems.

Data Base Manager : Person who supervises data systems, moves and manages the data for carrying out easy maintenance safeguard business data.

Data Engineer: Are undergraduates and postgraduates with degrees in math, science, Computer engineering, statistics, and have command over computing and programming languages as SQL, Python, R, Adept and so on.

Data GovernanceData management concept with different methodologies and strategies to manage enterprise data.

Data Security : Security of data or information from cyber theft by applying software tools and technologies in data networks.

Data Science: An interdisciplinary field

Data Library : Collection of numeric or geospatial data in data bases in institutions or organizations for later use during study or research.

Data Mining Process of extracting information and data patterns by using machine learning, statistics and data science techniques and technologies. Finding correlation in large data sets and predicting useful outcomes.

Data Set Collection of data in tabular forms where columns represent a variable and rows corresponds to a given record of a data set. Data set corresponds to one or more data base tables.

Data Visualization Graphical representation of information or data by using charts, graphs and tables etc. Data is easier to interpret and shared in real-time.

Data Warehouse Data management system, a central integrated data repository from where data can be retrieved, stored and analyzed for the use of business intelligence and important insights.

Digital Image Is real image /picture made of picture elements as pixels stored in numeric forms electronically depending upon its intensity and brightness stored and handled on computer.

E

EDC (Electronic Data Capture) Collection of data electronically especially clinical data for future human use by a computerized system.

F

Firewall Electronic Network security device to protect data from theft. Monitors data traffic to and from network and protects it with defined data security rules.

G

Gigabyte A unit of data storage expressed as GB.

GUI (Graphical User Interface) Interactive visual components of computer software that interact with user

H

Hypertext Text linked to other group of text on separate website or webpage that user can immediately access.

Hyperlink Clickable text on computer or smart device that takes user to another site or webpage with related group of information.

I

Image Picture of a device or entity.

Internet Global interconnected computer network working under protocols.

Internet of Things (IoT) Interconnected devices, connected objects through Internet, having embedded sensors, software and technologies.

Image recognition Set of methodologies, algorithms and technologies that recognize an image and analyze it, understand hidden representation and bring outcomes as classifying it and categorizing the image.

In- Memory Database Database management system that relies on main memory of computer data storage.

J

JavaScript Text based Programming language used to make web page interactive.

Jump drive Also called Flash drive is used interchangeably. Data storage device with different size, shape, color and function. A potable device that can connect to computer through USB port and exchange/store information.

Joy stick Input device with a stick and a base, a controller column which reports its angle and direction to the device it is connected, as in space craft.

K

Kilobyte Unit of data/ information expressed as KB. (1 KB =1024 Bytes)

Knowledge GraphsKnowledge base that uses sematic network to connect and integrate data. A network of real world entities as objects, people, places, concepts connected to each other through interlinked descriptions.

L

Learning Process of acquiring knowledge, information, skill, behavior, by the ability of human/ animal brain or neural network of some machines in a span of time.

LAN (Local Area Network) Computer network that connects computers of a limited area as residences, school, hospital, University.

M

Machine Learning : Branch of Artificial Intelligence which profess that machine can learn from data feed, analyze patterns and make decisions in a similar way as humans with less human intervention.

N

Neural Network Are subset of machine learning. Its a network of neuron circuit and heart of deep learning algorithms network in machine learning. Inspired by human brain neurons as they mimic the function of information transfer in artificial neuron network in the same way as in human brain, solving AI problems. It is a series of algorithms that understands the relationship in underlying data/information just like human brain.

O

Operating System (OS) Is the software that communicates with hard ware of computer system.

Optical Media or Optical Storage Is a Data storage device or equipment for storage and retrieval of data , read and write data.

Olfactory DataOlfactory information also called sense of smell data, which can be differentiated and categorized in different smell and odors, using a computerized system embedded with chemo sensors, smell detectors called olfactometers.

P

Pixel Data A tiny piece of code especially designed and integrated with website or email to detect the visitors and email openers. This gives website operators to know the site visiting audience.

Packet Filters Network security technique of Firewall to controll network access by monitoring incoming and outgoing packets depending upon internet rules and protocols.

Proxy Server Firewalls Most secure form of Firewall data security which filters messages at the application level to protect network.

Q

Queue It is a linear structure an order in which data operations are performed.

R

Reinforcement Learning (RL) Is a branch of Machine Learning which enables intelligent agent to interact and learn in an intuitive environment by trial and error and cumulative rewards.

S

Search Engines Is a software to perform search in websites and webpages when asked for in query by users. Examples are Google, Yahoo, Bing, Yandex and so on.

Structured DataStandardized format for information presentation. Data that resides in a fixed field within a web page, file, and record.

Supervised learning (SL) Is a machine learning task that that maps an input to an output based on input-output pairs. This takes machine learning algorithms to make generalizations, from a training set of data, to an unseen scenario in a reasonable logical manner. It uses labeled input and output data. Example is “Text Classification problems” where machine distinguishes the sentiments from a piece of text data, as twitter texts and product review texts.

Unsupervised learning (UL) This model does not use labeled data but utilizes inherent unlabeled data to discover and work on its own. No algorithm is provided with pre-assigned labels or scores for training data. Machine itself learns from self discovery and detects patterns in training data sets. Examples are customer segmentation to build marketing strategy and clustering DNA patterns to understand evolution pattern.

Semi supervised Learning (SSL) This takes a small amount of labeled data along with large amount of unlabeled data during learning. Example is speech analysis.

T

TensorFlow: Is an open source software library prepared by Google Brain Team to solve machine learning algorithms and can be used in non machine learning tasks too.

Torch : An open source machine learning library. it contains algorithms for Deep Learning.

Transfer Learning: It refers to shifting trained model to new data set to solve the problems of new data set. Existing model can be applied to new data set with similar data.

U

Unsupervised Learning: Machine learning which is done with no fixed target or outcome variable to predict or estimate. the goal of study is to model the underlying data or distribution of data to learn more about that patterns based on their attributes.

Unstructured Data: Raw data that has no fixed structure or format as email messages, texts and images.

V

Voice recognition : Subfield of computer which captures voices and speech and recognizes and converts speech in different languages to text read by computer systems. Different methodologies and technologies are involved.

Variance: Statistical term used to measure spread of given set of numbers. Calculated by average of squared distances from the mean value.

Visualization: Visual abstraction of data to communicate it more effectively.

W

Web World Wide Web or simply Web is an internet information system where documents and other web resources are identified by Unified Resource locators or URLs and are interlinked by hyperlinks shared over internet and accessible all over net through web browsers.

Wi-Fi Wireless Technology that connects all computers, tablets , smartphones and other devices to the internet. Wi-Fi is the radio signal generated from wireless router sent to the nearby devices where this signal is converted into data that one can see and use.

Widget Is a placeholder name for a software application made for a software platform.

X

XML Databases: Data stored in XML format. XML Databases are linked to document-oriented databases. Data stored in this format can be queried, exported or serialized in different formats.

Y

Yahoo Web service provider that connects millions of people over net through mail, sports, search engine

Yandex Is a search engine abbreviation for “Yet Another Index”.

Z

Zookeeper: A subproject of Hadoop. An Apache software project which provides open code configuration and name registration.

Z test : A test that determines how far is a data point from mean of a data set in standard deviation.

Photo by Anne Nygård on Unsplash

Shikha Saxena

A Technical Writer, an artist and blogger by choice. Passionate about reading , writing and editing. http://www.shikhasaxena.com and https://www.dnabox.co/

Get an email whenever Shikha Saxena publishes.

You cannot subscribe to yourself

Share Your Thoughts