<<

WWW.VIDYARTHIPLUS.COM

IT2052 MANAGEMENT INFORMATION SYSTEM

UNIT-3

SYSTEM, INFORMATION AND

DECISION THEORY www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

’ of information in messages

(Shannon & Weaver model of communication)

Information theory deals with measurement and transmission of information through a channel. A fundamental work in this area is the Shannon's Information Theory, which provides many useful tools that are based on measuring information in terms of bits or - more generally - in terms of (the minimal amount of) the complexity of structures needed to encode a given piece of information.

Noise can be considered data without meaning; that is, data that is not being used to transmit a signal, but is simply produced as an unwanted by-product of other activities. Noise is still considered information, in the sense of Information Theory.

Shannon’s ideas

• Form the basis for the field of Information Theory

• Provide the yardsticks for measuring the efficiency of communication system.

• Identified problems that had to be solved to get to what he described as ideal communications systems

Information:

In defining information, Shannon identified the critical relationships among the elements of a communication system the power at the source of a signal the bandwidth or frequency range of an information channel through which the signal travels the noise of the channel, such as unpredictable static on a radio, which will alter the signal by the time it reaches the last element of the System the receiver, which must decode the signal.

To get a high-level understanding of his theory, a few basic points should be made. First, words are symbols to carry information between people. If one says to an American, “Let’s

UNIT 3 Page 2

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY go!”, the command is immediately understood. But if we give the commands in Russian, “Pustim v xod!”, we only get a quizzical look. Russian is the wrong code for an American.

Second, all communication involves three steps

 Coding a message at its source

 Transmitting the message through a communications channel

 Decoding the message at its destination.

In the first step, the message has to be put into some kind of symbolic representation – words, musical notes, icons, mathematical equations, or bits. When we write “Hello,” we encode a greeting. When we write a musical score, it’s the same thing – only we’re encoding sounds. For any code to be useful it has to be transmitted to someone or, in a computer’s case, to something. Transmission can be by voice, a letter, a billboard, a telephone conversation, a radio or television broadcast. At the destination, someone or something has to receive the symbols, and then decode them by matching them against his or her own body of information to extract the data.

Fourth, there is a distinction between a communications channel’s designed symbol rate of so many bits per second and its actual information capacity. Shannon defines channel capacity as how many kilobits per second of user information can be transmitted over a noisy channel with as small an error rate as possible, which can be less than the channel’s “raw” symbol rate.

EXAMPLE:

Suppose we are watching cars going past on a highway. For simplicity, suppose 50% of the cars are black, 25% are white, 12.5% are red, and 12.5% are blue. Consider the flow of cars as an information source with four words: black, white, red, and blue. A simple way of encoding this source into binary symbols would be to associate each color with two bits, that is: black = 00, white = 01, red = 10, and blue = 11, an average of 2.00 bits per color.

UNIT 3 Page 3

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

A Better Code Using Information Theory

A better encoding can be constructed by allowing for the frequency of certain symbols, or words: black = 0, white = 10, red = 110, blue = 111.

How is this encoding better?

0.50 black x 1 = .500

0.25 white x 2 bits = .500

0.125 red x 3 bits = .375

0.125 blue x 3 bits = .375

Average-- 1.750 bits per car

ENTROPY:

A quantitative measure of the disorder of a system and inversely related to the amount of energy available to do work in an isolated system. The more energy has become dispersed, the less work it can perform and the greater the entropy.

Furthermore Information Theory tells us that the entropy of this information source is 1.75 bits per car and thus no encoding scheme will do better than the scheme we just described. In general, an efficient code for a source will not represent single letters, as in our example before, but will represent strings of letters or words. If we see three black cars, followed by a white car, a red car, and a blue car, the sequence would be encoded as 00010110111, and the original sequence of cars can readily be recovered from the encoded sequence.

SHANNON’S THEOREM

Shannon's theorem, proved by Claude Shannon in 1948, describes the maximum possible efficiency of error correcting methods versus levels of noise interference and data corruption.

The theory doesn't describe how to construct the error-correcting method; it only tells us how good the best possible method can be. Shannon's theorem has wide-ranging applications in both communications and data storage applications.

UNIT 3 Page 4

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

Where;

C is the post-correction effective channel capacity in bits per second;

W is the raw channel capacity in hertz (the bandwidth); and

S/N is the signal-to-noise ratio of the communication signal to the Gaussian noise interference expressed as a straight power ratio (not as decibels)

Channel capacity, shown often as "C" in communication formulas, is the amount of discrete information bits that a defined area or segment in a communications medium can hold.

The phrase signal-to-noise ratio, often abbreviated SNR or S/N, is an engineering term for the ratio between the magnitude of a signal (meaningful information) and the magnitude of background noise. Because many signals have a very wide , SNRs are often expressed in terms of the logarithmic decibel scale.

Example

If the SNR is 20 dB, and the bandwidth available is 4 kHz, which is appropriate for telephone communications, then C = 4 log2(1 + 100) = 4 log2 (101) = 26.63 kbit/s. Note that the value of 100 is appropriate for an SNR of 20 dB. If it is required to transmit at 50 kbit/s, and a bandwidth of 1 MHz is used, then the minimum SNR required is given by 50 = 1000 log2(1+S/N) so S/N = 2C/W -1 = 0.035 corresponding to an SNR of -14.5 dB. This shows that it is possible to transmit using signals which are actually much weaker than the background noise level.

SHANNON’S LAW

Shannon's law is any statement defining the theoretical maximum rate at which error free digits can be transmitted over a bandwidth limited channel in the presence of noise.

Core Assumptions and Statements

UNIT 3 Page 5

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

According to the theory, transmission of the message involved sending information through electronic signals. “Information” in the information theory sense of the word, should not be confused with ‘information’ as we commonly understand it. According to Shannon and Weaver, information is defined as “a measure of one’s freedom of choice when one selects a message”. In information theory, information and uncertainty are closely related. Information refers to the degree of uncertainty present in a situation. The larger the uncertainty removed by a message, the stronger the correlation between the input and output of a communication channel, the more detailed particular instructions are the more information is transmitted. Uncertainty also relates to the concept of predictability. When something is completely predictable, it is completely certain. Therefore, it contains very little, if any, information. A related term, entropy, is also important in information theory. Entropy refers to the degree of randomness, lack of organization, or disorder in a situation. Information theory measures the quantities of all kinds of information in terms of bits (binary digit). Redundancy is another concept which has emerged from the information theory to communication. Redundancy is the opposite of information. Something that is redundant adds little, if any, information to a message. Redundancy is important because it helps combat noise in a communicating system (e.g. in repeating the message). Noise is any factor in the process that works against the predictability of the outcome of the communication process. Information theory has contributed to the clarification of certain concepts such as noise, redundancy and entropy. These concepts are inherently part of the communication process.

Shannon and Weaver broadly defined communication as “all of the procedures by which one mind may affect another”. Their communication model consisted of an information source: the source’s message, a transmitter, a signal, and a receiver: the receiver’s message, and a destination. Eventually, the standard communication model featured the source or encoder, who encodes a message by translating an idea into a code in terms of bits. A code is a language or other set of symbols or signs that can be used to transmit a thought through one or more channels to elicit a response in a receiver or decoder. Shannon and Weaver also included the factor noise into the model. The study conducted by Shannon and Weaver was motivated by the desire to increase the efficiency and accuracy or fidelity of transmission and

UNIT 3 Page 6

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY reception. Efficiency refers to the bits of information per second that can be sent and received. Accuracy is the extent to which signals of information can be understood. In this sense, accuracy refers more to clear reception than to the meaning of message. This engineering model asks quite different questions than do other approaches to human communication research.

Conceptual Model

Mathematical (information) model of communication.

Scope and Application

Studies on the model of Shannon and Weaver takes two major orientations. One stresses the engineering principles of transmission and perception (in the electronic sciences). The other orientation considers how people are able or unable to communicate accurately because they have different experiences and attitudes (in the social sciences).

Application of information theory:

1)

2) Constructing decision tree using information gain

3) Accounting applications Information content

From Wikipedia, the free encyclopedia

UNIT 3 Page 7

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

The term information content is used to refer to the meaning of information as opposed to the form or carrier of the information. For example, the meaning that is conveyed in an expression (which may be a proposition) or document, which can be distinguished from the sounds or symbols or codes and carrier that physically form the expression or document. An information content is composed of a propositional content and an illocutionary force.

Example:

The meaning that is conveyed in an expression or document, which can be distinguished from sounds or symbols or codes and carrier that physically form the expression or document.

Shannon’s theorem expressed the capacity of a channel: defining the amount of information that can be sent down a noisy channel in terms of transmits power and bandwidth. He showed that the engineers could choose to send a given amount of information using high power and low bandwidth or high bandwidth and low power.

The traditional solution was to use narrow-band radios, which would focus all their power into a small range of frequencies. The problem was that the number of users increased, the number of channels began to be used up. Additionally, such radios were highly susceptible to interference: so much power was confined to a small portion of the spectrum that a single interfering signal in the frequency range could disrupt communication.

Shannon offered a solution to this problem by redefining the relationship between information, noise and power. Shannon quantified the amount of information in a signal, stating that is the amount of unexpected data the message contains. He called this information content of message ‘entropy’. In digital communication a stream of unexpected bits is just random noise. Shannon showed that the more a transmission resembles random noise, the more information it can hold, as long as it is modulated to an appropriate carrier: one needs a low entropy carrier to carry a high entropy message. Thus shannon stated that an alternative to narrow band radios was sending a message with low power, spread over a wide bandwidth.

Spread Spectrum is one technique, where it takes a narrow band signal and spreads its power over a wide band of frequencies. This makes it incredibly resistant to interferences. UNIT 3 Page 8

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

ENTROPY AS INFORMATION CONTENT:

Entropy is defined in the context of a probabilistic model. Independent fair coin flips have entropy of 1bit per flip. A source that always generates a long string of B’s has entropy of ‘0’, since the next character will always be a ‘B’.

The entropy rate of a data source means the average number of bits per symbol needed to encode it. Shannon’s experiments with human predictors show information rate of between 0.6 and 1.3 bits per character, depending on the experimental set up, the PPM compression can achieve a compression ratio of 1.5 bits per character in English text.

From the preceding example, note the following points:

1) The amount of entropy is not always an integer number of bits.

2) Many data bits may not convey information. For example, data structures often store information redundantly, or have identical sections regardless of the information in the data structure.

There are number of entropy related concepts that mathematically quantify information content in some way:

1) The self information of an individual message or symbol taken from a given probability distribution.

2) The entropy of a given probability distribution of messages or symbols

3) The entropy rate of a stochastic process

Limitations of entropy as information content:

Although entropy is often used as characterization of the information content of a data source this information content is not absolute: it depends crucially on the probabilistic model. A source that always generates the same symbol has an entropy rate of 0, but the definition of what a symbol is depends on the alphabet.

UNIT 3 Page 9

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

Example:

Consider a source that produces the string ABABABABAB….. in which A is always followed by B and vice cersa. If the probabilistic model considers individual letters as independent, the entropy rate of the sequence is 1 bit per character. But if the sequence is considered as “AB AB AB AB AB….” With symbols as two-character blocks, then the entropy rate is 0 bits per character.

However, if we use very large blocks, then the estimate of per-character entropy rate may become artificially low. This is because in reality, the probability distribution of the sequence is not knowable exactly; it is only an estimate. For example, suppose one considers the text of every book ever published as a sequence, with each symbol being the text of a complete book. If there are N published books, and each book is only published once, the estimate of the probability of each book is 1/N, and the entropy (in bits) is -log2 1/N = log2 N. As a practical code, this corresponds to assigning each book a unique identifier and using it in place of the text of the book whenever one wants to refer to the book. This is enormously useful for talking about books, but it is not so useful for characterizing the information content of an individual book, or of language in general: it is not possible to reconstruct the book from its identifier without knowing the probability distribution, that is, the complete text of all the books. The key idea is that the complexity of the probabilistic model must be considered. is a theoretical generalization of this idea that allows the consideration of the information content of a sequence independent of any particular probability model; it considers the shortest program for a universal computer that outputs the sequence. A code that achieves the entropy rate of a sequence for a given model, plus the codebook (i.e. the probabilistic model), is one such program, but it may not be the shortest.

For example, the Fibonacci sequence is 1, 1, 2, 3, 5, 8, 13, ... . Treating the sequence as a message and each number as a symbol, there are almost as many symbols as there are characters in the message, giving an entropy of approximately log2(n). So the first 128 symbols of the Fibonacci sequence has an entropy of approximately 7 bits/symbol. However, the sequence can be expressed using a formula [F(n) = F(n-1) + F(n-2) for n={3,4,5,...},

UNIT 3 Page 10

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

F(1)=1, F(2)=1] and this formula has a much lower entropy and applies to any length of the Fibonacci sequence. Redundancy

Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data. Data compression is a way to reduce or eliminate unwanted redundancy, while checksums are a way of adding desired redundancy for purposes of error detection when communicating over a noisy channel of limited capacity.

Quantitative definition

In describing the redundancy of raw data, recall that the rate of a source of information is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the most general case of a stochastic process, it is

the limit, as n goes to infinity, of the joint entropy of the first n symbols divided by n. It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a memory less source is simply H(M), since by definition there is no interdependence of the successive messages of a memory less source.

The absolute rate of a language or source is simply

the of the cardinality of the message space, or alphabet. (This formula is sometimes called the function.) This is the maximum possible rate of information that can be transmitted with that alphabet. (The logarithm should be taken to a base appropriate for the unit of measurement in use.) The absolute rate is equal to the actual rate if the source is memory less and has a uniform distribution.

UNIT 3 Page 11

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

The absolute redundancy can then be defined as

the difference between the absolute rate and the rate.

The quantity is called the relative redundancy and gives the maximum possible data compression ratio, when expressed as the percentage by which a file size can be decreased. (When expressed as a ratio of original file size to compressed file size, the quantity R:r gives the maximum compression ratio that can be achieved.) Complementary to the concept of relative redundancy is efficiency, defined as so that . A memory less source with a uniform distribution has zero redundancy (and thus 100% efficiency), and cannot be compressed.

Other notions of redundancy

A measure of redundancy between two variables is the mutual information or a normalized variant. A measure of redundancy among many variables is given by the total correlation.

Redundancy of compressed data refers to the difference between the expected compressed data length of n messages (or expected data rate ) and the entropy (or entropy rate ). (Here we assume the data is ergodic and stationary, e.g., a memoryless source.) Although the rate difference can be arbitrarily small as increased, the actual difference , cannot, although it can be theoretically upper-bounded by 1 in the case of finite-entropy memoryless sources. CLASSIFICATION AND COMPRESSION

 Availability classification

 Sensitivity classification

UNIT 3 Page 12

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

o Concepts

o Class : Public / non classified information

o Class : Internal information

o Class : Confidential information

o Class : Secret information

Information of different types need to be secured in different ways. Therefore a classification system is needed, whereby information is classified, a policy is laid down on how to handle information according to it's class and security mechanisms are enforced on systems handling information accordingly.

4.1 Availability classification

Here a classification system is proposed which has four availability classes. It is based on the author's experience, as no equivalent standards are available for reference.

To improve availability, preventative measures reduce the probability of downtime and recovery measures reduce the downtime after an incident.

Class

Maximum allowed Server 1 Week 1 Day 1 Hour 1 Hour downtime, per event

On which Days? Mon-Fri Mon-Fri Mon-Fri 7 Days

During what hours? 07:00-18:00 24h

Expected availability percentage 80% 95% 99.5% 99.9%

= 1 = 2 = = ==> expected max. downtime day/week hours/Week 20min./Week 12min./month

UNIT 3 Page 13

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

4.2 Sensitivity classification

A classification system is proposed which classes information / processes into four levels. The lowest is the least sensitive and the highest is for the most important information / processes.

4.2.1 Concepts

 All data has an owner.

 The data or process owner must classify the information into one of the security levels- depending on legal obligations, costs, corporate into policy and business needs.

 If the owner is not sure at what level data should be classified, use level .

 The owner must declare who is allowed access to the data.

 The owner is responsible for this data and must secure it or have it secured (e.g. via a security administrator) according to it's classification.

 All documents should be classified and the classification level should be written on at least the title page.

Once the data on a system has been classified to one of the following levels, then that system should be installed to conform to all directives for that class and classes below. Each level is a superset of the previous level. For example, if a system is classified as class , then the system must follow the directives of class , and .

If a system contains data or more than one sensitivity class, it must be classified according that needed for the most confidential data on the system.

4.2.2 Class : Public / non classified information

Data on these systems could be made public without any implications for the company (i.e. the data is not confidential). Data integrity is not vital. Loss of service due to malicious attacks is an acceptable danger.

UNIT 3 Page 14

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

Examples: Test services without confidential data, certain public information services, product brochures widely distributed, data available in the public domain anyway.

4.2.3 Class : Internal information

External access to this data is to be prevented, but should this data become public, the consequences are not critical (e.g. the company may be publicly embarrassed). Internal access is selective. Data integrity is important but not vital.

Examples of this type of data are found in development groups (where no live data is present), certain production public services, certain Customer Data, "normal" working documents and project/meeting protocols, Telephone books.

4.2.4 Class : Confidential information

Data in this class is confidential within the company and protected from external access. If such data were to be accessed by unauthorised persons, it could influence the company's operational effectiveness, cause an important financial loss, provide a significant gain to a competitor or cause a major drop in customer confidence. Data integrity is vital.

Examples: Datacenters normally maintain this level of security. Salaries, Personnel data, Accounting data, passwords, information on corporate security weaknesses, very confidential customer data and confidential contracts.

4.2.5 Class : Secret information

Unauthorised external or internal access to this data would be critical to the company. Data integrity is vital. The number of people with access to this data should be very small. Very strict rules must be adhered to in the usage of this data.

Examples: Military data, secret contracts. Data compression

UNIT 3 Page 15

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

In computer science and information theory, data compression, source coding,[1] or bit-rate reduction involves encoding information using fewer bits than the original representation.

Compression is useful because it helps reduce the consumption of resources such as data space or transmission capacity. Because compressed data must be decompressed to be used, this extra processing imposes computational or other costs through decompression. For instance, a compression scheme for may require expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed, and the option to decompress the video in full before watching it may be inconvenient or require additional storage. The design of data compression schemes involve trade-offs among various factors, including the degree of compression, the amount of distortion introduced (e.g., when using lossy data compression), and the computational resources required to compress and uncompress the data. Compression was one of the main drivers for the growth of information during the past two decades.[2] Lossless and

Main articles: Lossless data compression and Lossy data compression

Lossless compression usually exploit statistical redundancy to represent data more concisely without losing information. is possible because most real- world data has statistical redundancy. For example, an image may have areas of colour that do not change over several ; instead of coding "red , red pixel, ..." the data may be encoded as "279 red pixels". This is a simple example of run-length encoding; there are many schemes to reduce size by eliminating redundancy.

Lossless compression is contrasted with lossy data compression. In these schemes, some loss of information is acceptable. Depending upon the application, detail can be dropped from the data to save storage space. Generally, lossy data compression schemes are guided by research on how people perceive the data in question. For example, the human eye is more sensitive to subtle variations in luminance than it is to variations in color. JPEG works in part by "rounding off" less-important visual information. There is a corresponding

UNIT 3 Page 16

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY trade-off between information lost and the size reduction. A number of popular compression formats exploit these perceptual differences, including those used in music files, images, and video.

Lossy

Lossy image compression is used in digital cameras, to increase storage capacities with minimal degradation of picture quality. Similarly, DVDs use the lossy MPEG-2 Video for video compression.

In lossy audio compression, methods of psychoacoustics are used to remove non-audible (or less audible) components of the signal. Compression of human speech is often performed with even more specialized techniques, so that "speech compression" or "voice coding" is sometimes distinguished as a separate discipline from "audio compression". Different audio and speech compression standards are listed under audio . Voice compression is used in Internet telephony for example, while audio compression is used for CD ripping and is decoded by audio players.

Lossless

The Lempel–Ziv (LZ) compression methods are among the most popular algorithms for lossless storage. is a variation on LZ which is optimized for decompression speed and compression ratio, but compression can be slow. DEFLATE is used in PKZIP, gzip and PNG. LZW (Lempel–Ziv–Welch) is used in GIF images. Also noteworthy are the LZR (LZ– Renau) methods, which serve as the basis of the Zip method. LZ methods utilize a table- based compression model where table entries are substituted for repeated strings of data. For most LZ methods, this table is generated dynamically from earlier data in the input. The table itself is often Huffman encoded (e.g. SHRI, LZX). A current LZ-based coding scheme that performs well is LZX, used in Microsoft's CAB format.

The very best modern lossless compressors use probabilistic models, such as prediction by partial matching. The Burrows–Wheeler transform can also be viewed as an indirect form of statistical modelling.

UNIT 3 Page 17

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

In a further refinement of these techniques, statistical predictions can be coupled to an called . Arithmetic coding, invented by Jorma Rissanen, and turned into a practical method by Witten, Neal, and Cleary, achieves superior compression to the better-known Huffman algorithm, and lends itself especially well to adaptive data compression tasks where the predictions are strongly context-dependent. Arithmetic coding is used in the bilevel image-compression standard JBIG, and the document-compression standard DjVu. The text entry system, Dasher, is an inverse-arithmetic-coder. Theory

The theoretical background of compression is provided by information theory (which is closely related to algorithmic information theory) for lossless compression, and by rate– distortion theory for lossy compression. These fields of study were essentially created by Claude Shannon, who published fundamental papers on the topic in the late 1940s and early 1950s. Coding theory is also related. The idea of data compression is deeply connected with statistical inference.

Machine learning

See also: Machine learning

There is a close connection between machine learning and compression: a system that predicts the posterior probabilities of a sequence given its entire history can be used for optimal data compression (by using arithmetic coding on the output distribution), while an optimal compressor can be used for prediction (by finding the symbol that compresses best, given the previous history). This equivalence has been used as justification for data compression as a benchmark for "general intelligence".[3]

Data differencing

Main article: Data differencing

UNIT 3 Page 18

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

Data compression can be viewed as a special case of data differencing:[4][5] data differencing consists of producing a difference given a source and a target, with patching producing a target given a source and a difference, while data compression consists of producing a compressed file given a target, and decompression consists of producing a target given only a compressed file. Thus, one can consider data compression as data differencing with empty source data, the compressed file corresponding to a "difference from nothing". This is the same as considering absolute entropy (corresponding to data compression) as a special case of relative entropy (corresponding to data differencing) with no initial data.

When one wishes to emphasize the connection, one may use the term differential compression to refer to data differencing. Outlook and currently unused potential

It is estimated that the total amount of the information that is stored on the world's storage devices could be further compressed with existing compression algorithms by a remaining average factor of 4.5 : 1. It is estimated that the combined technological capacity of the world to store information provides 1,300 exabytes of hardware digits in 2007, but when the corresponding content is optimally compressed, this only represents 295 exabytes of Shannon information.[2] Uses

Audio

Audio data compression, as distinguished from dynamic range compression]], reduces the transmission bandwidth and storage requirements of audio data. Audio compression algorithms are implemented in software as audio codecs. Lossy audio compression algorithms provide higher compression at the cost of fidelity, are used in numerous audio applications. These algorithms almost all rely on psychoacoustics to eliminate less audible or meaningful sounds, thereby reducing the space required to store or transmit them.

UNIT 3 Page 19

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

In both lossy and lossless compression, information redundancy is reduced, using methods such as coding, pattern recognition and linear prediction to reduce the amount of information used to represent the uncompressed data.

The acceptable trade-off between loss of audio quality and transmission or storage size depends upon the application. For example, one 640MB compact disc (CD) holds approximately one hour of uncompressed high fidelity music, less than 2 hours of music compressed losslessly, or 7 hours of music compressed in the MP3 format at a medium . A digital sound recorder can typically store around 200 hours of clearly intelligible speech in 640MB[6].

Lossless audio compression produces a representation of digital data that decompresses to an exact digital duplicate of the original audio stream, unlike playback from lossy compression techniques such as Vorbis and MP3. Compression ratios are around 50–60% of original size[7], similar to those for generic lossless data compression. Lossy compression depends upon the quality required, but typically yields files of 5 to 20% of the size of the uncompressed original.[8] Lossless compression is unable to attain high compression ratios due to the complexity of wave forms and the rapid changes in sound forms. Codecs like FLAC, Shorten and TTA use linear prediction to estimate the spectrum of the signal. Many of these algorithms use with the filter [-1 1] to slightly whiten or flatten the spectrum, thereby allowing traditional lossless compression to work more efficiently. The process is reversed upon decompression.

When audio files are to be processed, either by further compression or for editing, it is desirable to work from an unchanged original (uncompressed or losslessly compressed). Processing of a lossily compressed file for some purpose usually produces a final result inferior to creation of the same compressed file from an uncompressed original. In addition to sound editing or mixing, lossless audio compression is often used for archival storage, or as master copies.

A number of lossless audio compression formats exist. Shorten was an early lossless format. Newer ones include Free Lossless (FLAC), Apple's Apple Lossless, MPEG-4

UNIT 3 Page 20

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

ALS, Microsoft's Windows Media Audio 9 Lossless (WMA Lossless), Monkey's Audio, and TTA. See list of lossless codecs for a complete list.

Some audio formats feature a combination of a lossy format and a lossless correction; this allows stripping the correction to easily obtain a lossy file. Such formats include MPEG-4 SLS (Scalable to Lossless), WavPack, and OptimFROG DualStream.

Other formats are associated with a distinct system, such as:

 Direct Stream Transfer, used in Super Audio CD

 Meridian Lossless Packing, used in DVD-Audio, Dolby TrueHD, Blu-ray and HD DVD

Lossy audio compression

Comparison of acoustic spectrograms of a song in an uncompressed format and various lossy formats. The fact that the lossy spectrograms are different from the uncompressed one indicates that they are in fact lossy, but nothing can be assumed about the effect of the changes on perceived quality.

UNIT 3 Page 21

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

Lossy audio compression is used in a wide range of applications. In addition to the direct applications (mp3 players or computers), digitally compressed audio streams are used in most video DVDs; digital television; streaming media on the internet; satellite and cable radio; and increasingly in terrestrial radio broadcasts. Lossy compression typically achieves far greater compression than lossless compression (data of 5 percent to 20 percent of the original stream, rather than 50 percent to 60 percent), by discarding less-critical data.

The innovation of lossy audio compression was to use psychoacoustics to recognize that not all data in an audio stream can be perceived by the human auditory system. Most lossy compression reduces perceptual redundancy by first identifying sounds which are considered perceptually irrelevant, that is, sounds that are very hard to hear. Typical examples include high frequencies, or sounds that occur at the same time as louder sounds. Those sounds are coded with decreased accuracy or not coded at all.

Due to the nature of lossy algorithms, audio quality suffers when a file is decompressed and recompressed (digital ). This makes lossy compression unsuitable for storing the intermediate results in professional audio engineering applications, such as sound editing and multitrack recording. However, they are very popular with end users (particularly MP3), as a megabyte can store about a minute's worth of music at adequate quality.

Coding methods

In order to determine what information in an audio signal is perceptually irrelevant, most lossy compression algorithms use transforms such as the modified discrete cosine transform (MDCT) to convert time domain sampled waveforms into a transform domain. Once transformed, typically into the frequency domain, component frequencies can be allocated bits according to how audible they are. Audibility of spectral components is determined by first calculating a masking threshold, below which it is estimated that sounds will be beyond the limits of human perception.

The masking threshold is calculated using the absolute threshold of hearing and the principles of simultaneous masking—the phenomenon wherein a signal is masked by another signal separated by frequency, and, in some cases, temporal masking—where a signal is masked by

UNIT 3 Page 22

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY another signal separated by time. Equal-loudness contours may also be used to weight the perceptual importance of different components. Models of the human ear-brain combination incorporating such effects are often called psychoacoustic models.

Other types of lossy compressors, such as the (LPC) used with speech, are source-based coders. These coders use a model of the sound's generator (such as the human vocal tract with LPC) to whiten the audio signal (i.e., flatten its spectrum) prior to quantization. LPC may also be thought of as a basic perceptual coding technique; reconstruction of an audio signal using a linear predictor shapes the coder's quantization noise into the spectrum of the target signal, partially masking it.

Lossy formats are often used for the distribution of streaming audio, or interactive applications (such as the coding of speech for digital transmission in cell phone networks). In such applications, the data must be decompressed as the data flows, rather than after the entire data stream has been transmitted. Not all audio codecs can be used for streaming applications, and for such applications a codec designed to stream data effectively will usually be chosen.

Latency results from the methods used to encode and decode the data. Some codecs will analyze a longer segment of the data to optimize efficiency, and then code it in a manner that requires a larger segment of data at one time in order to decode. (Often codecs create segments called a "frame" to create discrete data segments for encoding and decoding.) The inherent latency of the coding algorithm can be critical; for example, when there is two-way transmission of data, such as with a telephone conversation, significant delays may seriously degrade the perceived quality.

In contrast to the speed of compression, which is proportional to the number of operations required by the algorithm, here latency refers to the number of samples which must be analysed before a block of audio is processed. In the minimum case, latency is 0 zero samples (e.g., if the coder/decoder simply reduces the number of bits used to quantize the signal). Time domain algorithms such as LPC also often have low latencies, hence their popularity in for telephony. In algorithms such as MP3, however, a large number of samples

UNIT 3 Page 23

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY have to be analyzed in order to implement a psychoacoustic model in the frequency domain, and latency is on the order of 23 ms (46 ms for two-way communication).

Speech encoding

Speech encoding is an important category of audio data compression. The perceptual models used to estimate what a human ear can hear are generally somewhat different from those used for music. The range of frequencies needed to convey the sounds of a human voice are normally far narrower than that needed for music, and the sound is normally less complex. As a result, speech can be encoded at high quality using a relatively low bit rate.

This is accomplished, in general, by some combination of two approaches:

 Only encoding sounds that could be made by a single human voice.

 Throwing away more of the data in the signal—keeping just enough to reconstruct an "intelligible" voice rather than the full frequency range of human hearing.

Perhaps the earliest algorithms used in speech encoding (and audio data compression in general) were the A-law algorithm and the µ-law algorithm.

History

Solidyne 922: The world's first commercial audio bit compression card for PC, 1990

A literature compendium for a large variety of audio coding systems was published in the IEEE Journal on Selected Areas in Communications (JSAC), February 1988. While there were some papers from before that time, this collection documented an entire variety of

UNIT 3 Page 24

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY finished, working audio coders, nearly all of them using perceptual (i.e. masking) techniques and some kind of frequency analysis and back-end noiseless coding.[9] Several of these papers remarked on the difficulty of obtaining good, clean digital audio for research purposes. Most, if not all, of the authors in the JSAC edition were also active in the MPEG-1 Audio committee.

The world's first commercial broadcast automation audio compression system was developed by Oscar Bonello, an Engineering professor at the University of Buenos Aires.[10] In 1983, using the psychoacoustic principle of the masking of critical bands first published in 1967,[11] he started developing a practical application based on the recently developed IBM PC computer, and the broadcast automation system was launched in 1987 under the name Audicom. 20 years later, almost all the radio stations in the world were using similar technology, manufactured by a number of companies.

Video

Video compression uses modern coding techniques to reduce redundancy in video data. Most video compression algorithms and codecs combine spatial image compression and temporal . Video compression is a practical implementation of source coding in information theory. In practice most video codecs also use audio compression techniques in parallel to compress the separate, but combined data streams.

The majority of video compression algorithms use lossy compression. Large amounts of data may be eliminated while being perceptually indistinguishable. As in all lossy compression, there is a tradeoff between , cost of processing the compression and decompression, and system requirements. Highly compressed video may present visible or distracting artifacts.

Video compression typically operates on square-shaped groups of neighboring pixels, often called . These pixel groups or blocks of pixels are compared from one frame to the next and the video compression codec sends only the differences within those blocks. In areas of video with more motion, the compression must encode more data to keep up with the larger number of pixels that are changing. Commonly during explosions, flames, flocks of

UNIT 3 Page 25

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY animals, and in some panning shots, the high-frequency detail leads to quality decreases or to increases in the .

Encoding theory

Video data may be represented as a series of still image frames. The sequence of frames contains spatial and temporal redundancy that video compression algorithms attempt to eliminate or code in a smaller size. Similarities can be encoded by only storing differences between frames, or by using perceptual features of human vision. For example, small differences in color are more difficult to perceive than are changes in brightness. Compression algorithms can average a color across these similar areas to reduce space, in a manner similar to those used in JPEG image compression.[12] Some of these methods are inherently lossy while others may preserve all relevant information from the original, uncompressed video.

One of the most powerful techniques for compressing video is interframe compression. Interframe compression uses one or more earlier or later frames in a sequence to compress the current frame, while intraframe compression uses only the current frame, effectively being image compression.

The most commonly used method works by comparing each frame in the video with the previous one. If the frame contains areas where nothing has moved, the system simply issues a short command that copies that part of the previous frame, bit-for-bit, into the next one. If sections of the frame move in a simple manner, the compressor emits a (slightly longer) command that tells the decompresser to shift, rotate, lighten, or darken the copy: a longer command, but still much shorter than intraframe compression. Interframe compression works well for programs that will simply be played back by the viewer, but can cause problems if the video sequence needs to be edited.

Because interframe compression copies data from one frame to another, if the original frame is simply cut out (or lost in transmission), the following frames cannot be reconstructed properly. Some video formats, such as DV, compress each frame independently using intraframe compression. Making 'cuts' in intraframe-compressed video is almost as easy as

UNIT 3 Page 26

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY editing uncompressed video: one finds the beginning and ending of each frame, and simply copies bit-for-bit each frame that one wants to keep, and discards the frames one doesn't want. Another difference between intraframe and interframe compression is that with intraframe systems, each frame uses a similar amount of data. In most interframe systems, certain frames (such as "I frames" in MPEG-2) aren't allowed to copy data from other frames, and so require much more data than other frames nearby.

It is possible to build a computer-based video editor that spots problems caused when I frames are edited out while other frames need them. This has allowed newer formats like HDV to be used for editing. However, this process demands a lot more computing power than editing intraframe compressed video with the same picture quality.

Today, nearly all commonly used video compression methods (e.g., those in standards approved by the ITU-T or ISO) apply a discrete cosine transform (DCT) for spatial redundancy reduction. Other methods, such as compression, matching pursuit and the use of a discrete transform (DWT) have been the subject of some research, but are typically not used in practical products (except for the use of wavelet coding as still-image coders without motion compensation). Interest in fractal compression seems to be waning, due to recent theoretical analysis showing a comparative lack of effectiveness of such methods.

UNIT 3 Page 27

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

SUMMARIZING AND FILTERING

Information System with Summarization

Use

The Information System for investment programs consists of two parts. There are the drilldown reports for the "normal" database of your system or client. Along with these, there are also drilldown reports that access a summarization database. The summarization database makes it possible for you to update and report on a summarized dataset for an investment program and its dependent appropriation requests and measures. This offers several advantages:

 The drilldown reports that access the summarized dataset have better performance than unsummarized reports.

 Data from several local systems can be imported into the summarization database. These local systems can be SAP R/3 Systems, SAP R/2 Systems or non-SAP systems.

 You can manage different versions of the summarization database. You can thereby manage "snapshots" of the investment data of your enterprise. Using versions, you can create a sequence showing the data on the investments in your enterprise over time.

 You can still generate reports on data from the past, even if the objects to be reported on have themselves already been deleted from the system.

Features

The central part of summarized reporting is the summarization database. In the summarization database, you store summarized characteristics and key figures on investment programs and the appropriation requests and measures assigned to them. You store these on a periodic basis. The characteristics and key figures contain three types of data:

 The actual values

UNIT 3 Page 28

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

 The master data

 The investment program hierarchy

The summarization database is client-dependent. You can define only one summarization database in each client for investment programs. But you can still define any number of summarization versions in a given client.

Data Transfer

Depending on the source of the summarization data, there are different system functions for transferring the data to the summarization database (Investment Programs Periodic processing Summarization).

 The summarization data come from the same client or R/3 System as the client or system in which the summarization database is managed. In this case, you can select summarization data immediately (Copy program) and write it to the summarization database (Summarize values).

 The summarization database is supplied with values from other R/3 systems. Then you have to first export the data to a file (Output to file). Then you can import the data to the central system and summarize it there (Import from file).

When you export data to a file, you can select either already summarized data, or unsummarized data. Selecting summarized data makes sense when the system providing the data has its own summarization database that is supplied with data from other local systems (such as, other subsidiaries).

 The data comes from an SAP R/2 RK-P System. In this case, there is a data procurement program you can use to select the relevant projects. It then writes them to the summarization database in the R/3 System. You can also assign a project in SAP R/2 RK-P to an investment program position in the reporting system. You can either assign the project directly, or by means of the appropriation request.

UNIT 3 Page 29

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

 The data comes from an external, non-SAP system. There is an interface for updating the summarization database from external systems. For more information, see Special Considerations for Summarizing from Local Systems

Entities

Along with the investment program structure and the values, you can also store the following entities in the summarization database with their keys and short texts:

 Measures

 Appropriation requests

 Profit centers

 Cost centers

 Plants

 Functional locations

Storing the key and short text of the entities in the summarization database is necessary in order to be able to display the entities in report. The same applies when the summarization takes place within one system or client. There is a function for this in the Investment Programs menu under Periodic processing .

Storing in the summarization database also has another advantage. The entities are still available, even if they are changed or reorganized in their original database.

Characteristics that Can Be Summarized

The system achieves the "summarization" by summarizing using specific characteristics, rather than storing all characteristics and all characteristic values of the investment program. For example, if you want to create reports using the summarization database that only have to do with organizational units, you do not have to store investment data at the object level. Instead, you summarize using the characteristics "appropriation request," and "measure." In

UNIT 3 Page 30

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY this way, you can considerably reduce the number of data records in the summarization database.

In Customizing, you specify which characteristics the system should summarize, and which should not be summarized (Information System Using Summarization).

You cannot summarize using the characteristics "fiscal year," "plan version," or "value type." Otherwise the information about the type of key figure is lost. Summarizing using the characteristics "investment program" or "approval year" is also not possible, because the summarization data always has to be stored in relation to an investment program. Since most standard reports use investment program positions as drilldown characteristics, it generally does not make sense to use "investment program position" as a characteristic for summarization.

Summarization by Scale

If you manage the characteristics "appropriation request" or "measure" in the summarization database in detail display form (meaning that the data are not summarized), then you can specify in Customizing that objects that have a certain scale are summarized anyway. To do this, remove the "detail display" indicator from the definition of the scale you want to use (Appropriation Requests Master Data Define allowed values for certain master data fields Define scale).

Using this method, you can exclude measures and appropriation requests that have an insignificant value from summarization reporting. At the same time you benefit from a detail display of more important measures and appropriation requests.

You enter scales in the master data of investment program positions, appropriation requests and measures. For objects that are not managed in an SAP R/3 System, the system determines the scale from the appropriation request or investment program position to which they are assigned.

UNIT 3 Page 31

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

Summarization Versions

In order for you to have multiple "snapshots" of your investment programs at different times, the system saves the data in the summarization database in summarization versions. The summarization versions allow you to "preserve" the investment program in different phases. You have to enter a summarization version for the procurement of data for the summarization, as well as when you run summarization reports. If you request data collection for a summarization version that already has data in the summarization database, the system overwrites the existing data.

Summarization Versions

You can define any number of summarization versions in Customizing (Information System Information System Using Summarization). A summarization version consists solely of a key and a text.

Activities

You create a summarization database in the following steps (Periodic processing Summarization):

 Copy the investment program to the summarization database (Summarization In own client Copy program).

 Export the investment program values from the original system to a file (Summarization Output to file).

 Export the entities from the original system, each to its own file.

 Import the investment program values and entities (Summarization Import from file).

Exporting and importing are not necessary if the data does not come from other clients or systems (that is, all the data is in the same system or client as the summarization database). In

UNIT 3 Page 32

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY that case, you have to use the functions Summarize values and Copy entities (under Summarization (In own client).

Information filtering system

An Information filtering system is a system that removes redundant or unwanted information from an information stream using (semi)automated or computerized methods prior to presentation to a human user. Its main goal is the management of the information overload and increment of the semantic signal-to-noise ratio. To do this the user's profile is compared to some reference characteristics. These characteristics may originate from the information item (the content-based approach) or the user's social environment (the collaborative filtering approach).

Whereas in information transmission signal processing filters are used against syntax- disrupting noise on the bit-level, the methods employed in information filtering act on the semantic level.

The range of machine methods employed builds on the same principles as those for information extraction. A notable application can be found in the field of email spam filters. Thus, it is not only the information explosion that necessitates some form of filters, but also inadvertently or maliciously introduced pseudo-information.

On the presentation level, information filtering takes the form of user-preferences-based newsfeeds, etc..

Recommender systems are active information filtering systems that attempt to present to the user information items (film, television, music, books, news, web pages) the user is interested in. These systems add information items to the information flowing towards the user, as opposed to removing information items from the information flow towards the user. Recommender systems typically use collaborative filtering approaches or a combination of the collaborative filtering and content-based filtering approaches, although content-based recommender systems do exist.

History

UNIT 3 Page 33

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

Before the advent of the Internet, there are already several methods of filtering information; for instance, if a government controls and restricts the flow of information, speaking of censorship, although, somewhat in a democratic country it will do to satisfy needs of beneficiaries.

On the other hand, we are going to talk about information filters if we refer to newspaper editors and journalists when they provide a service that selects the most valuable information for their clients, readers of books, magazines, newspapers, radio listeners and viewers TV. This filtering operation is also present in schools and universities where there is a selection of information to provide assistance based on academic criteria to customers of this service, the students. With the advent of the Internet it increases the possibility that anyone can publish low-cost all one wish. In this way, it increases considerably the less useful information and consequently the quality information is disseminated. With this problem, it began to devise new filtering with which we can get the information required for each specific topic to easily and efficiently.

Operation

A filtering system of this style consists of several tools that help people find the most valuable information, so the limited time you can dedicate to read / listen / view, is correctly directional in the most interesting and valuable documents, aside from the most inconsequential. These filters are also used to organize and structure information in a correct and understandable way, in addition to group messages on the mail addressed. These filters are very necessary in the results obtained of the search engines on the Internet. The functions of filtering improves every day to get downloading Web documents and more efficient messages.

Criterion

One of the criteria used in this step is whether the knowledge is harmful or not, whether knowledge allows a better understanding with or without the concept. In this case the task of information filtering to reduce or eliminate the harmful information with knowledge.

Learning System UNIT 3 Page 34

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

A system of learning content consists, in general rules, mainly of three basic stages:

1. First, a system that provides solutions to a defined set of tasks.

2. Subsequently it undergoes assessment criteria which will measure the performance of the previous stage in relation to solutions of problems.

3. Acquisition module which its output obtained knowledge that are used in the system solver of the first stage.

Future

Currently the problem is not finding the best way to filter information, but the way that these systems require to learn independently the information needs of users. Not only because they automate the process of filtering but also the construction and adaptation of the filter. Some branches based on it, such as statistics, machine learning, pattern recognition and data mining, are the base for developing information filters that appear and adapt in base to experience. To allow the learning process can be carried out, part of the information has to be pre-filtered, it means there are positive and negative examples which we named training data, which can be generated by experts or, via feedback through ordinary users.

Error

As data is entered, the system includes new rules; if we consider that this data can generalize the training data information, then we have to evaluate the system development and measure the system's ability to correctly predict the categories of new information. This step is simplified by separating the training data in a new series called "test data" that we will use to measure the error rate. As a general rule it is important to distinguish between types of errors (false positives and false negatives). For example, in the case on an aggregator of content for children, it doesn’t have the same gravity to allow the passage of information not suitable for them, that shows violence or pornography, than the mistake to discard some appropriated information. To improve the system to lower error rates and have these systems with learning capabilities similar to humans we require development of systems that simulate human

UNIT 3 Page 35

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY cognitive abilities, such as natural language understanding, capturing meaning Common an other forms of advanced processing to achieve the semantics of information.

Fields of use

Nowadays, there are numerous techniques to develop information filters, some of these reach error rates lower than 10% in various experiments. Among these techniques there are decision trees, support vector machines, neural networks, Bayesian networks, linear discriminants, logistic regression, etc.. At present, these techniques are used in different applications, not only in the web context, but in thematic issues as varied as voice recognition, classification of telescopic astronomy or evaluation of financial risk. INFRENCES AND UNCERTAINTY

Inference

Inference is the act or process of deriving logical conclusions from premises known or assumed to be true.[1] The conclusion drawn is also called an idiomatic. The laws of valid inference are studied in the field of logic.

Human inference (i.e. how humans draw conclusions) is traditionally studied within the field of cognitive psychology; artificial intelligence researchers develop automated inference systems to emulate human inference. Statistical inference allows for inference from quantitative data.

Definition of inference

The process by which a conclusion is inferred from multiple observations is called inductive reasoning. The conclusion may be correct or incorrect, or correct to within a certain degree of accuracy, or correct in certain situations. Conclusions inferred from multiple observations may be tested by additional observations.

This definition is disputable (due to its lack of clarity. Ref: Oxford English dictionary:

UNIT 3 Page 36

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

"induction ... 3. Logic the inference of a general law from particular instances.") The definition given thus applies only when the "conclusion" is general.

1. A conclusion reached on the basis of evidence and reasoning. 2. The process of reaching such a conclusion: "order, health, and by inference cleanliness".

Examples of inference

Greek philosophers defined a number of syllogisms, correct three part inferences, that can be used as building blocks for more complex reasoning. We begin with the most famous of them all:

1. All men are mortal

2. Socrates is a man

3. Therefore, Socrates is mortal.

The reader can check that the premises and conclusion are true, but Logic is concerned with inference: does the truth of the conclusion follow from that of the premises?

The validity of an inference depends on the form of the inference. That is, the word "valid" does not refer to the truth of the premises or the conclusion, but rather to the form of the inference. An inference can be valid even if the parts are false, and can be invalid even if the parts are true. But a valid form with true premises will always have a true conclusion.

For example, consider the form of the following symbological track:

1. All fruits are sweet.

2. A banana is a fruit.

3. Therefore, a banana is sweet.

For the conclusion to be necessarily true, the premises need to be true.

Now we turn to an invalid form.

1. -All A are B.

2. -C is a B.

UNIT 3 Page 37

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

3. -Therefore, C is an A.

To show that this form is invalid, we demonstrate how it can lead from true premises to a false conclusion.

1. All apples are fruit. (Correct)

2. Bananas are fruit. (Correct)

3. Therefore, bananas are apples. (Wrong)

A valid argument with false premises may lead to a false conclusion:

1. All fat people are Greek.

2. John Lennon was fat.

3. Therefore, John Lennon was Greek.

When a valid argument is used to derive a false conclusion from false premises, the inference is valid because it follows the form of a correct inference.

A valid argument can also be used to derive a true conclusion from false premises:

1. All fat people are musicians

2. John Lennon was fat

3. Therefore, John Lennon was a musician

In this case we have two false premises that imply a true conclusion.

Incorrect inference

An incorrect inference is known as a fallacy. Philosophers who study informal logic have compiled large lists of them, and cognitive psychologists have documented many biases in human reasoning that favor incorrect reasoning.

Automatic logical inference

AI systems first provided automated logical inference and these were once extremely popular research topics, leading to industrial applications under the form of expert systems and later business rule engines. UNIT 3 Page 38

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

An inference system's job is to extend a knowledge base automatically. The knowledge base (KB) is a set of propositions that represent what the system knows about the world. Several techniques can be used by that system to extend KB by means of valid inferences. An additional requirement is that the conclusions the system arrives at are relevant to its task.

Example using Prolog

Prolog (for "Programming in Logic") is a programming language based on a subset of predicate calculus. Its main job is to check whether a certain proposition can be inferred from a KB (knowledge base) using an algorithm called backward chaining.

Let us return to our Socrates syllogism. We enter into our Knowledge Base the following piece of code: mortal(X) :- man(X). man(socrates).

( Here :- can be read as if. Generally, if P Q (if P then Q) then in Prolog we would code Q:-P (Q if P).) This states that all men are mortal and that Socrates is a man. Now we can ask the Prolog system about Socrates:

?- mortal(socrates).

(where ?- signifies a query: Can mortal(socrates). be deduced from the KB using the rules) gives the answer "Yes".

On the other hand, asking the Prolog system the following:

?- mortal(plato). gives the answer "No".

This is because Prolog does not know anything about Plato, and hence defaults to any property about Plato being false (the so-called closed world assumption). Finally ?- mortal(X) (Is anything mortal) would result in "Yes" (and in some implementations: "Yes": X=socrates)

UNIT 3 Page 39

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

Prolog can be used for vastly more complicated inference tasks. See the corresponding article for further examples.

Use with the semantic web

Recently automatic reasoners found in semantic web a new field of application. Being based upon first-order logic, knowledge expressed using one variant of OWL can be logically processed, i.e., inferences can be made upon it.

Bayesian statistics and probability logic

Philosophers and scientists who follow the Bayesian framework for inference use the mathematical rules of probability to find this best explanation. The Bayesian view has a number of desirable features—one of them is that it embeds deductive (certain) logic as a subset (this prompts some writers to call Bayesian probability "probability logic", following E. T. Jaynes).

Bayesians identify probabilities with degrees of beliefs, with certainly true propositions having probability 1, and certainly false propositions having probability 0. To say that "it's going to rain tomorrow" has a 0.9 probability is to say that you consider the possibility of rain tomorrow as extremely likely.

Through the rules of probability, the probability of a conclusion and of alternatives can be calculated. The best explanation is most often identified with the most probable (see Bayesian decision theory). A central rule of Bayesian inference is Bayes' theorem, which gave its name to the field.

See Bayesian inference for examples.

Nonmonotonic logic[2]

A relation of inference is monotonic if the addition of premises does not undermine previously reached conclusions; otherwise the relation is nonmonotonic. Deductive inference, is monotonic: if a conclusion is reached on the basis of a certain set of premises, then that conclusion still holds if more premises are added.

UNIT 3 Page 40

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

By contrast, everyday reasoning is mostly nonmonotonic because it involves risk: we jump to conclusions from deductively insufficient premises. We know when it is worth or even necessary (e.g. in medical diagnosis) to take the risk. Yet we are also aware that such inference is defeasible—that new information may undermine old conclusions. Various kinds of defeasible but remarkably successful inference have traditionally captured the attention of philosophers (theories of induction, Peirce’s theory of abduction, inference to the best explanation, etc.). More recently logicians have begun to approach the phenomenon from a formal point of view. The result is a large body of theories at the interface of philosophy, logic and artificial intelligence.

Applications

Computer applications

Bayesian inference has applications in artificial intelligence and expert systems. Bayesian inference techniques have been a fundamental part of computerized pattern recognition techniques since the late 1950s. There is also an ever growing connection between Bayesian methods and simulation-based Monte Carlo techniques since complex models cannot be processed in closed form by a Bayesian analysis, while a graphical model structure may allow for efficient simulation algorithms like the Gibbs sampling and other Metropolis–Hastings algorithm schemes.[13] Recently Bayesian inference has gained popularity amongst the phylogenetics community for these reasons; a number of applications allow many demographic and evolutionary parameters to be estimated simultaneously. In the areas of population genetics and dynamical systems theory, approximate Bayesian computation (ABC) is also becoming increasingly popular.

As applied to statistical classification, Bayesian inference has been used in recent years to develop algorithms for identifying e-mail spam. Applications which make use of Bayesian inference for spam filtering include DSPAM, Bogofilter, SpamAssassin, SpamBayes, and Mozilla. Spam classification is treated in more detail in the article on the naive Bayes classifier.

UNIT 3 Page 41

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

Solomonoff's Inductive inference is the theory of prediction based on observations; for example, predicting the next symbol based upon a given series of symbols. The only assumption is that the environment follows some unknown but computable probability distribution. It combines two well-studied principles of inductive inference: Bayesian statistics and Occam’s Razor. Solomonoff's universal prior probability of any prefix p of a computable sequence x is the sum of the probabilities of all programs (for a universal computer) that compute something starting with p. Given some p and any computable but unknown probability distribution from which x is sampled, the universal prior and Bayes' theorem can be used to predict the yet unseen parts of x in optimal fashion. [14][15]

Rule of inference

In logic, a rule of inference, inference rule, or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises, analyses their syntax, and returns a conclusion (or conclusions). For example, the rule of inference modus ponens takes two premises, one in the form of "If p then q" and another in the form of "p" and returns the conclusion "q". The rule is valid with respect to the semantics of classical logic (as well as the semantics of many other non-classical logics), in the sense that if the premises are true (under an interpretation) then so is the conclusion.

Typically, a rule of inference preserves truth, a semantic property. In many-valued logic, it preserves a general designation. But a rule of inference's action is purely syntactic, and does not need to preserve any semantic property: any function from sets of formulae to formulae counts as a rule of inference. Usually only rules that are recursive are important; i.e. rules such that there is an effective procedure for determining whether any given formula is the conclusion of a given set of formulae according to the rule. An example of a rule that is not effective in this sense is the infinitary ω-rule.[1]

Popular rules of inference include modus ponens, modus tollens from propositional logic and contraposition. First-order predicate logic uses rules of inference to deal with logical quantifiers.

UNIT 3 Page 42

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

Overview

In formal logic (and many related areas), rules of inference are usually given in the following standard form:

Premise#1 Premise#2 ... Premise#n Conclusion

This expression states, that whenever in the course of some logical derivation the given premises have been obtained, the specified conclusion can be taken for granted as well. The exact formal language that is used to describe both premises and conclusions depends on the actual context of the derivations. In a simple case, one may use logical formulae, such as in:

A→B A ∴ B

This is just the modus ponens rule of propositional logic. Rules of inference are usually formulated as rule schemata by the use of universal variables. In the rule (schema) above, A and B can be instantiated to any element of the universe (or sometimes, by convention, some restricted subset such as propositions) to form an infinite set of inference rules.

A proof system is formed from a set of rules chained together to form proofs, or derivations. Any derivation has only one final conclusion, which is the statement proved or derived. If premises are left unsatisfied in the derivation, then the derivation is a proof of a hypothetical statement: "if the premises hold, then the conclusion holds."

UNIT 3 Page 43

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

Admissibility and derivability

Main article: Admissible rule

In a set of rules, an inference rule could be redundant in the sense that it is admissible or derivable. A derivable rule is one whose conclusion can be derived from its premises using the other rules. An admissible rule is one whose conclusion holds whenever the premises hold. All derivable rules are admissible. To appreciate the difference, consider the following set of rules for defining the natural numbers (the judgment asserts the fact that is a natural number):

The first rule states that 0 is a natural number, and the second states that s(n) is a natural number if n is. In this proof system, the following rule demonstrating that the second successor of a natural number is also a natural number, is derivable:

Its derivation is just the composition of two uses of the successor rule above. The following rule for asserting the existence of a predecessor for any nonzero number is merely admissible:

This is a true fact of natural numbers, as can be proven by induction. (To prove that this rule is admissible, assume a derivation of the premise and induct on it to produce a derivation of .) However, it is not derivable, because it depends on the structure of the derivation of the premise. Because of this, derivability is stable under additions to the proof system, whereas admissibility is not. To see the difference, suppose the following nonsense rule were added to the proof system:

UNIT 3 Page 44

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

In this new system, the double-successor rule is still derivable. However, the rule for finding the predecessor is no longer admissible, because there is no way to derive . The brittleness of admissibility comes from the way it is proved: since the proof can induct on the structure of the derivations of the premises, extensions to the system add new cases to this proof, which may no longer hold.

Admissible rules can be thought of as theorems of a proof system. For instance, in a sequent calculus where cut elimination holds, the cut rule is admissible. Other considerations

Inference rules may also be stated in this form: (1) some (perhaps zero) premises, (2) a turnstile symbol , which means "infers", "proves" or "concludes", (3) a conclusion. This usually embodies the relational (as opposed to functional) view of a rule of inference, where the turnstile stands for a deducibility relation holding between premises and conclusion.

Rules of inference must be distinguished from axioms of a theory. In terms of semantics, axioms are valid assertions. Axioms are usually regarded as starting points for applying rules of inference and generating a set of conclusions. Or, in less technical terms:

Rules are statements ABOUT the system, axioms are statements IN the system. For example:

 The RULE that from you can infer is a statement that says if you've proven p, then it is provable that p is provable. This holds in Peano arithmetic, for example.

 The Axiom would mean that every true statement is provable. This does not hold in Peano arithmetic.

Rules of inference play a vital role in the specification of logical calculi as they are considered in proof theory, such as the sequent calculus and natural deduction.

Uncertainty

UNIT 3 Page 45

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

Uncertainty is a term used in subtly different ways in a number of fields, including physics, philosophy, statistics, economics, finance, insurance, psychology, sociology, engineering, and information science. It applies to predictions of future events, to physical measurements already made, or to the unknown.

Concepts

Although the terms are used in various ways among the general public, many specialists in decision theory, statistics and other quantitative fields have defined uncertainty, risk, and their measurement as:

1. Uncertainty: The lack of certainty, A state of having limited knowledge where it is impossible to exactly describe the existing state, a future outcome, or more than one possible outcome.

2. Measurement of Uncertainty: A set of possible states or outcomes where probabilities are assigned to each possible state or outcome – this also includes the application of a probability density function to continuous variables

3. Risk: A state of uncertainty where some possible outcomes have an undesired effect or significant loss.

4. Measurement of Risk: A set of measured uncertainties where some possible outcomes are losses, and the magnitudes of those losses – this also includes loss functions over continuous variables.[1]

Knightian uncertainty. In his seminal work Risk, Uncertainty, and Profit[2] University of Chicago economist Frank Knight (1921) established the important distinction between risk and uncertainty:

Uncertainty must be taken in a sense radically distinct from the familiar notion of “ risk, from which it has never been properly separated.... The essential fact is that 'risk' means in some cases a quantity susceptible of measurement, while at other ” times it is something distinctly not of this character; and there are far-reaching and

UNIT 3 Page 46

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

crucial differences in the bearings of the phenomena depending on which of the two is really present and operating.... It will appear that a measurable uncertainty, or 'risk' proper, as we shall use the term, is so far different from an unmeasurable one that it is not in effect an uncertainty at all.

There are other taxonomies of uncertainties and decisions that include a broader sense of uncertainty and how it should be approached from an ethics perspective:[3]

A taxonomy of uncertainty

There are some things that you know to be true, and others that you know to be false; yet, despite this extensive knowledge that you have, there remain many things whose truth or falsity is not known to you. We say that you are uncertain about them. You are uncertain, to varying degrees, about everything in the future; much of the past is hidden from you; and there is a lot of the present about which you do not have full information. Uncertainty is everywhere and you cannot escape from it.

Dennis Lindley, Understanding Uncertainty (2006)

For example, if you do not know whether it will rain tomorrow, then you have a state of uncertainty. If you apply probabilities to the possible outcomes using weather forecasts or even just a calibrated probability assessment, you have quantified the uncertainty. Suppose you quantify your uncertainty as a 90% chance of sunshine. If you are planning a major,

UNIT 3 Page 47

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY costly, outdoor event for tomorrow then you have risk since there is a 10% chance of rain and rain would be undesirable. Furthermore, if this is a business event and you would lose $100,000 if it rains, then you have quantified the risk (a 10% chance of losing $100,000). These situations can be made even more realistic by quantifying light rain vs. heavy rain, the cost of delays vs. outright cancellation, etc.

Some may represent the risk in this example as the "expected opportunity loss" (EOL) or the chance of the loss multiplied by the amount of the loss (10% × $100,000 = $10,000). That is useful if the organizer of the event is "risk neutral," which most people are not. Most would be willing to pay a premium to avoid the loss. An insurance company, for example, would compute an EOL as a minimum for any insurance coverage, then add on to that other operating costs and profit. Since many people are willing to buy insurance for many reasons, then clearly the EOL alone is not the perceived value of avoiding the risk.

Quantitative uses of the terms uncertainty and risk are fairly consistent from fields such as probability theory, actuarial science, and information theory. Some also create new terms without substantially changing the definitions of uncertainty or risk. For example, surprisal is a variation on uncertainty sometimes used in information theory. But outside of the more mathematical uses of the term, usage may vary widely. In cognitive psychology, uncertainty can be real, or just a matter of perception, such as expectations, threats, etc.

Vagueness or ambiguity are sometimes described as "second order uncertainty," where there is uncertainty even about the definitions of uncertain states or outcomes. The difference here is that this uncertainty is about the human definitions and concepts, not an objective fact of nature. It has been argued that ambiguity, however, is always avoidable while uncertainty (of the "first order" kind) is not necessarily avoidable.

Uncertainty may be purely a consequence of a lack of knowledge of obtainable facts. That is, you may be uncertain about whether a new rocket design will work, but this uncertainty can be removed with further analysis and experimentation. At the subatomic level, however, uncertainty may be a fundamental and unavoidable property of the universe. In quantum mechanics, the Heisenberg Uncertainty Principle puts limits on how much an observer can

UNIT 3 Page 48

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY ever know about the position and velocity of a particle. This may not just be ignorance of potentially obtainable facts but that there is no fact to be found. There is some controversy in physics as to whether such uncertainty is an irreducible property of nature or if there are "hidden variables" that would describe the state of a particle even more exactly than Heisenberg's uncertainty principle allows. Measurements

Main article: Measurement uncertainty

In metrology, physics, and engineering, the uncertainty or margin of error of a measurement is stated by giving a range of values likely to enclose the true value. This may be denoted by error bars on a graph, or by the following notations:

 measured value ± uncertainty

 measured value +uncertainty −uncertainty

 measured value(uncertainty)

The middle notation is used when the error is not symmetrical about the value – for example

. This can occur when using a logarithmic scale, for example. The latter "concise notation" is used for example by IUPAC in stating the atomic mass of elements. There, the uncertainty given in parenthesis applies to the least significant figure(s) of the number prior to the parenthesized value (ie. counting from rightmost digit to left). For instance, 1.00794(7) stands for 1.00794±0.00007, while 1.00794(72) stands for 1.00794±0.00072.[4]

Often, the uncertainty of a measurement is found by repeating the measurement enough times to get a good estimate of the standard deviation of the values. Then, any single value has an uncertainty equal to the standard deviation. However, if the values are averaged, then the mean measurement value has a much smaller uncertainty, equal to the standard error of the mean, which is the standard deviation divided by the square root of the number of measurements.

UNIT 3 Page 49

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

When the uncertainty represents the standard error of the measurement, then about 68.2% of the time, the true value of the measured quantity falls within the stated uncertainty range. For example, it is likely that for 31.8% of the atomic mass values given on the list of elements by atomic mass, the true value lies outside of the stated range. If the width of the interval is doubled, then probably only 4.6% of the true values lie outside the doubled interval, and if the width is tripled, probably only 0.3% lie outside. These values follow from the properties of the normal distribution, and they apply only if the measurement process produces normally distributed errors. In that case, the quoted standard errors are easily converted to 68.3% ("one sigma"), 95.4% ("two sigma"), or 99.7% ("three sigma") confidence intervals.

In this context, uncertainty depends on both the accuracy and precision of the measurement instrument. The lower the accuracy and precision of an instrument, the larger the measurement uncertainty is. Notice that precision is often determined as the standard deviation of the repeated measures of a given value, namely using the same method described above to assess measurement uncertainty. However, this method is correct only when the instrument is accurate. When it is inaccurate, the uncertainty is larger than the standard deviation of the repeated measures, and it appears evident that the uncertainty does not depend only on instrumental precision. Uncertainty and the media

Uncertainty in science, and science in general, is often interpreted much differently in the public sphere than in the scientific community [5]. This is due in part to the diversity of the public audience, and the tendency for scientists to misunderstand lay audiences and therefore not communicate ideas clearly and effectively [5]. One example is explained by the information deficit model. Also, in the public realm, there are often many scientific voices giving input on a single topic [5]. For example, depending on how an issue is reported in the public sphere, discrepancies between outcomes of multiple scientific studies due to methodological differences could be interpreted by the public as a lack of consensus in a situation where a consensus does in fact exist [5]. This interpretation may even been intentionally promoted, as scientific uncertainty may be managed to reach certain goals. For

UNIT 3 Page 50

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY example, global warming skeptics took the advice of Frank Luntz to frame global warming as an issue of scientific uncertainty, which was a precursor to the conflict frame used by journalists when reporting the issue [6].

“Indeterminacy can be loosely said to apply to situations in which not all the parameters of the system and their interactions are fully known, whereas ignorance refers to situations in which it is not known what is not known,” [7]. These unknowns, indeterminacy and ignorance, that exist in science are often “transformed” into uncertainty when reported to the public in order to make issues more manageable, since scientific indeterminacy and ignorance are difficult concepts for scientists to convey without losing credibility [5]. Conversely, uncertainty is often interpreted by the public as ignorance [8]. The transformation of indeterminacy and ignorance into uncertainty may be related to the public’s misinterpretation of uncertainty as ignorance.

Journalists often either inflate uncertainty (making the science seem more uncertain than it really is) or downplay uncertainty (making the science seem more certain than it really is) [9]. One way that journalists inflate uncertainty is by describing new research that contradicts past research without providing context for the change [9]Other times, journalists give scientists with minority views equal weight as scientists with majority views, without adequately describing or explaining the state of scientific consensus on the issue [9]. In the same vein, journalists often give non-scientists the same amount of attention and importance as scientists [9].

Journalists may downplay uncertainty by eliminating “scientists’ carefully chosen tentative wording, and by losing these caveats the information is skewed and presented as more certain and conclusive than it really is” [9]. Also, stories with a single source or without any context of previous research mean that the subject at hand is presented as more definitive and certain than it is in reality [9]. There is often a “product over process” approach to science journalism that aids, too, in the downplaying of [9]Finally, and most notably for this investigation, when science is framed by journalists as a triumphant quest, uncertainty is erroneously framed as “reducible and resolvable” [9].

UNIT 3 Page 51

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

Some media routines and organizational factors affect the overstatement of uncertainty; other media routines and organizational factors help inflate the certainty of an issue. Because the general public (in the United States) generally trusts scientists, when science stories are covered without alarm-raising cues from special interest organizations (religious groups, environmental organization, political factions, etc.) they are often covered in a business related sense, in an economic-development frame or a social progress frame [10]. The nature of these frames is to downplay or eliminate uncertainty, so when economic and scientific promise are focused on early in the issue cycle, as has happened with coverage of plant biotechnology and nanotechnology in the United States, the matter in question seems more definitive and certain [10].

Sometimes, too, stockholders, owners, or advertising will pressure a media organization to promote the business aspects of a scientific issue, and therefore any uncertainty claims that may compromise the business interests are downplayed or eliminated [9].

Applications

 Investing in financial markets such as the stock market.

 Uncertainty or error is used in science and engineering notation. Numerical values should only be expressed to those digits that are physically meaningful, which are referred to as significant figures. Uncertainty is involved in every measurement, such as measuring a distance, a temperature, etc., the degree depending upon the instrument or technique used to make the measurement. Similarly, uncertainty is propagated through calculations so that the calculated value has some degree of uncertainty depending upon the uncertainties of the measured values and the equation used in the calculation.[11]

 Uncertainty is designed into games, most notably in gambling, where chance is central to play.

UNIT 3 Page 52

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

 In scientific modelling, in which the prediction of future events should be understood to have a range of expected values.

 In physics, the Heisenberg uncertainty principle forms the basis of modern quantum mechanics.

 In weather forecasting it is now commonplace to include data on the degree of uncertainty in a weather forecast.

 Uncertainty is often an important factor in economics. According to economist Frank Knight, it is different from risk, where there is a specific probability assigned to each outcome (as when flipping a fair coin). Uncertainty involves a situation that has unknown probabilities, while the estimated probabilities of possible outcomes need not add to unity.

 In entrepreneurship: New products, services, firms and even markets are often created in the absence of probability estimates. According to entrepreneurship research, expert entrepreneurs predominantly use experience based heuristics called effectuation (as opposed to causality) to overcome uncertainty.

 In risk assessment and risk management.[12]

 In metrology, measurement uncertainty is a central concept quantifying the dispersion one may reasonably attribute to a measurement result. Such an uncertainty can also be referred to as a measurement error. In daily life, measurement uncertainty is often implicit ("He is 6 feet tall" give or take a few inches), while for any serious use an explicit statement of the measurement uncertainty is necessary. The expected measurement uncertainty of many measuring instruments (scales, oscilloscopes, force gages, rulers, thermometers, etc.) is often stated in the manufacturer's specification.

The most commonly used procedure for calculating measurement uncertainty is described in the Guide to the Expression of Uncertainty in Measurement (often referred to as "the GUM") published by ISO. A derived work is for example the National Institute for Standards and Technology (NIST) publication NIST Technical

UNIT 3 Page 53

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

Note 1297 "Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results" and the Eurachem/Citac publication "Quantifying Uncertainty in Analytical Measurement" (available at the Eurachem homepage). The uncertainty of the result of a measurement generally consists of several components. The components are regarded as random variables, and may be grouped into two categories according to the method used to estimate their numerical values:

o Type A, those evaluated by statistical methods

o Type B, those evaluated by other means, e.g., by assigning a probability distribution

By propagating the variances of the components through a function relating the components to the measurement result, the combined measurement uncertainty is given as the square root of the resulting variance. The simplest form is the standard deviation of a repeated observation.

 Uncertainty has been a common theme in art, both as a thematic device (see, for example, the indecision of Hamlet), and as a quandary for the artist (such as Martin Creed's difficulty with deciding what artworks to make).

UNIT 3 Page 54

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

INFORMATION NEEDED TO SUPPORT DECISION:

Information requirements of key decision-making groups

Various levels of management in the firm have differing information requirements for decision support because of their different job responsibilities and the nature of the decisions made at each level.

l Senior management. Senior management is concerned with general yet timely information on changes in the industry and society at large that may affect both the long-term and near- term future of the firm, the firm’s strategic goals, short-term and future performance, specific bottlenecks and trouble affecting operational capabilities, and the overall ability of the firm to achieve its objectives.

UNIT 3 Page 55

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

Middle management and project teams. Middle management is concerned with specific, timely information about firm performance, including revenue and cost reduction targets, and with developing plans and budgets to meet strategic goals established by senior management. This group needs to make important decisions about allocating resources, developing short- range plans, and monitoring the performance of departments, task forces, teams, and special project groups. Often the work of middle managers is accomplished in teams or small groups of managers working on a task.

Operational management and project teams. Operational management monitors the performance of each subunit of the firm and manages individual employees. Operational managers are in charge of specific projects and allocate resources within the project budget, establish schedules, and make personnel decisions. Operational work may also be accomplished through teams.

Individual employees. Employees try to fulfill the objectives of managers above them, following established rules and procedures for their routine activities. Increasingly, however, employees are granted much broader responsibilities and decision-making authority based on their own best judgment and information in corporate systems. Employees may be making decisions about specific vendors, customers, and other employees. Because employees interact directly with the public, how well they make their decisions can directly impact the firm’s revenue streams.

PROBLEM CHARACTERISTICS AND IS CAPABILITIES IN DECISION MAKING

FUNCTION VIEW

The Challenges in IT Exploitation

1. Business and IT vision: The challenges of using IS services for business-IT (or business- technology) alignment.

2. Delivery of IS services: The challenges of delivering high quality IS services cheaper.

UNIT 3 Page 56

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

3. Design of IT architecture: The challenges of designing and implementing an IT architecture that is inter-operable with existing and future intra-enterprise systems within the company and intra-enterprise systems that are running on collaborators platforms.

The Core IS Capabilities

In order to face these three challenges, Feeny and Willcoks (1998) identify a set of nine core IS capabilities:

1. Leadership: The capability of integrating IT efforts with business purposes and activities. This capability is to manage organizational arrangements, structure processes and staffing- tacking any challenges in these arrangements

2. Business Systems Thinking: The capability of ensuring optimal business-IT arrangement; this capability is about business problem solving with IT perspective, process reengineering, and strategic development.

3. Relationship Building: The capability of facilitating wider dialogs between business and IS communities.

4. Architectural Planning: The capability of creating IT architecture that can respond to present and future business needs, and allow future growth of the business.

5. Making Technology work: The capability of rapidly achieving technical progress, making the company forerunner (leader), or a quick adapter (follower).

6. Informed buying: The capability is for analyzing the external markets for IT suitability the specific business opportunities and for selection of a sourcing strategy to meet business needs and technology solutions.

7. Contract Facilitation: The capability of ensuring the success of existing contracts for IT services; this capability is about contract facilitating to ensure that problems and conflicts are to be solved fairly. This is basically about forming a single point of contact for customer relationship.

8. Contract Monitoring: The capability of measuring performance of suppliers and managing suppliers.

UNIT 3 Page 57

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

9. Vendor Development: The capability of identifying the potential added value of IT business service suppliers. This capability is about creating the Win-Win solution with collaborating partners and for forming a long-term relationship, which may – among other benefits – avoid switching cost.

Cultivating Core IS Capabilities

1. Technical skills,

2. Business skills,

3. Interpersonal skills,

4. Time horizons (balancing short-term and long-term interests), and

5. Motivating values (multi-talented work force).

PROCESS VIEW

The Core IS Capabilities

Marchand et al (2000) found out that there were three core IS capabilities:

• Information Technology Practices (ITP): The capability of a company to effectively manage appropriate IT applications and infrastructures in support of operational decision-making and communication processes.

• Information Management Practices (IMP): The capability of a company to manage information effectively over its life cycle.

• Information Behaviors and Values (IBV): The capability of a company to install and promote behaviors and values in its people for effective use of information.

The three IS capabilities are further divided into 15 competencies:

• ITP competencies:

UNIT 3 Page 58

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

1. IT for operational support

2. IT for business process support

3. IT for innovation support

4. IT for management support

• IMP competencies:

5. Sensing information

6. Collecting information

7. Organizing information

8. Processing information

9. Maintaining information

• IBV competencies:

10. Integrity: effective sharing of sensitive information

11. Formality: usage and trust of formal sources of information

12. Control: flow of information about business performance

13. Sharing: free exchange of non-sensitive (and sensitive) information

14. Transparency: openness about failures and mistakes

15. Proactiveness: reacting to changes in the competitive environment

HIERARCHICAL VIEW

The resource level denotes the resource components that are the key ingredients of the IS competencies, such as skills (e.g., business skills, technical skills), knowledge and experience, and behavior and attitudes.

• The organizing level is concerned with how these resources are utilized, via structures, processes and roles, to create IS competencies.

UNIT 3 Page 59

VIDYARTHIPLUS.COM

www.vidyarthiplus.com

SYSTEMS, INFORMATION AND DECISION THEORY

• The enterprise level is where the IS capability is visible and is recognized in the performance of the organization.

COMPARING 3 VIEWS ON IDENTIFYING CORE IS CAPABILITIES

1. A functional view based on the challenges in IT exploitation: Feeny and Willcoks (1998) identify nine core IS capabilities within three overlapping functions,

2. A process view based on measurement of effective information use: Marchand et al. (2000) identify core IS capabilities as three independent pillars, based on supporting processes, and

3. A hierarchical view based on resource utilization: Peppard and Ward (2004) identify core IS capabilities based on resource utilization.

UNIT 3 Page 60

VIDYARTHIPLUS.COM