RESEARCH MEANING
Research
is a serious academic activity with a set of objectives to explain or analyse
or understand a problem or finding solution(s) for the problem(s) by adopting a
systematic approach in collecting, organizing and analyzing the information
relating to the problem.
Research –Definition
“Research
; may be defined as the systematic and objective analyze and recording of
controlled observation that may lead to the developments or generalizations,
principles or theories, resulting in prediction and possibility ultimate
control of events”.
Sometimes
research is defined as a movement, a movement from the known to the unknown. It
is an effort to discover something. Some people say that research is a on
effort to know “more and more about less and less”.
According
to CLIFFORD WOODY, research comprises, defining and redefining problems
formulating hypothesis or suggested solutions; collecting organizing and
evaluating data; making deductions and reaching conclusions; and at as
carefully testing the conclusions to determine whether they fit the formulating
a hypothesis.
Research
may also be defined ”Any organized enquiry discussed and carried out to provide
information for solving a problem”.
OBJECTIVES
OF RESEARCH:
Research
is a conscious approach to find out the truth which is hidden and which has
not been discovered by applying
scientific procedure. Therefore each research has its own focus. This is stated
in terms of objectives (or) purposes of conducting research. Objectives are
like guide points in research, that the researcher does not nose his focus it
is also believed that the objectives determine the nature of data to be
compiled, the scope of collection, target group sample size and several other
crucial aspects which ultimately decide the success or failure, adequacy or in
failure, adequacy or research. The objectives or a research will be explained
in the following words;
It
develops Focus: The research may be to understand for become familiar with some
phenomena or to get to know more in depth it. For example, since the days of
steam engine, the research continued to come up with more powerful locomotive
which could be operated with alternative sources of energy like diesel,
electricity etc.
It
reveals characteristics: To clearly reveal the characteristics of an individual
or a situation or a group like a society is another type of research objective.
For example in these days before a criminal is sentenced efforts are taken to
study why he had turned criminal. This helps develops an approach to create
opportunities for criminals to cha ge themselves and join the main stream of
life
It
determines frequency of occurrence: To determine the frequency with which
something occurs or with which it associated with something else. In social
research one of the major areas of repeated and continuous research is analysis
of poverty and unemployment.
It
tests hypothesis: To test a hypothesis about the casual relationship
between variable being studied. This type of research is
mainly to determine the relationship between various factors so that necessary
policy options could be framed. For example, the reasons for several
malpractices adopted in public distribution outlets include low salary and
absence of regulation of service of the staff in such outlets. This is turn
make them to feel insecure and they resort to mal practices. Having found this
the Govt., had taken a policy to improve the salary structure of these staff ad
regularize their services. Hence the study of casual relationship might help in
formulation of policies.
Criteria
of Good Research (characteristics)
* Research is half complete, when
objective or purposes of it are clearly spelt out.
* It is necessary that every step
followed in the process of research is explained fully. This is because any
other person who wants to repeat such a work to achieve further improvement on
lest the validity of the research work should be able to do it.
* The research design adopted for the
study should be clear and match with objectives.
* The research should be honest in
reporting the facts and revealing the flaws in the work.
* Every research work should be based on
carefully selected analytical tools.
* The research work is incomplete
without acknowledging the various data (or) facts.
* Limitations should be frankly revealed
CLASSIFICATION
OF RESEARCH
FUNDAMENTAL
(OR) BASIC RESEARCH:
Pure
or Basic research is a search for broad principles and synthesis without and
immediate utilization objectives. It is not concerned with solving any practical
problems of policy but with designing and fascinating tools of analysis and
with discovering underlying and if possible universal laws and theories.
Eg.
John Robinson*s imperfect competition and chamberlain’s monopolistic
competition.
Applied
(or)Action Research:
Applied
research also known as action research is associated with particular project
and problem. Such research, being of practical value may release to current
activity (or) immediate practical situation it aims at finding a solution for
an immediate problems facing a society practically all social science research
undertaken in India is of the applied variety and more particularly of the type
which helps formulation of policy.
Descriptive
Research:
It
is designed to describe something such as demographic characteristics of
consumers who use the product. It is designed to describe something, such as
demographic characteristics of consumers who use the product. It deals with
determining frequency with which something occurs or how two variables vary
together. This study is also guided by a initial hypothesis. For example an
investigation of the trends in consumption of soft drinks in relation to
ration- economic characteristics as age, sex, ethnic group, family income,
education level, geographic location, and so on would be descriptive study.
Merits:
* This approach helps to test the
conclusion and findings arrived at on the basis of laboratory studies. By using
this approach, it is possible to substantiate existing theories and conclusions
on modifying them.
* Direct contact between the researcher
and the respondent is brought about in this approach. This is very significant
because, the researcher would be able to understand himself clearly the problem
to be studied.
* With the possibility of direct
contract with the respondent, the researcher is able to elicit all the relevant
information and eliminate irrelevant facts.
Limitations:
* Unless the researcher is experienced
there is every possibility of the approach being misused. Hurried conclusions
and generalizations may be formed based on the inaccurate field data.
* As this approach involves collection
of field data enormous time and efforts are required to plan and execute the
field survey
* This approach also involves incurring
heavy cost on data collection.
* Unless the respondents are
co-operative. It is not possible to collect data through this approach.
HISTORICAL
RESEARCH:
As
the name suggests in this approach historical data is given importance to
undertake analysis and interpret the results. Following this approach a
researcher would collect past data for his research. A scholar using this
approach has to depend on libraries for referring to the magazines or
periodicals for collecting data.
Merits:
• This approach alone is relevant in
certain types of research work. For examples to understand the trend in India*s
exports. One has to collect the export data for a period of say 20 years and
them analyze it similarly to study the impact of the liberalizations policy one
has to collect information from 1991 till date.
• Historical approach makes research
possible as it is firmly believed that once we understand the past, out
understanding of the present and expectations of the future could be predicted
to some extent. Hence historical research provides the insight into the past
and facilitates looking into the future.
Limitations:
• Personal bias of the people who had
written about historical events or incidents cannot be to mislead.
• Researchers tend to over generalize
their results using historical approach.
• Persons using this approach should be
conscious of the fact that historical data can be taken be give and indication
about the past, but formulation of solutions on that basis and applying them in
the current period is not correct.
EXPLORATORY
RESEARCH:
Most
of the marketing research projects begin with exploratory. It is conducted to
explore the possibilities of doing a particular project. The major emphasis is
on the discovery of ideas and insights. For example, a soft drinks firm might
conduct an exploratory study to generate possible explanations. The exploratory
study is used to spilt the broad and vague problem into smaller, more precise
sub problem statements, in the form of specific hypothesis. An exploratory
study is conducted in the following situations.
* To design a problem for investigations
and to formulate the hypothesis.
* To determine the priorities for
further research.
* To gather data about the practical
problems for carrying out research on particular conjectural statements.
* To increase the interest of the
analyst towards the problems and
* To explain the basic concepts.
Exploratory
study is more flexible and highly informal. There is no formal approach in
exploratory studies. Exploratory studies do not employ detailed questionnaire.
These studies will not involve probability sampling plans. The following are
the usual methods of conducting exploratory research
* Literature Survey
* Experience Survey and
* Analysis of insight stimulating cases.
LITERATURE
SURVEY;
The
literature search in fast and economic way for researchers to develop a better
understanding of a problem area in which they have limited experience. In this
regard, a large volume of published and unpublished data are collected and
scanned in a relatively small period of time. Generally sources includes books,
newspapers, Government documents trade journals, professional journals and
soon. These are available in libraries, company records such as these kept for
accounting sales analysis purposes; reports of previous research projects
conducted problems incompletely but will be of great help to provide a director
to further research.
EXPERIENCE
SURVEYS;
In
this method, the persons who have expertise knowledge and ideas about research
subject may be questioned. Generally the company executives, sales managers,
other relevant people of the company salesman, wholesalers, retailers who
handle the product or related products and consumers are concentrated. It does
not involve scientific ally conducted statistical survey, rather it reflects an
attempt to get available information from people who have some particular
knowledge of subject under investigation.
ANALYSIS
OF INSIGHT STIMULATING CASES:(Case Study Approach).
Case
study approach to research is recent development. In this approach the focus is
on a single organization or unit or an institution or a district or a
community. As the focus is on a single unit, it is possible to undertake an in
depth analysis of the single unit. It is basically a problem solving approach,
The following are the characteristics of case study method.
Intensive
study: It aims at deep and through study of a unit. It deals with every aspect
of a unit and studies at intensively.
The
following methods are undertaken in case study;
* Determination of Factors: First of all
the collection of materials about each of the units or aspects is very
essential. The determination of factors may be of two types, particular factors
and General factors.
* Statement of the problem: In this process
the defined problem is studied intensively and the data are classified into
various classes.
* Analysis and conclusion: After
classifying and studying the factors an analysis is made
Advantages:
• As this approach involves a focused
study there is lot of scope for generating new ideas and suggestions.
• It may provide the basis for
developing sound hypothesis.
• As the researcher studies the problem
from his own point of view, very useful and reliable findings may be obtained.
Limitations:
• A significant limitation of this
approach is that unless the researcher is experienced he might ignore very
important aspects.
• This approach also depends on the
infirm furnished by the respondents unless the infirm is accurate the
conclusions are bound to be irrelevant.
• It is often said that case studies
are based on the observations of the researcher
EXPERIMENTAL
RESEARCH:
This
is a very scientific approach. In this approach the researcher first determines
the problem to be studied. Then he identifies the factors that cause the
problem. The problem to be probed is quantified and taken as the dependent
variable. The factors causing to the problem will be taken as independent
variable. Then the researcher studies the casual relationship between the
dependent and independent variable. He is also able to specify to what extent
the dependent variable. He is also able to specify to what extent the dependent
variable is influenced by each independent variable.
For
examples suppose food production is taken as the problem for a research study.
then the scholar would determine the factors that will affect food production.
Viz size of the land cultivated(x) rainfall (y) quantity of fertilizer applied
(z) etc. These factors x,y and z are called independent variable,. Food production
[A] is called dependent variable. Then by collecting data regarding all the
four [A,x,y and z]. The researcher is able to state what percentage change in
the final food (A) is explained by x,y and z. The effect of x on A, y on A and
z on A is also studied. In this manner the researcher is able to successfully
indicate to what extent various factors included in the study are important.
Merits
of Experimental Approach (Research)
• This approach provides the social
scientists a reliable method it observes under given conditions to evaluate
various social programmes.
• This is one of the best methods of
measuring the relationship between variables.*
• This approach is more logical and
consistent that the conclusions drawn but of research based on this approach is
well received.
• It helps to determine the cause –
effect relationship very precisely and clearly.
• Following this approach researchers
could indicate clearly the areas of future research
Limitations
of Experimental Approach (Research)
• Unless a researcher is well
experienced and trained in model building this approach cannot be easily
followed.
• By relying more on models this
approach may not add anything significant to knowledge
• A serious limitation of this approach
is that it relies on sampling and collection of data. Unless these are properly
planned and executed. the outcome of analysis will not be accurate.
DIAGNOSTIC
STUDY;
This
is similar to descriptive study but with a different focus. It is directed
towards discovering what is happening, why it is happening and what can be done
about. It aims at identifying the causes of a problem and the possible
solutions for it.
A
diagnostic study may also be concerned with discovering and testing whether
certain variables are associated. E.g., are persons having from rural areas
more suitable for manning rural branches of banks* (or) Do more villagers than
city voters vote for a particular party.
EVALUATION
STUDIES;
Evaluation
study is one type of applied research it is made for assessing the effectiveness
of social or economic programmes implemented (e.g. family planning scheme) or
for assessing he impact of developmental projects (e.g., irrigation project) on
the development of the area. Evaluation study may be defined as “determination
of the results attained by some activity (whether a program me, a drug or a
therapy or an approach) designed to accomplish some valued goal or objective”.
ANALYTICAL
STUDY:
Analytical
study is system of procedures and techniques of analysis applied to quantitative
data. It may consist of a system of mathematical models (or) statistical
techniques applicable to numerical data. Hence it is also known as the
statistical method.
This
method is extensively used in business and other fields in which quantitative
numerical data are generated. It is used for measuring variables, comparing
groups and examine association between factors. Data may be collected from
either primary sources or secondary sources.
SURVEYS
RESEARCH:
Survey
is a fact finding study. It is a method of research involving collection of
data directly from a population or a sample there of at particular time. It
must not have confused with the more clerical routine of gathering and
tabulating figures. It requires expertise and careful analytical knowledge. The
analysis of data may be made by using simple or complex statistical techniques
depending upon the objectives of the study
This
type of research has the advantage of greater scope in the sense that a larger
volume of information can be collected from a very large population
OTHER
TYPES
Ex-post
Facto Research;
Ex-post
Fact research is based on observation made by inquiry in which the researcher
does not have direct control of independent variables because their outcome has
already occurred. This kind of research based on a scientific and analytical
examination of dependent and independent variables. The ex-post facto research
findings may become riskier by improper interpretations.
Panel
Research:
Generally
the survey research is valid for one time period which is known as „study
period* and they do not reflect changes occurring time. The consumer attitudes
toward purchasing a particular product are not static and hence changing. For
example, it is not possible to study the changes occurring in these attitudes
over a period in response to changes in the particular products marketing min.
measuring change over time is known as
longitudinal
analysis which is done by the use of panels. This method is generally used in
sales forecasting by consumer preferences for various products measuring
audience size and characteristics for media programmes testing new products.
Advantages;
·
It
considers the changes in the time.
·
It
provides more control
·
It
has greater co-operation
·
It
offers more analytical Data from respondents.
TYPES
OF RESEARCH
RESEARCH
PROCESS
Research
is a process. A process is a set of advices that are performed to achieve a
targeted outcome. That is a process involves a number of activities which are
carried out either sequentially or simultaneously. So research process would
refer to various steps and stages involved in research activity. The various
stages are listed below;
* Formulating the Research problem
* Extensive literature survey
* Developing the hypothesis
* Preparing the research design
* Determining the sample design
* Collecting the data
* Analysis of data
* Hypothesis testing and
* Preparation of report
Formulating
the Research Problem;
In
research process the first and foremost step is selecting and defining a
research problem. A researcher should at first find the problem. Then he should
formulate it so that it becomes susceptible to research. To define a problem
correctly, a researcher must know what a problem is* What is a Research problem
a problem can be called a research problem if it satisfies the following
condition;
• It must be worth studying
• The study of the problem must be
socially useful
• It should be a problem untouched by
other researchers or even if touched must be in need of further research
possibility.
• A research problem should come out
with solutions to the issue.
• It should be up to date and relevant
to the current social happenings.
• All the special terms that are used
in the statement of the problem should be clearly defined.
In
selection of the problem the researcher should take into consideration of the
following factors:
* Researchers* Interest
* Topic of significance
* Researcher*s resource
* Time availability
* Availability of data
* Feasibility of the study
* Benefits of the research
Review
of Literature:
After
defining the problem the researcher should undertake an extensive literature
survey connected with the problem. In this context he can refer previous
studies magazines journals and dissertations published, academic journals etc.,
In this process, oit should be remembered that one source will lead to another.
The earlier studies if any which are similar to the study in hand should be
carefully studied.
Developing
the Hypothesis:
This
is the next stage to the review. Here the researcher should state in clear
terms the hypothesis. Hypothesis is an assumption to be proved or disproved. A
research hypothesis is a predictive statement capable of being tested by
scientific methods. That relates an independent variable to some dependent variable.
Features:
* It should be clear and precise
* It should be capable of being tested
* It should state the relation between
variables
* It should be limited in scope and must
be specific
* It should be stated in simple terms
Normally
a hypothesis will be developed in the following ways:
* The researcher has to consult and
deliberate with colleagues and experts about the problem.
* He has to examine the existing data,
concerning the problem for possible trends and clues and
* He has to review studies on similar
problems
Preparing
the Research Design:
After
developing hypothesis the researcher has prepare a research design. A research
design could be defined as the blue print specifying every stage of action in
the course of research. Such a design would indicate whether the course of
action planned will minimize the use of resources and maximize the outcome.
Research design is the arrangement of conditions for collection and analysis of
data in a manner that aims to combine research purpose and economy in
procedure.
Research
design would answer the following questions.
* What is the study about*
* Why is the study being made*
* Where will be the study should be
carried out*
* What type of data and where it would
be collected*
* What is the period of study*
* Whether any sample would be used and
if so what type of sample will be sued*
* What type of tools to be used*
A
good research design should possess the folly features. However the qualities
of a good research would differ from study to study:
* It should be flexible
* It should help to minimize bias at
every stage
* It should facilitate collection and
analysis
* It should be closely linked with
objectives of the study
* It is a plan that specifies the
sources and type of inform relevant to the research problem.
* It should specifically mention the
type of approach to the study
* It should also include the time and
cost budget since most studies are suffered by these two constraints:
Broadly
there could be four different types of research design: viz., (Contents of
Research design)
TYPES
OF RESEARCH DESIGN
After
developing hypothesis the researcher has to prepare a research design. A
research design could be defined as he blue print specifying every stage of
action in the course of research. Such a design would indicate whether the
course of action planned will minimize the use of resources and maximize the
outcome. Research design is the arrangement of conditions for collection and
analysis of data in a manner that aims to combine research purpose and economy
in procedure.
EXPLORATORY
RESEARCH DESIGN:
This
is also called formulative research design. This aims of formulating a problem
for more precise idea or hypothesis, Based on this the subsequent stages of
research could be planned. As this design is only of formulate type it should
be highly flexible. While applying this design. Three different methods are
followed:
Survey
of related literature – by studying intensively the past studies and
contributions relating to the field of study, the research problem could be
easily formulated.
Conducting
experience survey –this refers to undertaking collection of details and
discussion with the experienced people in the chosen field of research. This
would help the researcher to determine the extent to which he is original and
can avoid duplication.
Analysis
of insight-stimulating examples is yet another method in which depending upon
the study on hand. In this method, the experience of people would be used as
guide to develop or formulate a hypothesis.
DESCRIPTIVE
AND DIAGNOSTIC RESEARCH DESIGN:
Descriptive
research design is concerned with research studies with a focus on the
portrayal of the characteristics of a group or individual or a situation. The
main objective such studies is to acquire knowledge. For example, to identify
the use of a product to various groups,. a research study may be undertaken to
question whether the use varies with income age sex or any other
characteristics of population.
On
the other hand the diagnostic studies aim at identifying the relationship of
any existing problem. Based on the diagnosis, it would also help to suggest
methods to solve the problem. In this process it may also evaluate the
effectiveness of the suggestions already implemented.
EXPERIMENATAL
RESEARCH DESIGN;
The
experimental research studies are mainly focused on finding out the cause and
effect relationship of the problem under study. Actually when observation is
arranged and controlled it becomes experimental study. An experiment is a test
or trial or an act or operation for the purpose of discovering something
unknown or of testing principle, supposition etc., it is a process in which one
or more variables are manipulate under conditions that permit the collection of
data that show the effects of any of such variables is a unconfused fashion.
The
experimental design is broadly classified as a) informal experimental design
and b)formal experimental design. The formal includes after only design, after
only with control design before and after without control design before and
after control and expost facto design. The formal experimental design would
include completely randomized design randomized block design; Latin squares
design and factorial design.
Sampling
design: all the details connected with the sampling process from the
determination of sample size down to the collection of data, would be spelt
out.
Observational
design: If the study makes use of observational technique then what type of
observation technique would be used, conditions under which the observations
will e made would be indicated.
Statistical
design: This part of research design would spell out the type of analysis that
would be carried out.
Operational
design: This design would lay down the steps that would be taken at each stage
as the design is executed.
Determining
the sample Design:
A
sample, as the name implies is a smaller representation of a large whole simple
speaking the method of selecting for the a study portion of the universe with a
view to draw conclusion about the universe is known as sampling.
The
researcher must decide the way of selecting a sample or what is popularly known
as the sample design, In other words a sample design is a definite plan
determined before any data are actually collected for obtaining a sample from
given population samples can be either probability samples or non probability
samples.
Collecting
the Data:
Collection
of data is on important stage in research. In fact the quality of data
collected determine the quality of research. A researcher has several ways of
collecting the appropriate data which offer considerably I the context of
money, time and other resources as per its sources the data may be classified
as primary data and secondary data. Primary data is known as the data collected
for the first time through field survey. Such data are collected with specific
set of objectives to assess the current status of any variables studied. By
survey methods data can be collected by anyone or more of the following ways:
* Observation Method
* Personal Interviews
* Telephone survey
* Questionnaires
* Schedules
Secondary
data refers to the information or facts already collected such data are
collected with the objective of understanding the past status of any variable.
Processing
and analysis of Data:
Processing
refers to the subjecting the data collected to a process in which the accuracy,
completeness, uniformity of entries and consistency of information gathered are
examined. Most commonly processing is understand as editing, coding,
classification and tabulation of the data collected. After processing in research a scholar
explains the tools that he has adopted for analyzing the data. The scholar
should select the tools of analysis by considering the objectives set for the
study. He should examine the type of analysis required for accomplishing each
objectives set. Based on that this he must explain the features of the tool and
how is it applied.
Testing
the Hypothesis:
The
researcher after analyzing the data will test the type of /Hypothesis while
testing the hypothesis various tests such as chi-square, test, t-test, F-test
will be used depending upon the nature and object of research. Hypothesis –
testing will result in either accepting the hypothesis or rejecting it.
Preparation
of the Report:
After
the analysis and interpretations are over, the research has to prepare the
report. The body of the report includes – introduction review of literature,
methodology result and discussions and summary and conclusions/
Relevance of Research in Decision Making in Various Functional Areas of Management
Generally,
a manager has to take a course of action which is most effective in attaining
the goals of the organization Research provides facts and figures in support of
such business decisions. It helps the manager to choose a measuring rod to
judge the effectiveness of each decision. This may be the reason why executives
and business professionals consider research and research findings as a boon in
their problem solving process.
• Any research on management will have
the following general objectives:
• The objective of decision making
• The objectives of decision making
• The objective of controlling the
managerial activities
• The object of studying the economic
and business environment
• The object of studying the market
• The object of studying the new
product development
• The object of studying innovation
• The object of studying customer
satisfaction
For
management the research helps the management in the following ways:
* Research provides „decision
alternatives in decision making*
* Research stimulates thinking analysis
evaluation and interpretation of the business environment
* Research leads to innovation
Research
facilitates the development of new products and modification of the existing
products
* Research easily locates the problem
areas.
* Research establishes the relationship
not only between variables in each functional area, but also between the
various functional area.
* Research facilitates business
forecasting
* Market and Marketing analysis may be
based on research
* Research is an aid to management
information system and
* Research helps to re-design corporate
policy and strategy.
Functional
areas of any business cover production personnel marketing finance and
organizational. They scope of research on these areas are listed below
Research
for Marketing decisions: New product development research – Research to brand
equity and preference – Research on pricing strategies – Research on
distribution channels – Research on salesman qualities and effectiveness –
Research on media effectiveness – Research on marketing information system etc.
Research
for personnel Decisions: Research on effectiveness of different sources of
recruitment and training – Research on leadership style and effectiveness –
Research of personnel information system etc.
Research
for capital market decisions: Research on issues, like climate culture
creativity change design etc.,
Research
for Financial decisions: Research on cost of capital and capital structure –
Research on working capital management research on inventory management – etc.
Research
on Business Strategies: Strategic alliances and divorces – Mergers and
acquisitions – Disinvestment –Reorganizations – Reengineering etc.
To
sum up research is an ingredient in all the functional areas of commerce and
economics production and materials management extensively make use of research.
However a close observation of management practices I India would determine
whether research receives its due importance.
SAMPLING
Meaning
of Sample:
A
sample as the name implies is a smaller representation of a large whole simply
speaking the method of selecting a study portion of the universe (total
population) is known as sampling.
Sampling
is not anything which is followed only in statistics. It is used in everyday
life when rice is purchased in provision store a small quantity is initially
purchased and tested sometimes the small quantity is cooked and it is found
food then the bulk is purchased. Similarly when a patient has to undergo blood
test the clinical laboratory takes a few drops test it and them gives the
report. Sampling as a method also used in research. By analyzing the sample
data, the research gets some findings which he uses for arriving at
conclusions.
Essentials
(features) of sampling:
Representativeness:
The sample selected should fully represent the population from which it is
drawn. This means all the characteristics or features of the population should
be reflected by the sample.
Adequacy:
The size of the sample should be large enough so as to provide accurate
results. Though it is difficult to state what is the ideal size of sample,
statistically it can be determined.
Randomness:
Samples should be selected at random. That is there should be no bias in the
selection of sample elements and each item in the population should have equal
chance of being selected.
Homogeneity:
Any number of samples could be drawn from a population. But all these samples
should have similarity in every respect. That is suppose a researcher selects
500 people from Chennai city as a sample to study consumer behavior of the
people, them the sample elements should be all be people living in Chennai
city. It should not include people who have come to Chennai city as tourists.
Merits
of Sampling:
* Sampling method requires lesser time
as only a part of the universe is included for data collection.
* Since only a part of the universe is
included for the data collection, the cost incurred will also be less.
* By adopting suitable method of sample
selection the results could be more reliable
* Sampling method is more frequently
used for testing the accuracy of information collected through census method.
*
Limitations
of Sampling:
* Unless sampling method is carefully
applied it may result in misguiding findings.
* Use of sampling requires the services
of experts and specialists. This in turn will reflect on costs.
* Some times when the sample size itself
is very large then sampling method would also be done consuming and costly.
* Apart from a detailed process to be
followed sampling also calls for application of a number of tests to verify the
findings and results. This makes the method more complex.
* While using sampling the investigators
have to be fully trained. This will add to the cost.
METHODS
OF SAMPLING
Sampling
method can be broadly classified as 1. Random or probability sampling and
2.
Non-random or non-probability sampling. Under the former every element of the
population enjoys equal chance of being selected. While the under the later use
elements will have constituting the sample are selected on some basis.
For
example, suppose from 2000 students in a college, 200 are selected at random
then every one of these 2000 students has equal chance of getting selected. On
the other hand, in the case of non random sampling. 200 students out of 2000
may be selected on such a way that there are 50 pure science students. In this
case the sample is purposively selected. So it is not random sample.
Sample
Methods
1.Types
of Random (or)Probability Sampling
(a) Simple (or) unrestricted sampling
i) Lottery Method: In this method all the
terms in the population are given numbers and these are written on chits of
uniform size. Then these chits are placed in a local or a bag and the required
number of chits are selected.
ii) Table of Random number: In this method,
first the size of the sample is determined. Then using random number table, the
required number of items is selected to form the sample.
(b) Restricted Random Sampling;
(i) Stratified Random Sampling: Stratum
means a layer, Population from which samples are to be selected may contain a
number of layers. From each layer a few samples are selected.. Suppose for a
research work on the literacy level in Tamil Nadu data is collected from all
places in Tamil Nadu. Adopting stratified random sampling, first the state is
divided on to different districts. A few districts are selected at random. Then
those districts are divided into Panchayat Unions. From this second stratum a
few Panchayat unions are selected. Each Panchayat union divided into Panchayats
and a few panchayats are selected at random. Then each panchayat containing a
number of villages, a few villages are selected at random.
Merits:
* It has better representative ness
* It also gives more accurate
information and there would be better coverage of the population.
Limitations:
* Requires lot of care and pre-planning
* A prior knowledge of the composition
of the population is required.
* Method is very expensive in terms of
bone of money
* Any bias in selection from each
stratum will affect the accuracy of results.
(ii)
Systematic Random Sampling: In this method the sample is formed by
selecting the first unit at random and them selecting the remaining items at
evenly spaced intervals.
For
example suppose from 2000 college students we have to select a sample of 50
students. First we determine the sampling interval (k). this is obtained by
dividing the size of population by sample size (i.e.40; 2000/50) = 40. Them
from serial number 0001 to 0040 we
selected
at random a serial number. Suppose we have selected with the serial number 15
with that we add 50 for another sample, So the sample will be as 15, 65,115 ,…and
soon.
Merits:
* It is very simple to adopt*
* The time and cost involved are
relatively less
* With a large population, this method
is easy to use
* Random selection of items is ensured
once the sampling interval is determined.
Limitations:
It
is less representative, as once the first item is selected at random,
subsequent items are all lying at uniform interval, So the selected items may
lack representative ness. The first item should be strictly selected at random,
If there is bias in this first stage this will influence the items selected at
subsequent stages.
Multistage
or Cluster Sampling: As the name suggests, in this method the samples are
selected at different stages here the population is first divided into
different stages. All the samples at random at different stages will possess
the common characteristics or will be homogeneous on some basis.
Merits:
* It is highly flexible
* It ensures better representative ness
* This type of sampling is very useful
either for formulating policy of evaluating an implemented policy.
* Easy to compute.
Limitations:
* In practice this method is found to be
less accurate compared to other methods because bias at any stage will get
accumulated.
* Unless a person is fully aware of the
various stages into which the population can be divided, he cannot be effective
in selecting the required number of samples.
NON-
RANDOM SAMPLING OR NON PROBABILITY SAMPLING
Non
random sampling or non probability sampling refers to the sampling process in
which the samples are selected for a specific purpose with pre-determined basis
of selection. This type of sampling is also required at times when random
selection may not be possible.
(a) Judgment Sampling: In this method the
sample selecting is purely based on the judgment of the researcher. This is
because the researcher may lack information regarding the population from which
he has to collect the sample. Population characteristics not known in such
cases the researcher can use this method. Once the sample size is determined
the investigator is free to select any item on the field.
For
example, suppose 100 boys are to be selected from a college with 1000 boys if
nothing is known about the students in this college, then the investigator may
visit the college and choose the first 100 boys he met or he may select 100
boys all belonging to III year or he might select 50 boys from commerce and 50
from science.
(b) Convenience Sampling: This method of
sampling involves selecting the sample elements using some convenient method
without going through the rig our of sampling method.*
For
example, suppose 100 car owners are to be selected. Then we may collect from
the RTO*s office the list of car owners and then make a selection of 100 from
that the form the sample.
(c
) Quota Sampling: In this method the sample size is determined first and then
quota is fixed for various categories of population, which is followed while
selecting the sample, Suppose we want to select 100 students, and it might say
that selection oof sample be according to the quota given below.
Boys
50% and girls 50% then among the boys 60% college students and 40% from plus
two students. A different or the same quota may be fixed for girls.
SAMPLING
ERRORS
While
using sampling, errors are committed. These errors are broadly classified as
sampling errors and non-sampling errors.
(1) Biased Errors:
Biased
errors are understood as the inference of the investigators likes and dislikes
in the process of sampling. For sample if an investigator has to collect data
from a specific group also. This may because of investigator*s urge to complete
the work early or failure to understand the purpose of the survey. Such a
mistake may result in collection of wrong data which eventually will result
only in wrong conclusions or inferences about the population. The following are
the reasons for biased errors.
Faulty
process of selection: This refers to a situation when the investigator does not
apply the randomness in his choice or selection of the sample elements from the
population.
Faulty
collection of information; Adoption of faulty method of collecting information
may cause errors. This will happen if the scope is not clear.
Faulty
method of analysis: This will happen when the researcher is not having
knowledge about the usage of tools.
(2) Un Biased Errors:
Non-sampling
errors are those errors, which are not due to any sampling process. It is due
to several other causes. Such errors are most due to the following reasons:
* Investigators may collect data without
using complete schedules or proper measurement. As a result data collected may
not be relevant at all.
* Faulty method of interview or
observations may also contribute to non- sampling errors.
* Using of UN trained and un skilled
investigators.
SAMPLE
SIZEAND ITS DETERMINATION
What
is the size of the sample* How large should be „n* when the size (n) is very
small the researcher may achieve the objectives and if it is too large, he may
incur huge cost and waste resources. Generally, a sample must be of an optimum
size i.e., it should not be too large nor too small. Normally the size should
be large enough to give a confidence interval of desired width and as such the
size of the sample must be chosen by some logical process. How ever the
researcher has to key the following points in his mind while deciding the size
of the sample.
Nature
of the Universe:
When
the items of the universe are homogenous, a small sample can serve the purpose,
suppose they are heterogeneous, a large sample would be required.
Number
of groups:
When
a researcher forms class – groups a large sample is necessary as a small sample
might not be able to give a reasonable number of items in each class-group.
Nature
of study:
When
the researcher examines the items very intensively and continuously then the
sample should be small. He may prefer general survey when the size of the
sample is large but a small sample is considered appropriate in technical
surveys.
Sample
Technique:
The
researcher has to decide the sampling tools while determining the size of the
sample A small random sample is better than a larger but badly selected sample.
Accuracy
and confidence level:
A
researcher requires a large size sample when the accuracy or the level of
precision is to be kept high. To get more accuracy for a fixed significance
level the samples size has to be increased fourfold.
Resources
available:
What
amount of time and financial resources are available to the researcher will
determine the size of sample, With sufficient time and large volume of funds
available the sample size could be large otherwise it should be small.
Miscellaneous
factors:
In
addition to the above considerations the following points to be considered by a
researcher. Nature of units size of the population size of questionnaire
availability or trained investigators the conditions under which the sample is
being conducted the time available for completion of the study.
Some
times the mathematical formula is used to determine the sample size. The
formula is given below:
N
= (Z / d)
When
n is the sample size Z is the degree of accuracy desired (specified level of
confidence) is the standard deviation of the population and d is the difference
between the population mean and sample mean.
COLLECTION
OF DATA
Data
refers to information of facts often researchers understand by data only
numerical figure. It also includes facts non-numerical information qualitative
and quantitative information in a research of the data are available the
research is half-complete. Data could be broadly classified as primary data and
secondary data they are also mentioned as sources of data.
Primary Data:
Primary
Data is known as the data collected for the first time through field survey.
Such data are collected with specific set of objectives to assess the current
status of any variable studied. By survey methods the data can be collected by
any one (or) more of the following ways.
Questionnaire
(or) Schedule:
In
this method a pre-printed list of questions arranged in sequence is used to
elicit response from the respondent
Interview:
This
is a method in which the researcher and the respondent meet and questions
raised are answered and answered and recorded. This method is adopted when
personal opinion or view point are to be gathered as a part of data.
Observation:
In
this method the observer applies his sense organs to note down whatever that he
could observe in the field and relate these data to explain some phenomena.
Feed
Back Form:
In
the case of the consumer goods the supplier or the manufacturer send the
product along with a pre-paid reply cover in which questions on the product and
its usage are raised and the customer is requested to fill it up and send.
Based on this first hand information about the product from the consuming
public is obtained.
Sales
Force opinion:
On
several occasions the manufacturers or distributors collect information about
the movement of the product or market size, market share etc..through sales
force on the field.
The
salesman visit the retailer*s shop to not down the details of stock movement.
Availability of items etc which give valuable information.
Projective
techniques:
This
technique is adopted to study the consumers though methods like recalling
advertisements them story completion tests etc. Through this technique it is
possible to compile information to be used as the basis for projecting the
demand for the product at different points of time.
Collection
through Mechanical Devices:
There
are several shopping establishments where hidden video cameras are positioned
at vantage points this are used for observing the public inside the ship. Apart
from helping to eliminate pilferage and theft they provide very useful
information on the consumers and their preference of products.
CLASSIFICATION OF
DATA
PRIMARY
(DATA) SOURCES
1. OBSERVATION;
Observation
as a method of data collection ois used very frequently whenever collection of
data through other methods is difficult for example it is not always possible
to conduct interviews with every person to collect required information. There
are occasion when no other method can be adopted for data collections. For instance,
suppose a scholar wants to study the life style of hill tribe. It is certainly
not possible to use a questionnaire or schedule or interview only alternative
available is observation as the respondents would not rely any question orally
or in written.
Observation
may be defined as, “sensible application of sense organs in understanding less
explained or unexplained phenomena” Whenever a researcher is unable to compile
information through any other method then he has to effectively apply his sense
organs to observe and explain. So it may be said that observation involves
recording of information applying visual understanding backed by alert sense
organs.
Types
of Observation:
Structured
observation:
When
observation takes place strictly in accordance with a plan or a design prepared
in advance it is called structured observation in such a type the observer
decides what to observe what to focus on what type of activity should be given
importance who are all to be observed etc in advance.
Unstructured
Observation:
In
this type of observation there is no advance planning of what how when, who
etc., of observation. The observer is given the freedom to decide on the spot
to observe everything that is relevant.
Participant
Observation:
In
this method the observer is very much present in the mindset of what is
observed for example, suppose a researcher is studying the life style of a hill
tribe, then he might understand the life style of the tribe better only when
the stays with them. He is a participant in the sense he is physically present
on the spot to observe and not influencing the activities.
Non-participant
Observation:
This
is a method in which the observer remain detached from whatever is happening
around and does not involve himself in any activities tapes place. He is
present only to observe and not to take part in the activities. That is the
target audience does not know his presence at all. For example, the police men
not in uniform is deputed on observation duty whenever a processing tapes
place.
Controlled
Observation:
In
this method the observer performs his work in on environment or situation,
which is very much planned (or) set. For example, sometimes to the
effectiveness and alertness of airport security system a mock even (like fire
accident) is carried out. Then how the security staff reacts to such mock event
is observed. Based on this the weakness on his system are noticed and steps
taken to eliminate them.
Merits
of Observation Method of Data Collection:
* If observation is done correctly, the
scope for bias is very much minimized.
* Through observation, the current
scenario in which anything is happening noticed and explained there is no
interpretation of how things would be happened in the past or will happen in
future etc.
* As there is no need to get any reply
or details from the respondents, observation does no require any co-operation
of the respondents.
* This is fairly reliable method,
provided the observer is well experienced trained and sincere.
* Whenever respondents are illiterate
and incapable of answering any question (due to language barrier (or) cultural
background etc.,) observation is the only method of data collection available
Limitations
of Observation:
* This is a relatively costly method of
data collection
* It could be noticed that what is
observed may bring out only part of the facts. While data collected through
questionnaire or interview ensure letter coverage.
* There is a lot of scope for the
observer to get distracted or influenced by unexpected factors which would
affect the accuracy of information collected
How
to make observation successful:
* First the researcher should have a
clear grasp of what he should observe and its purpose.
* The person should be gained in
adopting observation
* The person should avoid his personal
likes & dislikes.
* He might be alert and intelligent
* He should be able to connect all the
things observed.
2. INTERVIEW
One
of the very old methods of collecting data is the interview method. Interview
method involves direct or indirect meeting of the respondents by the
researcher. The researcher determines the questions to be raised at the time of
interview and elicit the response for them. The reply given is either written
down in a note book or recorded in audio or video cassette. This method has to
be necessarily adopted whenever details regarding any confidential matter are
to be collected or the research requires data collection directly from the
respondents.
Interview
may be broadly classified as 1.Direct interview and 2.Indirect interview
Direct
Interview:
In
this type of interview, the interviewer and the interviewee meet personally
either with prior appointment or not. Usually when this technique is adopted
the interviewer may brief the respondent about the purpose of interview and its
scope in advance. This enables the respondent to be ready with necessary
details (or) data. This type of interview may be classified as structure a
interview un structured interview focused interview clinical interview and non
directive interview.
(A) structured Interview:
In
this type of interview the person collecting information decides in advance the
nature scope questions to be asked, the person to be contacted etc in advance.
At the time of interview no deviation is made from the questions to be asked.
For example, it is usual for journalist to interview the Finance Minister after
the presentation of Budget. In such occasions, the journalist should be were
prepared and decide in advance the questioned to be asked etc., Sometimes even
the questions to be asked and other details are to be submitted to the
authorities concerned, before conducting the interview. The most important
advantages of such interview are below.
* The interview is well prepared and so
the interview is conducted in the focused manner*
* Time of both the interviewer and
respondents could be saved.
* There is no scope for irrelevant
matter to find a place in the course of interview
* If the respondent is informed in
advance he could prepare necessary details so that the outcome is reliable
But
this method of interview has the following limitations
* Since the subject matter is decided in
advance there is no scope for extending the interview even in case of need.
* If the questions are submitted in
advance that will tends the respondent to give wrong information*s.
* There is a need for the interview to
plan the interview properly and so if the plan is not perfect, the interview
findings may not be complete.
(B) Un Structured Interview:
In
this type of interview, interview is conducted on the spot without any
preparations (or) advance information oto the respondent. For example, suppose
an organization producing a new health
drink wants to know the opinion of the people directly. Then it ight send
trained field investigators who meet people directly. Then it might send
trained field investigators who meet people at random and offer them a cp of
that new drink. After they drink, their opinion is asked and the responses are
noted down or recorded. Such interviews are also conducted when opinion poll is
conducted. For example during election ttime, the TV channels would meet people
moving around and ask them about their opinion about different parties and the
one to which they would vote.
(C) Focused Interview:
In
this type of interview the object of the interviewer is to focus the attention
of the respondent ion a specific issue (or) point /for example suppose a
detective is questioning a person regarding a crime committed in an area. The
detective has to be very much experienced to make the person interviewed to
answer only about the crime and nothing else.
In
this type ,the interviewer encourages the respondents to say whatever he likes
and feels on a subject matter. There may not me much questing taking place. The
respondent is free to express his views or opinions without any direction from
the interviewer.. For example suppose in a college strike, an interviewer
encourage the students to say whatever they feel above the reasons for the strike.
(E) Telephone Interview:
This
is basically a type of direct interview and but there is no scope for physical
presence of both the parties to the interview. This method will be useful in
the following situations.
* When the informant and interviewer are
geographically separated.
* When the study requires responses to
five (or) six sample questions e.g.
* Radio, TV program me survey
* When the survey must be conducted in
very short period of time, provided the units are listed in telephone
directory.
This
method of interview provides following advantages:
-More
flexible
-It
is a quickest way of obtaining information
-Less
cost
-Recall
is easy
-The
rate of response is more than what we have in mailing method
-Replies
can be recorded
-It
does not require any field staff This method is suffered by following reasons:
-The
respondents characteristics and environment can not be observed
-It
is not suitable for intensive survey where comprehensive answers are required
-This
method left the respondents who do not have telephone facilities.
-This
method does not provide sufficient time to the respondents to respond
3. Questionnaire Method;
A
questionnaire is a sheet(s) of paper containing questions relating to certain
specific aspect. Regarding which the researcher collects the data. The
questionnaire is given to the informant or respondent to be filled up.
Sometimes questionnaire is also in the form of files generated trough computer.
This usually called soft copy of questionnaire. Generally to test the reliability
of the questionnaire, it should be tested on a limited scale and this is
technically known as Pilot Survey. The objective of a pilot survey is to filter
unnecessary questions, and the questions which are difficult to answer.
Mechanics
of Questionnaire Construction / Designing a Questionnaire / Features of a Good
Questionnaire
The
following are the points to be given important while designing a questionnaire:
* Questionnaire should be printed /
Cyclostyled / Xeroxed
* The first part of the questionnaire should
specify the object or should be constructed using simple language and technical
terms, concepts should all be avoided.
* Each question should be specific and
clear.
* Personal Questions on wealth, habits
etc., could be avoided
* Questions needing computation /
calculation / consultation should be avoided
* Questions on sentiments / belief/
faith should be avoided
* Repetition of question should be
eliminated
* Sufficient space should be given for
answering questions
* If any diagram me or map is used then
it should be printed clearly
* Instructions regarding how to return
the filled questionnaire must be given, it is desirable that a self addressed
sufficiently stamped envelope is sent along with the questionnaire to enable to
respondents the send the filled up questionnaire
TYPE
OF QUESTIONS TO BE INCLUDED
Open
– end questions:
In
these questions the respondents are given freedom to express their views as
there is wide range of choice. E.g.
“How
would you describe the use of this soap”*
Closed
questions:
These
type of questions do not allow the respondents to given answers freely E.g.
“Would you describe the smell of this soap is attractive”*
Yes
/ No
Pictorial
Questions:
In
this type of questions picture are drawn, and the respondents indicate the answer
by selecting the pictures he prefers.
Dichotomous
questions:
In
this questions two alternatives are given a positive one and a negative one.
E.g. “Do you own a watch”*
Yes
No
Multiple
choice questions:
These
questions contains more than two alternatives e.g. “Why have you preferred this
brand of two wheeler*”
-Price
-Fuel
– efficient
-comfort
-others
(please specify)
Type
of questions to be avoided:
(a) Leading questions:
A
leading questions is one which makes it easier for the respondent to react in a
certain way and is not natural. Examples of leading questions are :
“Are
you against giving too much power to the trade unions”* “Don*t you that
yesterday*s T.V. Drama was thrilling*”
(b) Loaded Questions:
Loading
means attaching emotional feelings to particular words of concepts which tends
to produce automatic approval or disapproval. Here the respondent would react
to the word than the Question. Example:
“Have
you tried to get special favours from a business establishment by pressuring
them*”
yes or No
(c
)Ambiguous questions:
An
ambiguous question is one that does not have a clear meaning. It may mean
different things to different people example.
Are
you interested in a small house*
What
does the word “interested” mean to own or hire* What does the word “small” mean
QUESTIONNAIRE
CONSTRUCTION PROCEDURE
* Decide what information is needed.
* Determine the type of collecting data
* Determine the content of individual
questions. Is question necessary
Does
respondent have the information Respondent remembers the same Several questions
needed instead of one
* Determine the type of questions
-open
ended
-closed
-dichotomous
-pictorial
-multiple
choice
* Decide on wordings of questions
* Decide question sequence
-Physical
appearance
-easy
to access
-easy
to understand
-motivate
* Preliminary Draft
* Revision and final draft
(4) SCHEDULES
Schedules
(contains a set of questions) are being filled in by the enumerators who are
specially appointed for the purpose. Enumerators go to respondents, ask them
questions from the proforma in the same order in which the questions are listed
and record the replies on the space given.
Enumerators
should be trained Example: Population census.
DIFFERENCE BETWEEN
QUESTIONNAIRE AND SCHEDULE
SECONDARY DATA
The
secondary data, are those which have already been collected some other agency
and which have already been processed. Generally speaking secondary data is
collected by some organization to satisfy its own need but it is being used by
various departments for different reasons. For example, census figures taken
are used by social scientists (economists) for social planning and research.
SOURCES OF SECONDARY
DATA:
Doing
the research with the secondary data is called as Desk research. The sources
for secondary data or the sources for doing desk research will be gathered by
the following ways: Internal Sources: Registers, Documents, Annual Reports,
Sales Reports, previous Research papers , Sales records, invoices etc.,
External
Sources: Journals on magazines, newspapers, public speeches, state and central
govt., departments, reports etc., The information had from any published
documents which may documents the researcher should consider the following
points:
* Exactly what products are included in
the statistical classification
* Who originally collected the data for
what purpose, and whether three might any motive for misrepresentation*
* From whom the data were collected and
how reliable the methodology might have been and
* How consistent the data are with other
local or international statistics.
Choice
between primary and secondary data:
The
researcher must decide whether he will use primary data / secondary data in an
research process. The choice between the two depends on
* Nature and Scope of Research
* Availability of financial resources
* Availability of time
* Degree of accuracy desired
* Status of the researcher (individual,
govt., corprn,, etc)
Pilot
Study
It
is difficult to plan a major study or project without adequate knowledge of its
subject matter, the population it is to cover, their level of knowledge and
understanding and the like. What are the issues involved* What are the concepts
associated with the subject matter* How can they be operationalize* What method
of study is appropriate* How much money it will cost* investigation is conducted?
This is called pilot study. The size scope and design of the pilot study is a
matter of convenience, time and money. It should be large enough to fulfill the
following functions.
Functions
of Pilot Study:
* It provides a better knowledge about
problem.
* It helps to identification and operationalization
of concepts relating to the study.
* It assists in discovering the nature
of relationship between variables and in formulating hypothesis.\
* It shows the nature of the population
to be surveyed and the variability within it
* It shows the adequacy of the tool for
data collection*
* It provides information for
structuring questions with alternative answers.
* It helps the researcher to develop an
appropriate plan of analysis
* It provides information for estimating
the probable cost and duration of the main study and of its various stages
Pre-Test
Pre
test is a trial test of a specific aspect of the study such as method of data
collection instrument–interview schedule mailed questionnaire or measurement
scale pre-testing has several Purposes. They are:
* To test whether the instrument would
elicit responses required to achieve the research objectives
* Whether the content of the instrument
is relevant and adequate
* To test whether wording of questions
is clear and suited to the understanding of the respondents field conditions
HYPOTHESIS TESTING
Hypothesis
is an assumption or some supposition to be proved or disproved. A research
Hypothesis
is a predictive statement capable of being tested by scientific methods, that
relates an independent variable with some variable. Hypothesis is usually
considered as the principal instrument for research. Its main function is to
suggest new experiments and observations.
Definition
of Hypothesis:
A
research hypothesis is a predictive statement capable of being tested by
scientific methods, that relates an independent variable to some dependent
variable. The feature of a hypothesis statement are as follows:
It
should be clear and precise It should be capable of tested
It
should state the relationship between variables It should be limited in scope
and must be specific It should be stated in simple terms
Basic
Concepts:
Null
Hypothesis: The random selection of the samples from the given population makes
the tests of significance valid for us. For applying any test of significance
we first set up hypothesis Such a statistical hypothesis, which is under test.
Is usually a hypothesis of no difference and hence is called Null Hypothesis.
It is usually denoted by (Ho)
Alternate
Hypothesis: Any hypothesis which is complementary to null hypothesis is called
and alternate hypothesis. It is usually denoted by (Ha). For example if the
null hypothesis is that there is no relationship between the eye color of
husbands and wives is rejected then automatically the alternate hypothesis is
that there is relationship between the eye color of husbands and wives is
accepted.
TYPE I ERROR AND TYPE
II ERROR:
In
the process of testing a hypothesis, a researcher may commit two type of errors
namely type I error and Type II error.
Type
I error: We commit this error when we reject a null hypothesis which is true.
Type
II error: This error is committed when we accept the null hypothesis which is
false. This could be stated below:
For
example, suppose we want to test the relationship between rainfall and
production. Suppose we set a null hypothesis that rainfall does not affect food
production. From experience and research findings in the past it is well known
that rain fall certainly affect food production. Hence the null hypothesis
should be rejected, but instead, if we accept it we commit type II error.
PROCEDURE FOR
HYPOTHESIS TESTING
Making
a formal statement:
It
consists of making a formal statement of the null hypothesis Ho and also of the
alternative hypothesis Ha
Selecting
a significance level:
Generally
the hypothesis is tested on a pre-determined level of significance and as such
the same should be specified. Generally in practice either 5% level or 1% level
is adopted for the purpose.
Deciding
the distribution to use:
After
deciding the level of significance the researcher has to determine the
appropriate sampling distribution.
Selecting
a random sample and computing an appropriate value:
The
researcher has to select a random sample(s) and compute an appropriate value
from the sample data.
Calculation
of the probability
The
researcher has to calculate the probability that the sample result would
diverge as widely as it has from expectations.
Comparing
the probability
Afterwards,
the researcher has to compare the probability thus calculated with ithe
specified value for a significance level.
USEFULNESS
OF STATISTICAL TOOLS (ANALYSIS)
* Statistical analysis of data serves
several major purposes. First it summarizes large mass of data into
understandable and meaningful form. The reduction of data facilitates further
analysis.
* Second, statistics makes exact
descriptions possible. For example when we say that the educational level of
people in X district is very high. The description is not specific; but when
statistical measures like the percentages of literate among males and females.
The percentage of degree holders among males and females and the like are
available the description becomes exact.
* Third, statistical analysis
facilitates identification of the casual factors underlying the complex
phenomena. What are the factors which determine a variable like labour
productivity of academic performance of students* What are the relative
contributions of the causative factors* Answers to such questions can be
obtained from statistical multivariate analysis
* Fourth statistical analysis aids the
drawing of reliable inferences from observational data.
* Last, statistical analysis is useful
for assessing the significance of specific sample results under assumed
population conditions. This is type of analysis is called hypothesis testing
PARAMETRIC
AND NONPARAMETRIC TESTS
Parametric
Tests:
The
tests of significance used for hypothesis testing are of two types; the
parametric and non-parametric tests. The parametric tests are more powerful but
they depend on the parameters or characteristics of the population.. They are
based on the following assumptions;
* The observations or values must be
independent
* The samples are drawn on a random
basis.
* The populations should have equal
variances
* The data should be at least at
interval level so that arithmetic operations can be used.
The
important parametric tests are ; The z-test, the t-test, and the F-test. They
are explained below:
The
Z-test:
It
is based on the normal distribution; it is widely used for testing the
significance of several statistics such at mean, median, mode, coefficient of
correlation and others. The relevant test statistic, z is calculated and
compared with its probable value (to be reads from the normal distribution
table) at a specified level of significance for judging the significance of the
measured concerned.
The
t-test:
It
is suitable for testing the significance of a sample mean or for judging the
significance of difference between the means of two samples. The t-test can
also be used for testing the significance of the co-efficient of simple and
partial correlations. The relevant test statistic, t, is calculated from the
sample data, it is compared with its corresponding critical value in the
t-distribution table for rejecting or accepting null hypothesis.
The
F-test:
The
F test is used to compare the variances of two independent samples. It is
also used in analysis of variance
(ANOVA) for testing the significance of more than two sample means at a time
.It is also used for judging the significance of multiple correlation
coefficients
NON
PARAMETRIC TESTS:
Most
of the statistical test requires an important assumption into be met if they
are not correctly applied. This assumption is that population of data from
which a samples are drawn is normally distributed. But there are some
situations when the researcher cannot or does not want to make such assumption.
In such situations we use statistical methods for testing hypothesis, which are
called non-parametric tests because such tests do not depended on any assumption
about the parameters of the parent population.
ADVANTAGES:
* They do not require any assumption to
be made about population following normal or any other distribution*
* Simple to understand and easy to apply
when the sample sizes are small.
* Most non-parametric tests do not
require lengthy computations.
* It is less time-consuming
* Non-parametric tests are applicable
for all types of data
* It makes possible to work with very
small samples.
DISADVATAGES:
* They ignore a certain amount of information
* They are not considered as efficient
as parametric test
The
important nonparametric tests are the chi-square test the median test the
Mann-whitney U test the sign test, the
Wilcoxin matched-pairs test and Kolmogorow Smirnov test.
MEASUREMENT
AND SCALING TECHNIQUES
MEASUREMENT:
Measurement
may be defined as the assignment of numeral to characteristics of object,
persons events according to rules.
SCALES:
The
instrument with the help of which a concept is measured is called a scale. A
scale ha a wide range of application is social science research. It is used in
all types of data collection techniques such as observation, interview,
projective techniques etc/.
Scaling
provides the procedures if assigning numbers to various degrees of opinion, attitude
and other concepts. Normally this takes place in two ways:
Making
judgment about some characteristics of an individual are then directly placing
him on a scale.
Constructing
a questionnaire in such a way that the score of individual responses assign him
a place on a scale.
IMPORTANT
SCALING TECHNIQUES RATING METHOD:
In
rating scale, the rater makes a judgment about some characteristics of a
subject and places him directly on some point on the scale. These scales can be
either discrete or continuous.
(a) Discrete Scales:
These
scales are used for raising ordinal data about on object. In these scales two
or more categories are provided representing discrete amount of some
characteristics. The rater can tick the category which he feels best describes
the person of object being rated. Thus for examples, the characteristics job
knowledge may be divided into five categories on a discrete scale thus
Exceptionally
good Above average Average Below average Poor
(b) Continuous graphic scales:
These
scales are used for raising interval data about an object. In these scales just
above the category notation, an uninterrupted line is provided. The rate can
tick anywhere along its length as shown below.
Both
these types of rating scales can use three kinds of standards for measuring a
characteristic or alphabetical, descriptive and behavior.
ATTITUDE
SCALE:
Attitude
scale are carefully constructed set of rating scales designed to measure one or
more aspects of an individual*s group*s attitude some object. The individual*s
responses to the various scales may be aggregated or summed to provide a single
attitude for the individual the following are the four types of Attitude
scales.
LIKER’T
SUMMATED SCALE;
Summated
scales consist of a number of statements which express either a favorable or
unfavorable attitude towards the given object to which the respondent is asked
to react. The respondent will tick his opinion, either favorable or unfavorable
the each statement in the instrument. The responses will give a numerical score
indicating its favourableness or unfavourableness and he scores are totaled to
measure the respondent*s attitude. In other words the overall score represents
the respondent*s positions.
In
a Likert scale, normally a respondent will be asked to respond to each of the
statements in terms of several degrees. Usually five degrees (but at time 3 to
7 may also be used) of agreement or disagreement. Suppose a researchers wants
to examine whether one considers his job quite pleasant, the respondent may
respond in any of the following ways:
strongly
agree – agree – undecided – disagree – strongly disagree.
In
the above scale, each points carries score, the response will be given weight
or scores. The least score will be given to the least favorable degree of job
satisfaction and the most favorable is given to the highest score.
Advantage:
* The Likert Type scale is easy to
develop in comparison to thurstone type scale it can be performed without a
panel judges.
* It is more reliable because under it.
Respondents can answer each statements included in the instrument
* The likert type scale permits the use
of statements that are not manifestly related to the attitude being studied.
* It can be used in a
respondent-centered and stimulus centered studies I.e., it shows how response
differ between people and also between stimuli.
* It requires less time to construct, it
is frequently used by the students of opinion research
Limitations:
* These scales will indicate whether
respondents are more or less favourable to a topic and they can not tell how
much more or less they are.
* The interval between strongly agree
and agree may not be equal to the interval between agree and undecided.
Thurstone
Type Scale (differential scales)
Here,
the selection of items is made by a panel of judges who evaluate the items in
terms of whether they are relevant to the topic area and unambiguous in
application. Here, the researcher adopts the following procedures:
* The researcher collects more
differential statements, usually 20 or more, that express various points of
view toward a group institution idea or practice.
* A panel of judges, will arrange them
in 11 groups or piles ranging from one extreme to another in position. The
judges will be asked to arrange generally in the first pile of the statements
which he thinks are most unfavourable to the issue, in the second pile to place
those statements which he thinks are next most unfavorable and he goes on doing
so in this manner till in the eleventh pile he puts the statements which he
considers to be the most favourable.
* The judges will sort out the items and
when there is disagreement between the judges in assigning a position to an
item that item will be left out.
* The panel will establish the median
scale value between one and eleven*
Then,
the researcher makes a final selection of statements, a sample of statements
whose median scores are spread evenly from one extreme to other is taken. The
statements so selected constitute the final scale to be administered to respondents.
The
respondents will be asked to check the statements with which they agree.. The
median value is worked out and this establishes their score or quantifies their
opinion. It may be noted that is the actual instrument the statements are
arranged in random order of scale value.
CUMULATIVE
SCALES:
* It consists of series of statements to
which a respondents express high agreement or dis agreement
* The statements are related to one
another in such a way that an individual who replies favorable to item no.3
also replied favorable to no.2 & 3
* The individual score is worked out by
counting the number of points concerning the number of statements he answered
favorable
*
SEMANTIC
DIFFERFENTIAL SCALES:
It
is an attempt to measure the psychological meaning of an object to an
individual.
It
consists of a set of bipolar rating scales, usually 7 points by which one or
more respondents rate one or more concept on each scale item.
DATA
ANALYSIS: EDITING AND CODING OF DATA
EDITING
Once
the data collection is complete, it is examined carefully to eliminate any
errors or mistakes. For that purpose of editing of data becomes mandatory.
Editing means to rectify or to set to order or to correct or to establish
sequence. Persons with editing responsibility should be trained and experienced
in this job. Editing is performed at two stages and depending on that it could
be two types. Field editing and centralized editing
Field
Editing: Field editing refers to the performance of the editing immediately in
the field where data is collected. For example if the data is collected through
questionnaire or schedule, then whether all the questions are answered or not
whether writing is legible or not etc should be checked out after the
collecting the questionnaire from the respondent in the field itself.
Centralized
Editing: In this type of editing, editing is done by a person or a team after all the recorded
questionnaires „ schedules are collected. So clearly it is not carried out on
the field itself or immediately after the data are collected. In such editing
normally the instructions regarding editing are printed and circulated to the
person or the team doing the editing. This is only to ensure that there is
uniformity in editing.
CODING
Coding
is a practice which simplifies recording of answers. When standard answers for
a question could be indicated, each answer is assigned a code. So instead of
writing the answers in full, the investigator simply writes the code. This is
not only saves times but also avoid confusing answers.
CLASSIFICATION
Classification
of data means grouping the data on the basis of some common characteristics.
Classified data can be used for specified purposes with ease. Further
classification adds to clarity and helps to maintain consistency. Classification
can be made on the basis of a) common characteristics like sex, literacy,
colour, height, and weight etc. b) geographical regions like north, south, east
west etc c) time oriented classification like monthly data, weekly data, yearly
data, d) value based classification in which collected data are grouped e)
reply based classification like no of people who answered yes to a question, no
to a question etc.
TABULATION
Tabulation
is the arrangement of classified in an orderly manner, In other words, it is a
method of presenting the summarized data tabulation is very important because
* It conserves space*
* It avoid need for explanation*
* Computation of data is made easier
* Comparison of data becomes very simple
* Adequacy or inadequacy of the data is
clearly visible
A
table contains columns and row, these columns and rows create small boxes.
Which are called cells. Tabulation has several rules and the most important
ones are listed below:
• Every table should be numbered
numbering could be in alphabet., Arabic or Roman
• Each table should have a distinct
title
• Unit of measurement of the values in
the table must be specified i.e. Rs. Crores, tones etc
• Each column should be titled.
• Each row must be titled
• Rows and columns are to be numbered
• Footnotes of the table should
indicate the explanatory notes on the data in the table and the footnotes must
be positioned below the table
• Data to be compared must be placed in
adjacent columns
SIGNIFICANCE
OF TABLES
It
reduces the complexity of data and provides simplicity of presentations:
Generally
the table removes unnecessary details and repetitions. They provide data
systematically in columns and rows. It presents a very clear idea of what the
table presents. Table provides a considerable saving in time taken in
understandings what is represented by the data and hence all confusion is
avoided.
It
facilitates comparison:
Tables
provide comparison. Generally table is divided into various parts and for each
part there are totals and subtotals, the relationship between different parts
of data can be studied much more easily with the help of a table than without
it.
It
gives identity to the data:
When
the data are arranged in a table with a title and number they can be distinctly
identified and can be used as a source reference in the interpretation of a
problem.
It
provides patterns:
Tabulation
reveals patterns with the figures which can not be seen in the narrative form.
It also facilitates the summation of the figures if the reader desires to check
the totals.
Part
of a table
TYPES
OF TABLES:
Tables
can be broadly classified to two categories: 1.Simple and complex frequency
tables
2.
General purpose and special purpose frequency tables.
SIMPLE
AND COMPLEX FREQUENCY TABLES SIMPLE OR ONE WAY TABLE:
Here
only characteristics is shown, this is the simple type of table. The following
is the illustration of such a table.
TWO
– WAY TABLE:
It
shows two characteristics and is formed when either the sub or the caption is
divided into two coordinate parts. The following example illustrates the nature
of such a table
Number
of employees in a Bank at Different age-groups according to sex
GENERAL
PURPOSE AND SPECIAL PURPOSE FREQUENCY TABLES
These
tables are called reference tables. They provide information for general use or
reference. They usually contain detailed information and are not constructed
for specific discussion
Number
of Employees of a Bank according to Age-Groups, Sex and Ranks
TYPES
OF DIAGRAMS USED IN IRESEARCH REPORT
Generally,
the statistical results are presented through diagrams and graphs, We can see
them in newspapers, magazines, journals, advertisements, etc. the statistical
data may be displayed pictorially such as different types of diagrams, graphps
and maps significance of Diagrams and Graphs:
1.They
provide bird*s eye view of the entire data 2.They are attractive
3.
They provide memorizing effect 4.They facilitate comparison of data
CHOICE
OF SUITABLE DIAGRAM;
As
regards the selection of the diagram to be drawn, several factors determine this.
They are 1. Nature of data 2. The target audience for whom the diagram is drawn
3. The
volume
of communication to be given 4. The facilities available to draw the diagram 5.
Purpose of the representation 6. The size of the paper or the sanctioned size
for the diagram etc. Based on these factors, the right type of diagram is
selected.
Types
of Diagram:
a. One dimensional diagrams e.g. bar
diagrams
b. Two dimensional diagrams e.g
rectangles, squares circles and pie diagrams
c. Three dimensional diagrams
(A) One Dimensional Diagrams or Bar Diagrams
*
A
bar diagram is thick line whose width is shown merely for attention, the merits
of such diagrams are as follows
1. A reader can easily understand the
subject matter
2. They are simplest and he easiest to make
3. For comparison of large numbers of
items they are the only form that can e used effectively
Example
for simple bar diagram:
Single
bar diagram is the simplest of the bar diagram and is used frequently I
practice for the comparative study of two or more items or values of a single
variable or a single classification or category of data.
Suppose
a simple diagram is to be drawn for the following data:
Country
population: A B C D E F G
(In million) 20 50 68 43 65 25 40
Examples
for multiple bar diagram:
If
two or more sets of inter related variables are to be presented graphically,
multiple bar diagram are used. The technique of drawing multiple bar diagram is
basically same as that of drawing simple bar diagram. In this type of
diagramme, the data given for each year is draw together. As a result for each
year there will be a number of bars drawn which are attached to each other.
Percentage
bars:
This
type of diagram in which all the given data for each year is converted into
percentage. Then for each year one bar is drawn for 100%. This can be
understood from the example given below
Deviation
bars:
Deviation
bars are specially useful for graphical presentation of net quantities i.e
surplus of deficit e.g., net profit or net loss net of imports and exports
which have positive and negative values. This could be explained with the
following example.
Rectangles:
A
rectangle is a two-dimensional diagram because it is based on the area of
principle. Just like bars, the rectangles are placed side by side, proper and
equal spacing being given different rectangles, in fact, rectangle diagrams are
modified from of bar diagrams and give more detailed information than is
furnished by bar diagrams.
Square
Diagrams:
Among
the two dimensional diagrams, squares are specially useful if it is desired to
compare graphically the values or quantities which differ widely from one
another.
Circles:
Circle
diagrams are alternative to square diagrams and are used for the same purpose.
Pie
diagram:
A
pie diagram will show how the expenditure of the government is distributed over
different heads like agricultural, irrigation, industry, transport etc. A pie
diagram can show how the expenditures incurred by an industry under different
heads like raw materials, wages and salaries, selling and distribution expenses
etc., Pie diagrams are used while making comparison on a percentage basis and
not on an absolute basis. When pie diagrams are constructed on a percentage
basis percentage can be presented by circles of equal in size.
(B) TWO
DIMENSIONAL DIAGRAMS:
In
the one dimensional diagrams only the length of the bar is taken in to account.
Whereas in two dimensional diagrams the length as well as the width of the bar
is considered, thus the area of the bar represents the given data.
(A) Rectangles
(B) Square Diagrams
(
C) Three Dimensional Diagrams; Pictographs and Cartograms:
Pictographs
are not abstract presentation such as lines or bars but really depict the kind
of data we are dealing with. Pictures are attractive and easy to comprehend and
as such this method is particularly useful in presenting statistics to the
layman.
Cartograms
or statistical maps are used to give quantitative information on a geographical
basis. They represent spatial distribution. The quantities on the map can be
shown in many ways, such as through shaded or color by dots, by placing
pictograms in each geographical unit and by placing the appropriate numerical
figure in each geographical unit.
GRAPHICAL
REPRESENTATION OF DATA
Diagrams
furnish only approximate information and are not much utility to a statistician
from analysis point of view. On the other hand, graphs are more obvious,
precise and accurate than diagrams and can be effectively used for further
statistical analysis. They can broadly classified under the following two
heads:
i. Graphs of frequency distributions
ii. Graphs of Time series
GRAPHS
OF FREQUENCY DISTRIBUTIONS:
Frequency
graphs are designed to reveal clearly the characteristic features of a
frequency data. Such graphs are more appealing to the eye than the tabulated
data and are readily perceptible, to the mind. They facilitate comparative
study of two or more frequency distributions regarding their shape and pattern.
The most commonly used graphs for charting a frequency distribution for the
general understanding of the detail of the data are:
HISTOGRAM:
It
is one of the most popular and commonly used devices for charting continuous
frequency distributions, no matter whether the variable under study is discrete
or continuous.
FREQUENCY
POLYGON:
It
is another device of graphic presentation of a frequency distribution . It
facilitates comparison of frequency distribution, Frequency polygon is drawn
from the histogram or without histogram.
DATA
ANALYSIS
Analysis of data is considered to be highly skilled and
technical job which should be carried out .Only by the researcher himself or
under his close supervision. Analysis of data means
critical
examination of the data for studying the characteristics of the object under
study and for determining the patterns of relationship among the variables
relating to it’s using both quantitative
and
qualitative methods.
Purpose of Analysis
Statistical
analysis of data saves several major purposes.
1.
It summarizes large mass of data in
to understandable and meaningful form.
2.
It makes descriptions to be exact.
3.
It aids the drawing of reliable
inferences from observational data.
4.
It facilitates identification of the
casual factors unde3rlyiong complex phenomena
5.
It helps making estimations or
generalizations from the results of sample surveys.
6.
Inferential analysis is useful for
assessing the significance of specific sample results under assumed population
conditions.
Steps in Analysis
Different steps in research analysis consist of the following.
1.
The first step involves construction
of statistical distributions and calculation of simple measures like averages,
percentages, etc.
2.
The second step is to compare two or
more distributions or two or more subgroups within a distribution.
3.
Third step is to study the nature of
relationships among variables.
4.
Next step is to find out the factors
which affect the relationship between a set of variables
5.
Testing the validity of inferences
drawn from sample survey by using parametric tests of significance.
Types of Analysis
Statistical analysis may broadly classified as descriptive
analysis and inferential analysis
Descriptive Analysis
Descriptive statistics are used to
describe the basic features of the data in a study. They provide simple
summaries about the sample and the measures. Descriptive statistics is the
discipline of quantitatively describing the main features of a collection of
data or the quantitative description itself. In such analysis there are
univariate analysis bivariate analysis and multivariate analysis.
•
Univariate
analysis
•
Univariate analysis involves
describing the distribution of a single variable, including its central
tendency (including the mean, median, and mode) and dispersion (including the
range and quartiles of the data-set, and measures of spread such as the
variance and standard deviation). The shape of the distribution may also be
described via indices such as skewness and kurtosis. Characteristics of a
variable's distribution may also be depicted in graphical or tabular format,
including histograms and stem-and-leaf display.
•
Bivariate
analysis
•
Bivariate analysis is one of the
simplest forms of the quantitative (statistical) analysis. It involves the
analysis of two variables (often denoted as X,
Y), for the purpose of determining
the empirical relationship between them. Common forms of bivariate analysis
involve creating a percentage table or a scatter plot graph and computing a
simple correlation coefficient
•
Multivariate
analysis.
•
In multivariate analysis multiple
relations between multiple variables are examined simultaneously. Multivariate
analysis (MVA) is based on the statistical principle of multivariate
statistics, which involves observation and analysis of more than one
statistical outcome variable at a time. In design and analysis, the technique
is used to perform trade studies across multiple dimensions while taking into
account the effects of all variables on the responses of interest
Inferential Analysis
Inferential statistics is concerned with making predictions
or inferences about a population from observations and analyses of a sample.
That is, we can take the results of an analysis using a sample and can
generalize it to the larger population that the sample represents. Ther are two
areas of statistical inferences (a) statistiacal estimation and (b) the testing
of hypothesis.
Tools and Statistical Methods For
Analysis
The tools and technique of statistics can be studied under
two divisions of statistics.
(A)Descriptive Statistics
In descriptive statistics we develop certain indices and
measures of raw data. They are;
1.
Measures of Central Tendency
2.
Measures of Dispersion
3.
Measures of skeweness and kurtosis
4.
Measures of correlation
5.
Regression analysis
6.
Index numbers
7.
Time series analysis
8.
Coefficient of association
1. Measures
of Central Tendency.
The
central tendency of a distribution is an estimate of the "center" of
a distribution of values. There are different types of estimates of central
tendency such as mean, median, mode, geometric mean, and harmonic mean.
2. Measures
of Dispersion.
Dispersion
refers to the spread of the values around the central tendency. There are two
common measures of dispersion, the range and the standard deviation. It can be
used to compare the variability in two statistical series
3. Measures
of skewness and kurtosis
A
fundamental task in many statistical analyses is to characterize the location and variability of a data set. A further characterization of the data
includes skewness and kurtosis. Skewness
is a measure of symmetry, or more precisely, the lack of symmetry. A
distribution, or data set, is symmetric if it looks the same to the left and
right of the center point. Kurtosis is a measure of whether the data are peaked
or flat relative to a normal distribution. That is, data sets with high
kurtosis tend to have a distinct peak near the mean, decline rather rapidly,
and have heavy tails.
4. Measures
of correlation
Correlation refers to any of a broad
class of statistical relationships involving dependence. When there are two
variables, the correlation between them is called simple correlation. When
there are more than two variables and we want to study relation between two of
them only, treating the others as constant, the relation is called partial
correlation. When there are more than two variables and we want to study
relation of one variable with all other variables together, the relation is
called multiple correlations.
5. Regression
analysis
Regression analysis is a statistical
process for estimating the relationships among variables. It includes many
techniques for modeling and analyzing several variables, when the focus is on
the relationship between a dependent variable and one or more independent
variables.
6. Index
numbers
An
index is a statistical measure of
changes in a representative group of individual data points. Index numbers are
designed to measure the magnitude of economic changes over time. Because they
work in a similar way to percentages they make such changes easier to compare.
7. Time
series analysis
A
time series is a sequence of data points, measured typically at successive
points in time spaced at uniform time intervals. Time series analysis comprises
methods for analyzing time series data in order to extract meaningful
statistics and other characteristics of the data./
8. Coefficient
of association
Coefficient of association like,
Yule’s coefficient, measures the extent of association between two attributes.
(B) Inferential Statistics
Inferential
statistics deals with forecasting, estimating or judging some results of the
universe based on some units selected from the universe. This process is called
Sampling. It facilitates estimation of some population values known as
parameters. It also deals with testing of hypothesis to determine with what
validity the conclusions are drawn.
Ratios, percentages and averages
In
statistical analysis Ratios, percentages and weighted averages play a very
important role. Ratios show the relation of one figure to another. For example,
if the total number of students in a school is 2000, and total number of
teachers if\s 250, then the ratio between teachers and students is 250:2000. To
make it percentage, multiply by 100.
Measures of central tendency
(averages)
An
average is a single significant figure which sums up characteristic of a group
of figures. The various measures of central tendency are;
(1)
Arithmetic mean
(2)
Median
(3)
Mode
(4)
Geometric mean
(5)
Harmonic mean
Arithmetic Mean
The
Mean or average is probably the most commonly used method of describing central
tendency. To compute the mean all you do is add up all the values and divide by
the number of value.
Arithmetic mean =
Where x stands for an observed value,
n
stands for the number of observations in the data set ∑x stands for all
observed x values, and
For example, consider the test score values:
15, 20, 21, 20, 36, 15, 25, 15
The sum of these 8 values is 167, so the mean is 167/8 =
20.875.
Ex. 1 calculate mean from the following data
Value: 5
|
15
|
25
|
35
|
45
|
55
|
65
|
75
|
|
||||
Freq:
|
1
|
20
|
25
|
24
|
12
|
31
|
71
|
52
|
|
|||
|
|
|
|
Values
|
|
frequency
|
|
|
Fx
|
|
||
|
|
|
5
|
|
|
15
|
|
|
|
75
|
|
|
|
|
|
15
|
|
|
20
|
|
|
|
300
|
|
|
|
|
|
25
|
|
|
25
|
|
|
|
625
|
|
|
|
|
|
65
|
|
|
24
|
|
|
|
840
|
|
|
|
|
|
45
|
|
|
12
|
|
|
|
540
|
|
|
|
|
|
55
|
|
|
31
|
|
|
|
1705
|
|
|
|
|
|
65
|
|
|
71
|
|
|
|
4615
|
|
|
|
|
|
75
|
|
|
52
|
|
|
|
3900
|
|
|
|
|
|
|
|
|
|
250
|
|
|
|
12600
|
|
Arithmetic
mean =
==12600/250=50.4
Ex.
2 calculate mean from the following data
Age
|
:
|
0-10
|
10-20 20-30
|
30-40
|
40-50
|
50-60 60-70
70-80
|
||||
No
of
|
|
: 15
|
30
|
53
|
75
|
100
|
110
|
115
|
125
|
|
Persons dying
|
|
|
|
|
|
|
|
|
||
SOLUTION:
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
Age
|
f
|
|
Mid
|
fx
|
|
|
|
|
|
|
|
|
|
value(x)
|
|
|
|
|
|
|
|
0-10
|
15
|
|
5
|
75
|
|
|
|
|
|
|
10-20
|
30
|
|
15
|
450
|
|
|
|
|
|
|
20-30
|
53
|
|
25
|
1325
|
|
|
|
|
|
|
30-40
|
75
|
|
35
|
2625
|
|
|
|
|
|
|
40-50
|
100
|
|
45
|
4500
|
|
|
|
|
|
|
50-60
|
110
|
|
55
|
6050
|
|
|
|
|
|
|
60-70
|
115
|
|
65
|
7475
|
|
|
|
|
|
|
70-80
|
125
|
|
75
|
9375
|
|
|
|
|
|
|
|
623
|
|
|
31875
|
|
|
|
|
Arithmetic
mean =
= 31875/623 =51.16
years
Median
The
Median is the score found at the exact middle of the set of values. One way to
compute the median is to list all scores in numerical order, and then locate
the score in the center of the sample.
For example, if there are 500 scores in the list, score #250
would be the median. It is also, called {(n
+ 1) ÷ 2} th value, where
n is the number of values in a set of data.
Example
Imagine that a top running athlete in a typical 200-metre training
session runs in the following times:
26.1, 25.6, 25.7, 25.2 et 25.0 seconds.
First, the
values are put in ascending order: 25.0, 25.2, 25.6, 25.7, and 26.1. Then,
using the following formula, figure out which value is the middle value.
Remember that n represents the number of values in the data set.
Median= {(n+1) ÷2} th value = (5+1) ÷ 2 = 3
The third
value in the data set will be the median. Since 25.6 is the third value, 25.6
seconds would be the median time.
= 25.6 secondes
Example
Now, if the runner sprints the sixth 200-metre race in 24.7
seconds, what is the median value now?
Again,
you first put the data in ascending order: 24.7, 25.0, 25.2, 25.6, 25.7, 26.1.
Then, you use the same formula to calculate the median time.
Median={(n+1)÷2} th value
(6+1)÷2
7÷2
= 3,5
Since
there is an even number of observations in this data set, there is no longer a
distinct middle value. The median is the 3.5th value in the data set
meaning that it lies between the third and fourth values. Thus, the median is
calculated by averaging the two middle values of 25.2 and 25.6. Use the formula
below to get the average value.
Average=(value below median + value above median)÷2
=(third value + fourth value)÷2
=25.2+25.6)÷2
=50.82
= 25.4
The value
25.4 falls directly between the third and fourth values in this data set, so
25.4 seconds would be the median
The various steps in the computations of median in a
discrete series are as follows:
(i)
Arrange the values in ascending or
descending order of magnitude.
(ii)
Find out the cumulative frequencies.
(iii)
Find out the middle item by the
formula N + 1/ 2
(iv)
Now find out the value of (N + 1/2) th
item. It can be found by first locating the cumulative frequency which is equal
to or (N + 1/2) next higher to it, and then determining the value corresponding
to it. This will be the value of the median.
Finding the Value of Median
Find out the value of median from the following data
Daily wages
|
|
10
|
5
|
7
|
11
|
8
|
.
|
|
|
|
|
|
|
|
|
Number of Workers
|
15
|
20
|
15
|
18
|
12
|
|
|
|
|
|
|
Solution: Calculation of median
|
|
|
|
|
|
Wages
|
Number of persons
|
Cumulative Frequency
|
in ascending order
|
(f)
|
(c.f.)
|
5
|
20
|
20
|
7
|
15
|
35
|
8
|
12
|
47
|
10
|
15
|
62
|
11
|
18
|
80
|
Median is the value of (N+1)/2)th or ((80+1)/2)th
or 40.5th item.
All items from 35 onwards up to 47 have a value of 8. Thus
the median value would be 8.
In the case
of continuous frequency distribution, median class corresponds to the
cumulative frequency which includes N/2. After getting median class find median
by using the following interpolation formula.
Median, m = L1
+ [ (N/2 – CF) / f ]C
L1 means lower boundary of the median class
N means sum of frequencies
CF means cumulative frequency before the median class. Meaning that
the class before the median class what is the frequency
f means frequency of the median class
C means the size of the median class
Find
out the value of median from the following data
Class
|
: 0-10
|
10-20
|
20-30
|
30-40 40-50
|
50-60
|
60-70
|
|||
Frequency:
|
8
|
12
|
20
|
23
|
18
|
7
|
2
|
||
|
|
|
|
|
|
|
|||
|
|
Class
|
Frequency
|
Cumulative frequency
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
0-10
|
|
8
|
8
|
|
|
|
|
|
|
10-20
|
|
12
|
20
|
|
|
|
|
|
|
20-30
|
|
20
|
40
|
|
|
|
|
|
|
30-40
|
|
23
|
63
|
n/2
|
|
|
|
|
|
40-50
|
|
18
|
81
|
|
|
|
|
|
|
50-60
|
|
7
|
88
|
|
|
|
|
|
|
60-70
|
|
2
|
90
|
|
|
|
|
|
|
|
|
90
|
|
|
|
|
|
Median=(N/2)th
item= size of(90/2)th item= size of 45th item
45
is included in the cumulative frequency 63. The class having cf 63 is 30-40
Therefore
30-40 is the median class
Applying
interpolation formula
Median, m
= L1 + [ (N/2 – CF) /
f ]C
Here
L1=30, N/2=45, cf=40, f=23, c=10
Median, = 30 + [ (45
– 40) / 23 ]10 =30+50/23 = 32.17
Mode
Mode is the value of the item of a
series which occurs most frequently. According to Kenny ‘the value of the
variable which occurs most frequently in a distribution is called a mode”.
In
the case of individual series, the value which occurs more number of times is
mode. For example, a set of students of a class report the following number of
video movies they see in a month.
No
of movies: 10,15,20,15,15,8
Mostly
the students see 15 movies in a month. Therefore mode=15
When
no item appears more number of items than others we say mode is ill defined. In
that case, mode is obtained by the formula, mode= 3median-2mean
Ex:
find mode from the values 40, 25, 60, 35, 81, 75, 90, 10
Ans:
all items appear equal number of items. So mode is ill defined.
Therefore,
mode= 3median-2mean
Mean=
416/8=52
Median= {(n + 1) ÷ 2} th
value
th
=size
of c4.5 item= (40+60) / 2 =50
=
(3*50)-(2*52)= 150-104=46
In the case of disrete frequency distribution, the value
having highest frequency is taken as mode Ex: find mode
Size
|
: 5
|
8
|
10
|
12
|
29
|
35
|
40
|
46
|
No
of items: 3
|
12
|
25
|
40
|
31
|
20
|
18
|
7
|
Ans:
the value 12 has the highest frequency. Therefore 12 is the mode.
In
the case of continuous frequency distribution, mode lies in the class having
highest frequency.
From
the model class, mode is calculated using interpolation formula.
Mode=
L1 + [ (f1-f0 ) c]/2f1-f0-f2
Where,
L1is the lower limit of model class. f0and f2
are respectively the frequencies of class just preceding and succeeding model
class, f1 is the frequency of the model class.
Ex:
calculate mode from the following data.
|
|
|
|
|||||
Size:
10-15
|
15-20
|
20-25 25-30
|
30-35
|
35-40
|
40-45
|
45-50
|
||
Freq:
|
4
|
8
|
18
|
30
|
20
|
10
|
5
|
2
|
Ans:
Modal class is 25-30 since it has highest frequency.
Mode= L1 + [ (f1-f0 ) c]/2f1-f0-f2
= 25 + [ (30-18 ) 5]/260-18-20 =25+60/22 =27.73
Index Numbers
Index numbers are designed to measure the magnitude of
economic changes over time. A statistic which assigns a single number to
several individual statistics in order to quantify trends. Index numbers are
the indicators of the various trends in an economy. Price index numbers
indicate the position of prices whether they are rising or falling and at what
rate. Similarly, index numbers regarding agricultural production indicates the
trend of change whether it is rising or falling at what rate over a period of
time. An index number is an economic data figure reflecting price or quantity
compared with a standard or base value. The base usually equals 100 and the
index number is usually expressed as 100 times the ratio to the base value. For
example, if a commodity costs twice as much in 1970 as it did in 1960, its
index number would be 200 relative to 1960. Index numbers are used especially
to compare business activity, the cost of living, and employment.
An index number is specialized average. Index numbers may be
simple or weighted depending on whether we assign equal importance to every
commodities or different importance to different commodities according to the
percentage of income spent on them or on the basis of some other criteria. In
this chapter, we shall discuss both simple and weighted index numbers.
Simple and weighted index numbers
Simple index numbers are those in the calculation of which
all the items are treated as equally important. Here items are not given any
weight. Weighted index numbers are those in the those in the calculation of
which each item is assigned a particular weight.
Price Index Numbers
Price index
numbers measure changes in the price of a commodity for a given period in
comparison with another period.
Various methods used for construction
of Price index numbers
1)
Simple
Aggressive Method
This is the
simplest method. The prices for base year and current year are only required.
The aggregate of current year price is divided by aggregate of base year price
and multiplied by 100.
i.e. ∑p1÷ ∑p0 *100 where, p1
is the aggregate of price in the current year and p0 is the
aggregate no of prices in the base year.
Ex: for the data given below calculate simple index number
Commodities:
|
A
|
B
|
C
|
D
|
E
|
Price
in 2008
|
5
|
8
|
12
|
25
|
3
|
Price in 2010
|
7
|
9
|
15
|
24
|
4
|
Ans: we take 2008 as
base year and 2010 as current year, since 2008 is the back period
Commodities
|
Price in 2008(p1)
|
Price in 2010 (p0)
|
A
|
5
|
7
|
B
|
8
|
9
|
C
|
12
|
15
|
D
|
25
|
24
|
E
|
3
|
4
|
|
53
|
59
|
Simple index number =∑p1÷ ∑p0 *100
=59/53*100 = 111.3
2)
Simple
Average Relative Method
In
this method, price relative for each item is found out. Price relative is 1=
current year price ÷ base year price * 100
The average of these relatives is found out. ie price index number =∑ I/n
Ex: for the data given below calculate simple index number
by average relative method
Items
|
|
:
|
1
|
2
|
3
|
4
|
5
|
|
Price in base year
|
:
|
5
|
10
|
15
|
20
|
8
|
|
|
price in current year
|
:
|
7
|
12
|
25
|
18
|
9
|
|
|
|
|
|
|
|||||
Items
|
price in base year
|
price in current year
|
I==∑p1÷ ∑p0 *100
|
|||||
1
|
5
|
|
|
|
|
7
|
|
140.0
|
2
|
10
|
|
|
|
|
12
|
|
120.0
|
3
|
15
|
|
|
|
|
25
|
|
166.7
|
4
|
20
|
|
|
|
|
18
|
|
90.0
|
5
|
8
|
|
|
|
|
9
|
|
112.5
|
|
|
|
|
|
|
|
|
629.2
|
Simple index number=∑ I/n = 629.2/5 = 125.84
3)
weighted
aggressive method
in this
method weights are assigned to each item. The two well known methods used for
assigning weights are known as Laspeyer’s method and Paasche’s method.
Laspeyer’s method: base
year quantity is taken as weight.
Laspeyer’s index number = ∑p1q0/
∑ p0q0*100
Paasche’s method. : current
year quantity is taken as weight.
Paasche’s index number = ∑p1q1/
∑p0q1*100
Prof. Irving Fisher has suggested a formula for the
construction of index numbers.
Fisher’s index number = ∑ p1q0 ∑p1q1
∑p0q0 × ∑p0q1
Ex: calculate
(i) Laspeyer’s (ii) Paasche’s (iii) Fisher’s index numbers from the following
data.
Commodity
|
|
price
|
|
|
quantity
assumed
|
|
|
||||
|
|
2009
|
2010
|
|
|
|
2009
|
2010
|
|||
A
|
|
0.80
|
0.70
|
|
|
|
|
10
|
|
11.0
|
|
B
|
|
0.85
|
0.90
|
|
|
|
|
8
|
|
9.0
|
|
C
|
|
1.30
|
0.80
|
|
|
|
|
5
|
|
5.5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Commodity
|
p0
|
p1
|
q0
|
|
q1
|
p1q1
|
p0q1
|
p1q0
|
p0q0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A
|
0.80
|
0.70
|
10
|
|
11
|
7.7
|
8.80
|
7.0
|
8.00
|
|
|
B
|
0.85
|
0.90
|
8
|
|
9
|
8.1
|
7.65
|
7.2
|
|
6.8
|
|
C
|
1.30
|
0.80
|
5
|
|
5.5
|
4.4
|
7.15
|
4.0
|
|
6.5
|
|
|
|
|
|
|
|
202
|
23.6
|
18.2
|
21.3
|
|
Laspeyer’s index number = ∑p1q0/
∑ p0q0*100
=(18.2/21.3)*100 = 85.45
Paasche’s index number = ∑p1q1/
∑p0q1*100
= (20.2/23.6)*100 =85.59
Fisher’s index number=
|
∑ p1q0
|
∑p1q1
|
|||
|
|
|
|
|
|
|
∑p0q0 ×
|
|
∑p0q1
|
||
18.2 20.2 =85.5
21.3 × 23.6
4)
Weighted
Average Of Price Relative Method
In this method, we are using some arbitrary numbers as
weight.
The formula is ∑IV/∑V where, ‘V ‘is the weight and I=(p1/ p0)*100
Calculate index number of price for 2009 on the basis of
2008
Commodity
|
weight
|
|
price (2008)
|
price(2009)
|
||||
A
|
40
|
|
|
16
|
|
20
|
||
B
|
25
|
|
|
40
|
|
60
|
||
C
|
5
|
|
|
2
|
|
2
|
||
D
|
20
|
|
|
5
|
|
6
|
||
E
|
10
|
|
|
2
|
|
1
|
||
Ans:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Commodity
|
V
|
|
P0
|
|
P1
|
I
|
IV
|
|
A
|
40
|
|
16
|
|
20
|
125
|
5000
|
|
B
|
25
|
|
40
|
|
60
|
150
|
3750
|
|
C
|
5
|
|
2
|
|
2
|
100
|
500
|
|
D
|
20
|
|
5
|
|
6
|
120
|
2400
|
|
E
|
10
|
|
2
|
|
1
|
50
|
500
|
|
|
100
|
|
|
|
|
|
12150
|
|
Index number for 2009 = ∑IV/∑V = 12150/100 = 121.5
Interpretation
Interpretation refers to the
technique of drawing inference from the collected facts and explaining the
significance of those inferences after an analytical and experimental study. It
is a search for broader and more abstract means of the research findings. If
the interpretation is not done very carefully, misleading conclusions may be
drawn. The interpreter must be creative of ideas he should be free from bias
and prejudice.
STATISTICAL
INFERENCES
TESTING OF HYPOTHSIS
USING DIFFERENT STATISTICAL METHODS
What is hypothesis: A
hypothesis is an assertion or conjecture about the parameter(s) of population distribution(s).
Some basic concepts:
Null
Hypothesis: Null hypothesis is the hypothesis which is
tested for possible rejection under
the assumption that it is true.
For example, in case of a single statistic, H0 will
be that the sample statistic does not differ significantly from the hypothetical
parameter value and in the case of two statistics, H0
will be that the sample statistics do not differ significantly.
Type
- I and Type - II Errors: After applying a test, a decision is taken about
the acception or rejection of null
hypothesis vis - a - vis the alternative hypothesis. There is always some
possibility of committing an error in taking a decision about the hypothesis.
There errors can be two types.
Type
I Error: Reject
null hypothesis H0 when it is true.
Type
II Error: Accept null hypothesis H0 when it is false.
These
two types of error can be better understood with an example where a patient is
given a inedicine to curve some disease and his condition is scrutinised for
some time. It is just possible that the medicine has a positive effect but it
considered that it has no effect or adverse effect. Thuse, it is the first kind
of error or type I error. On the contrary, if the medicine has an adverse
effect but in considered to have a positive effect, it is called the second
kind of error or type - II error.
Level of Significance: It is
the quantity of risk of type - I error which we are ready to tolerate in making a decision about H0 . In
other words, it is the probability of type - I error which is tolerable. The
level of significance is denoted by and is conventionally chosen as 0.05 or
0.01. Level 0 01 is used for high precision and 0
05 for moderate precision.
Test of significance for single mean:
This section shows how to test
the null
hypothesis that the population mean
is equal to some hypothesized value. For example, suppose an experimenter
wanted to know if people are influenced by a subliminal message and performed
the following experiment. Each of nine subjects is presented with a series of
100 pairs of pictures. As a pair of pictures is presented, a subliminal message
is presented suggesting the picture that the subject should choose. The
question is whether the (population) mean number of times the suggested picture
is chosen is equal to 50. In other words, the null hypothesis is that the
population mean (μ) is 50. The (hypothetical) data are shown in Table 1. The
data in Table 1 have a sample mean (M) of 51. Thus the sample mean differs from
the hypothesized population mean by 1.
Table 1. Distribution of scores.
Frequency
|
45
|
48
|
49
|
49
|
51
|
52
|
53
|
55
|
57
|
The significance
test consists of computing
the probability of a sample mean differing from μ by one (the difference
between the hypothesized population mean and the sample mean) or more. The
first step is to determine the sampling distribution of the mean. As shown in
a previous section, the mean and standard deviation of the sampling distribution of the
mean are
μM = μ
and
respectively. It is clear that
μM = 50. In order to compute the standard deviation of the
sampling distribution of the mean, we have to know the population standard
deviation (σ).
The current
example was constructed to be one of the few instances in which the standard
deviation is known. In practice, it is very unlikely that you would know σ and
therefore you would use s, the sample estimate of σ. However, it is instructive
to see how the probability is computed if σ is known before proceeding to see
how it is calculated when σ is estimated.
For the
current example, if the null hypothesis is true, then based on the binomial distribution, one can compute that variance of the number correct
is
σ2 = Nπ(1-π) = 100(0.5)(1-0.5) = 25.
Therefore, σ = 5. For a σ of 5
and an N of 9, the standard deviation of the sampling distribution of the mean
is 5/3 = 1.667. Recall that the standard deviation of a sampling distribution
is called the standard error.
To recap,
we wish to know the probability of obtaining a sample mean of 51 or more when
the sampling distribution of the mean has a mean of 50 and a standard deviation
of 1.667. To compute this probability, we will make the assumption that the
sampling distribution of the mean is normally distributed. We can then use
the normal distribution calculator as shown in Figure 1.
Figure 1. Probability of a sample mean being 51 or
greater.
Notice that the mean is set to
50, the standard deviation to 1.667, and the area above 51 is requested and
shown to be 0.274.
Therefore,
the probability of obtaining a sample mean of 51 or larger is 0.274. Since a
mean of 51 or higher is not unlikely under the assumption that the subliminal
message has no effect, the effect is not significant and the null hypothesis is
not rejected.
The test
conducted above was a one-tailed test because it computed the probability of a
sample mean being one or more points higher than the hypothesized mean of 50
and the area computed was the area above 51.
To test the two-tailed hypothesis, you would compute the probability of
a sample mean differing by one or more in either direction from the
hypothesized mean of 50. You would do so by computing the probability of a mean
being less than or equal to 49 or greater than or equal to 51.
Figure 2. Probability of a sample mean being less than
or equal to 49 or greater than or equal to 51.
As you can see, the
probability is 0.548 which, as expected, is twice the probability of 0.274
shown in Figure 1.
Before
normal calculators such as the one illustrated above were widely available,
probability calculations were made based on the standard normal distribution.
This was done by computing Z based on the formula
where Z is the value on the
standard normal distribution, M is the sample mean, μ is the hypothesized value
of the mean, and σM is the standard error of the mean. For this
example, Z = (51-50)/1.667 = 0.60. Use the normal calculator, with a mean of 0
and a standard deviation of 1, as shown below.
Figure 3. Calculation using the standardized normal
distribution.
Notice that the probability
(the shaded area) is the same as previously calculated (for the one-tailed
test).
As noted,
in real-world data analyses it is very rare that you would know σ and wish to
estimate μ. Typically σ is not known and is estimated in a sample by s, and σM is
estimated by sM. For our next example, we will consider the data in
the "ADHD Treatment" case study. These data consist of the scores of 24 children with
ADHD on a delay of gratification (DOG) task. Each child was tested under four
dosage levels. Table 2 shows the data for the placebo (0 mg) and highest dosage
level (0.6 mg) of methylphenidate. Of particular interest here is the column
labeled "Diff" that shows the difference in performance between the
0.6 mg (D60) and the 0 mg (D0) conditions. These difference scores are positive
for children who performed better in the 0.6 mg condition than in the control
condition and negative for those who scored better in the control condition. If
methylphenidate has a positive effect, then the mean difference score in the
population will be positive. The null hypothesis is that the mean difference
score in the population is 0.
Table 2. DOG scores as a function of dosage.
D0
|
D60
|
Diff
|
57
|
62
|
5
|
27
|
49
|
22
|
32
|
30
|
-2
|
31
|
34
|
3
|
34
|
38
|
4
|
38
|
36
|
-2
|
71
|
77
|
6
|
33
|
51
|
18
|
34
|
45
|
11
|
53
|
42
|
-11
|
36
|
43
|
7
|
42
|
57
|
15
|
26
|
36
|
10
|
52
|
58
|
6
|
36
|
35
|
-1
|
55
|
60
|
5
|
36
|
33
|
-3
|
42
|
49
|
7
|
36
|
33
|
-3
|
54
|
59
|
5
|
34
|
35
|
1
|
29
|
37
|
8
|
33
|
45
|
12
|
33
|
29
|
-4
|
To test this null hypothesis,
we compute t using a special case of the following formula:
The special case of
this formula applicable to testing a single mean is
where t is the value we
compute for the significance test, M is the sample mean, μ is the hypothesized
value of the population mean, and sM is the estimated standard
error of the mean. Notice the similarity of this formula to the formula
for Z.
In the
previous example, we assumed that the scores were normally distributed. In this
case, it is the population of difference scores that we assume to be normally
distributed.
The mean
(M) of the N = 24 difference scores is 4.958, the hypothesized value of μ is 0,
and the standard deviation (s) is 7.538. The estimate of the standard error of
the mean is computed as:
Therefore, t = 4.96/1.54 =
3.22. The probability value for t depends on the degrees of freedom. The number
of degrees of freedom is equal to N - 1 = 23. As shown below, the t distribution calculator finds that the probability of a t less than
-3.22 or greater than 3.22 is only 0.0038. Therefore, if the drug had no
effect, the probability of finding a difference between means as large or
larger (in either direction) than the difference found is very low. Therefore
the null hypothesis that the population mean difference score is zero can be
rejected. The conclusion is that the population mean for the drug condition is
higher than the population mean for the placebo condition.
Introduction
Correlation
Modern business requires managers to make professionals decisions every
day. Which should depend upon predictions of future event. To make better use
of foreccast they rely on relationships (intuitive and calculated) between
related events. If decision makers can determine the strength of relationship
that exists between variables. It can aid the decision making process
considerably.
Examples:
The relationship between the age of husband and age of wife, price of a
commodity and the amount demanded, heights and weights of a group of persons,
income and expenditure of a group of persons etc.
Meaning:
The term carrelation indicates the relationship between two such
variables in which with change in the values of one variable. The values of the
other variable also changes. According to croxton and cowden "when the
relationship is of a quantitative nature the appropriate statistical tool for
discovering and expressing it in a brief farmula is known as correlation".
Uses of Correlation:
The study of correlation is useful in practical life because of the
reasons as follows: With the help of correlation analysis one can measure the
degree of relationship that exists between the variable in one figure. From one
variable we can estimate the other variable by the help of regression analysis
only when we establish the variables are related. Correlation study helps us in
identifying such factors which can stabilize a disturbed economic situation. Interrelationship
studies between different variables are helpful tools in promoting research.
Scatter Diagram Method:
The scatter diagram is the simplest method of studying relationship
between two variables. The simplest device for ascertaining whether two
variables are related is to prepare a dot chart harizontal axis representing
one variable and vertical axis representing the other. The diagram of dots so
obtained is known as scatter diagram. From the scatter diagram we can form a
fairly good though rough idea about the relationship between two variables. The
following diagrams of the scattered data depiet different type of correlation.
INTERPRETATION OF
CORRELATION:
REGRESSION MODELLING:
Regression
is a method to mathematically formulate relationship between variables that in
due course can be used to estimate, interpolate and extrapolate. Suppose we
want to estimate the weight of individuals, which is influenced by height,
diet, workout, etc. Here, Weight is the predicted variable. Height, Diet, Workout are predictor variables.
The
predicted variable is a dependant variable in the sense that
it depends on predictors. Predictors are also called as independent variables.
Regression reveals to what extent the predicted variable is affected by the
predictors. In other words, what amount of variation in predictors will
result in variations of the predicted variable. The predicted variable is
mathematically represented as YY. The predictor variables are represented
as X1X1, X2X2, X3X3, etc. This mathematical relationship is often
called the regression model.
Regression
is a branch of statistics. There are many types of regression. Regression is
commonly used for prediction and forecasting.
Regression
Equation
Now
that we know how the relative relationship between the two variables is
calculated, we can develop a regression equation to forecast or predict the
variable we desire. Below is the formula for a simple linear regression. The
"y" is the value we are trying to forecast, the "b" is the
slope of the regression line, the "x" is the value of our independent
value, and the "a" represents the y-intercept. The regression
equation simply describes the relationship between the dependent variable (y)
and the independent variable (x).
y=bx+a
The
intercept, or "a," is the value of y (dependent variable) if the
value of x (independent variable) is zero, and so is sometimes simply referred
to as the 'constant.' So if there was no change in GDP, your company would
still make some sales. This value, when the change in GDP is zero, is the
intercept. Take a look at the graph below to see a graphical depiction of a
regression equation. In this graph, there are only five data points represented
by the five dots on the graph. Linear regression attempts to estimate a line
that best fits the data (a line of best fit) and the equation of that line
results in the regression equation.
Linear Regression
It is
one of the most widely known modeling technique. Linear regression is usually
among the first few topics which people pick while learning predictive
modeling. In this technique, the dependent variable is continuous,
independent variable(s) can be continuous or discrete, and nature of regression line is linear.
Linear
Regression establishes a relationship between dependent
variable (Y) and one or more independent variables
(X) using a best fit straight line (also
known as regression line).
It
is represented by an equation Y=a+b*X + e, where a
is intercept, b is slope of the line and e is error term. This
equation can be used to predict the value of target variable based on given
predictor variable(s).
The
difference between simple linear regression and multiple linear regression is
that, multiple linear regression has (>1) independent variables, whereas
simple linear regression has only 1 independent variable. Now, the
question is “How do we obtain best fit line?”.
How to obtain best fit line (Value of a and b)?
This
task can be easily accomplished by Least Square Method. It is the
most common method used for fitting a regression line. It calculates the
best-fit line for the observed data by minimizing the sum of the squares of the
vertical deviations from each data point to the line. Because the deviations
are first squared, when added, there is no cancelling out between
positive and negative values.
We can
evaluate the model performance using the metric R-square. Logistic Regression
Logistic
regression is used to find the probability of event=Success and event=Failure.
We should use logistic regression when the dependent variable is
binary (0/ 1, True/ False, Yes/ No) in nature. Here the value of Y ranges from
0 to 1 and it can represented by following equation.
odds= p/ (1-p) = probability of event occurrence / probability of not event occurrence
ln(odds) = ln(p/(1-p))
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk
Above,
p is the probability of presence of the characteristic of interest. A question
that you should ask here is “why have we used log in the equation?”.
Since we
are working here with a binomial distribution (dependent variable), we
need to choose a link function which is best suited for this distribution.
And, it is logit function. In the equation above, the parameters
are chosen to maximize the likelihood of observing the sample values
rather than minimizing the sum of squared errors (like in ordinary regression).
Important Points:
- Logistic
regression is widely used for classification problems
- Logistic regression
doesn’t require linear relationship between dependent and independent
variables. It can handle various types of relationships because
it applies a non-linear log transformation to the predicted odds ratio
- To
avoid over fitting and under fitting, we should include all
significant variables. A good approach to ensure this practice is to use a
step wise method to estimate the logistic regression
- It
requires large sample sizes because maximum
likelihood estimates are less powerful at low sample sizes than ordinary
least square
- The
independent variables should not be correlated with each
other i.e. no multi collinearity.
However, we have the options to include interaction effects of categorical
variables in the analysis and in the model.
- If
the values of dependent variable is ordinal, then it is called
as Ordinal logistic regression
- If
dependent variable is multi class then it is known as Multinomial
Logistic regression.
No comments:
Post a Comment