Rafee welcomes you........: Research Methodology Notes

RESEARCH MEANING

Research is a serious academic activity with a set of objectives to explain or analyse or understand a problem or finding solution(s) for the problem(s) by adopting a systematic approach in collecting, organizing and analyzing the information relating to the problem.

Research –Definition

“Research ; may be defined as the systematic and objective analyze and recording of controlled observation that may lead to the developments or generalizations, principles or theories, resulting in prediction and possibility ultimate control of events”.

Sometimes research is defined as a movement, a movement from the known to the unknown. It is an effort to discover something. Some people say that research is a on effort to know “more and more about less and less”.

According to CLIFFORD WOODY, research comprises, defining and redefining problems formulating hypothesis or suggested solutions; collecting organizing and evaluating data; making deductions and reaching conclusions; and at as carefully testing the conclusions to determine whether they fit the formulating a hypothesis.

Research may also be defined ”Any organized enquiry discussed and carried out to provide information for solving a problem”.

OBJECTIVES OF RESEARCH:

Research is a conscious approach to find out the truth which is hidden and which has not been discovered by applying scientific procedure. Therefore each research has its own focus. This is stated in terms of objectives (or) purposes of conducting research. Objectives are like guide points in research, that the researcher does not nose his focus it is also believed that the objectives determine the nature of data to be compiled, the scope of collection, target group sample size and several other crucial aspects which ultimately decide the success or failure, adequacy or in failure, adequacy or research. The objectives or a research will be explained in the following words;

It develops Focus: The research may be to understand for become familiar with some phenomena or to get to know more in depth it. For example, since the days of steam engine, the research continued to come up with more powerful locomotive which could be operated with alternative sources of energy like diesel, electricity etc.

It reveals characteristics: To clearly reveal the characteristics of an individual or a situation or a group like a society is another type of research objective. For example in these days before a criminal is sentenced efforts are taken to study why he had turned criminal. This helps develops an approach to create opportunities for criminals to cha ge themselves and join the main stream of life

It determines frequency of occurrence: To determine the frequency with which something occurs or with which it associated with something else. In social research one of the major areas of repeated and continuous research is analysis of poverty and unemployment.

It tests hypothesis: To test a hypothesis about the casual relationship between variable being studied. This type of research is mainly to determine the relationship between various factors so that necessary policy options could be framed. For example, the reasons for several malpractices adopted in public distribution outlets include low salary and absence of regulation of service of the staff in such outlets. This is turn make them to feel insecure and they resort to mal practices. Having found this the Govt., had taken a policy to improve the salary structure of these staff ad regularize their services. Hence the study of casual relationship might help in formulation of policies.

Criteria of Good Research (characteristics)

* Research is half complete, when objective or purposes of it are clearly spelt out.

* It is necessary that every step followed in the process of research is explained fully. This is because any other person who wants to repeat such a work to achieve further improvement on lest the validity of the research work should be able to do it.

* The research design adopted for the study should be clear and match with objectives.

* The research should be honest in reporting the facts and revealing the flaws in the work.

* Every research work should be based on carefully selected analytical tools.

* The research work is incomplete without acknowledging the various data (or) facts.

* Limitations should be frankly revealed

CLASSIFICATION OF RESEARCH

FUNDAMENTAL (OR) BASIC RESEARCH:

Pure or Basic research is a search for broad principles and synthesis without and immediate utilization objectives. It is not concerned with solving any practical problems of policy but with designing and fascinating tools of analysis and with discovering underlying and if possible universal laws and theories.

Eg. John Robinson*s imperfect competition and chamberlain’s monopolistic competition.

Applied (or)Action Research:

Applied research also known as action research is associated with particular project and problem. Such research, being of practical value may release to current activity (or) immediate practical situation it aims at finding a solution for an immediate problems facing a society practically all social science research undertaken in India is of the applied variety and more particularly of the type which helps formulation of policy.

Descriptive Research:

It is designed to describe something such as demographic characteristics of consumers who use the product. It is designed to describe something, such as demographic characteristics of consumers who use the product. It deals with determining frequency with which something occurs or how two variables vary together. This study is also guided by a initial hypothesis. For example an investigation of the trends in consumption of soft drinks in relation to ration- economic characteristics as age, sex, ethnic group, family income, education level, geographic location, and so on would be descriptive study.

Merits:

* This approach helps to test the conclusion and findings arrived at on the basis of laboratory studies. By using this approach, it is possible to substantiate existing theories and conclusions on modifying them.

* Direct contact between the researcher and the respondent is brought about in this approach. This is very significant because, the researcher would be able to understand himself clearly the problem to be studied.

* With the possibility of direct contract with the respondent, the researcher is able to elicit all the relevant information and eliminate irrelevant facts.

Limitations:

* Unless the researcher is experienced there is every possibility of the approach being misused. Hurried conclusions and generalizations may be formed based on the inaccurate field data.

* As this approach involves collection of field data enormous time and efforts are required to plan and execute the field survey

* This approach also involves incurring heavy cost on data collection.

* Unless the respondents are co-operative. It is not possible to collect data through this approach.

HISTORICAL RESEARCH:

As the name suggests in this approach historical data is given importance to undertake analysis and interpret the results. Following this approach a researcher would collect past data for his research. A scholar using this approach has to depend on libraries for referring to the magazines or periodicals for collecting data.

Merits:

• This approach alone is relevant in certain types of research work. For examples to understand the trend in India*s exports. One has to collect the export data for a period of say 20 years and them analyze it similarly to study the impact of the liberalizations policy one has to collect information from 1991 till date.

• Historical approach makes research possible as it is firmly believed that once we understand the past, out understanding of the present and expectations of the future could be predicted to some extent. Hence historical research provides the insight into the past and facilitates looking into the future.

Limitations:

• Personal bias of the people who had written about historical events or incidents cannot be to mislead.

• Researchers tend to over generalize their results using historical approach.

• Persons using this approach should be conscious of the fact that historical data can be taken be give and indication about the past, but formulation of solutions on that basis and applying them in the current period is not correct.

EXPLORATORY RESEARCH:

Most of the marketing research projects begin with exploratory. It is conducted to explore the possibilities of doing a particular project. The major emphasis is on the discovery of ideas and insights. For example, a soft drinks firm might conduct an exploratory study to generate possible explanations. The exploratory study is used to spilt the broad and vague problem into smaller, more precise sub problem statements, in the form of specific hypothesis. An exploratory study is conducted in the following situations.

* To design a problem for investigations and to formulate the hypothesis.

* To determine the priorities for further research.

* To gather data about the practical problems for carrying out research on particular conjectural statements.

* To increase the interest of the analyst towards the problems and

* To explain the basic concepts.

Exploratory study is more flexible and highly informal. There is no formal approach in exploratory studies. Exploratory studies do not employ detailed questionnaire. These studies will not involve probability sampling plans. The following are the usual methods of conducting exploratory research

* Literature Survey

* Experience Survey and

* Analysis of insight stimulating cases.

LITERATURE SURVEY;

The literature search in fast and economic way for researchers to develop a better understanding of a problem area in which they have limited experience. In this regard, a large volume of published and unpublished data are collected and scanned in a relatively small period of time. Generally sources includes books, newspapers, Government documents trade journals, professional journals and soon. These are available in libraries, company records such as these kept for accounting sales analysis purposes; reports of previous research projects conducted problems incompletely but will be of great help to provide a director to further research.

EXPERIENCE SURVEYS;

In this method, the persons who have expertise knowledge and ideas about research subject may be questioned. Generally the company executives, sales managers, other relevant people of the company salesman, wholesalers, retailers who handle the product or related products and consumers are concentrated. It does not involve scientific ally conducted statistical survey, rather it reflects an attempt to get available information from people who have some particular knowledge of subject under investigation.

ANALYSIS OF INSIGHT STIMULATING CASES:(Case Study Approach).

Case study approach to research is recent development. In this approach the focus is on a single organization or unit or an institution or a district or a community. As the focus is on a single unit, it is possible to undertake an in depth analysis of the single unit. It is basically a problem solving approach, The following are the characteristics of case study method.

Intensive study: It aims at deep and through study of a unit. It deals with every aspect of a unit and studies at intensively.

The following methods are undertaken in case study;

* Determination of Factors: First of all the collection of materials about each of the units or aspects is very essential. The determination of factors may be of two types, particular factors and General factors.

* Statement of the problem: In this process the defined problem is studied intensively and the data are classified into various classes.

* Analysis and conclusion: After classifying and studying the factors an analysis is made

Advantages:

• As this approach involves a focused study there is lot of scope for generating new ideas and suggestions.

• It may provide the basis for developing sound hypothesis.

• As the researcher studies the problem from his own point of view, very useful and reliable findings may be obtained.

Limitations:

• A significant limitation of this approach is that unless the researcher is experienced he might ignore very important aspects.

• This approach also depends on the infirm furnished by the respondents unless the infirm is accurate the conclusions are bound to be irrelevant.

• It is often said that case studies are based on the observations of the researcher

EXPERIMENTAL RESEARCH:

This is a very scientific approach. In this approach the researcher first determines the problem to be studied. Then he identifies the factors that cause the problem. The problem to be probed is quantified and taken as the dependent variable. The factors causing to the problem will be taken as independent variable. Then the researcher studies the casual relationship between the dependent and independent variable. He is also able to specify to what extent the dependent variable. He is also able to specify to what extent the dependent variable is influenced by each independent variable.

For examples suppose food production is taken as the problem for a research study. then the scholar would determine the factors that will affect food production. Viz size of the land cultivated(x) rainfall (y) quantity of fertilizer applied (z) etc. These factors x,y and z are called independent variable,. Food production [A] is called dependent variable. Then by collecting data regarding all the four [A,x,y and z]. The researcher is able to state what percentage change in the final food (A) is explained by x,y and z. The effect of x on A, y on A and z on A is also studied. In this manner the researcher is able to successfully indicate to what extent various factors included in the study are important.

Merits of Experimental Approach (Research)

• This approach provides the social scientists a reliable method it observes under given conditions to evaluate various social programmes.

• This is one of the best methods of measuring the relationship between variables.*

• This approach is more logical and consistent that the conclusions drawn but of research based on this approach is well received.

• It helps to determine the cause – effect relationship very precisely and clearly.

• Following this approach researchers could indicate clearly the areas of future research

Limitations of Experimental Approach (Research)

• Unless a researcher is well experienced and trained in model building this approach cannot be easily followed.

• By relying more on models this approach may not add anything significant to knowledge

• A serious limitation of this approach is that it relies on sampling and collection of data. Unless these are properly planned and executed. the outcome of analysis will not be accurate.

DIAGNOSTIC STUDY;

This is similar to descriptive study but with a different focus. It is directed towards discovering what is happening, why it is happening and what can be done about. It aims at identifying the causes of a problem and the possible solutions for it.

A diagnostic study may also be concerned with discovering and testing whether certain variables are associated. E.g., are persons having from rural areas more suitable for manning rural branches of banks* (or) Do more villagers than city voters vote for a particular party.

EVALUATION STUDIES;

Evaluation study is one type of applied research it is made for assessing the effectiveness of social or economic programmes implemented (e.g. family planning scheme) or for assessing he impact of developmental projects (e.g., irrigation project) on the development of the area. Evaluation study may be defined as “determination of the results attained by some activity (whether a program me, a drug or a therapy or an approach) designed to accomplish some valued goal or objective”.

ANALYTICAL STUDY:

Analytical study is system of procedures and techniques of analysis applied to quantitative data. It may consist of a system of mathematical models (or) statistical techniques applicable to numerical data. Hence it is also known as the statistical method.

This method is extensively used in business and other fields in which quantitative numerical data are generated. It is used for measuring variables, comparing groups and examine association between factors. Data may be collected from either primary sources or secondary sources.

SURVEYS RESEARCH:

Survey is a fact finding study. It is a method of research involving collection of data directly from a population or a sample there of at particular time. It must not have confused with the more clerical routine of gathering and tabulating figures. It requires expertise and careful analytical knowledge. The analysis of data may be made by using simple or complex statistical techniques depending upon the objectives of the study

This type of research has the advantage of greater scope in the sense that a larger volume of information can be collected from a very large population

OTHER TYPES

Ex-post Facto Research;

Ex-post Fact research is based on observation made by inquiry in which the researcher does not have direct control of independent variables because their outcome has already occurred. This kind of research based on a scientific and analytical examination of dependent and independent variables. The ex-post facto research findings may become riskier by improper interpretations.

Panel Research:

Generally the survey research is valid for one time period which is known as „study period* and they do not reflect changes occurring time. The consumer attitudes toward purchasing a particular product are not static and hence changing. For example, it is not possible to study the changes occurring in these attitudes over a period in response to changes in the particular products marketing min. measuring change over time is known as

longitudinal analysis which is done by the use of panels. This method is generally used in sales forecasting by consumer preferences for various products measuring audience size and characteristics for media programmes testing new products.

Advantages;

· It considers the changes in the time.

· It provides more control

· It has greater co-operation

· It offers more analytical Data from respondents.

TYPES OF RESEARCH

RESEARCH PROCESS

Research is a process. A process is a set of advices that are performed to achieve a targeted outcome. That is a process involves a number of activities which are carried out either sequentially or simultaneously. So research process would refer to various steps and stages involved in research activity. The various stages are listed below;

* Formulating the Research problem

* Extensive literature survey

* Developing the hypothesis

* Preparing the research design

* Determining the sample design

* Collecting the data

* Analysis of data

* Hypothesis testing and

* Preparation of report

Formulating the Research Problem;

In research process the first and foremost step is selecting and defining a research problem. A researcher should at first find the problem. Then he should formulate it so that it becomes susceptible to research. To define a problem correctly, a researcher must know what a problem is* What is a Research problem a problem can be called a research problem if it satisfies the following condition;

• It must be worth studying

• The study of the problem must be socially useful

• It should be a problem untouched by other researchers or even if touched must be in need of further research possibility.

• A research problem should come out with solutions to the issue.

• It should be up to date and relevant to the current social happenings.

• All the special terms that are used in the statement of the problem should be clearly defined.

In selection of the problem the researcher should take into consideration of the following factors:

* Researchers* Interest

* Topic of significance

* Researcher*s resource

* Time availability

* Availability of data

* Feasibility of the study

* Benefits of the research

Review of Literature:

After defining the problem the researcher should undertake an extensive literature survey connected with the problem. In this context he can refer previous studies magazines journals and dissertations published, academic journals etc., In this process, oit should be remembered that one source will lead to another. The earlier studies if any which are similar to the study in hand should be carefully studied.

Developing the Hypothesis:

This is the next stage to the review. Here the researcher should state in clear terms the hypothesis. Hypothesis is an assumption to be proved or disproved. A research hypothesis is a predictive statement capable of being tested by scientific methods. That relates an independent variable to some dependent variable.

Features:

* It should be clear and precise

* It should be capable of being tested

* It should state the relation between variables

* It should be limited in scope and must be specific

* It should be stated in simple terms

Normally a hypothesis will be developed in the following ways:

* The researcher has to consult and deliberate with colleagues and experts about the problem.

* He has to examine the existing data, concerning the problem for possible trends and clues and

* He has to review studies on similar problems

Preparing the Research Design:

After developing hypothesis the researcher has prepare a research design. A research design could be defined as the blue print specifying every stage of action in the course of research. Such a design would indicate whether the course of action planned will minimize the use of resources and maximize the outcome. Research design is the arrangement of conditions for collection and analysis of data in a manner that aims to combine research purpose and economy in procedure.

Research design would answer the following questions.

* What is the study about*

* Why is the study being made*

* Where will be the study should be carried out*

* What type of data and where it would be collected*

* What is the period of study*

* Whether any sample would be used and if so what type of sample will be sued*

* What type of tools to be used*

A good research design should possess the folly features. However the qualities of a good research would differ from study to study:

* It should be flexible

* It should help to minimize bias at every stage

* It should facilitate collection and analysis

* It should be closely linked with objectives of the study

* It is a plan that specifies the sources and type of inform relevant to the research problem.

* It should specifically mention the type of approach to the study

* It should also include the time and cost budget since most studies are suffered by these two constraints:

Broadly there could be four different types of research design: viz., (Contents of Research design)

TYPES OF RESEARCH DESIGN

After developing hypothesis the researcher has to prepare a research design. A research design could be defined as he blue print specifying every stage of action in the course of research. Such a design would indicate whether the course of action planned will minimize the use of resources and maximize the outcome. Research design is the arrangement of conditions for collection and analysis of data in a manner that aims to combine research purpose and economy in procedure.

EXPLORATORY RESEARCH DESIGN:

This is also called formulative research design. This aims of formulating a problem for more precise idea or hypothesis, Based on this the subsequent stages of research could be planned. As this design is only of formulate type it should be highly flexible. While applying this design. Three different methods are followed:

Survey of related literature – by studying intensively the past studies and contributions relating to the field of study, the research problem could be easily formulated.

Conducting experience survey –this refers to undertaking collection of details and discussion with the experienced people in the chosen field of research. This would help the researcher to determine the extent to which he is original and can avoid duplication.

Analysis of insight-stimulating examples is yet another method in which depending upon the study on hand. In this method, the experience of people would be used as guide to develop or formulate a hypothesis.

DESCRIPTIVE AND DIAGNOSTIC RESEARCH DESIGN:

Descriptive research design is concerned with research studies with a focus on the portrayal of the characteristics of a group or individual or a situation. The main objective such studies is to acquire knowledge. For example, to identify the use of a product to various groups,. a research study may be undertaken to question whether the use varies with income age sex or any other characteristics of population.

On the other hand the diagnostic studies aim at identifying the relationship of any existing problem. Based on the diagnosis, it would also help to suggest methods to solve the problem. In this process it may also evaluate the effectiveness of the suggestions already implemented.

EXPERIMENATAL RESEARCH DESIGN;

The experimental research studies are mainly focused on finding out the cause and effect relationship of the problem under study. Actually when observation is arranged and controlled it becomes experimental study. An experiment is a test or trial or an act or operation for the purpose of discovering something unknown or of testing principle, supposition etc., it is a process in which one or more variables are manipulate under conditions that permit the collection of data that show the effects of any of such variables is a unconfused fashion.

The experimental design is broadly classified as a) informal experimental design and b)formal experimental design. The formal includes after only design, after only with control design before and after without control design before and after control and expost facto design. The formal experimental design would include completely randomized design randomized block design; Latin squares design and factorial design.

Sampling design: all the details connected with the sampling process from the determination of sample size down to the collection of data, would be spelt out.

Observational design: If the study makes use of observational technique then what type of observation technique would be used, conditions under which the observations will e made would be indicated.

Statistical design: This part of research design would spell out the type of analysis that would be carried out.

Operational design: This design would lay down the steps that would be taken at each stage as the design is executed.

Determining the sample Design:

A sample, as the name implies is a smaller representation of a large whole simple speaking the method of selecting for the a study portion of the universe with a view to draw conclusion about the universe is known as sampling.

The researcher must decide the way of selecting a sample or what is popularly known as the sample design, In other words a sample design is a definite plan determined before any data are actually collected for obtaining a sample from given population samples can be either probability samples or non probability samples.

Collecting the Data:

Collection of data is on important stage in research. In fact the quality of data collected determine the quality of research. A researcher has several ways of collecting the appropriate data which offer considerably I the context of money, time and other resources as per its sources the data may be classified as primary data and secondary data. Primary data is known as the data collected for the first time through field survey. Such data are collected with specific set of objectives to assess the current status of any variables studied. By survey methods data can be collected by anyone or more of the following ways:

* Observation Method

* Personal Interviews

* Telephone survey

* Questionnaires

* Schedules

Secondary data refers to the information or facts already collected such data are collected with the objective of understanding the past status of any variable.

Processing and analysis of Data:

Processing refers to the subjecting the data collected to a process in which the accuracy, completeness, uniformity of entries and consistency of information gathered are examined. Most commonly processing is understand as editing, coding, classification and tabulation of the data collected. After processing in research a scholar explains the tools that he has adopted for analyzing the data. The scholar should select the tools of analysis by considering the objectives set for the study. He should examine the type of analysis required for accomplishing each objectives set. Based on that this he must explain the features of the tool and how is it applied.

Testing the Hypothesis:

The researcher after analyzing the data will test the type of /Hypothesis while testing the hypothesis various tests such as chi-square, test, t-test, F-test will be used depending upon the nature and object of research. Hypothesis – testing will result in either accepting the hypothesis or rejecting it.

Preparation of the Report:

After the analysis and interpretations are over, the research has to prepare the report. The body of the report includes – introduction review of literature, methodology result and discussions and summary and conclusions/

Relevance of Research in Decision Making in Various Functional Areas of Management

Generally, a manager has to take a course of action which is most effective in attaining the goals of the organization Research provides facts and figures in support of such business decisions. It helps the manager to choose a measuring rod to judge the effectiveness of each decision. This may be the reason why executives and business professionals consider research and research findings as a boon in their problem solving process.

• Any research on management will have the following general objectives:

• The objective of decision making

• The objectives of decision making

• The objective of controlling the managerial activities

• The object of studying the economic and business environment

• The object of studying the market

• The object of studying the new product development

• The object of studying innovation

• The object of studying customer satisfaction

For management the research helps the management in the following ways:

* Research provides „decision alternatives in decision making*

* Research stimulates thinking analysis evaluation and interpretation of the business environment

* Research leads to innovation

Research facilitates the development of new products and modification of the existing products

* Research easily locates the problem areas.

* Research establishes the relationship not only between variables in each functional area, but also between the various functional area.

* Research facilitates business forecasting

* Market and Marketing analysis may be based on research

* Research is an aid to management information system and

* Research helps to re-design corporate policy and strategy.

Functional areas of any business cover production personnel marketing finance and organizational. They scope of research on these areas are listed below

Research for Marketing decisions: New product development research – Research to brand equity and preference – Research on pricing strategies – Research on distribution channels – Research on salesman qualities and effectiveness – Research on media effectiveness – Research on marketing information system etc.

Research for personnel Decisions: Research on effectiveness of different sources of recruitment and training – Research on leadership style and effectiveness – Research of personnel information system etc.

Research for capital market decisions: Research on issues, like climate culture creativity change design etc.,

Research for Financial decisions: Research on cost of capital and capital structure – Research on working capital management research on inventory management – etc.

Research on Business Strategies: Strategic alliances and divorces – Mergers and acquisitions – Disinvestment –Reorganizations – Reengineering etc.

To sum up research is an ingredient in all the functional areas of commerce and economics production and materials management extensively make use of research. However a close observation of management practices I India would determine whether research receives its due importance.

SAMPLING

Meaning of Sample:

A sample as the name implies is a smaller representation of a large whole simply speaking the method of selecting a study portion of the universe (total population) is known as sampling.

Sampling is not anything which is followed only in statistics. It is used in everyday life when rice is purchased in provision store a small quantity is initially purchased and tested sometimes the small quantity is cooked and it is found food then the bulk is purchased. Similarly when a patient has to undergo blood test the clinical laboratory takes a few drops test it and them gives the report. Sampling as a method also used in research. By analyzing the sample data, the research gets some findings which he uses for arriving at conclusions.

Essentials (features) of sampling:

Representativeness: The sample selected should fully represent the population from which it is drawn. This means all the characteristics or features of the population should be reflected by the sample.

Adequacy: The size of the sample should be large enough so as to provide accurate results. Though it is difficult to state what is the ideal size of sample, statistically it can be determined.

Randomness: Samples should be selected at random. That is there should be no bias in the selection of sample elements and each item in the population should have equal chance of being selected.

Homogeneity: Any number of samples could be drawn from a population. But all these samples should have similarity in every respect. That is suppose a researcher selects 500 people from Chennai city as a sample to study consumer behavior of the people, them the sample elements should be all be people living in Chennai city. It should not include people who have come to Chennai city as tourists.

Merits of Sampling:

* Sampling method requires lesser time as only a part of the universe is included for data collection.

* Since only a part of the universe is included for the data collection, the cost incurred will also be less.

* By adopting suitable method of sample selection the results could be more reliable

* Sampling method is more frequently used for testing the accuracy of information collected through census method.

Limitations of Sampling:

* Unless sampling method is carefully applied it may result in misguiding findings.

* Use of sampling requires the services of experts and specialists. This in turn will reflect on costs.

* Some times when the sample size itself is very large then sampling method would also be done consuming and costly.

* Apart from a detailed process to be followed sampling also calls for application of a number of tests to verify the findings and results. This makes the method more complex.

* While using sampling the investigators have to be fully trained. This will add to the cost.

METHODS OF SAMPLING

Sampling method can be broadly classified as 1. Random or probability sampling and

2. Non-random or non-probability sampling. Under the former every element of the population enjoys equal chance of being selected. While the under the later use elements will have constituting the sample are selected on some basis.

For example, suppose from 2000 students in a college, 200 are selected at random then every one of these 2000 students has equal chance of getting selected. On the other hand, in the case of non random sampling. 200 students out of 2000 may be selected on such a way that there are 50 pure science students. In this case the sample is purposively selected. So it is not random sample.

Sample Methods

1.Types of Random (or)Probability Sampling

(a) Simple (or) unrestricted sampling

i) Lottery Method: In this method all the terms in the population are given numbers and these are written on chits of uniform size. Then these chits are placed in a local or a bag and the required number of chits are selected.

ii) Table of Random number: In this method, first the size of the sample is determined. Then using random number table, the required number of items is selected to form the sample.

(b) Restricted Random Sampling;

(i) Stratified Random Sampling: Stratum means a layer, Population from which samples are to be selected may contain a number of layers. From each layer a few samples are selected.. Suppose for a research work on the literacy level in Tamil Nadu data is collected from all places in Tamil Nadu. Adopting stratified random sampling, first the state is divided on to different districts. A few districts are selected at random. Then those districts are divided into Panchayat Unions. From this second stratum a few Panchayat unions are selected. Each Panchayat union divided into Panchayats and a few panchayats are selected at random. Then each panchayat containing a number of villages, a few villages are selected at random.

Merits:

* It has better representative ness

* It also gives more accurate information and there would be better coverage of the population.

Limitations:

* Requires lot of care and pre-planning

* A prior knowledge of the composition of the population is required.

* Method is very expensive in terms of bone of money

* Any bias in selection from each stratum will affect the accuracy of results.

(ii) Systematic Random Sampling: In this method the sample is formed by selecting the first unit at random and them selecting the remaining items at evenly spaced intervals.

For example suppose from 2000 college students we have to select a sample of 50 students. First we determine the sampling interval (k). this is obtained by dividing the size of population by sample size (i.e.40; 2000/50) = 40. Them from serial number 0001 to 0040 we

selected at random a serial number. Suppose we have selected with the serial number 15 with that we add 50 for another sample, So the sample will be as 15, 65,115 ,…and soon.

Merits:

* It is very simple to adopt*

* The time and cost involved are relatively less

* With a large population, this method is easy to use

* Random selection of items is ensured once the sampling interval is determined.

Limitations:

It is less representative, as once the first item is selected at random, subsequent items are all lying at uniform interval, So the selected items may lack representative ness. The first item should be strictly selected at random, If there is bias in this first stage this will influence the items selected at subsequent stages.

Multistage or Cluster Sampling: As the name suggests, in this method the samples are selected at different stages here the population is first divided into different stages. All the samples at random at different stages will possess the common characteristics or will be homogeneous on some basis.

Merits:

* It is highly flexible

* It ensures better representative ness

* This type of sampling is very useful either for formulating policy of evaluating an implemented policy.

* Easy to compute.

Limitations:

* In practice this method is found to be less accurate compared to other methods because bias at any stage will get accumulated.

* Unless a person is fully aware of the various stages into which the population can be divided, he cannot be effective in selecting the required number of samples.

NON- RANDOM SAMPLING OR NON PROBABILITY SAMPLING

Non random sampling or non probability sampling refers to the sampling process in which the samples are selected for a specific purpose with pre-determined basis of selection. This type of sampling is also required at times when random selection may not be possible.

(a) Judgment Sampling: In this method the sample selecting is purely based on the judgment of the researcher. This is because the researcher may lack information regarding the population from which he has to collect the sample. Population characteristics not known in such cases the researcher can use this method. Once the sample size is determined the investigator is free to select any item on the field.

For example, suppose 100 boys are to be selected from a college with 1000 boys if nothing is known about the students in this college, then the investigator may visit the college and choose the first 100 boys he met or he may select 100 boys all belonging to III year or he might select 50 boys from commerce and 50 from science.

(b) Convenience Sampling: This method of sampling involves selecting the sample elements using some convenient method without going through the rig our of sampling method.*

For example, suppose 100 car owners are to be selected. Then we may collect from the RTO*s office the list of car owners and then make a selection of 100 from that the form the sample.

(c ) Quota Sampling: In this method the sample size is determined first and then quota is fixed for various categories of population, which is followed while selecting the sample, Suppose we want to select 100 students, and it might say that selection oof sample be according to the quota given below.

Boys 50% and girls 50% then among the boys 60% college students and 40% from plus two students. A different or the same quota may be fixed for girls.

SAMPLING ERRORS

While using sampling, errors are committed. These errors are broadly classified as sampling errors and non-sampling errors.

(1) Biased Errors:

Biased errors are understood as the inference of the investigators likes and dislikes in the process of sampling. For sample if an investigator has to collect data from a specific group also. This may because of investigator*s urge to complete the work early or failure to understand the purpose of the survey. Such a mistake may result in collection of wrong data which eventually will result only in wrong conclusions or inferences about the population. The following are the reasons for biased errors.

Faulty process of selection: This refers to a situation when the investigator does not apply the randomness in his choice or selection of the sample elements from the population.

Faulty collection of information; Adoption of faulty method of collecting information may cause errors. This will happen if the scope is not clear.

Faulty method of analysis: This will happen when the researcher is not having knowledge about the usage of tools.

(2) Un Biased Errors:

Non-sampling errors are those errors, which are not due to any sampling process. It is due to several other causes. Such errors are most due to the following reasons:

* Investigators may collect data without using complete schedules or proper measurement. As a result data collected may not be relevant at all.

* Faulty method of interview or observations may also contribute to non- sampling errors.

* Using of UN trained and un skilled investigators.

SAMPLE SIZEAND ITS DETERMINATION

What is the size of the sample* How large should be „n* when the size (n) is very small the researcher may achieve the objectives and if it is too large, he may incur huge cost and waste resources. Generally, a sample must be of an optimum size i.e., it should not be too large nor too small. Normally the size should be large enough to give a confidence interval of desired width and as such the size of the sample must be chosen by some logical process. How ever the researcher has to key the following points in his mind while deciding the size of the sample.

Nature of the Universe:

When the items of the universe are homogenous, a small sample can serve the purpose, suppose they are heterogeneous, a large sample would be required.

Number of groups:

When a researcher forms class – groups a large sample is necessary as a small sample might not be able to give a reasonable number of items in each class-group.

Nature of study:

When the researcher examines the items very intensively and continuously then the sample should be small. He may prefer general survey when the size of the sample is large but a small sample is considered appropriate in technical surveys.

Sample Technique:

The researcher has to decide the sampling tools while determining the size of the sample A small random sample is better than a larger but badly selected sample.

Accuracy and confidence level:

A researcher requires a large size sample when the accuracy or the level of precision is to be kept high. To get more accuracy for a fixed significance level the samples size has to be increased fourfold.

Resources available:

What amount of time and financial resources are available to the researcher will determine the size of sample, With sufficient time and large volume of funds available the sample size could be large otherwise it should be small.

Miscellaneous factors:

In addition to the above considerations the following points to be considered by a researcher. Nature of units size of the population size of questionnaire availability or trained investigators the conditions under which the sample is being conducted the time available for completion of the study.

Some times the mathematical formula is used to determine the sample size. The formula is given below:

N = (Z / d)

When n is the sample size Z is the degree of accuracy desired (specified level of confidence) is the standard deviation of the population and d is the difference between the population mean and sample mean.

COLLECTION OF DATA

Data refers to information of facts often researchers understand by data only numerical figure. It also includes facts non-numerical information qualitative and quantitative information in a research of the data are available the research is half-complete. Data could be broadly classified as primary data and secondary data they are also mentioned as sources of data.

Primary Data:

Primary Data is known as the data collected for the first time through field survey. Such data are collected with specific set of objectives to assess the current status of any variable studied. By survey methods the data can be collected by any one (or) more of the following ways.

Questionnaire (or) Schedule:

In this method a pre-printed list of questions arranged in sequence is used to elicit response from the respondent

Interview:

This is a method in which the researcher and the respondent meet and questions raised are answered and answered and recorded. This method is adopted when personal opinion or view point are to be gathered as a part of data.

Observation:

In this method the observer applies his sense organs to note down whatever that he could observe in the field and relate these data to explain some phenomena.

Feed Back Form:

In the case of the consumer goods the supplier or the manufacturer send the product along with a pre-paid reply cover in which questions on the product and its usage are raised and the customer is requested to fill it up and send. Based on this first hand information about the product from the consuming public is obtained.

Sales Force opinion:

On several occasions the manufacturers or distributors collect information about the movement of the product or market size, market share etc..through sales force on the field.

The salesman visit the retailer*s shop to not down the details of stock movement. Availability of items etc which give valuable information.

Projective techniques:

This technique is adopted to study the consumers though methods like recalling advertisements them story completion tests etc. Through this technique it is possible to compile information to be used as the basis for projecting the demand for the product at different points of time.

Collection through Mechanical Devices:

There are several shopping establishments where hidden video cameras are positioned at vantage points this are used for observing the public inside the ship. Apart from helping to eliminate pilferage and theft they provide very useful information on the consumers and their preference of products.

CLASSIFICATION OF DATA

PRIMARY (DATA) SOURCES

1. OBSERVATION;

Observation as a method of data collection ois used very frequently whenever collection of data through other methods is difficult for example it is not always possible to conduct interviews with every person to collect required information. There are occasion when no other method can be adopted for data collections. For instance, suppose a scholar wants to study the life style of hill tribe. It is certainly not possible to use a questionnaire or schedule or interview only alternative available is observation as the respondents would not rely any question orally or in written.

Observation may be defined as, “sensible application of sense organs in understanding less explained or unexplained phenomena” Whenever a researcher is unable to compile information through any other method then he has to effectively apply his sense organs to observe and explain. So it may be said that observation involves recording of information applying visual understanding backed by alert sense organs.

Types of Observation:

Structured observation:

When observation takes place strictly in accordance with a plan or a design prepared in advance it is called structured observation in such a type the observer decides what to observe what to focus on what type of activity should be given importance who are all to be observed etc in advance.

Unstructured Observation:

In this type of observation there is no advance planning of what how when, who etc., of observation. The observer is given the freedom to decide on the spot to observe everything that is relevant.

Participant Observation:

In this method the observer is very much present in the mindset of what is observed for example, suppose a researcher is studying the life style of a hill tribe, then he might understand the life style of the tribe better only when the stays with them. He is a participant in the sense he is physically present on the spot to observe and not influencing the activities.

Non-participant Observation:

This is a method in which the observer remain detached from whatever is happening around and does not involve himself in any activities tapes place. He is present only to observe and not to take part in the activities. That is the target audience does not know his presence at all. For example, the police men not in uniform is deputed on observation duty whenever a processing tapes place.

Controlled Observation:

In this method the observer performs his work in on environment or situation, which is very much planned (or) set. For example, sometimes to the effectiveness and alertness of airport security system a mock even (like fire accident) is carried out. Then how the security staff reacts to such mock event is observed. Based on this the weakness on his system are noticed and steps taken to eliminate them.

Merits of Observation Method of Data Collection:

* If observation is done correctly, the scope for bias is very much minimized.

* Through observation, the current scenario in which anything is happening noticed and explained there is no interpretation of how things would be happened in the past or will happen in future etc.

* As there is no need to get any reply or details from the respondents, observation does no require any co-operation of the respondents.

* This is fairly reliable method, provided the observer is well experienced trained and sincere.

* Whenever respondents are illiterate and incapable of answering any question (due to language barrier (or) cultural background etc.,) observation is the only method of data collection available

Limitations of Observation:

* This is a relatively costly method of data collection

* It could be noticed that what is observed may bring out only part of the facts. While data collected through questionnaire or interview ensure letter coverage.

* There is a lot of scope for the observer to get distracted or influenced by unexpected factors which would affect the accuracy of information collected

How to make observation successful:

* First the researcher should have a clear grasp of what he should observe and its purpose.

* The person should be gained in adopting observation

* The person should avoid his personal likes & dislikes.

* He might be alert and intelligent

* He should be able to connect all the things observed.

2. INTERVIEW

One of the very old methods of collecting data is the interview method. Interview method involves direct or indirect meeting of the respondents by the researcher. The researcher determines the questions to be raised at the time of interview and elicit the response for them. The reply given is either written down in a note book or recorded in audio or video cassette. This method has to be necessarily adopted whenever details regarding any confidential matter are to be collected or the research requires data collection directly from the respondents.

Interview may be broadly classified as 1.Direct interview and 2.Indirect interview

Direct Interview:

In this type of interview, the interviewer and the interviewee meet personally either with prior appointment or not. Usually when this technique is adopted the interviewer may brief the respondent about the purpose of interview and its scope in advance. This enables the respondent to be ready with necessary details (or) data. This type of interview may be classified as structure a interview un structured interview focused interview clinical interview and non directive interview.

(A) structured Interview:

In this type of interview the person collecting information decides in advance the nature scope questions to be asked, the person to be contacted etc in advance. At the time of interview no deviation is made from the questions to be asked. For example, it is usual for journalist to interview the Finance Minister after the presentation of Budget. In such occasions, the journalist should be were prepared and decide in advance the questioned to be asked etc., Sometimes even the questions to be asked and other details are to be submitted to the authorities concerned, before conducting the interview. The most important advantages of such interview are below.

* The interview is well prepared and so the interview is conducted in the focused manner*

* Time of both the interviewer and respondents could be saved.

* There is no scope for irrelevant matter to find a place in the course of interview

* If the respondent is informed in advance he could prepare necessary details so that the outcome is reliable

But this method of interview has the following limitations

* Since the subject matter is decided in advance there is no scope for extending the interview even in case of need.

* If the questions are submitted in advance that will tends the respondent to give wrong information*s.

* There is a need for the interview to plan the interview properly and so if the plan is not perfect, the interview findings may not be complete.

(B) Un Structured Interview:

In this type of interview, interview is conducted on the spot without any preparations (or) advance information oto the respondent. For example, suppose an organization producing a new health drink wants to know the opinion of the people directly. Then it ight send trained field investigators who meet people directly. Then it might send trained field investigators who meet people at random and offer them a cp of that new drink. After they drink, their opinion is asked and the responses are noted down or recorded. Such interviews are also conducted when opinion poll is conducted. For example during election ttime, the TV channels would meet people moving around and ask them about their opinion about different parties and the one to which they would vote.

In this type of interview the object of the interviewer is to focus the attention of the respondent ion a specific issue (or) point /for example suppose a detective is questioning a person regarding a crime committed in an area. The detective has to be very much experienced to make the person interviewed to answer only about the crime and nothing else.

In this type ,the interviewer encourages the respondents to say whatever he likes and feels on a subject matter. There may not me much questing taking place. The respondent is free to express his views or opinions without any direction from the interviewer.. For example suppose in a college strike, an interviewer encourage the students to say whatever they feel above the reasons for the strike.

(E) Telephone Interview:

This is basically a type of direct interview and but there is no scope for physical presence of both the parties to the interview. This method will be useful in the following situations.

* When the informant and interviewer are geographically separated.

* When the study requires responses to five (or) six sample questions e.g.

* Radio, TV program me survey

* When the survey must be conducted in very short period of time, provided the units are listed in telephone directory.

This method of interview provides following advantages:

-More flexible

-It is a quickest way of obtaining information

-Less cost

-Recall is easy

-The rate of response is more than what we have in mailing method

-Replies can be recorded

-It does not require any field staff This method is suffered by following reasons:

-The respondents characteristics and environment can not be observed

-It is not suitable for intensive survey where comprehensive answers are required

-This method left the respondents who do not have telephone facilities.

-This method does not provide sufficient time to the respondents to respond

3. Questionnaire Method;

A questionnaire is a sheet(s) of paper containing questions relating to certain specific aspect. Regarding which the researcher collects the data. The questionnaire is given to the informant or respondent to be filled up. Sometimes questionnaire is also in the form of files generated trough computer. This usually called soft copy of questionnaire. Generally to test the reliability of the questionnaire, it should be tested on a limited scale and this is technically known as Pilot Survey. The objective of a pilot survey is to filter unnecessary questions, and the questions which are difficult to answer.

Mechanics of Questionnaire Construction / Designing a Questionnaire / Features of a Good Questionnaire

The following are the points to be given important while designing a questionnaire:

* Questionnaire should be printed / Cyclostyled / Xeroxed

* The first part of the questionnaire should specify the object or should be constructed using simple language and technical terms, concepts should all be avoided.

* Each question should be specific and clear.

* Personal Questions on wealth, habits etc., could be avoided

* Questions needing computation / calculation / consultation should be avoided

* Questions on sentiments / belief/ faith should be avoided

* Repetition of question should be eliminated

* Sufficient space should be given for answering questions

* If any diagram me or map is used then it should be printed clearly

* Instructions regarding how to return the filled questionnaire must be given, it is desirable that a self addressed sufficiently stamped envelope is sent along with the questionnaire to enable to respondents the send the filled up questionnaire

TYPE OF QUESTIONS TO BE INCLUDED

Open – end questions:

In these questions the respondents are given freedom to express their views as there is wide range of choice. E.g.

“How would you describe the use of this soap”*

Closed questions:

These type of questions do not allow the respondents to given answers freely E.g. “Would you describe the smell of this soap is attractive”*

Yes / No

Pictorial Questions:

In this type of questions picture are drawn, and the respondents indicate the answer by selecting the pictures he prefers.

Dichotomous questions:

In this questions two alternatives are given a positive one and a negative one. E.g. “Do you own a watch”*

Yes No

Multiple choice questions:

These questions contains more than two alternatives e.g. “Why have you preferred this brand of two wheeler*”

-Price

-Fuel – efficient

-comfort

-others (please specify)

Type of questions to be avoided:

(a) Leading questions:

A leading questions is one which makes it easier for the respondent to react in a certain way and is not natural. Examples of leading questions are :

“Are you against giving too much power to the trade unions”* “Don*t you that yesterday*s T.V. Drama was thrilling*”

(b) Loaded Questions:

Loading means attaching emotional feelings to particular words of concepts which tends to produce automatic approval or disapproval. Here the respondent would react to the word than the Question. Example:

“Have you tried to get special favours from a business establishment by pressuring

them*” yes or No

(c )Ambiguous questions:

An ambiguous question is one that does not have a clear meaning. It may mean different things to different people example.

Are you interested in a small house*

What does the word “interested” mean to own or hire* What does the word “small” mean

QUESTIONNAIRE CONSTRUCTION PROCEDURE

* Decide what information is needed.

* Determine the type of collecting data

* Determine the content of individual questions. Is question necessary

Does respondent have the information Respondent remembers the same Several questions needed instead of one

* Determine the type of questions

-open ended

-closed

-dichotomous

-pictorial

-multiple choice

* Decide on wordings of questions

* Decide question sequence

-Physical appearance

-easy to access

-easy to understand

-motivate

* Preliminary Draft

* Revision and final draft

(4) SCHEDULES

Schedules (contains a set of questions) are being filled in by the enumerators who are specially appointed for the purpose. Enumerators go to respondents, ask them questions from the proforma in the same order in which the questions are listed and record the replies on the space given.

Enumerators should be trained Example: Population census.

DIFFERENCE BETWEEN QUESTIONNAIRE AND SCHEDULE

SECONDARY DATA

The secondary data, are those which have already been collected some other agency and which have already been processed. Generally speaking secondary data is collected by some organization to satisfy its own need but it is being used by various departments for different reasons. For example, census figures taken are used by social scientists (economists) for social planning and research.

SOURCES OF SECONDARY DATA:

Doing the research with the secondary data is called as Desk research. The sources for secondary data or the sources for doing desk research will be gathered by the following ways: Internal Sources: Registers, Documents, Annual Reports, Sales Reports, previous Research papers , Sales records, invoices etc.,

External Sources: Journals on magazines, newspapers, public speeches, state and central govt., departments, reports etc., The information had from any published documents which may documents the researcher should consider the following points:

* Exactly what products are included in the statistical classification

* Who originally collected the data for what purpose, and whether three might any motive for misrepresentation*

* From whom the data were collected and how reliable the methodology might have been and

* How consistent the data are with other local or international statistics.

Choice between primary and secondary data:

The researcher must decide whether he will use primary data / secondary data in an research process. The choice between the two depends on

* Nature and Scope of Research

* Availability of financial resources

* Availability of time

* Degree of accuracy desired

* Status of the researcher (individual, govt., corprn,, etc)

Pilot Study

It is difficult to plan a major study or project without adequate knowledge of its subject matter, the population it is to cover, their level of knowledge and understanding and the like. What are the issues involved* What are the concepts associated with the subject matter* How can they be operationalize* What method of study is appropriate* How much money it will cost* investigation is conducted? This is called pilot study. The size scope and design of the pilot study is a matter of convenience, time and money. It should be large enough to fulfill the following functions.

Functions of Pilot Study:

* It provides a better knowledge about problem.

* It helps to identification and operationalization of concepts relating to the study.

* It assists in discovering the nature of relationship between variables and in formulating hypothesis.\

* It shows the nature of the population to be surveyed and the variability within it

* It shows the adequacy of the tool for data collection*

* It provides information for structuring questions with alternative answers.

* It helps the researcher to develop an appropriate plan of analysis

* It provides information for estimating the probable cost and duration of the main study and of its various stages

Pre-Test

Pre test is a trial test of a specific aspect of the study such as method of data collection instrument–interview schedule mailed questionnaire or measurement scale pre-testing has several Purposes. They are:

* To test whether the instrument would elicit responses required to achieve the research objectives

* Whether the content of the instrument is relevant and adequate

* To test whether wording of questions is clear and suited to the understanding of the respondents field conditions

HYPOTHESIS TESTING

Hypothesis is an assumption or some supposition to be proved or disproved. A research

Hypothesis is a predictive statement capable of being tested by scientific methods, that relates an independent variable with some variable. Hypothesis is usually considered as the principal instrument for research. Its main function is to suggest new experiments and observations.

Definition of Hypothesis:

A research hypothesis is a predictive statement capable of being tested by scientific methods, that relates an independent variable to some dependent variable. The feature of a hypothesis statement are as follows:

It should be clear and precise It should be capable of tested

It should state the relationship between variables It should be limited in scope and must be specific It should be stated in simple terms

Basic Concepts:

Null Hypothesis: The random selection of the samples from the given population makes the tests of significance valid for us. For applying any test of significance we first set up hypothesis Such a statistical hypothesis, which is under test. Is usually a hypothesis of no difference and hence is called Null Hypothesis. It is usually denoted by (Ho)

Alternate Hypothesis: Any hypothesis which is complementary to null hypothesis is called and alternate hypothesis. It is usually denoted by (Ha). For example if the null hypothesis is that there is no relationship between the eye color of husbands and wives is rejected then automatically the alternate hypothesis is that there is relationship between the eye color of husbands and wives is accepted.

TYPE I ERROR AND TYPE II ERROR:

In the process of testing a hypothesis, a researcher may commit two type of errors namely type I error and Type II error.

Type I error: We commit this error when we reject a null hypothesis which is true.

Type II error: This error is committed when we accept the null hypothesis which is false. This could be stated below:

For example, suppose we want to test the relationship between rainfall and production. Suppose we set a null hypothesis that rainfall does not affect food production. From experience and research findings in the past it is well known that rain fall certainly affect food production. Hence the null hypothesis should be rejected, but instead, if we accept it we commit type II error.

PROCEDURE FOR HYPOTHESIS TESTING

Making a formal statement:

It consists of making a formal statement of the null hypothesis Ho and also of the alternative hypothesis Ha

Selecting a significance level:

Generally the hypothesis is tested on a pre-determined level of significance and as such the same should be specified. Generally in practice either 5% level or 1% level is adopted for the purpose.

Deciding the distribution to use:

After deciding the level of significance the researcher has to determine the appropriate sampling distribution.

Selecting a random sample and computing an appropriate value:

The researcher has to select a random sample(s) and compute an appropriate value from the sample data.

Calculation of the probability

The researcher has to calculate the probability that the sample result would diverge as widely as it has from expectations.

Comparing the probability

Afterwards, the researcher has to compare the probability thus calculated with ithe specified value for a significance level.

USEFULNESS OF STATISTICAL TOOLS (ANALYSIS)

* Statistical analysis of data serves several major purposes. First it summarizes large mass of data into understandable and meaningful form. The reduction of data facilitates further analysis.

* Second, statistics makes exact descriptions possible. For example when we say that the educational level of people in X district is very high. The description is not specific; but when statistical measures like the percentages of literate among males and females. The percentage of degree holders among males and females and the like are available the description becomes exact.

* Third, statistical analysis facilitates identification of the casual factors underlying the complex phenomena. What are the factors which determine a variable like labour productivity of academic performance of students* What are the relative contributions of the causative factors* Answers to such questions can be obtained from statistical multivariate analysis

* Fourth statistical analysis aids the drawing of reliable inferences from observational data.

* Last, statistical analysis is useful for assessing the significance of specific sample results under assumed population conditions. This is type of analysis is called hypothesis testing

PARAMETRIC AND NONPARAMETRIC TESTS

Parametric Tests:

The tests of significance used for hypothesis testing are of two types; the parametric and non-parametric tests. The parametric tests are more powerful but they depend on the parameters or characteristics of the population.. They are based on the following assumptions;

* The observations or values must be independent

* The samples are drawn on a random basis.

* The populations should have equal variances

* The data should be at least at interval level so that arithmetic operations can be used.

The important parametric tests are ; The z-test, the t-test, and the F-test. They are explained below:

The Z-test:

It is based on the normal distribution; it is widely used for testing the significance of several statistics such at mean, median, mode, coefficient of correlation and others. The relevant test statistic, z is calculated and compared with its probable value (to be reads from the normal distribution table) at a specified level of significance for judging the significance of the measured concerned.

The t-test:

It is suitable for testing the significance of a sample mean or for judging the significance of difference between the means of two samples. The t-test can also be used for testing the significance of the co-efficient of simple and partial correlations. The relevant test statistic, t, is calculated from the sample data, it is compared with its corresponding critical value in the t-distribution table for rejecting or accepting null hypothesis.

The F-test:

The F test is used to compare the variances of two independent samples. It is also used in analysis of variance (ANOVA) for testing the significance of more than two sample means at a time .It is also used for judging the significance of multiple correlation coefficients

NON PARAMETRIC TESTS:

Most of the statistical test requires an important assumption into be met if they are not correctly applied. This assumption is that population of data from which a samples are drawn is normally distributed. But there are some situations when the researcher cannot or does not want to make such assumption. In such situations we use statistical methods for testing hypothesis, which are called non-parametric tests because such tests do not depended on any assumption about the parameters of the parent population.

ADVANTAGES:

* They do not require any assumption to be made about population following normal or any other distribution*

* Simple to understand and easy to apply when the sample sizes are small.

* Most non-parametric tests do not require lengthy computations.

* It is less time-consuming

* Non-parametric tests are applicable for all types of data

* It makes possible to work with very small samples.

DISADVATAGES:

* They ignore a certain amount of information

* They are not considered as efficient as parametric test

The important nonparametric tests are the chi-square test the median test the Mann-whitney U test the sign test, the Wilcoxin matched-pairs test and Kolmogorow Smirnov test.

MEASUREMENT AND SCALING TECHNIQUES

MEASUREMENT:

Measurement may be defined as the assignment of numeral to characteristics of object, persons events according to rules.

SCALES:

The instrument with the help of which a concept is measured is called a scale. A scale ha a wide range of application is social science research. It is used in all types of data collection techniques such as observation, interview, projective techniques etc/.

Scaling provides the procedures if assigning numbers to various degrees of opinion, attitude and other concepts. Normally this takes place in two ways:

Making judgment about some characteristics of an individual are then directly placing him on a scale.

Constructing a questionnaire in such a way that the score of individual responses assign him a place on a scale.

IMPORTANT SCALING TECHNIQUES RATING METHOD:

In rating scale, the rater makes a judgment about some characteristics of a subject and places him directly on some point on the scale. These scales can be either discrete or continuous.

(a) Discrete Scales:

These scales are used for raising ordinal data about on object. In these scales two or more categories are provided representing discrete amount of some characteristics. The rater can tick the category which he feels best describes the person of object being rated. Thus for examples, the characteristics job knowledge may be divided into five categories on a discrete scale thus

Exceptionally good Above average Average Below average Poor

(b) Continuous graphic scales:

These scales are used for raising interval data about an object. In these scales just above the category notation, an uninterrupted line is provided. The rate can tick anywhere along its length as shown below.

Both these types of rating scales can use three kinds of standards for measuring a characteristic or alphabetical, descriptive and behavior.

ATTITUDE SCALE:

Attitude scale are carefully constructed set of rating scales designed to measure one or more aspects of an individual*s group*s attitude some object. The individual*s responses to the various scales may be aggregated or summed to provide a single attitude for the individual the following are the four types of Attitude scales.

LIKER’T SUMMATED SCALE;

Summated scales consist of a number of statements which express either a favorable or unfavorable attitude towards the given object to which the respondent is asked to react. The respondent will tick his opinion, either favorable or unfavorable the each statement in the instrument. The responses will give a numerical score indicating its favourableness or unfavourableness and he scores are totaled to measure the respondent*s attitude. In other words the overall score represents the respondent*s positions.

In a Likert scale, normally a respondent will be asked to respond to each of the statements in terms of several degrees. Usually five degrees (but at time 3 to 7 may also be used) of agreement or disagreement. Suppose a researchers wants to examine whether one considers his job quite pleasant, the respondent may respond in any of the following ways:

strongly agree – agree – undecided – disagree – strongly disagree.

In the above scale, each points carries score, the response will be given weight or scores. The least score will be given to the least favorable degree of job satisfaction and the most favorable is given to the highest score.

Advantage:

* The Likert Type scale is easy to develop in comparison to thurstone type scale it can be performed without a panel judges.

* It is more reliable because under it. Respondents can answer each statements included in the instrument

* The likert type scale permits the use of statements that are not manifestly related to the attitude being studied.

* It can be used in a respondent-centered and stimulus centered studies I.e., it shows how response differ between people and also between stimuli.

* It requires less time to construct, it is frequently used by the students of opinion research

Limitations:

* These scales will indicate whether respondents are more or less favourable to a topic and they can not tell how much more or less they are.

* The interval between strongly agree and agree may not be equal to the interval between agree and undecided.

Thurstone Type Scale (differential scales)

Here, the selection of items is made by a panel of judges who evaluate the items in terms of whether they are relevant to the topic area and unambiguous in application. Here, the researcher adopts the following procedures:

* The researcher collects more differential statements, usually 20 or more, that express various points of view toward a group institution idea or practice.

* A panel of judges, will arrange them in 11 groups or piles ranging from one extreme to another in position. The judges will be asked to arrange generally in the first pile of the statements which he thinks are most unfavourable to the issue, in the second pile to place those statements which he thinks are next most unfavorable and he goes on doing so in this manner till in the eleventh pile he puts the statements which he considers to be the most favourable.

* The judges will sort out the items and when there is disagreement between the judges in assigning a position to an item that item will be left out.

* The panel will establish the median scale value between one and eleven*

Then, the researcher makes a final selection of statements, a sample of statements whose median scores are spread evenly from one extreme to other is taken. The statements so selected constitute the final scale to be administered to respondents.

The respondents will be asked to check the statements with which they agree.. The median value is worked out and this establishes their score or quantifies their opinion. It may be noted that is the actual instrument the statements are arranged in random order of scale value.

CUMULATIVE SCALES:

* It consists of series of statements to which a respondents express high agreement or dis agreement

* The statements are related to one another in such a way that an individual who replies favorable to item no.3 also replied favorable to no.2 & 3

* The individual score is worked out by counting the number of points concerning the number of statements he answered favorable

SEMANTIC DIFFERFENTIAL SCALES:

It is an attempt to measure the psychological meaning of an object to an individual.

It consists of a set of bipolar rating scales, usually 7 points by which one or more respondents rate one or more concept on each scale item.

DATA ANALYSIS: EDITING AND CODING OF DATA

EDITING

Once the data collection is complete, it is examined carefully to eliminate any errors or mistakes. For that purpose of editing of data becomes mandatory. Editing means to rectify or to set to order or to correct or to establish sequence. Persons with editing responsibility should be trained and experienced in this job. Editing is performed at two stages and depending on that it could be two types. Field editing and centralized editing

Field Editing: Field editing refers to the performance of the editing immediately in the field where data is collected. For example if the data is collected through questionnaire or schedule, then whether all the questions are answered or not whether writing is legible or not etc should be checked out after the collecting the questionnaire from the respondent in the field itself.

Centralized Editing: In this type of editing, editing is done by a person or a team after all the recorded questionnaires „ schedules are collected. So clearly it is not carried out on the field itself or immediately after the data are collected. In such editing normally the instructions regarding editing are printed and circulated to the person or the team doing the editing. This is only to ensure that there is uniformity in editing.

CODING

Coding is a practice which simplifies recording of answers. When standard answers for a question could be indicated, each answer is assigned a code. So instead of writing the answers in full, the investigator simply writes the code. This is not only saves times but also avoid confusing answers.

CLASSIFICATION

Classification of data means grouping the data on the basis of some common characteristics. Classified data can be used for specified purposes with ease. Further classification adds to clarity and helps to maintain consistency. Classification can be made on the basis of a) common characteristics like sex, literacy, colour, height, and weight etc. b) geographical regions like north, south, east west etc c) time oriented classification like monthly data, weekly data, yearly data, d) value based classification in which collected data are grouped e) reply based classification like no of people who answered yes to a question, no to a question etc.

TABULATION

Tabulation is the arrangement of classified in an orderly manner, In other words, it is a method of presenting the summarized data tabulation is very important because

* It conserves space*

* It avoid need for explanation*

* Computation of data is made easier

* Comparison of data becomes very simple

* Adequacy or inadequacy of the data is clearly visible

A table contains columns and row, these columns and rows create small boxes. Which are called cells. Tabulation has several rules and the most important ones are listed below:

• Every table should be numbered numbering could be in alphabet., Arabic or Roman

• Each table should have a distinct title

• Unit of measurement of the values in the table must be specified i.e. Rs. Crores, tones etc

• Each column should be titled.

• Each row must be titled

• Rows and columns are to be numbered

• Footnotes of the table should indicate the explanatory notes on the data in the table and the footnotes must be positioned below the table

• Data to be compared must be placed in adjacent columns

SIGNIFICANCE OF TABLES

It reduces the complexity of data and provides simplicity of presentations:

Generally the table removes unnecessary details and repetitions. They provide data systematically in columns and rows. It presents a very clear idea of what the table presents. Table provides a considerable saving in time taken in understandings what is represented by the data and hence all confusion is avoided.

It facilitates comparison:

Tables provide comparison. Generally table is divided into various parts and for each part there are totals and subtotals, the relationship between different parts of data can be studied much more easily with the help of a table than without it.

It gives identity to the data:

When the data are arranged in a table with a title and number they can be distinctly identified and can be used as a source reference in the interpretation of a problem.

It provides patterns:

Tabulation reveals patterns with the figures which can not be seen in the narrative form. It also facilitates the summation of the figures if the reader desires to check the totals.

Part of a table

TYPES OF TABLES:

Tables can be broadly classified to two categories: 1.Simple and complex frequency tables

2. General purpose and special purpose frequency tables.

SIMPLE AND COMPLEX FREQUENCY TABLES SIMPLE OR ONE WAY TABLE:

Here only characteristics is shown, this is the simple type of table. The following is the illustration of such a table.

TWO – WAY TABLE:

It shows two characteristics and is formed when either the sub or the caption is divided into two coordinate parts. The following example illustrates the nature of such a table

Number of employees in a Bank at Different age-groups according to sex

GENERAL PURPOSE AND SPECIAL PURPOSE FREQUENCY TABLES

These tables are called reference tables. They provide information for general use or reference. They usually contain detailed information and are not constructed for specific discussion

Number of Employees of a Bank according to Age-Groups, Sex and Ranks

TYPES OF DIAGRAMS USED IN IRESEARCH REPORT

Generally, the statistical results are presented through diagrams and graphs, We can see them in newspapers, magazines, journals, advertisements, etc. the statistical data may be displayed pictorially such as different types of diagrams, graphps and maps significance of Diagrams and Graphs:

1.They provide bird*s eye view of the entire data 2.They are attractive

3. They provide memorizing effect 4.They facilitate comparison of data

CHOICE OF SUITABLE DIAGRAM;

As regards the selection of the diagram to be drawn, several factors determine this. They are 1. Nature of data 2. The target audience for whom the diagram is drawn 3. The

volume of communication to be given 4. The facilities available to draw the diagram 5. Purpose of the representation 6. The size of the paper or the sanctioned size for the diagram etc. Based on these factors, the right type of diagram is selected.

Types of Diagram:

a. One dimensional diagrams e.g. bar diagrams

b. Two dimensional diagrams e.g rectangles, squares circles and pie diagrams

c. Three dimensional diagrams

(A) One Dimensional Diagrams or Bar Diagrams *

A bar diagram is thick line whose width is shown merely for attention, the merits of such diagrams are as follows

1. A reader can easily understand the subject matter

2. They are simplest and he easiest to make

3. For comparison of large numbers of items they are the only form that can e used effectively

Example for simple bar diagram:

Single bar diagram is the simplest of the bar diagram and is used frequently I practice for the comparative study of two or more items or values of a single variable or a single classification or category of data.

Suppose a simple diagram is to be drawn for the following data:

Country population: A B C D E F G

(In million) 20 50 68 43 65 25 40

Examples for multiple bar diagram:

If two or more sets of inter related variables are to be presented graphically, multiple bar diagram are used. The technique of drawing multiple bar diagram is basically same as that of drawing simple bar diagram. In this type of diagramme, the data given for each year is draw together. As a result for each year there will be a number of bars drawn which are attached to each other.

Percentage bars:

This type of diagram in which all the given data for each year is converted into percentage. Then for each year one bar is drawn for 100%. This can be understood from the example given below

Deviation bars:

Deviation bars are specially useful for graphical presentation of net quantities i.e surplus of deficit e.g., net profit or net loss net of imports and exports which have positive and negative values. This could be explained with the following example.

Rectangles:

A rectangle is a two-dimensional diagram because it is based on the area of principle. Just like bars, the rectangles are placed side by side, proper and equal spacing being given different rectangles, in fact, rectangle diagrams are modified from of bar diagrams and give more detailed information than is furnished by bar diagrams.

Square Diagrams:

Among the two dimensional diagrams, squares are specially useful if it is desired to compare graphically the values or quantities which differ widely from one another.

Circles:

Circle diagrams are alternative to square diagrams and are used for the same purpose.

Pie diagram:

A pie diagram will show how the expenditure of the government is distributed over different heads like agricultural, irrigation, industry, transport etc. A pie diagram can show how the expenditures incurred by an industry under different heads like raw materials, wages and salaries, selling and distribution expenses etc., Pie diagrams are used while making comparison on a percentage basis and not on an absolute basis. When pie diagrams are constructed on a percentage basis percentage can be presented by circles of equal in size.

(B) TWO DIMENSIONAL DIAGRAMS:

In the one dimensional diagrams only the length of the bar is taken in to account. Whereas in two dimensional diagrams the length as well as the width of the bar is considered, thus the area of the bar represents the given data.

(A) Rectangles

(B) Square Diagrams

( C) Three Dimensional Diagrams; Pictographs and Cartograms:

Pictographs are not abstract presentation such as lines or bars but really depict the kind of data we are dealing with. Pictures are attractive and easy to comprehend and as such this method is particularly useful in presenting statistics to the layman.

Cartograms or statistical maps are used to give quantitative information on a geographical basis. They represent spatial distribution. The quantities on the map can be shown in many ways, such as through shaded or color by dots, by placing pictograms in each geographical unit and by placing the appropriate numerical figure in each geographical unit.

GRAPHICAL REPRESENTATION OF DATA

Diagrams furnish only approximate information and are not much utility to a statistician from analysis point of view. On the other hand, graphs are more obvious, precise and accurate than diagrams and can be effectively used for further statistical analysis. They can broadly classified under the following two heads:

i. Graphs of frequency distributions

ii. Graphs of Time series

GRAPHS OF FREQUENCY DISTRIBUTIONS:

Frequency graphs are designed to reveal clearly the characteristic features of a frequency data. Such graphs are more appealing to the eye than the tabulated data and are readily perceptible, to the mind. They facilitate comparative study of two or more frequency distributions regarding their shape and pattern. The most commonly used graphs for charting a frequency distribution for the general understanding of the detail of the data are:

HISTOGRAM:

It is one of the most popular and commonly used devices for charting continuous frequency distributions, no matter whether the variable under study is discrete or continuous.

FREQUENCY POLYGON:

It is another device of graphic presentation of a frequency distribution . It facilitates comparison of frequency distribution, Frequency polygon is drawn from the histogram or without histogram.

DATA ANALYSIS

Analysis of data is considered to be highly skilled and technical job which should be carried out .Only by the researcher himself or under his close supervision. Analysis of data means

critical examination of the data for studying the characteristics of the object under study and for determining the patterns of relationship among the variables relating to it’s using both quantitative

and qualitative methods.

Purpose of Analysis

Statistical analysis of data saves several major purposes.

1. It summarizes large mass of data in to understandable and meaningful form.

2. It makes descriptions to be exact.

3. It aids the drawing of reliable inferences from observational data.

4. It facilitates identification of the casual factors unde3rlyiong complex phenomena

5. It helps making estimations or generalizations from the results of sample surveys.

6. Inferential analysis is useful for assessing the significance of specific sample results under assumed population conditions.

Steps in Analysis

Different steps in research analysis consist of the following.

1. The first step involves construction of statistical distributions and calculation of simple measures like averages, percentages, etc.

2. The second step is to compare two or more distributions or two or more subgroups within a distribution.

3. Third step is to study the nature of relationships among variables.

4. Next step is to find out the factors which affect the relationship between a set of variables

5. Testing the validity of inferences drawn from sample survey by using parametric tests of significance.

Types of Analysis

Statistical analysis may broadly classified as descriptive analysis and inferential analysis

Descriptive Analysis

Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data or the quantitative description itself. In such analysis there are univariate analysis bivariate analysis and multivariate analysis.

• Univariate analysis

• Univariate analysis involves describing the distribution of a single variable, including its central tendency (including the mean, median, and mode) and dispersion (including the range and quartiles of the data-set, and measures of spread such as the variance and standard deviation). The shape of the distribution may also be described via indices such as skewness and kurtosis. Characteristics of a variable's distribution may also be depicted in graphical or tabular format, including histograms and stem-and-leaf display.

• Bivariate analysis

• Bivariate analysis is one of the simplest forms of the quantitative (statistical) analysis. It involves the analysis of two variables (often denoted as X, Y), for the purpose of determining the empirical relationship between them. Common forms of bivariate analysis involve creating a percentage table or a scatter plot graph and computing a simple correlation coefficient

• Multivariate analysis.

• In multivariate analysis multiple relations between multiple variables are examined simultaneously. Multivariate analysis (MVA) is based on the statistical principle of multivariate statistics, which involves observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest

Inferential Analysis

Inferential statistics is concerned with making predictions or inferences about a population from observations and analyses of a sample. That is, we can take the results of an analysis using a sample and can generalize it to the larger population that the sample represents. Ther are two areas of statistical inferences (a) statistiacal estimation and (b) the testing of hypothesis.

Tools and Statistical Methods For Analysis

The tools and technique of statistics can be studied under two divisions of statistics.

(A)Descriptive Statistics

In descriptive statistics we develop certain indices and measures of raw data. They are;

1. Measures of Central Tendency

2. Measures of Dispersion

3. Measures of skeweness and kurtosis

4. Measures of correlation

5. Regression analysis

6. Index numbers

7. Time series analysis

8. Coefficient of association

1. Measures of Central Tendency.

The central tendency of a distribution is an estimate of the "center" of a distribution of values. There are different types of estimates of central tendency such as mean, median, mode, geometric mean, and harmonic mean.

2. Measures of Dispersion.

Dispersion refers to the spread of the values around the central tendency. There are two common measures of dispersion, the range and the standard deviation. It can be used to compare the variability in two statistical series

3. Measures of skewness and kurtosis

A fundamental task in many statistical analyses is to characterize the location and variability of a data set. A further characterization of the data includes skewness and kurtosis. Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails.

4. Measures of correlation

Correlation refers to any of a broad class of statistical relationships involving dependence. When there are two variables, the correlation between them is called simple correlation. When there are more than two variables and we want to study relation between two of them only, treating the others as constant, the relation is called partial correlation. When there are more than two variables and we want to study relation of one variable with all other variables together, the relation is called multiple correlations.

5. Regression analysis

Regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables.

6. Index numbers

An index is a statistical measure of changes in a representative group of individual data points. Index numbers are designed to measure the magnitude of economic changes over time. Because they work in a similar way to percentages they make such changes easier to compare.

7. Time series analysis

A time series is a sequence of data points, measured typically at successive points in time spaced at uniform time intervals. Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data./

8. Coefficient of association

Coefficient of association like, Yule’s coefficient, measures the extent of association between two attributes.

(B) Inferential Statistics

Inferential statistics deals with forecasting, estimating or judging some results of the universe based on some units selected from the universe. This process is called Sampling. It facilitates estimation of some population values known as parameters. It also deals with testing of hypothesis to determine with what validity the conclusions are drawn.

Ratios, percentages and averages

In statistical analysis Ratios, percentages and weighted averages play a very important role. Ratios show the relation of one figure to another. For example, if the total number of students in a school is 2000, and total number of teachers if\s 250, then the ratio between teachers and students is 250:2000. To make it percentage, multiply by 100.

Measures of central tendency (averages)

An average is a single significant figure which sums up characteristic of a group of figures. The various measures of central tendency are;

(1) Arithmetic mean

(2) Median

(3) Mode

(4) Geometric mean

(5) Harmonic mean

Arithmetic Mean

The Mean or average is probably the most commonly used method of describing central tendency. To compute the mean all you do is add up all the values and divide by the number of value.

Arithmetic mean =

Where x stands for an observed value,

n stands for the number of observations in the data set ∑x stands for all observed x values, and

stands for the mean value of x

For example, consider the test score values:

15, 20, 21, 20, 36, 15, 25, 15

The sum of these 8 values is 167, so the mean is 167/8 = 20.875.

Ex. 1 calculate mean from the following data

Value: 5		15			25	35	45	55	65	75
Freq:	1	20			25	24	12	31	71	52
				Values			frequency			Fx
			5				15			75
			15				20			300
			25				25			625
			65				24			840
			45				12			540
			55				31			1705
			65				71			4615
			75				52			3900
							250			12600

Arithmetic mean = ==12600/250=50.4

Ex. 2 calculate mean from the following data

Age	:	0-10	10-20 20-30		30-40	40-50	50-60 60-70 70-80
No of		: 15	30	53	75	100	110	115	125
Persons dying
SOLUTION:

	Age	f		Mid	fx
				value(x)
	0-10	15		5	75
	10-20	30		15	450
	20-30	53		25	1325
	30-40	75		35	2625
	40-50	100		45	4500
	50-60	110		55	6050
	60-70	115		65	7475
	70-80	125		75	9375
		623			31875

Arithmetic mean = = 31875/623 =51.16 years

Median

The Median is the score found at the exact middle of the set of values. One way to compute the median is to list all scores in numerical order, and then locate the score in the center of the sample.

For example, if there are 500 scores in the list, score #250 would be the median. It is also, called {(n + 1) ÷ 2} ^th value, where n is the number of values in a set of data.

Example

Imagine that a top running athlete in a typical 200-metre training session runs in the following times:

26.1, 25.6, 25.7, 25.2 et 25.0 seconds.

First, the values are put in ascending order: 25.0, 25.2, 25.6, 25.7, and 26.1. Then, using the following formula, figure out which value is the middle value. Remember that n represents the number of values in the data set.

Median= {(n+1) ÷2} th value = (5+1) ÷ 2 = 3

The third value in the data set will be the median. Since 25.6 is the third value, 25.6 seconds would be the median time.

= 25.6 secondes

Example

Now, if the runner sprints the sixth 200-metre race in 24.7 seconds, what is the median value now?

Again, you first put the data in ascending order: 24.7, 25.0, 25.2, 25.6, 25.7, 26.1. Then, you use the same formula to calculate the median time.

Median={(n+1)÷2} ^th value

(6+1)÷2

7÷2

= 3,5

Since there is an even number of observations in this data set, there is no longer a distinct middle value. The median is the 3.5^th value in the data set meaning that it lies between the third and fourth values. Thus, the median is calculated by averaging the two middle values of 25.2 and 25.6. Use the formula below to get the average value.

Average=(value below median + value above median)÷2

=(third value + fourth value)÷2

=25.2+25.6)÷2

=50.82

= 25.4

The value 25.4 falls directly between the third and fourth values in this data set, so 25.4 seconds would be the median

The various steps in the computations of median in a discrete series are as follows:

(i) Arrange the values in ascending or descending order of magnitude.

(ii) Find out the cumulative frequencies.

(iii) Find out the middle item by the formula N + 1/ 2

(iv) Now find out the value of (N + 1/2) ^th item. It can be found by first locating the cumulative frequency which is equal to or (N + 1/2) next higher to it, and then determining the value corresponding to it. This will be the value of the median.

Finding the Value of Median

Find out the value of median from the following data

Daily wages		10	5	7	11	8	.

Number of Workers	15	20	15	18	12

Solution: Calculation of median

Wages	Number of persons	Cumulative Frequency
in ascending order	(f)	(c.f.)
5	20	20
7	15	35
8	12	47
10	15	62
11	18	80

Median is the value of (N+1)/2)^th or ((80+1)/2)^th or 40.5th item.

All items from 35 onwards up to 47 have a value of 8. Thus the median value would be 8.

In the case of continuous frequency distribution, median class corresponds to the cumulative frequency which includes N/2. After getting median class find median by using the following interpolation formula.

Median, m = L₁ + [ (N/2 – CF) / f ]C

L₁ means lower boundary of the median class

N means sum of frequencies

CF means cumulative frequency before the median class. Meaning that

the class before the median class what is the frequency

f means frequency of the median class

C means the size of the median class

Find out the value of median from the following data

Class	: 0-10		10-20	20-30	30-40 40-50		50-60	60-70
Frequency:		8	12	20	23	18	7	2

	Class	Frequency		Cumulative frequency

	0-10		8	8
	10-20		12	20
	20-30		20	40
	30-40		23	63	n/2
	40-50		18	81
	50-60		7	88
	60-70		2	90
			90

Median=(N/2)^th item= size of(90/2)^th item= size of 45^th item

45 is included in the cumulative frequency 63. The class having cf 63 is 30-40

Therefore 30-40 is the median class

Applying interpolation formula

Median, m = L₁ + [ (N/2 – CF) / f ]C

Here L₁=30, N/2=45, cf=40, f=23, c=10

Median, = 30 + [ (45 – 40) / 23 ]10 =30+50/23 = 32.17

Mode

Mode is the value of the item of a series which occurs most frequently. According to Kenny ‘the value of the variable which occurs most frequently in a distribution is called a mode”.

In the case of individual series, the value which occurs more number of times is mode. For example, a set of students of a class report the following number of video movies they see in a month.

No of movies: 10,15,20,15,15,8

Mostly the students see 15 movies in a month. Therefore mode=15

When no item appears more number of items than others we say mode is ill defined. In that case, mode is obtained by the formula, mode= 3median-2mean

Ex: find mode from the values 40, 25, 60, 35, 81, 75, 90, 10

Ans: all items appear equal number of items. So mode is ill defined.

Therefore, mode= 3median-2mean

Mean=

416/8=52

Median= {(n + 1) ÷ 2} ^th value

^th

=size of c4.5 item= (40+60) / 2 =50

= (3*50)-(2*52)= 150-104=46

In the case of disrete frequency distribution, the value having highest frequency is taken as mode Ex: find mode

Size	: 5	8	10	12	29	35	40	46
No of items: 3		12	25	40	31	20	18	7

Ans: the value 12 has the highest frequency. Therefore 12 is the mode.

In the case of continuous frequency distribution, mode lies in the class having highest frequency.

From the model class, mode is calculated using interpolation formula.

Mode= L₁ + [ (f₁-f₀ ) c]/2f₁-f₀-f₂

Where, L₁is the lower limit of model class. f₀and f₂ are respectively the frequencies of class just preceding and succeeding model class, f₁ is the frequency of the model class.

Ex: calculate mode from the following data.
Size: 10-15		15-20	20-25 25-30		30-35	35-40	40-45	45-50
Freq:	4	8	18	30	20	10	5	2

Ans: Modal class is 25-30 since it has highest frequency.

Mode= L₁ + [ (f₁-f₀ ) c]/2f₁-f₀-f₂

= 25 + [ (30-18 ) 5]/260-18-20 =25+60/22 =27.73

Index Numbers

Index numbers are designed to measure the magnitude of economic changes over time. A statistic which assigns a single number to several individual statistics in order to quantify trends. Index numbers are the indicators of the various trends in an economy. Price index numbers indicate the position of prices whether they are rising or falling and at what rate. Similarly, index numbers regarding agricultural production indicates the trend of change whether it is rising or falling at what rate over a period of time. An index number is an economic data figure reflecting price or quantity compared with a standard or base value. The base usually equals 100 and the index number is usually expressed as 100 times the ratio to the base value. For example, if a commodity costs twice as much in 1970 as it did in 1960, its index number would be 200 relative to 1960. Index numbers are used especially to compare business activity, the cost of living, and employment.

An index number is specialized average. Index numbers may be simple or weighted depending on whether we assign equal importance to every commodities or different importance to different commodities according to the percentage of income spent on them or on the basis of some other criteria. In this chapter, we shall discuss both simple and weighted index numbers.

Simple and weighted index numbers

Simple index numbers are those in the calculation of which all the items are treated as equally important. Here items are not given any weight. Weighted index numbers are those in the those in the calculation of which each item is assigned a particular weight.

Price Index Numbers

Price index numbers measure changes in the price of a commodity for a given period in comparison with another period.

Various methods used for construction of Price index numbers

1) Simple Aggressive Method

This is the simplest method. The prices for base year and current year are only required. The aggregate of current year price is divided by aggregate of base year price and multiplied by 100.

i.e. ∑p_1÷ ∑p₀ *100 where, p₁ is the aggregate of price in the current year and p₀ is the aggregate no of prices in the base year.

Ex: for the data given below calculate simple index number

Commodities:	A	B	C	D	E
Price in 2008	5	8	12	25	3
Price in 2010	7	9	15	24	4

Ans: we take 2008 as base year and 2010 as current year, since 2008 is the back period

Commodities	Price in 2008(p₁)	Price in 2010 (p₀)
A	5	7
B	8	9
C	12	15
D	25	24
E	3	4
	53	59

Simple index number =∑p_1÷ ∑p₀ *100 =59/53*100 = 111.3

2) Simple Average Relative Method

In this method, price relative for each item is found out. Price relative is 1= current year price ÷ base year price * 100

The average of these relatives is found out. ie price index number =∑ I/n

Ex: for the data given below calculate simple index number by average relative method

Items		:	1	2	3	4	5
Price in base year		:	5	10	15	20	8
price in current year		:	7	12	25	18	9

Items	price in base year			price in current year				I==∑p_1÷ ∑p₀ *100
1	5					7		140.0
2	10					12		120.0
3	15					25		166.7
4	20					18		90.0
5	8					9		112.5
								629.2

Simple index number=∑ I/n = 629.2/5 = 125.84

3) weighted aggressive method

in this method weights are assigned to each item. The two well known methods used for assigning weights are known as Laspeyer’s method and Paasche’s method.

Laspeyer’s method: base year quantity is taken as weight.

Laspeyer’s index number = ∑p₁q₀/ ∑ p₀q₀*100

Paasche’s method. : current year quantity is taken as weight.

Paasche’s index number = ∑p₁q₁/ ∑p₀q₁*100

Prof. Irving Fisher has suggested a formula for the construction of index numbers.

Fisher’s index number = ∑ p₁q₀ ∑p₁q₁

∑p₀q₀ _× ∑p₀q₁

Ex: calculate (i) Laspeyer’s (ii) Paasche’s (iii) Fisher’s index numbers from the following data.

Commodity		price				quantity assumed
		2009	2010				2009		2010
A		0.80	0.70				10		11.0
B		0.85	0.90				8		9.0
C		1.30	0.80				5		5.5

Commodity	p₀	p₁	q₀	q₁	p1q1	p0q1	p1q0	p0q0

A	0.80	0.70	10	11	7.7	8.80	7.0	8.00
B	0.85	0.90	8	9	8.1	7.65	7.2		6.8
C	1.30	0.80	5	5.5	4.4	7.15	4.0		6.5
					202	23.6	18.2	21.3

Laspeyer’s index number = ∑p₁q₀/ ∑ p₀q₀*100

=(18.2/21.3)*100 = 85.45

Paasche’s index number = ∑p₁q₁/ ∑p₀q₁*100

= (20.2/23.6)*100 =85.59

Fisher’s index number=	∑ p₁q₀	∑p₁q₁

	∑p0q0 ×		∑p₀q₁

18.2 20.2 =85.5

21.3 × 23.6

4) Weighted Average Of Price Relative Method

In this method, we are using some arbitrary numbers as weight.

The formula is ∑IV/∑V where, ‘V ‘is the weight and I=(p_1/ p₀)*100

Calculate index number of price for 2009 on the basis of 2008

Commodity	weight		price (2008)			price(2009)
A	40		16			20
B	25		40			60
C	5		2			2
D	20		5			6
E	10		2			1
Ans:

Commodity	V	P₀		P₁	I	IV
A	40	16		20	125	5000
B	25	40		60	150	3750
C	5	2		2	100	500
D	20	5		6	120	2400
E	10	2		1	50	500
	100					12150

Index number for 2009 = ∑IV/∑V = 12150/100 = 121.5

Interpretation

Interpretation refers to the technique of drawing inference from the collected facts and explaining the significance of those inferences after an analytical and experimental study. It is a search for broader and more abstract means of the research findings. If the interpretation is not done very carefully, misleading conclusions may be drawn. The interpreter must be creative of ideas he should be free from bias and prejudice.

STATISTICAL INFERENCES

TESTING OF HYPOTHSIS USING DIFFERENT STATISTICAL METHODS

What is hypothesis: A hypothesis is an assertion or conjecture about the parameter(s) of population distribution(s).

Some basic concepts:

Null Hypothesis: Null hypothesis is the hypothesis which is tested for possible rejection under the assumption that it is true.

For example, in case of a single statistic, H₀ will be that the sample statistic does not differ significantly from the hypothetical parameter value and in the case of two statistics, H₀ will be that the sample statistics do not differ significantly.

Type - I and Type - II Errors: After applying a test, a decision is taken about the acception or rejection of null hypothesis vis - a - vis the alternative hypothesis. There is always some possibility of committing an error in taking a decision about the hypothesis. There errors can be two types.

Type I Error: Reject null hypothesis H₀ when it is true.

Type II Error: Accept null hypothesis H₀ when it is false.

These two types of error can be better understood with an example where a patient is given a inedicine to curve some disease and his condition is scrutinised for some time. It is just possible that the medicine has a positive effect but it considered that it has no effect or adverse effect. Thuse, it is the first kind of error or type I error. On the contrary, if the medicine has an adverse effect but in considered to have a positive effect, it is called the second kind of error or type - II error.

Level of Significance: It is the quantity of risk of type - I error which we are ready to tolerate in making a decision about H₀ . In other words, it is the probability of type - I error which is tolerable. The level of significance is denoted by and is conventionally chosen as 0.05 or 0.01. Level 0 01 is used for high precision and 0 05 for moderate precision.

Test of significance for single mean:

This section shows how to test the null hypothesis that the population mean is equal to some hypothesized value. For example, suppose an experimenter wanted to know if people are influenced by a subliminal message and performed the following experiment. Each of nine subjects is presented with a series of 100 pairs of pictures. As a pair of pictures is presented, a subliminal message is presented suggesting the picture that the subject should choose. The question is whether the (population) mean number of times the suggested picture is chosen is equal to 50. In other words, the null hypothesis is that the population mean (μ) is 50. The (hypothetical) data are shown in Table 1. The data in Table 1 have a sample mean (M) of 51. Thus the sample mean differs from the hypothesized population mean by 1.

Table 1. Distribution of scores.

Frequency
45
48
49
49
51
52
53
55
57

The significance test consists of computing the probability of a sample mean differing from μ by one (the difference between the hypothesized population mean and the sample mean) or more. The first step is to determine the sampling distribution of the mean. As shown in a previous section, the mean and standard deviation of the sampling distribution of the mean are

μ_M = μ

and

respectively. It is clear that μ_M = 50. In order to compute the standard deviation of the sampling distribution of the mean, we have to know the population standard deviation (σ).

The current example was constructed to be one of the few instances in which the standard deviation is known. In practice, it is very unlikely that you would know σ and therefore you would use s, the sample estimate of σ. However, it is instructive to see how the probability is computed if σ is known before proceeding to see how it is calculated when σ is estimated.

For the current example, if the null hypothesis is true, then based on the binomial distribution, one can compute that variance of the number correct is

σ² = Nπ(1-π) = 100(0.5)(1-0.5) = 25.

Therefore, σ = 5. For a σ of 5 and an N of 9, the standard deviation of the sampling distribution of the mean is 5/3 = 1.667. Recall that the standard deviation of a sampling distribution is called the standard error.

To recap, we wish to know the probability of obtaining a sample mean of 51 or more when the sampling distribution of the mean has a mean of 50 and a standard deviation of 1.667. To compute this probability, we will make the assumption that the sampling distribution of the mean is normally distributed. We can then use the normal distribution calculator as shown in Figure 1.

Figure 1. Probability of a sample mean being 51 or greater.

Notice that the mean is set to 50, the standard deviation to 1.667, and the area above 51 is requested and shown to be 0.274.

Therefore, the probability of obtaining a sample mean of 51 or larger is 0.274. Since a mean of 51 or higher is not unlikely under the assumption that the subliminal message has no effect, the effect is not significant and the null hypothesis is not rejected.

The test conducted above was a one-tailed test because it computed the probability of a sample mean being one or more points higher than the hypothesized mean of 50 and the area computed was the area above 51. To test the two-tailed hypothesis, you would compute the probability of a sample mean differing by one or more in either direction from the hypothesized mean of 50. You would do so by computing the probability of a mean being less than or equal to 49 or greater than or equal to 51.

The results of the normal distribution calculator are shown in Figure 2.

Figure 2. Probability of a sample mean being less than or equal to 49 or greater than or equal to 51.

As you can see, the probability is 0.548 which, as expected, is twice the probability of 0.274 shown in Figure 1.

Before normal calculators such as the one illustrated above were widely available, probability calculations were made based on the standard normal distribution. This was done by computing Z based on the formula

where Z is the value on the standard normal distribution, M is the sample mean, μ is the hypothesized value of the mean, and σ_M is the standard error of the mean. For this example, Z = (51-50)/1.667 = 0.60. Use the normal calculator, with a mean of 0 and a standard deviation of 1, as shown below.

Figure 3. Calculation using the standardized normal distribution.

Notice that the probability (the shaded area) is the same as previously calculated (for the one-tailed test).

As noted, in real-world data analyses it is very rare that you would know σ and wish to estimate μ. Typically σ is not known and is estimated in a sample by s, and σ_M is estimated by s_M. For our next example, we will consider the data in the "ADHD Treatment" case study. These data consist of the scores of 24 children with ADHD on a delay of gratification (DOG) task. Each child was tested under four dosage levels. Table 2 shows the data for the placebo (0 mg) and highest dosage level (0.6 mg) of methylphenidate. Of particular interest here is the column labeled "Diff" that shows the difference in performance between the 0.6 mg (D60) and the 0 mg (D0) conditions. These difference scores are positive for children who performed better in the 0.6 mg condition than in the control condition and negative for those who scored better in the control condition. If methylphenidate has a positive effect, then the mean difference score in the population will be positive. The null hypothesis is that the mean difference score in the population is 0.

Table 2. DOG scores as a function of dosage.

D0	D60	Diff
57	62	5
27	49	22
32	30	-2
31	34	3
34	38	4
38	36	-2
71	77	6
33	51	18
34	45	11
53	42	-11
36	43	7
42	57	15
26	36	10
52	58	6
36	35	-1
55	60	5
36	33	-3
42	49	7
36	33	-3
54	59	5
34	35	1
29	37	8
33	45	12
33	29	-4

To test this null hypothesis, we compute t using a special case of the following formula:

The special case of this formula applicable to testing a single mean is

where t is the value we compute for the significance test, M is the sample mean, μ is the hypothesized value of the population mean, and s_M is the estimated standard error of the mean. Notice the similarity of this formula to the formula for Z.

In the previous example, we assumed that the scores were normally distributed. In this case, it is the population of difference scores that we assume to be normally distributed.

The mean (M) of the N = 24 difference scores is 4.958, the hypothesized value of μ is 0, and the standard deviation (s) is 7.538. The estimate of the standard error of the mean is computed as:

Therefore, t = 4.96/1.54 = 3.22. The probability value for t depends on the degrees of freedom. The number of degrees of freedom is equal to N - 1 = 23. As shown below, the t distribution calculator finds that the probability of a t less than -3.22 or greater than 3.22 is only 0.0038. Therefore, if the drug had no effect, the probability of finding a difference between means as large or larger (in either direction) than the difference found is very low. Therefore the null hypothesis that the population mean difference score is zero can be rejected. The conclusion is that the population mean for the drug condition is higher than the population mean for the placebo condition.

Introduction Correlation

Modern business requires managers to make professionals decisions every day. Which should depend upon predictions of future event. To make better use of foreccast they rely on relationships (intuitive and calculated) between related events. If decision makers can determine the strength of relationship that exists between variables. It can aid the decision making process considerably.

Examples:

The relationship between the age of husband and age of wife, price of a commodity and the amount demanded, heights and weights of a group of persons, income and expenditure of a group of persons etc.

Meaning:

The term carrelation indicates the relationship between two such variables in which with change in the values of one variable. The values of the other variable also changes. According to croxton and cowden "when the relationship is of a quantitative nature the appropriate statistical tool for discovering and expressing it in a brief farmula is known as correlation".

Uses of Correlation:

The study of correlation is useful in practical life because of the reasons as follows: With the help of correlation analysis one can measure the degree of relationship that exists between the variable in one figure. From one variable we can estimate the other variable by the help of regression analysis only when we establish the variables are related. Correlation study helps us in identifying such factors which can stabilize a disturbed economic situation. Interrelationship studies between different variables are helpful tools in promoting research.

Scatter Diagram Method:

The scatter diagram is the simplest method of studying relationship between two variables. The simplest device for ascertaining whether two variables are related is to prepare a dot chart harizontal axis representing one variable and vertical axis representing the other. The diagram of dots so obtained is known as scatter diagram. From the scatter diagram we can form a fairly good though rough idea about the relationship between two variables. The following diagrams of the scattered data depiet different type of correlation.

INTERPRETATION OF CORRELATION:

REGRESSION MODELLING:

Regression is a method to mathematically formulate relationship between variables that in due course can be used to estimate, interpolate and extrapolate. Suppose we want to estimate the weight of individuals, which is influenced by height, diet, workout, etc. Here, Weight is the predicted variable. Height, Diet, Workout are predictor variables.

The predicted variable is a dependant variable in the sense that it depends on predictors. Predictors are also called as independent variables. Regression reveals to what extent the predicted variable is affected by the predictors. In other words, what amount of variation in predictors will result in variations of the predicted variable. The predicted variable is mathematically represented as YY. The predictor variables are represented as X1X1, X2X2, X3X3, etc. This mathematical relationship is often called the regression model.

Regression is a branch of statistics. There are many types of regression. Regression is commonly used for prediction and forecasting.

Regression Equation

Now that we know how the relative relationship between the two variables is calculated, we can develop a regression equation to forecast or predict the variable we desire. Below is the formula for a simple linear regression. The "y" is the value we are trying to forecast, the "b" is the slope of the regression line, the "x" is the value of our independent value, and the "a" represents the y-intercept. The regression equation simply describes the relationship between the dependent variable (y) and the independent variable (x).

y=bx+a

The intercept, or "a," is the value of y (dependent variable) if the value of x (independent variable) is zero, and so is sometimes simply referred to as the 'constant.' So if there was no change in GDP, your company would still make some sales. This value, when the change in GDP is zero, is the intercept. Take a look at the graph below to see a graphical depiction of a regression equation. In this graph, there are only five data points represented by the five dots on the graph. Linear regression attempts to estimate a line that best fits the data (a line of best fit) and the equation of that line results in the regression equation.

Linear Regression

It is one of the most widely known modeling technique. Linear regression is usually among the first few topics which people pick while learning predictive modeling. In this technique, the dependent variable is continuous, independent variable(s) can be continuous or discrete, and nature of regression line is linear.

Linear Regression establishes a relationship between dependent variable (Y) and one or more independent variables (X) using a best fit straight line (also known as regression line).

It is represented by an equation Y=a+b*X + e, where a is intercept, b is slope of the line and e is error term. This equation can be used to predict the value of target variable based on given predictor variable(s).

The difference between simple linear regression and multiple linear regression is that, multiple linear regression has (>1) independent variables, whereas simple linear regression has only 1 independent variable. Now, the question is “How do we obtain best fit line?”.

How to obtain best fit line (Value of a and b)?

This task can be easily accomplished by Least Square Method. It is the most common method used for fitting a regression line. It calculates the best-fit line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line. Because the deviations are first squared, when added, there is no cancelling out between positive and negative values.

We can evaluate the model performance using the metric R-square. Logistic Regression

Logistic regression is used to find the probability of event=Success and event=Failure. We should use logistic regression when the dependent variable is binary (0/ 1, True/ False, Yes/ No) in nature. Here the value of Y ranges from 0 to 1 and it can represented by following equation.

odds= p/ (1-p) = probability of event occurrence / probability of not event occurrence

ln(odds) = ln(p/(1-p))

logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk

Above, p is the probability of presence of the characteristic of interest. A question that you should ask here is “why have we used log in the equation?”.

Since we are working here with a binomial distribution (dependent variable), we need to choose a link function which is best suited for this distribution. And, it is logit function. In the equation above, the parameters are chosen to maximize the likelihood of observing the sample values rather than minimizing the sum of squared errors (like in ordinary regression).

Important Points:

Logistic regression is widely used for classification problems
Logistic regression doesn’t require linear relationship between dependent and independent variables. It can handle various types of relationships because it applies a non-linear log transformation to the predicted odds ratio
To avoid over fitting and under fitting, we should include all significant variables. A good approach to ensure this practice is to use a step wise method to estimate the logistic regression
It requires large sample sizes because maximum likelihood estimates are less powerful at low sample sizes than ordinary least square
The independent variables should not be correlated with each other i.e. no multi collinearity. However, we have the options to include interaction effects of categorical variables in the analysis and in the model.
If the values of dependent variable is ordinal, then it is called as Ordinal logistic regression
If dependent variable is multi class then it is known as Multinomial Logistic regression.

Thursday, June 18, 2020

Research Methodology Notes

Linear Regression

How to obtain best fit line (Value of a and b)?

Important Points:

No comments:

Post a Comment

Problems of Non-Covid Patients and Health Care Services during Pandemic Period: A Micro level Study with reference to Chennai City, Tamilnadu

My Blog List

Followers