• Users Online: 100
  • Home
  • Print this page
  • Email this page
Home About us Editorial board Ahead of print Current issue Search Archives Submit article Instructions Subscribe Contacts Login 

Ahead of print publication  

Statistical methods used in medical research and cancer registries: A review

1 Department of Radiation Oncology, Sheri Kashmir Institute of Medical Sciences, SKIMS Soura, Srinagar, Jammu and Kashmir, India
2 Department of Internal Medicine, Sheri Kashmir Institute of Medical Sciences, SKIMS Soura, Srinagar, Jammu and Kashmir, India
3 Department of Medical Oncology, Sheri Kashmir Institute of Medical Sciences, SKIMS Soura, Srinagar, Jammu and Kashmir, India

Date of Submission14-Jun-2022
Date of Acceptance29-Aug-2022
Date of Web Publication02-Nov-2022

Correspondence Address:
Mushtaq Ahmad Sofi,
Department of Radiation Oncology, Sheri Kashmir Institute of Medical Sciences, SKIMS Soura, Srinagar, Jammu and Kashmir
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/jrcr.jrcr_36_22


Medicine is an ever-changing science. Thus, new knowledge is generated by research and clinical experience. Statistical methods used in medical research play a vital role in medical research to draw a meaningful conclusion about research. Analyzing data and interpreting results is the most exciting stage of research, but it is not possible for everyone. It is possible for those who is having deep knowledge and to know the applicability of statistical methods used in medical research. Commonly used statistical methods in medical research are descriptive and inferential statistical methods. In descriptive statistical methods, we describe our data by the organization of our data in the form of tabulation and diagrams, measures of central tendency, dispersion, condensation, and measures of correlation. In inferential statistics, we draw a meaningful conclusion whether our treatment or procedure used in medical research gives a fruitful outcome or not. It is possible only when we have a good knowledge and skill of statistical methods used in basic research and it allows our clinical researchers to draw accurate and reasonable conclusions. Statistics provides us with sound methods in collecting data about observing health-related events, which in turn helps us in summarizing and analyzing the results so as to draw valid inferences regarding the hypothesis of our research. During the research, scientists used different statistical methods such as independent t-test or Student's t-test and Chi-square test to compare the different treatments used in the experimental studies to check whether there was a significant difference in our treatment or not. The main role of a cancer registry is to capturing a clear and complete picture of the cancer burden. To show how confident the researchers are that the results did not happen by chance, they use confidence intervals. For example, 95% confidence means that the researchers are pretty sure that the result has not happened by chance. The motive and aim of my review article are only to aware the researchers to know the importance and applicability of these statistical methods used in medical research and cancer registries.

Keywords: Cancer registries, descriptive statistics, inferential statistics, statistical methods

How to cite this URL:
Dar NA, Tali TA, Gani BA, Sofi MA, Sofi SR, Khan NA, Najmi AM, Fir A, Ahmad SN. Statistical methods used in medical research and cancer registries: A review. J Radiat Cancer Res [Epub ahead of print] [cited 2022 Dec 4]. Available from:

  Introduction Top

Anyone who is involved in medical research should always keep in mind that science is a search for the truth and there is no room for bias or inaccuracy in statistical analyses or interpretation. However, data analysis must be undertaken in a careful and considered way by people who have an inherent knowledge of the nature of the data and of their interpretation. Any errors in statistical analyses will mean that the conclusions of the study may be incorrect. As a result, many journals may require reviewers to scrutinize the statistical aspects of submitted articles, and many research groups include statisticians who direct the data analyses. Analyzing data correctly and including detailed documentation are established markers of scientific integrity, which help other researchers, reach the same conclusions.

The father of epidemiology, John Snow during 1854 studied the cholera epidemic in London and demonstrated the association of epidemiological and statistical methods in medical research. Popularity gained after Bradford Hill's lectures were published as a series of articles in the Lancet and then in book form, principles of medical statistics.[1]

Biostatistics is the part of statistics as applied to biological areas. Biological laboratory experiments, medical research (including clinical research), and health services research all use statistical methods. The reason to study biostatistics than statistics is that:

  1. Some statistical methods are used more heavily in biostatistics than in other fields. For example, a general statistical textbook would not discuss the life table method of analyzing survival data—unlike it has importance in many biostatistical applications
  2. Examples are drawn from the biological, medical, and health-care areas: which helps us to have references in the specific field. It also helps you understand how to apply statistical methods to the specific field of research such as biological and health sciences
  3. The third reason for a biostatistical text is to teach the material to an audience of health professionals. In this case, the interaction between students and teachers, but especially among the students themselves, is of great value in learning and applying the subject matter.

The process of converting data into meaningful information requires a special approach called statistics. Statistics is the branch of methods for making wise decisions in the face of uncertainty. In other words, it can be defined as the collection, summarization, organization, analysis, and interpretation of numerical data. Biostatistics is the science that helps in managing medical uncertainties. It mainly consists of various steps such as generation of hypothesis, collection of data, and application of statistical analysis. Ample knowledge of biostatistics is important for research scholars, medical students, and nursing students so that they can design epidemiological study accurately and draws meaningful conclusions and inadequate knowledge of biostatistics leads to biased results.

Statistical methods help us in developing solutions to overcome complex questions in research and during the collection of data. Biostatistics has developed enormously in recent years, due to continuing advances in diverse biomedical fields. For example, new problems in biomedical research have led to the development of new statistical methodologies that would not otherwise have arisen, and at the same time have favored ingenious adaptations of classical statistical techniques to new contexts of applications.[2],[3]

The main role of statistics in research is to design research, analyze data, and draw meaningful conclusions. A meaningful conclusion can be drawn using proper statistical tests. Statistics also help to reduce the large volume of raw data which must be suitably reduced so that the same can be read easily and can be used for further analysis.[4]

The most important statistical methods used in basic research are summarized below:

  Descriptive Statistical Methods Top


Mean is the first and simplest measure of location. It is the most frequently used measure of location. It can be defined as the sum of observations divided by the number of observations. The most important drawback of mean is that it was affected by extreame values.[5]


Median is defined as the middle of observation. It divides the whole data into two equal parts one part comprising all the values less than the median and the second part comprising all the values greater than the median. The median is not affected by extreme values. Median is the only average used for dealing with the qualitative data. In the median, we have two cases odd and even. In the odd case, we arrange the distribution into ascending (descending) order and distribute the series into two parts and the middle one is median. In an even case, we arrange the distribution into ascending (descending) order and calculate the average between the two middle values and the middle value is median.[6]


Mode is the most frequently occurring value in a set of data. Mode is particularly useful in the study of popular sizes. Mode is the average to be used to find the ideal size in a series.[7]


Range is the simplest measure of dispersion. It can be defined as the difference between the two extreme items of series. The utility of range is that it gives us an idea of variability very quickly.

Range = (highest value of series-lowest value of series).

Standard deviation

The standard deviation (SD) is mostly used in research studies and is regarded as a very satisfactory measure of dispersion. SD can also be defined as the positive square root of the mean of the squared deviation of the values of mean. The SD describes how much individual measurement differs on the average from the mean.

  Statistical Inferential Method Top

The statistical inferential methods are used to draw meaningful inferences about the characteristics of the population using various inferential statistics such as independent t-test and Chi-square test to compare the significant impact using various treatments in research subjects. The fundamental principle by applying these inferential methods, we have to check the normality of the distribution, if our distribution follows normality, we have to go for parametric test (Student's t-test and analysis of variance [ANOVA]) and if not go for nonparametric test (Chi-square test, sign test, Wilcoxon signed-rank test, and Mann–Whitney U-test). A thumb rule to check the normality of the distribution is based upon the mean and SD, if the mean of any distribution is 50 and its SD is >50% of the mean, then it does not follow normality, if its SD is <50% of the mean, then distribution follows normality. In this review, we will discuss some of the important parametric and nonparametric statistical tests mostly used in medical research and are discussed below.

  Student's t-test and Analysis of Variance Top

T-tests and ANOVA both are parametric tests and are extensively used in medical research to check the effectiveness of treatment in our research. Student's t-test is used when two independent groups are compared, and in Student's t-test, we have two types: (a) paired t-test and (b) unpaired t-test. ANOVA extends the t-test to more than two groups. Both methods are parametric as they assume the normality of the data and equality of variances across comparison groups. Both analyses are performed on log-transformed data and compare the means of the groups. A paired t-test (also known as a dependent or correlated t-test) is a statistical test that compares the averages/means and SDs of two related groups to determine if there is a significant difference between the two groups. An unpaired t-test (also known as an independent t-test) is a statistical procedure that compares the averages/means of two independent or unrelated groups to determine if there is a significant difference between the two.[8],[9],[10],[11]

Paired t-tests are used when the same item or group is tested twice, which is known as a repeated measures t-test. Some of the examples mentioned here such as (a) before and after effect of pharmaceutical treatment on the same group of people, (b) body temperature using two different thermometers on the same group of participants, and (c) standardized test results of a group of students before and after a study course.

An unpaired t-test is used to compare the mean between two independent groups. Examples of appropriate instances during which an use an unpaired t-test is used such as (a) research during which there are two independent groups, such as women and men, that examines whether the average bone density is significantly different between the two groups[12],[13],[14] [Table 1].
Table 1: Paired versus unpaired table

Click here to view

ANOVA is also a parametric test used to compare the variance across the means of different groups. When data are normally distributed, Student's t-test can be used to assess the significance of the means of the sample. To compare the difference between three or more independent groups simultaneously, a parametric test called ANOVA can be used. When there is only one qualitative variable which defines the groups, a one-way ANOVA is performed.

For example, to study the effectiveness of different diabetes medications, scientists design an experiment to explore the relationship between the type of medicine and the resulting blood sugar level. The sample population is a set of people. We divide the sample population into multiple groups, and each group receives a particular medicine for a trial period. At the end of the trial period, blood sugar levels are measured for each of the individual participants. Then for each group, the mean blood sugar level is calculated. ANOVA helps to compare these group means to find out if they are statistically different or not.

Nonparametric test means distribution does not follow the normality of the distribution. A nonparametric test in statistics does not mean that you do not know nothing about the population, but it usually means that population data does not follow a normal distribution. The rule of thumb is (a) for nominal or ordinal scale use a nonparametric test and (b) for interval or ratio scale use a nonparametric test.

Chi-square test

It is a nonparametric test and is used to find the association between variables. It is used for categorical variables and also used in continuous variables by making intervals.

Sign test

It is the simplest nonparametric test and estimates the median of the population and compares it to the reference value or target value. In the sign test, we use signs positive (+) and negative (−) to every observation. When the reference value is less than the observed value, the plus sign will be used and when the reference value is greater than the observed value, the negative sign will be used. Moreover, when the reference value is equal to the observed value, it will be eliminated.

Wilcoxon signed-rank test

Estimate the population median and compare it to a reference/target value and assumes your data comes from a symmetric distribution (like the Cauchy or uniform distribution).

Mann–Whitney test

Compare differences between two independent groups when dependent variables are either ordinal or continuous.

  Cancer Registries Top

Projection of cancer incidence is essential for planning cancer control actions, health care, and allocation of resources. The incidence or the projection of cancer is made from the cancer registry. The cancer registry is an organization for the systematic collection, storage, analysis, interpretation, and reporting of data on subjects with cancer. There are two types of cancer registries which are available, namely population-based cancer registry (PBCR) and hospital-based cancer registry (HBCR).

PBCR systematically collects data on all new cases of cancer occurring in a well-defined population, from multiple sources such as government hospitals, private hospitals, nursing homes, clinics, diagnostic laboratories, imaging centers, hospices, and registrars of births and deaths. The coverage is about 10% of the population in India. National Cancer Registry Program (NCRP) started with a network of three PBCRs in Bangalore, Chennai, and Mumbai and three HBCRs in Chandigarh, Dibrugarh, and Thiruvananthapuram. The number of registries working under the program has expanded greatly from the time of inception and presently, there are 36 PBCRs and 236 HBCRs registered under NCRP.

Since cancer is not a notifiable disease, cancer registration in India is active and staff of all registries visit hospitals, pathology laboratories, and all other sources of registration of cancer cases on a routine basis. Death certificates are also scrutinized from the local government units such as municipal corporations and Panchayat Raj Institutes, and information is collected on all cases where cancer is mentioned as a cause of death on the death certificates.[15],[16],[17],[18],[19],[20],[21]

  Definitions, Statistical Terms, and Methods Top

Cancer registration

It is the process of continuing, systematic collection of data on the occurrence, and characteristics of reportable neoplasms with the purpose of helping to assess and control the impact of malignancies on the community.

Cancer case

All neoplasms with a behavior code of “3” as defined by the International Classification of Diseases for Oncology, Third edition are considered reportable.

Cancer registry

Is the office or institution which attempts to collect, store, analyze, and interpret data on persons with cancer.

Population-based cancer registries

Systematically collect information on reportable neoplasms from multiple sources in a geographically defined population residing in the area for 1 year.

Hospital-based cancer registries

These registries are concerned with recording the information on the treatment, management, and outcome of cancer patients registered in a particular hospital.

Sources of registration

Hospitals or cancer centers are the sources of registration for cancer registries.

Data processing

The data processing means checking the quality of the data that may have been committed during the data submission. These errors must be rectified during the data processing and further data analysis will be done.

Crude incidence rate refers to the new cases of cancer in a particular year by division of the total number of cancer cases by the corresponding estimated population (midyear) and multiplying by 100,000.

Age-specific rate (ASpR) refers to the rate obtained by division of the total number of cancer cases by the corresponding estimated population in that age group and gender/site/geographic area/time period and multiplying by 100,000.

Age-adjusted rate (AAR) or age-standardized rate cancer incidence increases as age increases. Therefore, the higher the proportion of the older population, the higher is the number of cancers. Most developed and western countries have a higher proportion of the older population. Hence, to make rates of cancer comparable between countries, a world standard population that takes this into account is used to arrive at AAR or age-standardized rates. This is calculated according to the direct method (Boyle and Parkin, 1991) by obtaining the ASpRs and applying these rates to the standard population in that age group.[22],[23],[24],[25],[26]

  Conclusion Top

The purpose of this article is to aware scientist working in medical centers to know the importance of statistical methods used in medical research and cancer registries. Good knowledge of statistical methods gives us an adequate analysis of the data.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.

  References Top

Hill AB. Principles. In: Hill AB, editor. Principles of Medical Statistics. 1st ed. London: Lancet, Oxford University Press; 1937. p. 189.  Back to cited text no. 1
DeMets DL, Stormo G, Boehnke M, Louis TA, Taylor J, Dixon D. Training of the next generation of biostatisticians: A call to action in the U.S. Stat Med 2006;25:3415-29.  Back to cited text no. 2
Zelen M. Biostatisticians, biostatistical science and the future. Stat Med 2006;25:3409-14.  Back to cited text no. 3
Sprent P. Statistics in medical research. Swiss Med Wkly 2003;133:522-9.  Back to cited text no. 4
Perrie A, Sabin C, editors. Describing data. In: Medical Statistics at Galance. UK: Blackwell Science Ltd.; 2000. p. 16-9.  Back to cited text no. 5
Kuzma JW, Bohnenblust SE, editors. Summarizing Data: Basic Statistics for the Science. London: Mayfield Publishing Company; 2001 p. 44-54.  Back to cited text no. 6
Manikandan S. Measures of central tendency: Median and mode. J Pharmacol Pharmacother 2011;2:214-5.  Back to cited text no. 7
[PUBMED]  [Full text]  
Bewick V, Check L, Ball J. Statistics review 10: Further non-parametric methods. Crit Care 2004;8:196-9.  Back to cited text no. 8
Altman DG, Bland JM. Parametric v non-parametric methods for data analysis. BMJ 2009;338:a3167.  Back to cited text no. 9
Kaur SP. Variables in research. Indian J Res Rep Med Sci 2013;4:36-8.  Back to cited text no. 10
Ali Z, Bhaskar SB. Basic statistical tools in research and data analysis. Indian J Anaesth 2016;60:662-9.  Back to cited text no. 11
[PUBMED]  [Full text]  
Magnello ME karl person and the origin of modern statistics : An elastician becomes statistician, Rutherford J 2005-2006;1. Avaliable from: [Last accessed on 2022 Jun 12].  Back to cited text no. 12
Rana R, Singhal R. Chi-square test and its application in hypothesis testing. J Pract Cradiovasc Sci 2015;1:69-71.  Back to cited text no. 13
Acheson ED. Medical Record Linkage. London: Oxford University Press; 1967.  Back to cited text no. 14
American Cancer Society. Manual of Tumor Nomenclature and Coding. Washington, DC: American Cancer Society; 1951.  Back to cited text no. 15
Baker RJ, Nelder JA. The GLIM System Release 3: Generalized Interactive Linear Modelling. Oxford: Numerical Algorithms Group; 1978.  Back to cited text no. 16
Barclay TH. Canada, Saskatchewan. In: Waterhouse J, Muir CS, Correa P, Powell J, editors. Cancer Incidence in Five Continents. Vol. III (IARC Scientific Publications No. 15). Lyon: International Agency for Research on Cancer; 1976. p. 160 3.  Back to cited text no. 17
Powell J, eds, Cancer Incidence in Five Continents, Volume III (IARC Scientific Publications No. 15), Lyon, International Agency for Research on Cancer, 1976. p. 160-3.  Back to cited text no. 18
Danish Cancer Registry. Cancer Incidence in Denmark 1981 and 1982. Copenhagen: Danish Cancer Society; 1985.  Back to cited text no. 19
Danish Cancer Society Danish Cancer Registry. Cancer Incidence in Denmark 1984. Copenhagen: Danish Cancer Society; 1987.  Back to cited text no. 20
National Cancer Registry Programme (ICMR). Time Trends in Cancer Incidence Rates 1982-2010. Bangalore: National Cancer Registry Programme (ICMR); 2013.  Back to cited text no. 21
National Cancer Registry Programme (ICMR). Three-Year Report of Hospital Based Cancer Registries 2007-2011. Bangalore: National Cancer Registry Programme (ICMR); 2013.  Back to cited text no. 22
National Cancer Registry Programme (ICMR). Consolidated Report of Hospital Based Cancer Registries 2012-2014. Bangalore: National Cancer Registry Programme (ICMR); 2016.  Back to cited text no. 23
National Cancer Registry Programme (ICMR). Three-Year Report of Population Based Cancer Registries 2012-2014. Bangalore: National Cancer Registry Programme (ICMR); 2016.  Back to cited text no. 24
National Cancer Registry Programme (ICMR). A Report on Cancer Burden in North Eastern States of India 2012-2014. Bangalore: National Cancer Registry Programme (ICMR); 2017.  Back to cited text no. 25
National Cancer Registry Programme (ICMR). A Report on Cancer Burden in North Eastern States of India 2012-2016. Bangalore: National Cancer Registry Programme (ICMR); 2020.  Back to cited text no. 26


  [Table 1]


     Search Pubmed for
    -  Dar NA
    -  Tali TA
    -  Gani BA
    -  Sofi MA
    -  Sofi SR
    -  Khan NA
    -  Najmi AM
    -  Fir A
    -  Ahmad SN
    Access Statistics
    Email Alert *
    Add to My List *
* Registration required (free)  

  In this article
Descriptive Stat...
Statistical Infe...
Cancer Registries
Definitions, Sta...
Article Tables

 Article Access Statistics
    PDF Downloaded7    

Recommend this journal