piatok 21. novembra 2008

Comparing multiple proportions: The Marascuillo procedure

Comparing multiple proportions: The Marascuillo procedure





from NIST Handbook. includes syntax for calculation








BIOS nice overview of proportions / basics - equivalence of Pearson chi-square test and z-test





Macro from Minitab


This macro performs a multiple comparisons test on proportion data using # Tukey's honest significant difference test.



statistics for lawyers

google books

http://books.google.com/books?id=_bR9Q-EERlIC&pg=PA200&lpg=PA200&dq=comparing+multiple+proportions&source=bl&ots=PTb87Wa-gr&sig=PuEEGFcVwPlqSnhLUF6zXoXJ5p0&hl=en&sa=X&oi=book_result&resnum=5&ct=result#PPA201,M1

pondelok 10. novembra 2008

lognormal distribution - on different methods of calculation of confidence intervals




utorok 7. októbra 2008

SEQUENCING low price, individual genome

Low-Price Mapping SEQUENCING

companies developing/offering genome sequencing

- COMPLETE GENOMICS - 5000 $ /Reid, Radoje Drmanac
- applied biosystems
- Knome - cca 20 customers
- cf Systems Biology Institute as customer for complete genomics

sobota 20. septembra 2008

datamining

datamining in genetic/genomic research

finland Helsinki
DM in LD mapping ... haplotype association etc. PDF

PUBLICATIONS BY HANNU TOIVONEN, UNI helsinki

štvrtok 18. septembra 2008

datamining intro

datamining course materials from Australian national Uni,
- same as links below.
COURSE SLIDES -FROM slideshare.com

by prof Lanzi, some slides look identical to slides provided by ANU

- but generally looks good, COMPREHENSIBLE .....



DATAMINING, UNSUPERVISED RECORD LINKAGE.

Markus Hegland , australia

CHRISTEN
DATAMINING. CHALLENGES, MODELS, METHODS AND ALGORITHMS
year 2003
intro -- mainly from the standpoint of computation science.. algorithms
--
progream - FEBRL year 2008
Febrl - A Freely Available Record Linkage System with a Graphical User Interface Peter Christen Proceedings of the Australasian Workshop on Health Data and Knowledge Management (HDKM), Wollongong, January 2008.

- DISTANCE - EUCLIDEAN, PYTHAGOREAN ETC WOLFRAM MATH



ASSOCIATION RULES

- Support, confidence

Support gives total number of transaction of any particular item are occurring in datasets while confidence gives strength of a data in a dataset, we can say support is probability of A and B while confidence is conditional probability. Association rule based on these two characteristics.


pondelok 15. septembra 2008

c-statistic

____?????? I would like to know whether we can calculate C-statistic using SPSS >> 13.

> If by the "c-statisitc" you are referring to the measure of the > discriminative power of the logistic equation, you can calculate it by > saving the predictive probabilities from the logistic regression > analysis and running a ROC curve with the preditive probabilities as > the "test variable." The c-statistic is the area under the curve > value.


HTH

In R/S-Plus you can just use the lrm function in the Design package or:

mean.rank <- mean(rank(x)[y == 1]) c.index <- (mean.rank - (n1 + 1)/2)/(n - n1)
(where n1 is the number of observations with y=1)
or use the somers2 function in the Hmisc package.

Frank Harrell

utorok 9. septembra 2008

statistics general

Q&A by Seaman /course in psychology/


Newsom Portland UNI - handouts to course , SPSS examples


STATNOTES - WEBLECTURES. brief, explanatory, with formulas and examples of both manual calculation of basic tests and examples in SPSS.


semipartial correlation coefficient = Rsq change,,,,, semipartial = part correlation






Dr Newsom has nice disclaimer on his page : "DISCLAIMER: I am not always right."






nedeľa 7. septembra 2008


by Venter


cf dinococcus radiodurans -

sobota 6. septembra 2008

genetics statistics microarray spss

Genovese


//also see the article by authors of the method FDR /method for controlling multiple comparison problem / benjamini, hochberg from 1995




SPSS tutorials at texas ..

latex /superscripts/


SPSS macro basics at Raynald







Rodenburg

A framework to identify physiological responses in microarray-based gene expression studies: selection and interpretation of biologically relevant genes



A Bayesian Measure of the Probability of False Discovery in Genetic Epidemiology Studies
Jon Wakefield*



Comprehensive Analysis of Affymetrix Exon Arrays Using BioConductor
Michał J. Okoniewski, Crispin J. Miller *




streda 3. septembra 2008

SPSS statistics resources

flash/audiovisual tutorials with screenshots and narrated.

wide range of topics, including advanced ones like nonlinear, segmented, robust regression,

Box-Cox transformation, contour, surface plotting and much more....

*****************Uni TEXAS


utorok 2. septembra 2008

enrichment analysis - combining different sources of information




cf SW package GSEAP

nedeľa 31. augusta 2008

problems SPSS/automation

using SPSS OMS in split file mode. how to add labels/subtypes to create file with tables from all subgroups of splitting variable.


macro - cf learn - check german site


sobota 30. augusta 2008

repeated measures ANOVA, SPSS

RM ANOVA in SPSS from UCLA site

CRP cutoffs

review CRP in clinical practice on Medscape by Musunuru 08/2008

Laaksonen 2005 - C-reactive protein in the prediction of cardiovascular and overall mortality in middle-aged men: a population-based cohort study

streda 27. augusta 2008

power and sample size programs

Biostatistics Uni UCSF
list of statistical programs for different types of analyses - extensive

on estimation of power/ss for logistic regression in stata from UCLA

site - but rather simple calculations //by eugene demidenko //cf author of recent book on Mixed models

site2 - logistic regression with covariate with measurement ERROR, generalized linear models etc - ALSO GIVES PLOTS OF POWER, OR, SAMPLE SIZE

logistic regression

dr Friel

Prof. Moosbrugger Uni Frankfurt /German
http://user.uni-frankfurt.de/~moosbrug/lehre/kap0506/Logistische_Regression.ppt

maths algebra BtB

logarithms

some basic algebra

utorok 26. augusta 2008

reader genetics

Diabgen
Finland, Askenazi Jews, Pomerania Germans, English

rare SNP - value Gorlov 2007

power analysis for GWAS KLEIN 2008

exocrine pancreas reprogramming into betacells zhou 2008

ONLINE LEARNING

videolectures - biology, genetics, bioinformatics
eg.
http://videolectures.net/msht07_omont_gbba/

COMMENT: browsed videolectures.net yester and tried to watch cca 3 lectures on bioinformatics but the level of presentation was rather mediocre. topics interesting, but presented by some phd students in an uninteresting , hard-to-follow way.
but there is much more stuff, maybe i just didnt get representative sample, so hope to find something interesting.

OTHER SOURCE OF LECTURES/courses FROM MIT
opensource lectures from massachussets institute of technology.

pondelok 18. augusta 2008

Quantile regression

QUANTILE REGRESSION
BY ROGER KOENKER - GOOGLE BOOKS - ADVANCED MATHS
http://books.google.com/books?
id=Xi_dTAeAmGcC&pg=PA248&lpg=PA248&dq=quantile+regression+visualisation&source=web&ots=7JodXzWE8k&sig=0VWO5PTpFiAj7LaoARk3KkY8gok&hl=en&sa=X&oi=book_result&resnum=3&ct=result#PPA22,M1
REGRESSION ANOVA MODEL SELECTION

R_SQUARED preferably adjusted
F test
Mallow's C criterion

www-stat.stanford.edu/~jtaylo/courses/stats203/notes/selection.pdf by Jon Taylor

Larsen Stat course http://statmaster.sdu.dk/courses/st111/module08/index.html

Multiple regression -ppt. inc forward/backward/stepwise/best-subset regression, some maths

REGRESSION AND ANOVA TUTORIAL

NICE LIST OF LECTURES ON
REGRESSION AND ANOVA BY Dr Pia Veldt Larsen, Uni of Southern Denmark, Odense
- including regression model selection, residual analysis, factor analysis, logistic regression, and topics like transformation of variables etc.

streda 6. augusta 2008

statistics lectures, python, spss, R

a FANTASTIC COLLECTION of lectures on statistics - from basic concepts to advanced topics such as nonlinear regression. with examples.
- refers mainly to SPSS software, but this is marginal. MAINLY the mechanics behind the different tests is explained. VERY INSIGHTFUL.
dr FRIEL

******
seems like INTRO / TUTORIAL on the following topics.

ON PYTHON AND SPSS

on R - the open source statistical program "R"

utorok 29. apríla 2008

weighted least square regression

A technique for correcting the problem of heteroskedasticity by log-likelihood estimation of a weight that adjusts the errors of prediction

if some assumptions of ordinarly least square OLS regression do not hold.
.... in case of heteroscedasticity
.....unequal precision/reliability of datapoints

1. san Houston Uni - Weighted least squares regression XXX
http://www.shsu.edu/~icc_cmf/cj_789/weightedLeastSquares2.doc
- very good and detailed review of the technique, from the description of situations when to use WLS /ie if violation of homoscedasticity/ to detailed and very clear examples how to perform it /mainly applies to SPSS, the output is provided/.
- discusses both approaches on calculating weights 1/residualising the response variable 2/log-likelihood estimation of weights
- includes some basic mathematics, mostly well comprehensible college-level, formulas perfectly support the statements in text
/dr Charles M. Friel - author of this text provides lecture notes to a wide selection of statistical topics, comparable to Garson's Statnotes, in some topic I found dr Friel's explanations more poignant.
http://www.shsu.edu/~icc_cmf/directory.htm

2. statnotes - garson -including exemplification on SPSS ooutput - insightful, but not much regarding the mechanics of weight estimation / but maybe not so important/.

3. NIST Handbook - nice figures, comparison of WLS with alternative of nonlinear transformation of response and predictor variables.

piatok 11. apríla 2008

relationship between

median, mean, mode

in skewed distributions

violations in discrete distributions and multimodal continuous ADVANCED, but accessible

by Hippel 2005 , journal of statistical education

transformations

Transformations

Statnotes - transformations - as part of testing assumptions, by Garson
- very nice, some pictures


Dallal's example of transformation in linear REGRESSION
RULE of THUMB: first transform the response variable y to correct heteroscedasticity (heterogeneity of variance) than apply transformation to predictiors to achieve linear relationship (bivariate scatterplot)


similar recommendations by NIST
Transformations:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

NIST handbook of statistics
http://www.itl.nist.gov/div898/handbook/index.htm

from STATNOTES: To correct left (negative) skew, first subtract all values from the highest value plus 1, then apply square root, inverse, or logarithmic transforms.

Journal

chass

journal of statistical teaching and education ???

sobota 29. marca 2008

2 factor anova

from Dallals page Little Handbook of Statistical Practice

dairy farmer wished to determine which type of feed will produce the greatest yield of milk. From the research literature she is able to determine the mean milk output for each of the breeds she owns for each type of feed she is considering. As a practical matter, she can use only one type of feed for her herd.
Since she can use only one type of feed, she wants the one that will produce the greatest yield from her herd. She wants the feed type that produces the greatest yield when averaged over all breeds, even if it means using a feed that is not optimal for a particular breed. (In fact, it is easy to construct examples where the feed-type that is best on average is not the best for any breed!) The dairy farmer is interested in what the main effects have to say even in the presence of the interaction. She wants to compare
where the means are obtained by averaging over breed.
For the sake of rigor, it is worth remarking that this assumes the herd is composed of equal numbers of each breed. Otherwise, the feed-types would be compared through weighted averages with weights determined by the composition of the herd. For example, suppose feed A is splendid for Jerseys but mundane for Holsteins while feed B is splendid for Holsteins but mundane for Jerseys. Finally, let feed C be pretty good for both. In a mixed herd, feed C would be the feed of choice. If the composition of the herd were to become predominantly Jerseys, A might be the feed of choice with the gains in the Jerseys more than offsetting the losses in the Holsteins. A similar argument applies to feed B and a herd that is predominantly Holsteins.

utorok 4. marca 2008

Linar mixed models - resources

Linar mixed models - resources

West, Galecki: Practical Approach

Linear Mixed Models: A Practical Guide Using Statistical Software

http://books.google.com/books?id=LSJ__7lDSdgC&printsec=frontcover&dq=west+galecki&ei=nPzOR7f0OJy8zATGpqmwBQ&sig=rI8DjvFAgy8W-5lcgbZX0WZTiXY
book-accompanying site with datasets, code, errata etc
http://www-personal.umich.edu/~bwest/almmussp.html


books google: Designing Experiments and Analyzing Data....
http://books.google.com/books?id=h-bMhmQMifsC&printsec=frontcover&dq=designing+experiments+and+analyzing+data&ei=pxfQR6mWHJPAzATw04iwBQ&sig=DlC1utBSuL1magJSjkdDxa-_-xw#PRA2-PA145,M1
- provides explanation of both ANOVA based mixed models with univariate and multivariate approach, as well as linear mixed models in the frame of multilevels modeling (chapters ,,,,)
- uses some mathematics, but in aN understandable form.

LIST of from site dedicated to HIERARCHICAL modeling
http://www.hlm-online.com/books/


multilevel data analysis - from basics - WEBCAST by Leroux
uwtv.org Multilevel data analysis


www.geocities.com/joophox/publist/whenwhy.pdf


Gelman

Data Analysis Using Regression and Multilevel/Hierarchical Models



Eugen Demidenko LMM: Theory and Application

MULTILEVEL MODELING - SOFTWARE
http://www.cmm.bristol.ac.uk/learning-training/multilevel-m-software/index.shtml

MM and R
http://www.cmm.bristol.ac.uk/learning-training/multilevel-m-software/r.shtml
package lme (Pinheiro, Bates 2000)
more recent version - lmer, lme4 (2005)
www.r-project.org/doc/Rnews/Rnews_2005-1.pdf

for Splus, R by Fox (2002)
cran.r-project.org/doc/contrib/Fox-Companion/appendix-mixed-models.pdf

LINEAR MIXED MODELS
http://www2.chass.ncsu.edu/garson/pa765/multilevel.htm
no mathematical formulas
mainly refers to SPSS, but can can be useful as introduction for any platform

Hubbard Alan longitudinal data analysis
marginal models, generalized estimating equations GEE, with different link functions - Poisson regression etc.
provides lecture notes, and chapters from his textbook on the analysis of longitudinal data (quite readable, but at moments requires to think about vectors, matrix albebra etc., last chapter on Mixed models is not (yet?) available online), includes examples of code for STATA, SAS.



www.nyu.edu/its/socsci/Docs/SPSSMixed.ppt
comparison of mixed models with GLM - some formulas, many PRACTICAL points

www.spss.ch/upload/1107355943_LinearMixedEffectsModelling.pdf

www.spss.ch/upload/1126184451_Linear%20Mixed%20Effects%20Modeling%20in%20SPSS.pdf

gsic.syr.edu/manuals/spss/SPSS%20Advanced%20Models%2015.0.pdf

longitudinal data analysis - mixed models,

COURSES
ONLINE -
statistics.com - Galecki see above - also author of a textbook
www.statistics.com/ourcourses/mixedmodels/
also check -
http://www.hlm-online.com/books/

http://www.csc.fi/english/csc/courses/archive/GLM2008
september 2008 - helsinki - Mixed models in R

MAXIMUM LIKELIHOOD ESTIMATION
explained - BGIM - Purcell
http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_1.html


http://tigger.uic.edu/~hedeker/ml.html
SAS, SPSS

PPT
blog.case.edu/jjw17/2006/04/27/ZyzanskiSeminar4bPowerpoint.ppt

SURVIVAL mixed models for survival analysis
doi.wiley.com/10.1002/bimj.4710390102
jas.fass.org/cgi/reprint/77/E-Suppl_2/147.pdf
http://www.ingentaconnect.com/content/bpl/biom/2001/00000057/00000001/art00012
http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V8V-4KFV26C-1&_user=949847&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C000049130&_version=1&_urlVersion=0&_userid=949847&md5=8f89a4cc66b9fe62715703c0323305f0
http://www.wiwi.uni-bielefeld.de/~kauermann/survival/kauermann.pdf
http://links.jstor.org/sici?sici=0006-341X(200103)57%3A1%3C96%3AMMFSAW%3E2.0.CO%3B2-M

generalized linear model

online courses
XXXXXXX generalized - for quantitative, categorical, binary, count data XXXXXXXXXXXXX
http://genetics.agrsci.dk/statistics/courses/phd07/
in R

pondelok 4. februára 2008

EXCEL MACRO resources

links to some fora dealing with macros/VBA script for different tasks

eg hiding row/column depending on value in a cell

solution from allexperts.com

thread from ozgrid.com

piatok 1. februára 2008

piatok 25. januára 2008

webinars

http://www.beamyourscreen.com/EU/Welcome.aspx
beamyourscreen

SPSS for dummies

http://media.wiley.com/product_data/excerpt/48/04701134/0470113448.pdf

SPSS for dummies - 1st chapter- sample
by griffith

includes chapters on command syntax, Python , and VBS scripting.

štvrtok 24. januára 2008

python textbook

learning to progam PYTHON ale aj o VBScript a JavaScript by alan gauld
česky preklad !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
looks interesting, readable.



update How to Think Like a (Python) Programmer 2007
http://www.greenteapress.com/thinkpython/thinkPP.pdf
the update of the following book. ... by downey
How to Think Like a Computer Scientist 2002
Learning with Python
Allen Downey
Jerey Elkner
Chris Meyers


baldwin HTML - Python