Bot4.usSome make their own luck. We use a robot. Predictions About Blog Contact Books FAQ Glossary Twitter Feed Site Map Terms of Service Disclaimer Leadership Team Current GMT Time Android Web App 11 Currency Pairs: AUD/JPY AUD/USD EUR/AUD EUR/CHF EUR/GBP EUR/JPY EUR/USD GBP/USD USD/CAD USD/CHF USD/JPY Some Big Caps: C XOM AAPL MSFT IBM T WMT JNJ PG ORCL GOOG PFE KO Some ETFs: SPY IWM QQQ GLD SLV XLE EFA FXI Some Options: C AAPL SPY GLD Backtested Longterm Predictions: Barrons |
Site Map> Blog
Blog2011-09-21 Predicting Gains From Abnormal Moving Average SlopesOn September 13, 2011 I gave a presentation at the Bay Area Trading Systems Interest Group:http://www.meetup.com/batsig An enhanced version of that presentation is displayed below. SummaryI start off with a discussion about downloading a copy of Oracle RDBMS software and installing it on a laptop.Next I pick a stock we love or hate and I show how to download many years of CSV data for it from Yahoo to a laptop. Then I demonstrate loading that data into an Oracle table using a utility called SQL-Loader. Next I decide on a holding-period which we want to simulate. Probably one week would be a good choice. Then I demonstrate how to create a table from the CSV data which contains useful columns:
Then I demonstrate how to visualize mvg_avg_4wk_slope as a distribution (hint: it will look like a bell curve). Next I demonstrate how to visualize mvg_avg_4wk_slope and gain1week as points in a scatter plot. Next I answer the question, when mvg_avg_4wk_slope was more than two standard deviations from the mean, how did gain1week usually react? (hint: we will use Pearson's Correlation here) Next I discuss the question, what besides mvg_avg_4wk_slope could we mix with gain1week in a scatter plot? My goal is to find some calculated attribute of stock price which has a large correlation with gain1week (and thus be predictive). Oracle RDBMS SoftwareHow to Download Oracle:
Pick a stock, SPY, download many years of CSV data for itThe URL for SPY:http://ichart.finance.yahoo.com/table.csv?s=SPY I Demo Linux-wget-shell-command to download CSV for SPY into SPY.csv: oracle@z2:/pt/s/rluck/svmd/bp/cf$ oracle@z2:/pt/s/rluck/svmd/bp/cf$ wget --output-document=SPY.csv http://ichart.finance.yahoo.com/table.csv?s=SPY --2011-09-20 23:05:46-- http://ichart.finance.yahoo.com/table.csv?s=SPY Resolving ichart.finance.yahoo.com... 67.195.146.181 Connecting to ichart.finance.yahoo.com|67.195.146.181|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/csv] Saving to: `SPY.csv' [ <=> ] 246,771 308K/s in 0.8s 2011-09-20 23:05:47 (308 KB/s) - `SPY.csv' saved [246771] oracle@z2:/pt/s/rluck/svmd/bp/cf$ oracle@z2:/pt/s/rluck/svmd/bp/cf$ I Add a column to SPY.csv and then inspect it: oracle@z2:/pt/s/rluck/svmd/bp/cf$
oracle@z2:/pt/s/rluck/svmd/bp/cf$
oracle@z2:/pt/s/rluck/svmd/bp/cf$ cat SPY.csv | awk '{print "SPY,"$0}' | grep 0 > ystk_stage.csv
oracle@z2:/pt/s/rluck/svmd/bp/cf$
oracle@z2:/pt/s/rluck/svmd/bp/cf$
oracle@z2:/pt/s/rluck/svmd/bp/cf$ head ystk_stage.csv
SPY,2011-09-19,119.53,120.93,118.72,120.31,239978600,120.31
SPY,2011-09-16,121.29,121.97,120.32,121.52,284528300,121.52
SPY,2011-09-15,120.65,121.47,119.40,121.43,326777200,120.81
SPY,2011-09-14,118.34,120.80,116.72,119.37,319389500,118.76
SPY,2011-09-13,117.05,118.18,116.22,117.74,272514700,117.13
SPY,2011-09-12,114.47,116.76,114.05,116.67,305793500,116.07
SPY,2011-09-09,117.68,119.06,115.28,115.92,380195100,115.32
SPY,2011-09-08,119.57,120.94,118.77,119.04,250568200,118.43
SPY,2011-09-07,118.76,120.34,118.36,120.29,209803200,119.67
SPY,2011-09-06,114.39,117.16,114.38,116.99,285130500,116.39
oracle@z2:/pt/s/rluck/svmd/bp/cf$ tail ystk_stage.csv
SPY,1993-02-11,44.78,45.13,44.78,44.94,19500,32.21
SPY,1993-02-10,44.66,44.75,44.53,44.72,379600,32.05
SPY,1993-02-09,44.81,44.81,44.56,44.66,122100,32.01
SPY,1993-02-08,44.97,45.13,44.91,44.97,596100,32.23
SPY,1993-02-05,44.97,45.06,44.72,44.97,492100,32.23
SPY,1993-02-04,44.97,45.09,44.47,45.00,531500,32.25
SPY,1993-02-03,44.41,44.84,44.38,44.81,529400,32.11
SPY,1993-02-02,44.22,44.38,44.13,44.34,201300,31.78
SPY,1993-02-01,43.97,44.25,43.97,44.25,480500,31.71
SPY,1993-01-29,43.97,43.97,43.75,43.94,1003200,31.49
oracle@z2:/pt/s/rluck/svmd/bp/cf$
oracle@z2:/pt/s/rluck/svmd/bp/cf$
Load that data into an Oracle table using a utility called SQL-Loader.I Create a table:oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ which sqt /pt/s/bin/sqt oracle@z2:/pt/s/rluck/svmd/bp$ cat /pt/s/bin/sqt #!/bin/bash . /pt/s/oracle/.orcl sqlplus trade/t $@ oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ sqt @cr_ystk_stage.sql SQL*Plus: Release 11.2.0.1.0 Production on Tue Sep 20 23:23:09 2011 Copyright (c) 1982, 2009, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options 23:23:09 SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'; Session altered. Elapsed: 00:00:00.01 23:23:09 SQL> 23:23:09 SQL> -- 23:23:09 SQL> -- cr_ystk_stage.sql 23:23:09 SQL> -- 23:23:09 SQL> 23:23:09 SQL> DROP TABLE ystk_stage; Table dropped. Elapsed: 00:00:00.22 23:23:09 SQL> 23:23:09 SQL> 23:23:09 SQL> CREATE TABLE ystk_stage( 23:23:09 2 tkr VARCHAR2(11) 23:23:09 3 ,ydate DATE 23:23:09 4 ,opn NUMBER 23:23:09 5 ,mx NUMBER 23:23:09 6 ,mn NUMBER 23:23:09 7 ,clse0 NUMBER 23:23:09 8 ,vol NUMBER 23:23:09 9 ,clse NUMBER) 23:23:09 10 / Table created. Elapsed: 00:00:00.06 23:23:09 SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options oracle@z2:/pt/s/rluck/svmd/bp$ I Create a CTL file named ystk_stage.ctl (using editor): oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ cat ystk_stage.ctl LOAD DATA INFILE 'ystk_stage.csv' replace INTO TABLE ystk_stage FIELDS TERMINATED BY ',' TRAILING NULLCOLS ( tkr CHAR NULLIF (tkr=BLANKS) ,ydate DATE "YYYY-MM-DD" NULLIF (ydate=BLANKS) ,opn DECIMAL EXTERNAL NULLIF (opn=BLANKS) ,mx DECIMAL EXTERNAL NULLIF (mx=BLANKS) ,mn DECIMAL EXTERNAL NULLIF (mn=BLANKS) ,clse0 DECIMAL EXTERNAL NULLIF (clse0=BLANKS) ,vol DECIMAL EXTERNAL NULLIF (vol=BLANKS) ,clse DECIMAL EXTERNAL NULLIF (clse=BLANKS) ) oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ I Call SQL*Loader: oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ sqlldr trade/t bindsize=20971520 readsize=20971520 rows=123456 control=ystk_stage.ctl SQL*Loader: Release 11.2.0.1.0 - Production on Wed Sep 21 07:15:26 2011 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. Commit point reached - logical record count 4696 oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ grep loaded ystk_stage.log Table YSTK_STAGE, loaded from every logical record. 4696 Rows successfully loaded. 0 Rows not loaded due to data errors. 0 Rows not loaded because all WHEN clauses were failed. 0 Rows not loaded because all fields were null. oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ I Inspect the data: oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ sqt SQL*Plus: Release 11.2.0.1.0 Production on Wed Sep 21 07:17:29 2011 Copyright (c) 1982, 2009, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options 07:17:29 SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'; Session altered. Elapsed: 00:00:00.00 07:17:29 SQL> 07:17:29 SQL> SELECT tkr,ydate,clse FROM ystk_stage WHERE ydate > sysdate - 9 ORDER BY ydate; TKR YDATE CLSE ----------- ---------- ---------- SPY 2011-09-13 117.13 SPY 2011-09-14 118.76 SPY 2011-09-15 120.81 SPY 2011-09-16 121.52 SPY 2011-09-19 120.31 Elapsed: 00:00:00.00 07:17:42 SQL> SELECT tkr,MIN(ydate),MAX(ydate),MIN(clse),MAX(clse)FROM ystk_stage GROUP BY tkr; TKR MIN(YDATE) MAX(YDATE) MIN(CLSE) MAX(CLSE) ----------- ---------- ---------- ---------- ---------- SPY 1993-01-29 2011-09-19 31.11 144.2 Elapsed: 00:00:00.01 07:18:01 SQL> 07:18:02 SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options oracle@z2:/pt/s/rluck/svmd/bp$ I Use SQL to create a CSV file for R to plot the data: --
-- qry_spy_clse.sql
--
SET PAGES 5555 COLSEP ","
SELECT rownum rwnum,clse FROM
(
SELECT clse FROM ystk_stage ORDER BY ydate
);
SELECT TO_CHAR(ydate,'YYYY')year,clse price FROM ystk_stage ORDER BY ydate;
I Use R to plot the data (using shell command: R -f qry_spy_clse.r): # qry_spy_clse.r
# I use this file to read prices of SPY from qry_spy_clse.csv
# Then, I plot them.
# I build qry_spy_clse.csv by hand from qry_spy_clse.sql output
qry_spy_clse = read.csv("qry_spy_clse.csv")
# Define the .png file which will hold the output:
png("qry_spy_clse.png",width = 800, height = 1800)
# Declare the output will be in 2 rows and 1 column:
par(mfrow = c(2,1))
# Show the data as a sequence of about 5,000 prices ordered by day number after Jan 1993:
day.vector = 1:length(qry_spy_clse$PRICE)
plot(day.vector
,qry_spy_clse$PRICE
,main="Price Of SPY ETF In Dollars Between 1993-01-29 and 2011-09-19"
,xlab="Number Of Trading Days Since 1993-01-29"
,ylab="Price Of SPY ETF In Dollars"
,type="h"
,col="#444444"
,xlim=c(0,5000))
grid(col="black")
# Show the data as a boxplot grouped by year:
boxplot (qry_spy_clse$PRICE ~ qry_spy_clse$YEAR
,pch=20
,main="Box Plot of Price Of SPY Between 1993-01-29 and 2011-09-19 (Black Dots are Outliers)"
,xlab="Year"
,ylab="Price Of SPY ETF In Dollars"
)
grid(col="black")
dev.off()
I Show the 2 plots of the SPY prices from Yahoo. The first plot is just a simple time series of 4,696 prices. The 2nd plot is a box plot of the data grouped by year. For each year the box plot shows min, max, median, quartiles and outliers. It's obvious that 2011 has the most outliers and they are negative. I would have expected 2008 to have the most outliers but I see none in the box plot.
More information about both R and box plots can be found in wikipedia: http://en.wikipedia.org/wiki/Box_plot http://en.wikipedia.org/wiki/R_(programming_language) I demonstrate how to create a table from the CSV data which contains useful columnsoracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ sqt @cr_batsig.sql SQL*Plus: Release 11.2.0.1.0 Production on Wed Sep 21 08:16:11 2011 Copyright (c) 1982, 2009, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options 08:16:11 SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'; Session altered. Elapsed: 00:00:00.01 08:16:11 SQL> 08:16:11 SQL> -- 08:16:11 SQL> -- cr_batsig.sql 08:16:11 SQL> -- 08:16:11 SQL> 08:16:11 SQL> -- Get prices from past and future in same row as price1: 08:16:11 SQL> CREATE OR REPLACE VIEW batsigv1 AS 08:16:11 2 SELECT 08:16:11 3 tkr 08:16:11 4 ,ydate 08:16:11 5 ,LEAD(ydate,5,NULL)OVER(PARTITION BY tkr ORDER BY ydate) date1week_later 08:16:11 6 ,LAG(clse,1,NULL)OVER(PARTITION BY tkr ORDER BY ydate) price0 08:16:11 7 ,clse price1 08:16:11 8 ,LEAD(clse,5,NULL)OVER(PARTITION BY tkr ORDER BY ydate) price2 08:16:11 9 FROM ystk_stage 08:16:11 10 -- Prevent divide by 0 later on: 08:16:11 11 WHERE clse>0 08:16:11 12 / View created. Elapsed: 00:00:00.05 08:16:11 SQL> 08:16:11 SQL> -- Get mvg avg for today and yesterday. Their Difference is slope. 08:16:11 SQL> CREATE OR REPLACE VIEW batsigv2 AS 08:16:11 2 SELECT 08:16:11 3 tkr 08:16:11 4 ,ydate 08:16:11 5 ,date1week_later 08:16:11 6 ,price1 08:16:11 7 ,price2 08:16:11 8 ,(price2 - price1)/price1 gain1week 08:16:11 9 ,AVG(price1)OVER(PARTITION BY tkr ORDER BY ydate ROWS BETWEEN 4*5 PRECEDING AND CURRENT ROW) 08:16:11 10 -AVG(price0)OVER(PARTITION BY tkr ORDER BY ydate ROWS BETWEEN 4*5 PRECEDING AND CURRENT ROW)mas1 08:16:11 11 FROM batsigv1 08:16:11 12 / View created. Elapsed: 00:00:00.03 08:16:11 SQL> 08:16:11 SQL> DROP TABLE batsig; Table dropped. Elapsed: 00:00:00.11 08:16:11 SQL> CREATE TABLE batsig COMPRESS AS 08:16:11 2 SELECT 08:16:11 3 tkr 08:16:11 4 ,ydate 08:16:11 5 ,date1week_later 08:16:11 6 ,price1 08:16:11 7 ,price2 08:16:11 8 ,gain1week 08:16:11 9 ,mas1/price1 mvg_avg_slope 08:16:11 10 FROM batsigv2 08:16:11 11 WHERE price2 > 0 08:16:11 12 AND mas1 IS NOT NULL 08:16:11 13 / Table created. Elapsed: 00:00:00.20 08:16:12 SQL> 08:16:12 SQL> SELECT 08:16:12 2 tkr 08:16:12 3 ,ydate 08:16:12 4 ,date1week_later 08:16:12 5 ,price1 08:16:12 6 ,price2 08:16:12 7 ,mvg_avg_slope 08:16:12 8 ,gain1week 08:16:12 9 FROM batsig 08:16:12 10 WHERE ydate > sysdate - 22 08:16:12 11 / TKR YDATE DATE1WEEK_ PRICE1 PRICE2 MVG_AVG_SLOPE GAIN1WEEK ----------- ---------- ---------- ---------- ---------- ------------- ---------- SPY 2011-08-31 2011-09-08 121.59 118.43 -.00127282 -.02598898 SPY 2011-09-01 2011-09-09 120.32 115.32 -.002058 -.04155585 SPY 2011-09-02 2011-09-12 117.24 116.07 -.0009748 -.00997953 SPY 2011-09-06 2011-09-13 116.39 117.13 -.00125604 .006357935 SPY 2011-09-07 2011-09-14 119.67 118.76 .003179378 -.00760425 SPY 2011-09-08 2011-09-15 118.43 120.81 .000623233 .020096259 SPY 2011-09-09 2011-09-16 115.32 121.52 .001490676 .053763441 SPY 2011-09-12 2011-09-19 116.07 120.31 -.00027077 .03652968 8 rows selected. Elapsed: 00:00:00.02 08:16:12 SQL> 08:16:12 SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ Next I demonstrate how to visualize gain1week as a distributionoracle@z2:/pt/s/rluck/svmd/bp$
oracle@z2:/pt/s/rluck/svmd/bp$
oracle@z2:/pt/s/rluck/svmd/bp$ sqt @gdist
SQL*Plus: Release 11.2.0.1.0 Production on Wed Sep 21 08:20:48 2011
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
08:20:48 SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD';
Session altered.
Elapsed: 00:00:00.00
08:20:48 SQL>
08:20:48 SQL> --
08:20:48 SQL> -- gdist.sql
08:20:48 SQL> --
08:20:48 SQL>
08:20:48 SQL> -- Demonstrate how to visualize gain1week as a distribution.
08:20:48 SQL>
08:20:48 SQL> -- Model a Normal Distribution:
08:20:48 SQL>
08:20:48 SQL> SELECT
08:20:48 2 MIN(gain1week)
08:20:48 3 ,AVG(gain1week)
08:20:48 4 ,MAX(gain1week)
08:20:48 5 ,STDDEV(gain1week)
08:20:48 6 FROM batsig
08:20:48 7 /
MIN(GAIN1WEEK) AVG(GAIN1WEEK) MAX(GAIN1WEEK) STDDEV(GAIN1WEEK)
-------------- -------------- -------------- -----------------
-.19799692 .001725962 .194084507 .025151243
Elapsed: 00:00:00.07
08:20:48 SQL>
08:20:48 SQL> CREATE OR REPLACE VIEW batsigv10 AS
08:20:48 2 SELECT
08:20:48 3 tkr
08:20:48 4 ,ydate
08:20:48 5 ,price1
08:20:48 6 ,gain1week
08:20:48 7 ,ROUND(gain1week,1)g1wr1
08:20:48 8 ,ROUND(gain1week,2)g1wr2
08:20:48 9 FROM batsig
08:20:48 10 /
View created.
Elapsed: 00:00:00.05
08:20:48 SQL>
08:20:48 SQL> -- Gather some points for a plot.
08:20:48 SQL>
08:20:48 SQL> -- Start with a very rough histogram:
08:20:48 SQL>
08:20:48 SQL> SELECT
08:20:48 2 g1wr1
08:20:48 3 ,COUNT(g1wr1) gcount
08:20:48 4 FROM batsigv10
08:20:48 5 GROUP BY g1wr1
08:20:48 6 ORDER BY g1wr1
08:20:48 7 /
G1WR1 GCOUNT
---------- ----------
-.2 5
-.1 113
0 4466
.1 104
.2 2
Elapsed: 00:00:00.01
08:20:48 SQL>
08:20:48 SQL> -- Drill Down:
08:20:48 SQL>
08:20:48 SQL> SELECT
08:20:48 2 g1wr2
08:20:48 3 ,COUNT(g1wr2) gcount
08:20:48 4 FROM batsigv10
08:20:48 5 GROUP BY g1wr2
08:20:48 6 ORDER BY g1wr2
08:20:48 7 /
G1WR2 GCOUNT
---------- ----------
-.2 1
-.19 1
-.17 1
-.16 1
-.15 1
-.14 1
-.13 1
-.12 2
-.11 5
-.1 3
-.09 7
-.08 12
-.07 26
-.06 29
-.05 74
-.04 137
-.03 229
-.02 404
-.01 632
0 969
.01 938
.02 621
.03 296
.04 151
.05 71
.06 32
.07 16
.08 11
.09 6
.1 5
.11 4
.14 1
.16 1
.19 1
34 rows selected.
Elapsed: 00:00:00.01
08:20:48 SQL>
08:20:48 SQL>
08:20:48 SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
oracle@z2:/pt/s/rluck/svmd/bp$
oracle@z2:/pt/s/rluck/svmd/bp$
I use R to visualize the data from the second query. I plot a simple gain-frequency-histogram with a normal distribution overlayed on top. I show two standard deviations from the mean as two blue vertical lines. # gdist.r
# I use this file to read gdist.csv which was created from gdist.sql
# The data is a frequency count which I intend to display as a histogram.
# I build gdist.csv by hand from gdist.sql output
gdist = read.csv("gdist.csv")
# Define the .png file which will hold the output:
png("gdist.png",width = 800, height = 800)
plot(gdist$G1WR2
,gdist$GCOUNT
,type="h"
,lwd=4
,col="black"
,xlim=c(-0.2,0.2)
,main="Frequency Counts of Rounded Normalized Weekly Gains For SPY ETF (0.1 = 10% gain)"
,xlab="Normal Distribution(Avg:0.0017, Std.Dev.:0.025) on top of Rounded Normalized 1 Week Gains"
,ylab="Frequency Count"
)
grid(col="black")
# Now overlay another plot:
# http://pj.freefaculty.org/R/Rtips.html#5.11
par(new = TRUE)
g1wmin = -0.198
g1wmax = 0.194
g1wsd = 0.02515
g1wmean = 0.00171
x=seq(g1wmin,g1wmax,length=200)
y=dnorm(x,mean=g1wmean,sd=g1wsd)
plot(x,y,type="l",lwd=1,col="blue",xaxt="n", yaxt="n",xlab=" ",ylab=" ",xlim=c(-0.2,0.2))
# Draw vertical lines at mean and 2 std. dev. from mean:
abline(v = g1wmean, col="blue",lwd=2)
abline(v = g1wmean-2*g1wsd, col="blue",lwd=2)
abline(v = g1wmean+2*g1wsd, col="blue",lwd=2)
dev.off()
Interpretation: I see a long tail of negative gains and bias towards positive gains near the mean. Next I plot data from the tails. # gdist2.r
# I use this file to read gdist2.csv which was created from gdist.csv
# The data is a frequency count which I intend to display as a histogram.
# I build gdist2.csv by hand from gdist.csv
gdist2 = read.csv("gdist2.csv")
# Define the .png file which will hold the output:
png("gdist2.png",width = 800, height = 800)
# Plot the R and L tails:
plot(gdist2$G1WR2
,gdist2$GCOUNT
,type="h"
,lwd=4
,col="black"
,xlim=c(-0.2,0.2)
,main=
"Tails of Frequency Counts of Rounded Normalized Weekly Gains For SPY ETF (0.1 = 10% gain)"
,xlab="Left and Right Tails of Rounded Normalized 1 Week Gains (Blue Lines are Mean and 2 Std. Dev. From Mean)"
,ylab="Frequency Count"
)
grid(col="black")
# I pulled these values from my Oracle rpt:
g1wmin = -0.198
g1wmax = 0.194
g1wsd = 0.02515
g1wmean = 0.00171
# Show Mean and 2 stddevs from mean as vertical lines:
abline(v = g1wmean, col="blue",lwd=2)
abline(v = g1wmean-2*g1wsd, col="blue",lwd=2)
abline(v = g1wmean+2*g1wsd, col="blue",lwd=2)
dev.off()
Again, I see a long tail of negative gains and bias towards positive gains near the mean. I show two standard deviations from the mean as two blue vertical lines. The blue vertical line in the middle is at the mean. Before I started this exercise I assumed that the histogram of gains would resemble a bell curve. I was wrong; instead it looks like a narrow triangle with a long tail on the left. Next, I demonstrate how to visualize mvg_avg_4wk_slope as a distributionAs before I collect slope frequencies using a SQL script.oracle@z2:/pt/s/rluck/svmd/bp$
oracle@z2:/pt/s/rluck/svmd/bp$ sqt @ma_gdist
SQL*Plus: Release 11.2.0.1.0 Production on Wed Sep 21 09:12:18 2011
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
09:12:18 SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD';
Session altered.
Elapsed: 00:00:00.01
09:12:18 SQL>
09:12:18 SQL> --
09:12:19 SQL> -- ma_gdist.sql
09:12:19 SQL> --
09:12:19 SQL>
09:12:19 SQL> -- Demonstrate how to visualize gain1week as a distribution.
09:12:19 SQL> -- Demonstrate how to visualize mvg_avg_slope as a distribution.
09:12:19 SQL>
09:12:19 SQL> -- Model a Normal Distribution:
09:12:19 SQL>
09:12:19 SQL> SELECT
09:12:19 2 -- MIN(gain1week)
09:12:19 3 -- ,AVG(gain1week)
09:12:19 4 -- ,MAX(gain1week)
09:12:19 5 -- ,STDDEV(gain1week)
09:12:19 6 -- ,MIN(mvg_avg_slope)
09:12:19 7 MIN(mvg_avg_slope)
09:12:19 8 ,AVG(mvg_avg_slope)
09:12:19 9 ,MAX(mvg_avg_slope)
09:12:19 10 ,STDDEV(mvg_avg_slope)
09:12:19 11 FROM batsig
09:12:19 12 /
MIN(MVG_AVG_SLOPE) AVG(MVG_AVG_SLOPE) MAX(MVG_AVG_SLOPE) STDDEV(MVG_AVG_SLOPE)
------------------ ------------------ ------------------ ---------------------
-.0209343 .000223191 .009013965 .002369312
Elapsed: 00:00:00.04
09:12:19 SQL>
09:12:19 SQL> CREATE OR REPLACE VIEW batsigv10 AS
09:12:19 2 SELECT
09:12:19 3 tkr
09:12:19 4 ,ydate
09:12:19 5 ,price1
09:12:19 6 ,gain1week
09:12:19 7 ,ROUND(gain1week,1)g1wr1
09:12:19 8 ,ROUND(gain1week,2)g1wr2
09:12:19 9 ,mvg_avg_slope
09:12:19 10 ,ROUND(mvg_avg_slope,1)masr1
09:12:19 11 ,ROUND(mvg_avg_slope,2)masr2
09:12:19 12 ,ROUND(mvg_avg_slope,3)masr3
09:12:19 13 FROM batsig
09:12:19 14 /
View created.
Elapsed: 00:00:00.14
09:12:19 SQL>
09:12:19 SQL> -- Visualize a histogram:
09:12:19 SQL>
09:12:19 SQL> SELECT
09:12:19 2 masr3
09:12:19 3 ,COUNT(masr3) gcount
09:12:19 4 FROM batsigv10
09:12:19 5 GROUP BY masr3
09:12:19 6 ORDER BY masr3
09:12:19 7 /
MASR3 GCOUNT
---------- ----------
-.021 1
-.02 1
-.018 1
-.017 2
-.015 1
-.014 1
-.013 2
-.012 6
-.011 3
-.01 7
-.009 13
-.008 12
-.007 19
-.006 40
-.005 57
-.004 103
-.003 206
-.002 366
-.001 546
0 881
.001 1121
.002 792
.003 331
.004 103
.005 45
.006 18
.007 7
.008 3
.009 2
29 rows selected.
Elapsed: 00:00:00.02
09:12:19 SQL>
09:12:19 SQL>
09:12:19 SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
oracle@z2:/pt/s/rluck/svmd/bp$
As before, I see a long tail left of the mean. Unlike the gain frequencies, it seems that the mean is further to the right; it is more positive. I used R to plot the data: # masr3.r
# I use this file to read masr3.csv which was created from ma_gdist.sql
# The data is a frequency count which I intend to display as a histogram.
# I build masr3.csv by hand from preso.txt
masr3 = read.csv("masr3.csv")
# Define the .png file which will hold the output:
png("masr3.png",width = 800, height = 800)
# Plot the R and L tails:
plot(masr3$MASR3
,masr3$GCOUNT
,type="h"
,lwd=4
,col="black"
,xlim=c(-0.020,0.020)
,main="Frequency Counts of Rounded Normalized Moving Average Slopes of SPY ETF"
,xlab=
"Normal Distribution on top of Moving Average Slope Frequencies (Blue Lines are at Mean and 2 Std. Dev. From Mean)"
,ylab="Frequency Count")
grid(col="black")
# I pulled these values from my Oracle rpt:
masr3min = -0.0209
masr3max = 0.00901
masr3sd = 0.00237
masr3mean = 0.000223
# Show Mean and 2 stddevs from mean as vertical lines:
abline(v = masr3mean, col="blue",lwd=2)
abline(v = masr3mean-2*masr3sd, col="blue",lwd=2)
abline(v = masr3mean+2*masr3sd, col="blue",lwd=2)
# Now overlay another plot:
# http://pj.freefaculty.org/R/Rtips.html#5.11
par(new = TRUE)
x=seq(masr3min,masr3max,length=200)
y=dnorm(x,mean=masr3mean,sd=masr3sd)
plot(x,y,type="l",lwd=1,col="blue",xaxt="n", yaxt="n",xlab=" ",ylab=" ",xlim=c(-0.020,0.020))
dev.off()
This histogram resembles the histogram of gains. It looks like a narrow triangle with a long tail on the left. Next I answer the question, when mvg_avg_4wk_slope, was more than two standard deviations from the mean, how did gain1week usually react?SQL is well suited to answer this question as demonstrated by the script below:oracle@z2:/pt/s/rluck/svmd/bp$
oracle@z2:/pt/s/rluck/svmd/bp$ sqt @corr_spy.sql
SQL*Plus: Release 11.2.0.1.0 Production on Wed Sep 21 09:28:47 2011
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
09:28:47 SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD';
Session altered.
Elapsed: 00:00:00.00
09:28:47 SQL>
09:28:47 SQL> --
09:28:47 SQL> -- corr_spy.sql
09:28:47 SQL> --
09:28:47 SQL>
09:28:47 SQL> -- Start by showing a count of all the trading days:
09:28:47 SQL> SELECT COUNT(ydate)FROM batsig;
COUNT(YDATE)
------------
4690
Elapsed: 00:00:00.01
09:28:47 SQL>
09:28:47 SQL> -- Now go on a quest for correlation
09:28:47 SQL> -- between extreme 4 wk moving avg slopes and resulting 1 week gain:
09:28:47 SQL>
09:28:47 SQL> SELECT
09:28:47 2 CASE WHEN SIGN(mvg_avg_slope)>0 THEN'slope_up'ELSE'slope_down'END up_down
09:28:47 3 ,CORR(mvg_avg_slope,gain1week)corr_mas_g1w
09:28:47 4 ,COUNT(mvg_avg_slope) occurrences
09:28:47 5 ,AVG(gain1week) avg_gain1week
09:28:47 6 ,SUM(gain1week) sum_gain1week
09:28:47 7 FROM batsig
09:28:47 8 WHERE ABS(mvg_avg_slope)>= 0.004
09:28:47 9 GROUP BY SIGN(mvg_avg_slope)
09:28:47 10 /
UP_DOWN CORR_MAS_G1W OCCURRENCES AVG_GAIN1WEEK SUM_GAIN1WEEK
---------- ------------ ----------- ------------- -------------
slope_up .064292553 111 -.00269497 -.29914135
slope_down -.28823126 211 .006379264 1.3460247
Elapsed: 00:00:00.03
09:28:47 SQL>
09:28:47 SQL> SELECT
09:28:47 2 CASE WHEN SIGN(mvg_avg_slope)>0 THEN'slope_up'ELSE'slope_down'END up_down
09:28:47 3 ,CORR(mvg_avg_slope,gain1week)corr_mas_g1w
09:28:47 4 ,COUNT(mvg_avg_slope) occurrences
09:28:47 5 ,AVG(gain1week) avg_gain1week
09:28:47 6 ,SUM(gain1week) sum_gain1week
09:28:47 7 FROM batsig
09:28:47 8 WHERE ABS(mvg_avg_slope)>= 0.0045
09:28:47 9 GROUP BY SIGN(mvg_avg_slope)
09:28:47 10 /
UP_DOWN CORR_MAS_G1W OCCURRENCES AVG_GAIN1WEEK SUM_GAIN1WEEK
---------- ------------ ----------- ------------- -------------
slope_up .094249655 75 -.00220989 -.16574172
slope_down -.27570685 166 .010005402 1.66089671
Elapsed: 00:00:00.00
09:28:47 SQL>
09:28:47 SQL> SELECT
09:28:47 2 CASE WHEN SIGN(mvg_avg_slope)>0 THEN'slope_up'ELSE'slope_down'END up_down
09:28:47 3 ,CORR(mvg_avg_slope,gain1week)corr_mas_g1w
09:28:47 4 ,COUNT(mvg_avg_slope) occurrences
09:28:47 5 ,AVG(gain1week) avg_gain1week
09:28:47 6 ,SUM(gain1week) sum_gain1week
09:28:47 7 FROM batsig
09:28:47 8 WHERE ABS(mvg_avg_slope)>= 0.005
09:28:47 9 GROUP BY SIGN(mvg_avg_slope)
09:28:47 10 /
UP_DOWN CORR_MAS_G1W OCCURRENCES AVG_GAIN1WEEK SUM_GAIN1WEEK
---------- ------------ ----------- ------------- -------------
slope_up .201186317 45 -.00283419 -.12753875
slope_down -.26423013 134 .013981069 1.87346323
Elapsed: 00:00:00.02
09:28:47 SQL>
The last query shows a significant negative correlation between negative moving average slopes and resulting gains. In plain English it could be stated like this: When the slope of the moving average is abnormally negative (more than two standard deviations to the left of the mean), that is a bullish indicator for SPY. In this situation we should expect a return of 1.4% / week. Since 1998 this opportunity appeared 134 times and offered a return of 187% (assuming a constant sized bet). All 134 opportunities are listed below: 09:28:47 SQL> SELECT 09:28:47 2 tkr 09:28:47 3 ,ydate 09:28:47 4 ,price1 09:28:47 5 ,gain1week normalized_g1wk 09:28:47 6 FROM batsig 09:28:47 7 WHERE mvg_avg_slope <= -0.005 09:28:47 8 ORDER by ydate 09:28:47 9 / TKR YDATE PRICE1 NORMALIZED_G1WK ----------- ---------- ---------- --------------- SPY 1998-08-14 85.37 .022958885 SPY 1998-08-31 77.23 .072899132 SPY 1998-09-01 80.49 .004472605 SPY 1998-09-04 78.63 .058247488 SPY 2000-10-12 109.72 .046481954 SPY 2001-03-01 102.97 .020200058 SPY 2001-03-02 102.15 -.0020558 SPY 2001-03-05 103.08 -.05335662 SPY 2001-03-12 97.58 -.00348432 SPY 2001-03-14 97.22 -.04320099 SPY 2001-03-15 97.25 -.05316195 SPY 2001-03-16 95.3 -.004617 SPY 2001-03-19 97.24 -.01203209 SPY 2001-03-20 94.63 .035929409 SPY 2001-03-21 93.02 .024725865 SPY 2001-03-22 92.08 .039205039 SPY 2001-04-03 91.47 .056739915 SPY 2001-04-04 91.85 .05302123 SPY 2001-04-06 93.88 .03792075 SPY 2001-09-17 86.67 -.03092189 SPY 2001-09-18 86.46 -.01839001 SPY 2001-09-19 84.72 -.00177054 SPY 2001-09-20 82.02 .039990246 SPY 2001-09-21 81.14 .073576534 SPY 2001-09-24 83.99 .035480414 SPY 2001-09-25 84.87 .037586898 SPY 2001-09-26 84.57 .058767885 SPY 2001-09-27 85.3 .050644783 SPY 2001-09-28 87.11 .02674779 SPY 2001-10-01 86.97 .021731632 SPY 2001-10-02 88.06 .003633886 SPY 2002-06-25 81.89 -.02649896 SPY 2002-07-10 77.33 -.01500065 SPY 2002-07-12 77.1 -.07769131 SPY 2002-07-16 76.02 -.117206 SPY 2002-07-17 76.17 -.06629907 SPY 2002-07-18 73.7 -.04328358 SPY 2002-07-19 71.11 .010406413 SPY 2002-07-22 69 .092028986 SPY 2002-07-23 67.11 .13753539 SPY 2002-07-24 71.12 .075928009 SPY 2002-07-25 70.51 .056871366 SPY 2002-07-26 71.85 .013917884 SPY 2002-07-29 75.35 -.06675514 SPY 2002-08-05 70.32 .081769056 SPY 2002-08-06 72.68 .027517887 SPY 2002-09-19 71.1 .01673699 SPY 2002-09-20 71.12 -.018982 SPY 2002-09-23 70.54 -.02239864 SPY 2002-09-24 69.4 .041498559 SPY 2002-09-25 71.12 -.01420135 SPY 2002-09-27 69.77 -.0235058 SPY 2002-09-30 68.96 -.0324826 SPY 2002-10-07 66.72 .069544365 SPY 2002-10-08 67.77 .103585657 SPY 2002-10-09 65.85 .108276386 SPY 2002-10-10 67.99 .094719812 SPY 2003-02-11 70.69 .020936483 SPY 2003-02-12 69.56 .027170788 SPY 2003-02-13 69.78 .034250502 SPY 2008-01-22 121.1 .03971924 SPY 2008-01-25 123.25 .049168357 SPY 2008-01-28 125.29 .019075744 SPY 2008-07-07 116.97 -.01838078 SPY 2008-07-15 113.2 .053710247 SPY 2008-09-29 104.82 -.05981683 SPY 2008-10-02 105.26 -.18915067 SPY 2008-10-03 103.84 -.19799692 A loss of 19.8% ! SPY 2008-10-06 98.55 -.03216641 SPY 2008-10-07 94.13 -.00169978 SPY 2008-10-08 91.76 -.07683086 SPY 2008-10-09 85.35 .033860574 SPY 2008-10-10 83.28 .053314121 SPY 2008-10-13 95.38 -.02505766 SPY 2008-10-14 93.97 -.04001277 SPY 2008-10-15 84.71 .006964939 SPY 2008-10-16 88.24 -.02209882 SPY 2008-10-17 87.72 -.06623347 SPY 2008-10-20 92.99 -.15044628 SPY 2008-10-21 90.21 -.02194879 SPY 2008-10-22 85.3 .026846424 SPY 2008-10-23 86.29 .050179627 SPY 2008-10-24 81.91 .112440483 SPY 2008-10-27 79 .156835443 SPY 2008-10-28 88.23 .070950924 SPY 2008-10-29 87.59 .033451307 SPY 2008-10-30 90.62 -.05638932 SPY 2008-10-31 91.12 -.03061896 SPY 2008-11-03 91.39 -.04617573 SPY 2008-11-11 84.48 -.02994792 SPY 2008-11-12 80.76 -.05027241 SPY 2008-11-18 81.95 -.01635143 SPY 2008-11-19 76.7 .091655802 SPY 2008-11-20 71 .194084507 A gain of 19.4% ! SPY 2008-11-21 74.83 .032607243 SPY 2008-12-01 77.27 .10832147 SPY 2008-12-02 80.24 .049725823 SPY 2008-12-03 82.17 .032006815 SPY 2008-12-04 80.27 .031020306 SPY 2009-01-20 76.44 .049058085 SPY 2009-02-03 79.44 -.00742699 SPY 2009-02-04 79.06 .003162155 SPY 2009-02-05 80.23 -.01071918 SPY 2009-02-23 70.82 -.05422197 SPY 2009-02-26 71.74 -.09018679 SPY 2009-02-27 70.14 -.06786427 SPY 2009-03-02 66.98 -.0352344 SPY 2009-03-03 66.48 .029933815 SPY 2009-03-04 68.05 .012637766 SPY 2009-03-05 65.27 .097441397 SPY 2009-03-06 65.38 .104160294 SPY 2009-03-09 64.62 .113741876 SPY 2009-03-10 68.47 .083248138 SPY 2009-03-11 68.91 .100420839 SPY 2010-05-20 104.39 .02988792 SPY 2010-05-21 105.91 .002360495 SPY 2010-05-24 104.55 -.00162602 SPY 2010-05-25 104.66 .023218039 SPY 2010-05-26 104.03 .032971258 SPY 2010-05-28 106.16 -.03541824 SPY 2011-08-04 119.64 -.02432297 SPY 2011-08-05 119.46 -.01632346 SPY 2011-08-08 111.68 .074498567 SPY 2011-08-09 116.88 .017881588 SPY 2011-08-10 111.71 .065705845 SPY 2011-08-11 116.73 -.02407265 SPY 2011-08-12 117.51 -.04637903 SPY 2011-08-17 119.05 -.01327173 SPY 2011-08-18 113.92 .015449438 SPY 2011-08-19 112.06 .047296091 SPY 2011-08-22 112.15 .076593848 SPY 2011-08-23 115.84 .044975829 SPY 2011-08-24 117.47 .035072785 SPY 2011-08-25 115.68 .04011065 134 rows selected. Elapsed: 00:00:00.00 09:28:47 SQL> 09:28:47 SQL> 09:28:47 SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ oracle@z2:/pt/s/rluck/svmd/bp$ Next, I show how to visualize the negative correlation between moving average slope and gain as a scatter plot. I use SQL and then R to create a plot: -- -- masg.sql -- -- I use this script to create 2 columns: -- Mvg Avg Slope (mas) -- Gain 1 week (g1w) SET COLSEP "," SELECT mvg_avg_slope mas ,gain1week g1w FROM batsig WHERE ABS(mvg_avg_slope)>= 0.004 ORDER BY mvg_avg_slope; exit # masg004.r
# I use this script to read masg004.csv which was created from masg.sql
masg004 = read.csv("masg004.csv")
# Define the .png file which will hold the output:
png("masg004.png",width = 800, height = 800)
plot(masg004$MAS
,masg004$G1W
,pch=20
,col="black"
,xlim=c(-0.025,0.025)
,main="Scatter Plot of SPY ETF Moving Average Slope vs. Normalized 1 Week Gain"
,xlab="Mvg. Avg. Slope (Values within 2 Std. Dev. From Mean Omitted)"
,ylab="Normalized 1 Week Gain (0.1 = 10% Gain)")
grid(col="black")
abline(v = 0.004, col="red",lwd=1)
abline(v = -0.004, col="red",lwd=1)
abline(v = 0.0045, col="green",lwd=1)
abline(v = -0.0045, col="green",lwd=1)
abline(v = 0.005, col="blue",lwd=1)
abline(v = -0.005, col="blue",lwd=1)
dev.off()
The red vertical line in the above plot corresponds to this predicate: SELECT data FROM batsig WHERE ABS(mvg_avg_slope)>= 0.004 The green vertical line corresponds to this predicate: SELECT data FROM batsig WHERE ABS(mvg_avg_slope)>= 0.0045 The blue vertical line corresponds to this predicate: SELECT data FROM batsig WHERE ABS(mvg_avg_slope)>= 0.005 Two standard deviations from the mean is 2 x 0.002369 which lies between the green line and the blue line. It is not obvious from the scatter plot but inspection of the SQL output from corr_spy.sql reveals that this predicate gives me the best results: SELECT data FROM batsig WHERE ABS(mvg_avg_slope)>= 0.005 Inspection of the plot, however, does reveal that if I draw a vertical line at -0.01 and slowly move it to the left, the proportion of dots above the 0.0 line increases. I interpret this to mean that as moving average slope becomes more negative, the negative correlation between that slope and gain becomes stronger. Next I discuss the question, what besides mvg_avg_4wk_slope could we mix with gain1week in a scatter plot?Answer: More tkrs!I obtained data for more tkrs by duplicating my efforts to fill ystk_stage with prices from SPY. I then filled a table named ystk and wrote a SQL script to query it: oracle@h2:/pt/s/rluck/svmd/bp$ sqt oracle@h2:/pt/s/rluck/svmd/bp$ sqt oracle@h2:/pt/s/rluck/svmd/bp$ sqt @corr_tkr SQL*Plus: Release 11.2.0.1.0 Production on Tue Sep 20 03:04:13 2011 Copyright (c) 1982, 2009, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options 03:04:13 SQL> ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'; Session altered. Elapsed: 00:00:00.00 03:04:13 SQL> 03:04:13 SQL> -- 03:04:13 SQL> -- corr_tkr.sql 03:04:13 SQL> -- 03:04:13 SQL> 03:04:13 SQL> -- Helps me find tkrs where CORR-tween MAS and g1w is strong. 03:04:13 SQL> 03:04:13 SQL> -- Get prices from past and future in same row as price1: 03:04:13 SQL> CREATE OR REPLACE VIEW batsigv20 AS 03:04:13 2 SELECT 03:04:13 3 tkr 03:04:13 4 ,ydate 03:04:13 5 ,LAG(clse,1,NULL)OVER(PARTITION BY tkr ORDER BY ydate) price0 03:04:13 6 ,clse price1 03:04:13 7 ,LEAD(clse,5,NULL)OVER(PARTITION BY tkr ORDER BY ydate) price2 03:04:13 8 FROM ystk 03:04:13 9 -- Prevent divide by 0 later on: 03:04:13 10 WHERE clse>0 03:04:13 11 / View created. Elapsed: 00:00:00.06 03:04:13 SQL> 03:04:13 SQL> 03:04:13 SQL> -- Get mvg avg for today and yesterday. Their Difference is slope. 03:04:13 SQL> CREATE OR REPLACE VIEW batsigv22 AS 03:04:13 2 SELECT 03:04:13 3 tkr 03:04:13 4 ,ydate 03:04:13 5 ,price1 03:04:13 6 ,price2 03:04:13 7 ,(price2 - price1)/price1 normlzed_gain1wk 03:04:13 8 ,AVG(price1)OVER(PARTITION BY tkr ORDER BY ydate ROWS BETWEEN 4*5 PRECEDING AND CURRENT ROW) 03:04:13 9 -AVG(price0)OVER(PARTITION BY tkr ORDER BY ydate ROWS BETWEEN 4*5 PRECEDING AND CURRENT ROW)mas1 03:04:13 10 FROM batsigv20 03:04:13 11 / View created. Elapsed: 00:00:00.03 03:04:13 SQL> 03:04:13 SQL> 03:04:13 SQL> 03:04:13 SQL> CREATE OR REPLACE VIEW batsigv24 AS 03:04:13 2 SELECT 03:04:13 3 tkr 03:04:13 4 ,ydate 03:04:13 5 ,price1 03:04:13 6 ,price2 03:04:13 7 ,normlzed_gain1wk 03:04:13 8 ,mas1/price1 mvg_avg_slope 03:04:13 9 ,STDDEV(mas1/price1)OVER(PARTITION BY tkr)stddev4tkr 03:04:13 10 FROM batsigv22 03:04:13 11 / View created. Elapsed: 00:00:00.02 03:04:13 SQL> 03:04:13 SQL> DROP TABLE batsig2; Table dropped. Elapsed: 00:00:00.09 03:04:13 SQL> CREATE TABLE batsig2 COMPRESS AS 03:04:13 2 SELECT 03:04:13 3 tkr 03:04:13 4 ,ydate 03:04:13 5 ,price1 03:04:13 6 ,price2 03:04:13 7 ,normlzed_gain1wk 03:04:13 8 ,mvg_avg_slope 03:04:13 9 ,stddev4tkr 03:04:13 10 FROM batsigv24 03:04:13 11 / Table created. Elapsed: 00:00:24.07 03:04:37 SQL> 03:04:37 SQL> SELECT 03:04:37 2 tkr 03:04:37 3 ,CASE WHEN SIGN(mvg_avg_slope)>0 THEN'slope_up'ELSE'slope_down'END up_down 03:04:37 4 ,CORR(mvg_avg_slope,normlzed_gain1wk)corr_mas_g1w 03:04:37 5 ,COUNT(mvg_avg_slope) occurrences 03:04:37 6 ,MIN(ydate) min_date 03:04:37 7 ,MAX(ydate) max_date 03:04:37 8 ,AVG(normlzed_gain1wk) avg_n_gain 03:04:37 9 FROM batsig2 03:04:37 10 WHERE ABS(mvg_avg_slope)> 2*stddev4tkr 03:04:37 11 GROUP BY SIGN(mvg_avg_slope),tkr 03:04:37 12 HAVING COUNT(mvg_avg_slope)>10 03:04:37 13 -- I want negative correlation: 03:04:37 14 AND CORR(mvg_avg_slope,normlzed_gain1wk) < -0.3 03:04:37 15 ORDER BY SIGN(mvg_avg_slope),CORR(mvg_avg_slope,normlzed_gain1wk) 03:04:37 16 / TKR UP_DOWN CORR_MAS_G1W OCCURRENCES MIN_DATE MAX_DATE AVG_N_GAIN --------- ---------- ------------ ----------- ---------- ---------- ---------- LVS slope_down -.77302025 59 2008-10-03 2009-03-12 -.00643721 IAG slope_down -.57996787 37 2004-04-20 2009-12-31 .026073603 MVG slope_down -.55026234 48 2007-08-20 2011-05-26 .053888808 EGO slope_down -.54875409 35 2004-05-03 2008-11-06 .061388713 GOOG slope_down -.532438 51 2006-02-07 2011-08-22 .012257707 AXU slope_down -.52112439 46 2006-09-11 2008-12-11 .018281875 GDX slope_down -.49894063 28 2006-05-24 2008-11-12 .027681865 BIDU slope_down -.48231978 21 2010-05-12 2010-06-10 -.00013375 WYNN slope_down -.44769247 64 2005-05-03 2009-10-28 .003877144 SWC slope_down -.43825229 188 1998-08-31 2011-08-08 .039312815 IOC slope_down -.43813153 47 2005-04-15 2010-05-24 .00539572 V slope_down -.43481517 35 2008-09-29 2010-06-02 .004035193 MET slope_down -.43038829 57 2002-07-23 2011-08-26 .104785469 XLU slope_down -.42859917 108 2000-02-22 2011-08-08 -.0029094 C slope_down -.42316725 260 1987-03-13 2011-08-24 .029379151 PRU slope_down -.41812509 58 2008-02-11 2011-08-08 .041262988 PM slope_down -.41472197 47 2008-09-29 2010-05-25 -.00515463 DTV slope_down -.38713246 47 2003-12-24 2011-08-30 .028532859 GFI slope_down -.38076471 165 1992-10-05 2008-11-12 .043204377 FSLR slope_down -.37918789 40 2008-01-16 2011-08-22 .039759179 CELG slope_down -.37748948 137 1991-12-12 2006-03-27 .012393269 GS slope_down -.3749512 90 2000-04-14 2011-08-24 .017092532 CEO slope_down -.36387817 50 2004-05-10 2011-08-22 .0063181 BUCY slope_down -.36379256 67 2004-07-27 2010-05-24 -.00601248 SLW slope_down -.36000276 36 2006-05-18 2008-11-20 -.02423249 TM slope_down -.35810336 119 1993-04-14 2011-08-30 .025914938 AUY slope_down -.3408188 57 2004-01-14 2008-11-12 .00410878 PALL slope_down -.33766057 16 2010-01-22 2011-03-17 .028838285 SLV slope_down -.33150545 21 2008-07-24 2008-08-21 -.05697899 RIG slope_down -.32793158 169 1993-12-14 2010-06-16 .011860647 AU slope_down -.32093453 120 1998-08-13 2009-04-20 .038261332 MOS slope_down -.30870009 31 2008-08-11 2008-12-04 -.01070864 VECO slope_down -.30859293 182 1994-12-08 2010-08-24 .015516011 SINA slope_up -.85996624 12 2000-05-16 2003-07-29 -.13897002 AXP slope_up -.58643497 24 2009-04-03 2009-08-07 .081219741 DNDN slope_up -.56742989 38 2007-03-30 2009-05-11 .036226391 MT slope_up -.5026734 31 1999-04-29 2004-08-11 -.02947996 CREE slope_up -.48175097 15 1993-02-10 2000-02-16 .022740543 ARG slope_up -.44331489 12 1987-03-03 2010-03-08 -.05655323 UA slope_up -.44302496 11 2005-11-21 2009-05-01 -.04501422 CMG slope_up -.42715384 22 2006-01-30 2009-04-22 .012556741 EGO slope_up -.41285315 17 2005-11-21 2009-01-09 .013781502 STT slope_up -.41246182 15 2000-04-03 2009-04-21 -.02448819 BBT slope_up -.39829984 39 1990-03-27 2009-08-24 -.02747301 GLD slope_up -.39238453 50 2005-12-08 2011-09-06 -.01337806 AMGN slope_up -.36264683 15 1986-04-10 2000-01-19 -.03259565 QCOM slope_up -.36015301 12 1992-01-02 1999-04-26 .025506011 TKR slope_up -.32943567 35 1998-11-04 2010-08-04 .008300022 AMZN slope_up -.32781593 25 1997-09-19 2001-05-08 -.02851557 VLO slope_up -.32236941 28 1984-08-27 2009-01-06 -.02598296 HMY slope_up -.31562862 62 1998-02-02 2008-12-31 -.0138532 IYR slope_up -.31085718 11 2008-12-19 2009-08-10 -.01799323 CAT slope_up -.30785237 39 1974-10-31 2009-08-14 -.00702876 WDC slope_up -.30168735 29 1987-01-05 2002-08-22 -.03050444 54 rows selected. Elapsed: 00:00:01.51 03:04:39 SQL> 03:04:39 SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options oracle@h2:/pt/s/rluck/svmd/bp$ The above query gives me a list of tkrs which exhibit negative correlation between moving average slope and 1 week gain. For example I see that GOOG, the tkr for Google, has a correlation of -0.532438 Between 2006-02-07 and 2011-08-22 I had 51 opportunities to trade GOOG on this signal. The average one week gain of these 51 opportunities was 1.2% Next, I use SQL and R to visualize the above report as a scatter plot. The SQL I use to generate the CSV file for R is similar to the SQL I used to create a CSV file full of SPY data. It is different though for two reasons. I want to generate points for many tkrs instead of just SPY. For all of the tkrs except EGO, I only want to see points which correspond to either negative moving average slopes or positive moving average slopes. The SQL is displayed below: -- -- masg_tkrs.sql -- -- I use this script to help build a scatter plot -- for tkrs which show a negative correlation between -- 4-week-moving-average-slope and 1-week-gain. -- This script depends on corr_tkr.sql: -- @corr_tkr.sql SET COLSEP "," ECHO OFF CREATE OR REPLACE VIEW masg_tkrsv1 AS SELECT tkr ,mvg_avg_slope ,normlzed_gain1wk FROM batsig2 WHERE ABS(mvg_avg_slope) > 2*stddev4tkr; SELECT mvg_avg_slope,normlzed_gain1wk FROM ( SELECT mvg_avg_slope,normlzed_gain1wk ,CORR(mvg_avg_slope,normlzed_gain1wk)OVER(PARTITION BY tkr)corr4tkr FROM masg_tkrsv1 WHERE mvg_avg_slope < 0 ) WHERE corr4tkr < -0.3; SELECT mvg_avg_slope,normlzed_gain1wk FROM ( SELECT mvg_avg_slope,normlzed_gain1wk ,CORR(mvg_avg_slope,normlzed_gain1wk)OVER(PARTITION BY tkr)corr4tkr FROM masg_tkrsv1 WHERE mvg_avg_slope > 0 ) WHERE corr4tkr < -0.3; exit Next, I show the R script to plot data from the above SQL script: # masg_tkrs.r
# I use this script to read masg_tkrs.csv which was created from masg_tkrs.sql
masg_tkrs = read.csv("masg_tkrs.csv")
# Define the .png file which will hold the output:
png("masg_tkrs.png",width = 800, height = 1700)
# Declare the output will be in 2 rows and 1 column:
par(mfrow = c(2,1))
plot(masg_tkrs$MVG_AVG_SLOPE,masg_tkrs$NORMLZED_GAIN1WK,pch=".",col="black",
xlim=c(-0.1, 0.1),
ylim=c(-0.7, 1.0),
xlab="Moving Average Slope At Least 2 Std. Dev. From Mean",
ylab="Normalized Gain Over 1 Week. (1.0 = 100% Gain)",
main="Scatter Plot of Negative Correlation Stocks (Zoom In with Light Points)")
grid(col="black")
plot(masg_tkrs$MVG_AVG_SLOPE,masg_tkrs$NORMLZED_GAIN1WK,pch=20,col="black",
xlim=c(-0.3, 0.2),
ylim=c(-0.7, 1.7),
xlab="Moving Average Slope At Least 2 Std. Dev. From Mean",
ylab="Normalized Gain Over 1 Week. (1.0 = 100% Gain)",
main="Scatter Plot of Negative Correlation Stocks (Zoom Out with Heavy Points)")
grid(col="black")
dev.off()
Output from the above R script is displayed below:
When I use my eyes to search for negative correlation in the above two charts, it seems more apparent in the bottom plot with heavy points. How do I trade using this idea?Using moving average slope to generate buy or sell signals depends on a simple idea.Every day I download tkr prices from Yahoo into a table. I use SQL to inspect the slope of the moving average of tkrs which exhibited negative correlation in the past. So that is it. In the evening after Yahoo has collected closing prices, I download prices and then I run a SQL script. I offer more discussion before I display the download script and the SQL script. If the slope of the moving average for any tkr is more than two standard deviations from the mean, I interpret that as a signal to buy or sell. One subtle point is that negative correlation usually occurs when moving average slope is negative. Using GOOG as an example again, it appears that it only exhibits strong negative correlation when it has an abnormally negative moving average slope. I see no evidence that GOOG exhibits strong negative correlation when it has an abnormally positive moving average slope. Some tkrs do, however, exhibit strong negative correlation when moving average slope is abnormally positive. The most commonly known tkr which exhibits this behavior is AXP, the tkr for American Express. In fact, I see only one tkr which exhibits strong negative correlation for both negative and positive slopes. That tkr is EGO which is the tkr for Eldorado Gold Corporation. With that discussion behind us, I now display the script to download prices for tkrs which exhibit strong negative correlation: #!/bin/bash
# getload_tkrs.bash
. /pt/s/rluck/svmd/.orcl
# Use wget to download csv files full of pricing data from finance.yahoo.com
set -x
# cd to the right place
cd /pt/s/rluck/svmd/bp/cf/
# avoid bumping into old data
rm -f *csv
# wget em
wget --output-document=ABX.csv http://ichart.finance.yahoo.com/table.csv?s=ABX
wget --output-document=AMGN.csv http://ichart.finance.yahoo.com/table.csv?s=AMGN
wget --output-document=AMZN.csv http://ichart.finance.yahoo.com/table.csv?s=AMZN
wget --output-document=APC.csv http://ichart.finance.yahoo.com/table.csv?s=APC
wget --output-document=APOL.csv http://ichart.finance.yahoo.com/table.csv?s=APOL
wget --output-document=ARG.csv http://ichart.finance.yahoo.com/table.csv?s=ARG
wget --output-document=AU.csv http://ichart.finance.yahoo.com/table.csv?s=AU
wget --output-document=AUY.csv http://ichart.finance.yahoo.com/table.csv?s=AUY
wget --output-document=AXP.csv http://ichart.finance.yahoo.com/table.csv?s=AXP
wget --output-document=AXU.csv http://ichart.finance.yahoo.com/table.csv?s=AXU
wget --output-document=BBT.csv http://ichart.finance.yahoo.com/table.csv?s=BBT
wget --output-document=BIDU.csv http://ichart.finance.yahoo.com/table.csv?s=BIDU
wget --output-document=C.csv http://ichart.finance.yahoo.com/table.csv?s=C
wget --output-document=CAT.csv http://ichart.finance.yahoo.com/table.csv?s=CAT
wget --output-document=CELG.csv http://ichart.finance.yahoo.com/table.csv?s=CELG
wget --output-document=CEO.csv http://ichart.finance.yahoo.com/table.csv?s=CEO
wget --output-document=CMG.csv http://ichart.finance.yahoo.com/table.csv?s=CMG
wget --output-document=CREE.csv http://ichart.finance.yahoo.com/table.csv?s=CREE
wget --output-document=DNDN.csv http://ichart.finance.yahoo.com/table.csv?s=DNDN
wget --output-document=DTV.csv http://ichart.finance.yahoo.com/table.csv?s=DTV
wget --output-document=EBAY.csv http://ichart.finance.yahoo.com/table.csv?s=EBAY
wget --output-document=EFA.csv http://ichart.finance.yahoo.com/table.csv?s=EFA
wget --output-document=EGO.csv http://ichart.finance.yahoo.com/table.csv?s=EGO
wget --output-document=EGO.csv http://ichart.finance.yahoo.com/table.csv?s=EGO
wget --output-document=FSLR.csv http://ichart.finance.yahoo.com/table.csv?s=FSLR
wget --output-document=FXI.csv http://ichart.finance.yahoo.com/table.csv?s=FXI
wget --output-document=GDX.csv http://ichart.finance.yahoo.com/table.csv?s=GDX
wget --output-document=GFI.csv http://ichart.finance.yahoo.com/table.csv?s=GFI
wget --output-document=GLD.csv http://ichart.finance.yahoo.com/table.csv?s=GLD
wget --output-document=GOOG.csv http://ichart.finance.yahoo.com/table.csv?s=GOOG
wget --output-document=GS.csv http://ichart.finance.yahoo.com/table.csv?s=GS
wget --output-document=HAL.csv http://ichart.finance.yahoo.com/table.csv?s=HAL
wget --output-document=HMY.csv http://ichart.finance.yahoo.com/table.csv?s=HMY
wget --output-document=IAG.csv http://ichart.finance.yahoo.com/table.csv?s=IAG
wget --output-document=IOC.csv http://ichart.finance.yahoo.com/table.csv?s=IOC
wget --output-document=IYR.csv http://ichart.finance.yahoo.com/table.csv?s=IYR
wget --output-document=LVS.csv http://ichart.finance.yahoo.com/table.csv?s=LVS
wget --output-document=MDT.csv http://ichart.finance.yahoo.com/table.csv?s=MDT
wget --output-document=MET.csv http://ichart.finance.yahoo.com/table.csv?s=MET
wget --output-document=MOS.csv http://ichart.finance.yahoo.com/table.csv?s=MOS
wget --output-document=MT.csv http://ichart.finance.yahoo.com/table.csv?s=MT
wget --output-document=MVG.csv http://ichart.finance.yahoo.com/table.csv?s=MVG
wget --output-document=PALL.csv http://ichart.finance.yahoo.com/table.csv?s=PALL
wget --output-document=PEP.csv http://ichart.finance.yahoo.com/table.csv?s=PEP
wget --output-document=PM.csv http://ichart.finance.yahoo.com/table.csv?s=PM
wget --output-document=PRU.csv http://ichart.finance.yahoo.com/table.csv?s=PRU
wget --output-document=QCOM.csv http://ichart.finance.yahoo.com/table.csv?s=QCOM
wget --output-document=QQQ.csv http://ichart.finance.yahoo.com/table.csv?s=QQQ
wget --output-document=RIG.csv http://ichart.finance.yahoo.com/table.csv?s=RIG
wget --output-document=SCCO.csv http://ichart.finance.yahoo.com/table.csv?s=SCCO
wget --output-document=SINA.csv http://ichart.finance.yahoo.com/table.csv?s=SINA
wget --output-document=SLV.csv http://ichart.finance.yahoo.com/table.csv?s=SLV
wget --output-document=SLW.csv http://ichart.finance.yahoo.com/table.csv?s=SLW
wget --output-document=STT.csv http://ichart.finance.yahoo.com/table.csv?s=STT
wget --output-document=SWC.csv http://ichart.finance.yahoo.com/table.csv?s=SWC
wget --output-document=TKR.csv http://ichart.finance.yahoo.com/table.csv?s=TKR
wget --output-document=TM.csv http://ichart.finance.yahoo.com/table.csv?s=TM
wget --output-document=UA.csv http://ichart.finance.yahoo.com/table.csv?s=UA
wget --output-document=V.csv http://ichart.finance.yahoo.com/table.csv?s=V
wget --output-document=VECO.csv http://ichart.finance.yahoo.com/table.csv?s=VECO
wget --output-document=VLO.csv http://ichart.finance.yahoo.com/table.csv?s=VLO
wget --output-document=VMW.csv http://ichart.finance.yahoo.com/table.csv?s=VMW
wget --output-document=WDC.csv http://ichart.finance.yahoo.com/table.csv?s=WDC
wget --output-document=WYNN.csv http://ichart.finance.yahoo.com/table.csv?s=WYNN
wget --output-document=XLU.csv http://ichart.finance.yahoo.com/table.csv?s=XLU
wget --output-document=XOM.csv http://ichart.finance.yahoo.com/table.csv?s=XOM
wget --output-document=YUM.csv http://ichart.finance.yahoo.com/table.csv?s=YUM
# clobber ystk_stage.csv:
cat ABX.csv | awk '{print "ABX,"$0}' | grep 0 >ystk_stage.csv
# Now append to it:
cat AMGN.csv | awk '{print "AMGN,"$0}'| grep 0 >>ystk_stage.csv
cat AMZN.csv | awk '{print "AMZN,"$0}' | grep 0 >>ystk_stage.csv
cat APC.csv | awk '{print "APC,"$0}' | grep 0 >>ystk_stage.csv
cat APOL.csv | awk '{print "APOL,"$0}' | grep 0 >>ystk_stage.csv
cat ARG.csv | awk '{print "ARG,"$0}' | grep 0 >>ystk_stage.csv
cat AU.csv | awk '{print "AU,"$0}' | grep 0 >>ystk_stage.csv
cat AUY.csv | awk '{print "AUY,"$0}' | grep 0 >>ystk_stage.csv
cat AXP.csv | awk '{print "AXP,"$0}' | grep 0 >>ystk_stage.csv
cat AXU.csv | awk '{print "AXU,"$0}' | grep 0 >>ystk_stage.csv
cat BBT.csv | awk '{print "BBT,"$0}' | grep 0 >>ystk_stage.csv
cat BIDU.csv | awk '{print "BIDU,"$0}' | grep 0 >>ystk_stage.csv
cat C.csv | awk '{print "C,"$0}' | grep 0 >>ystk_stage.csv
cat CAT.csv | awk '{print "CAT,"$0}' | grep 0 >>ystk_stage.csv
cat CELG.csv | awk '{print "CELG,"$0}' | grep 0 >>ystk_stage.csv
cat CEO.csv | awk '{print "CEO,"$0}' | grep 0 >>ystk_stage.csv
cat CMG.csv | awk '{print "CMG,"$0}' | grep 0 >>ystk_stage.csv
cat CREE.csv | awk '{print "CREE,"$0}' | grep 0 >>ystk_stage.csv
cat DNDN.csv | awk '{print "DNDN,"$0}' | grep 0 >>ystk_stage.csv
cat DTV.csv | awk '{print "DTV,"$0}' | grep 0 >>ystk_stage.csv
cat EBAY.csv | awk '{print "EBAY,"$0}' | grep 0 >>ystk_stage.csv
cat EFA.csv | awk '{print "EFA,"$0}' | grep 0 >>ystk_stage.csv
cat EGO.csv | awk '{print "EGO,"$0}' | grep 0 >>ystk_stage.csv
cat EGO.csv | awk '{print "EGO,"$0}' | grep 0 >>ystk_stage.csv
cat FSLR.csv | awk '{print "FSLR,"$0}' | grep 0 >>ystk_stage.csv
cat FXI.csv | awk '{print "FXI,"$0}' | grep 0 >>ystk_stage.csv
cat GDX.csv | awk '{print "GDX,"$0}' | grep 0 >>ystk_stage.csv
cat GFI.csv | awk '{print "GFI,"$0}' | grep 0 >>ystk_stage.csv
cat GLD.csv | awk '{print "GLD,"$0}' | grep 0 >>ystk_stage.csv
cat GOOG.csv | awk '{print "GOOG,"$0}' | grep 0 >>ystk_stage.csv
cat GS.csv | awk '{print "GS,"$0}' | grep 0 >>ystk_stage.csv
cat HAL.csv | awk '{print "HAL,"$0}' | grep 0 >>ystk_stage.csv
cat HMY.csv | awk '{print "HMY,"$0}' | grep 0 >>ystk_stage.csv
cat IAG.csv | awk '{print "IAG,"$0}' | grep 0 >>ystk_stage.csv
cat IOC.csv | awk '{print "IOC,"$0}' | grep 0 >>ystk_stage.csv
cat IYR.csv | awk '{print "IYR,"$0}' | grep 0 >>ystk_stage.csv
cat LVS.csv | awk '{print "LVS,"$0}' | grep 0 >>ystk_stage.csv
cat MDT.csv | awk '{print "MDT,"$0}' | grep 0 >>ystk_stage.csv
cat MET.csv | awk '{print "MET,"$0}' | grep 0 >>ystk_stage.csv
cat MOS.csv | awk '{print "MOS,"$0}' | grep 0 >>ystk_stage.csv
cat MT.csv | awk '{print "MT,"$0}' | grep 0 >>ystk_stage.csv
cat MVG.csv | awk '{print "MVG,"$0}' | grep 0 >>ystk_stage.csv
cat PALL.csv | awk '{print "PALL,"$0}' | grep 0 >>ystk_stage.csv
cat PEP.csv | awk '{print "PEP,"$0}' | grep 0 >>ystk_stage.csv
cat PM.csv | awk '{print "PM,"$0}' | grep 0 >>ystk_stage.csv
cat PRU.csv | awk '{print "PRU,"$0}' | grep 0 >>ystk_stage.csv
cat QCOM.csv | awk '{print "QCOM,"$0}' | grep 0 >>ystk_stage.csv
cat QQQ.csv | awk '{print "QQQ,"$0}' | grep 0 >>ystk_stage.csv
cat RIG.csv | awk '{print "RIG,"$0}' | grep 0 >>ystk_stage.csv
cat SCCO.csv | awk '{print "SCCO,"$0}' | grep 0 >>ystk_stage.csv
cat SINA.csv | awk '{print "SINA,"$0}' | grep 0 >>ystk_stage.csv
cat SLV.csv | awk '{print "SLV,"$0}' | grep 0 >>ystk_stage.csv
cat SLW.csv | awk '{print "SLW,"$0}' | grep 0 >>ystk_stage.csv
cat STT.csv | awk '{print "STT,"$0}' | grep 0 >>ystk_stage.csv
cat SWC.csv | awk '{print "SWC,"$0}' | grep 0 >>ystk_stage.csv
cat TKR.csv | awk '{print "TKR,"$0}' | grep 0 >>ystk_stage.csv
cat TM.csv | awk '{print "TM,"$0}' | grep 0 >>ystk_stage.csv
cat UA.csv | awk '{print "UA,"$0}' | grep 0 >>ystk_stage.csv
cat V.csv | awk '{print "V,"$0}' | grep 0 >>ystk_stage.csv
cat VECO.csv | awk '{print "VECO,"$0}' | grep 0 >>ystk_stage.csv
cat VLO.csv | awk '{print "VLO,"$0}' | grep 0 >>ystk_stage.csv
cat VMW.csv | awk '{print "VMW,"$0}' | grep 0 >>ystk_stage.csv
cat WDC.csv | awk '{print "WDC,"$0}' | grep 0 >>ystk_stage.csv
cat WYNN.csv | awk '{print "WYNN,"$0}' | grep 0 >>ystk_stage.csv
cat XLU.csv | awk '{print "XLU,"$0}' | grep 0 >>ystk_stage.csv
cat XOM.csv | awk '{print "XOM,"$0}' | grep 0 >>ystk_stage.csv
cat YUM.csv | awk '{print "YUM,"$0}' | grep 0 >>ystk_stage.csv
# Load this into Oracle table:
sqlldr trade/t bindsize=20971520 readsize=20971520 rows=123456 control=ystk_stage.ctl
# They there?
grep loaded ystk_stage.log
Next, I display the SQL script I use to look for abnormal moving average slopes which are recent: 04:08:14 SQL> @qry_batsig3 04:08:16 SQL> -- 04:08:16 SQL> -- qry_batsig3.sql 04:08:16 SQL> -- 04:08:16 SQL> 04:08:16 SQL> -- Helps me look at recent data for abnormally large MAS. 04:08:16 SQL> 04:08:16 SQL> select min(ydate),max(ydate)from ystk_stage; MIN(YDATE) MAX(YDATE) ---------- ---------- 1962-01-02 2011-09-19 Elapsed: 00:00:00.04 04:08:16 SQL> 04:08:16 SQL> -- Get prices from past and future in same row as price1: 04:08:16 SQL> CREATE OR REPLACE VIEW batsigv30 AS 04:08:16 2 SELECT 04:08:16 3 tkr 04:08:16 4 ,ydate 04:08:16 5 ,LAG(clse,1,NULL)OVER(PARTITION BY tkr ORDER BY ydate) price0 04:08:16 6 ,clse price1 04:08:16 7 ,LEAD(clse,5,NULL)OVER(PARTITION BY tkr ORDER BY ydate) price2 04:08:16 8 FROM ystk_stage 04:08:16 9 -- Prevent divide by 0 later on: 04:08:16 10 WHERE clse>0 04:08:16 11 / View created. Elapsed: 00:00:00.19 04:08:16 SQL> 04:08:16 SQL> CREATE OR REPLACE VIEW batsigv32 AS 04:08:16 2 SELECT 04:08:16 3 tkr 04:08:16 4 ,ydate 04:08:16 5 ,price1 04:08:16 6 ,price2 04:08:16 7 ,(price2 - price1)/price1 gain1week 04:08:16 8 ,AVG(price1)OVER(PARTITION BY tkr ORDER BY ydate ROWS BETWEEN 4*5 PRECEDING AND CURRENT ROW) 04:08:16 9 -AVG(price0)OVER(PARTITION BY tkr ORDER BY ydate ROWS BETWEEN 4*5 PRECEDING AND CURRENT ROW)mas1 04:08:16 10 FROM batsigv30 04:08:16 11 / View created. Elapsed: 00:00:00.02 04:08:16 SQL> 04:08:16 SQL> 04:08:16 SQL> CREATE OR REPLACE VIEW batsigv34 AS 04:08:16 2 SELECT 04:08:16 3 tkr 04:08:16 4 ,ydate 04:08:16 5 ,price1 04:08:16 6 ,price2 04:08:16 7 ,gain1week 04:08:16 8 ,mas1/price1 mvg_avg_slope 04:08:16 9 ,STDDEV(mas1/price1)OVER(PARTITION BY tkr)stddev4tkr 04:08:16 10 FROM batsigv32 04:08:16 11 / View created. Elapsed: 00:00:00.02 04:08:16 SQL> 04:08:16 SQL> PURGE RECYCLEBIN; Recyclebin purged. Elapsed: 00:00:00.09 04:08:16 SQL> DROP TABLE batsig3; Table dropped. Elapsed: 00:00:00.14 04:08:16 SQL> CREATE TABLE batsig3 COMPRESS AS 04:08:16 2 SELECT 04:08:16 3 tkr 04:08:16 4 ,ydate 04:08:16 5 ,price1 04:08:16 6 ,price2 04:08:16 7 ,gain1week 04:08:16 8 ,mvg_avg_slope 04:08:16 9 ,stddev4tkr 04:08:16 10 FROM batsigv34 04:08:16 11 / Table created. Elapsed: 00:00:02.36 04:08:19 SQL> 04:08:19 SQL> -- Look for recent abnormally large MAS: 04:08:19 SQL> 04:08:19 SQL> SELECT 04:08:19 2 ydate 04:08:19 3 ,tkr 04:08:19 4 ,CASE WHEN SIGN(mvg_avg_slope)>0 THEN'slope_up'ELSE'slope_down'END up_down 04:08:19 5 ,mvg_avg_slope 04:08:19 6 ,gain1week 04:08:19 7 FROM batsig3 04:08:19 8 WHERE ABS(mvg_avg_slope)> 2*stddev4tkr 04:08:19 9 -- recent 04:08:19 10 AND ydate > sysdate - 40 04:08:19 11 ORDER BY ydate,tkr 04:08:19 12 / YDATE TKR UP_DOWN MVG_AVG_SLOPE GAIN1WEEK ---------- ----------- ---------- ------------- ---------- 2011-08-12 BBT slope_down -.01108175 -.06183057 2011-08-12 C slope_down -.01461275 -.10318258 2011-08-12 CAT slope_down -.00918341 -.10956464 2011-08-12 CEO slope_down -.01059212 -.01888646 2011-08-12 DNDN slope_down -.13211186 .162970106 2011-08-12 DTV slope_down -.00966884 -.03253346 2011-08-12 STT slope_down -.0130719 -.07987313 2011-08-12 TKR slope_down -.0125103 -.13057158 2011-08-15 BBT slope_down -.00928111 -.09532798 2011-08-15 CAT slope_down -.00913606 -.12564299 2011-08-15 DNDN slope_down -.09843779 -.06007905 2011-08-15 DTV slope_down -.0090005 -.04787962 2011-08-15 STT slope_down -.01227027 -.08837877 2011-08-15 TKR slope_down -.01099275 -.15398366 2011-08-16 BBT slope_down -.00954212 -.04757328 2011-08-16 CAT slope_down -.00983292 -.07129267 2011-08-16 DNDN slope_down -.09163114 -.04361132 2011-08-16 DTV slope_down -.00840081 -.0300478 2011-08-16 STT slope_down -.01150897 -.04394334 2011-08-16 TKR slope_down -.01185786 -.08957184 2011-08-17 BBT slope_down -.00975986 -.02049571 2011-08-17 CAT slope_down -.01195366 -.02555911 2011-08-17 DNDN slope_down -.09296709 -.07040121 2011-08-17 DTV slope_down -.00911123 -.03249097 2011-08-17 MT slope_down -.02060543 -.06615315 2011-08-17 TKR slope_down -.01415185 -.037751 2011-08-17 UA slope_down -.01872876 .063916131 2011-08-18 BBT slope_down -.01405231 .027791814 2011-08-18 C slope_down -.01809115 .066118656 2011-08-18 CAT slope_down -.01541776 -.00096004 2011-08-18 CEO slope_down -.01073965 .01719929 2011-08-18 DNDN slope_down -.10597923 -.01503759 2011-08-18 DTV slope_down -.01089295 -.01919925 2011-08-18 GLD slope_up .00581439 -.0301598 2011-08-18 MET slope_down -.01347737 -.01066834 2011-08-18 MT slope_down -.02665913 -.01186944 2011-08-18 STT slope_down -.01447374 .028160392 2011-08-18 TKR slope_down -.02053852 .016509434 2011-08-18 TM slope_down -.00848986 -.0041917 2011-08-18 UA slope_down -.02163029 .067705535 2011-08-18 VLO slope_down -.01587302 .049325026 2011-08-19 BBT slope_down -.01734747 .07628438 2011-08-19 C slope_down -.02408524 .114680613 2011-08-19 CAT slope_down -.01883444 .064899337 2011-08-19 DNDN slope_down -.1068862 -.01575456 2011-08-19 DTV slope_down -.01144765 .013832578 2011-08-19 FSLR slope_down -.01961037 .135656292 2011-08-19 GLD slope_up .006647349 -.01378161 2011-08-19 GOOG slope_down -.01125874 .073209484 2011-08-19 IYR slope_down -.00859102 .031326665 2011-08-19 MET slope_down -.01668885 .010600707 2011-08-19 MT slope_down -.03240335 .031186094 2011-08-19 STT slope_down -.01720613 .066750235 2011-08-19 TKR slope_down -.02224424 .087454765 2011-08-19 TM slope_down -.00994814 .013437058 2011-08-19 UA slope_down -.0249373 .27889352 2011-08-19 VLO slope_down -.02026719 .108926674 2011-08-22 BBT slope_down -.01741312 .134063641 2011-08-22 C slope_down -.02592917 .200690714 2011-08-22 CAT slope_down -.01505642 .103392164 2011-08-22 CEO slope_down -.01138851 .088313451 2011-08-22 DNDN slope_down -.10897513 .030277544 2011-08-22 DTV slope_down -.01207581 .043342912 2011-08-22 FSLR slope_down -.01977521 .16659065 2011-08-22 GLD slope_up .007344462 -.0579663 2011-08-22 GOOG slope_down -.01147629 .082120561 2011-08-22 GS slope_down -.01295522 .089744797 2011-08-22 IYR slope_down -.00877217 .062158599 2011-08-22 MET slope_down -.01742084 .089986911 2011-08-22 MT slope_down -.03216845 .080859775 2011-08-22 RIG slope_down -.01325736 .110193392 2011-08-22 STT slope_down -.01626272 .112055398 2011-08-22 TKR slope_down -.02210628 .150618775 2011-08-22 TM slope_down -.00976818 .022115112 2011-08-22 UA slope_down -.02013965 .252187891 2011-08-22 VLO slope_down -.01938452 .145723337 2011-08-22 WDC slope_down -.02071096 .098076198 2011-08-23 BBT slope_down -.01508817 .095358224 2011-08-23 C slope_down -.02192707 .13250366 2011-08-23 CAT slope_down -.01301518 .082550012 2011-08-23 DNDN slope_down -.09832381 -.032 2011-08-23 DTV slope_down -.01011388 .027458343 2011-08-23 GLD slope_up .005448839 .008048629 2011-08-23 GS slope_down -.01336408 .081096302 2011-08-23 MET slope_down -.0145281 .042680999 2011-08-23 MT slope_down -.02934607 .066866267 2011-08-23 STT slope_down -.01376589 .061384941 2011-08-23 TKR slope_down -.01860248 .098095788 2011-08-23 TM slope_down -.00735064 -.0266191 2011-08-23 VLO slope_down -.01535259 .092896175 2011-08-24 BBT slope_down -.01337041 .084671533 2011-08-24 C slope_down -.01884677 .091388401 2011-08-24 CAT slope_down -.01104048 .06557377 2011-08-24 DNDN slope_down -.10020164 0 2011-08-24 DTV slope_down -.01019456 .025419776 2011-08-24 GS slope_down -.01178136 .056737589 2011-08-24 MET slope_down -.01241601 .030674847 2011-08-24 MT slope_down -.02596983 .065987385 2011-08-24 STT slope_down -.01165325 .042253521 2011-08-24 TKR slope_down -.01481305 .094880356 2011-08-24 TM slope_down -.00794091 -.00332963 2011-08-24 VLO slope_down -.01387143 .110459433 2011-08-25 BBT slope_down -.01247834 .064896755 2011-08-25 CAT slope_down -.01034749 .063663664 2011-08-25 DNDN slope_down -.10307363 .008481764 2011-08-25 DTV slope_down -.0108561 .035330628 2011-08-25 MT slope_down -.02707469 .065065065 2011-08-25 STT slope_down -.0113696 .036022626 2011-08-25 TKR slope_down -.0146807 .091647332 2011-08-25 TM slope_down -.00728283 -.00140311 2011-08-26 BBT slope_down -.01154888 .008196721 2011-08-26 DNDN slope_down -.10093473 -.07666386 2011-08-26 DTV slope_down -.00982402 -.00682192 2011-08-26 MET slope_down -.01263888 -.02193261 2011-08-26 MT slope_down -.02526147 -.01140307 2011-08-26 STT slope_down -.01079962 -.02085781 2011-08-29 BBT slope_down -.00863013 -.03909844 2011-08-29 DNDN slope_down -.09582119 -.11755102 2011-08-29 DTV slope_down -.00777075 -.03465687 2011-08-29 MT slope_down -.02196068 -.11695076 2011-08-30 BBT slope_down -.00855432 .024412713 2011-08-30 DNDN slope_down -.09543487 -.04049587 2011-08-30 DTV slope_down -.00725489 -.02512563 2011-08-30 GLD slope_up .005684507 -.01127862 2011-08-30 TM slope_down -.00726972 -.0153651 2011-08-31 DNDN slope_down -.08802544 -.11400651 2011-09-01 DNDN slope_down -.09591894 -.0412111 2011-09-02 GLD slope_up .00587312 -.03585462 2011-09-06 GLD slope_up .005506522 -.02383816 129 rows selected. Elapsed: 00:00:00.08 04:08:19 SQL> 04:08:19 SQL> SELECT 04:08:19 2 CASE WHEN SIGN(mvg_avg_slope)>0 THEN'slope_up'ELSE'slope_down'END up_down 04:08:19 3 ,AVG(gain1week) 04:08:19 4 ,SUM(gain1week) 04:08:19 5 ,COUNT(gain1week) 04:08:19 6 ,AVG(gain1week)/STDDEV(gain1week) sharpe_ratio 04:08:19 7 FROM batsig3 04:08:19 8 WHERE ABS(mvg_avg_slope)> 2*stddev4tkr 04:08:19 9 -- recent 04:08:19 10 AND ydate > sysdate - 40 04:08:19 11 GROUP BY SIGN(mvg_avg_slope) 04:08:19 12 ORDER BY SIGN(mvg_avg_slope) 04:08:19 13 / UP_DOWN AVG(GAIN1WEEK) SUM(GAIN1WEEK) COUNT(GAIN1WEEK) SHARPE_RATIO ---------- -------------- -------------- ---------------- ------------ slope_down .024641 3.00620196 122 .312304457 slope_up -.02354721 -.16483048 7 -1.1259565 Elapsed: 00:00:00.06 04:08:19 SQL> 04:08:19 SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options oracle@h2:/pt/s/rluck/svmd/bp$ I see in the above output that August 2011 was lucrative but moving average slopes in September were never abnormally far away from the mean and thus no trading signals were generated. As I write this blog entry on September 21, I see that the US Stock Market lost almost 3% today. Perhaps some abnormally negative moving average slopes for some of the tkrs listed above will appear within the next week or two. 2011-08-14 Bob asks about INTCBob,thanks for checking in; I'm doin good. INTC started showing up in the predictions in July. I started collecting info on INTC at 2011-06-08 13:30:00 So, bot4.us now has over 3,600 prediction vectors on INTC. IBM, though, offers over 35,000 prediction vectors. bot4.us started collecting IBM vectors on 2009-10-28 18:10:00 So, I would trust bot4.us predictions on IBM more than I would for INTC. Anyway, I see INTC here: http://bot4.us/us_stk_past/us_stk_past_wk2011_07_18 http://bot4.us/us_stk_past/us_stk_past_wk2011_08_01 The second URL is the week before debt-downgrade-week. bot4.us did a good job during the week of 08_01 on both us stocks and forex. During 08_01, bot4.us issued 63 bearish predictions on INTC. The Sharpe ratio was -1.67 which is acceptable; anything below -3 is excellent (for bearish predictions). For each bearish prediction, the avg price decline for INTC was -$0.39 During 08_01, bot4.us issued no bullish predictions on INTC. During debt-downgrade-week (08_08), bot4.us issued only 1 high confidence prediction for INTC. It was a bearish prediction and it was wrong (a false positive). bot4.us did very poorly predicting stock prices during debt-downgrade-week: http://bot4.us/us_stk_past/us_stk_past_wk2011_08_08 For Forex prices it did ok: http://bot4.us/fx_past/fx_past_wk2011_08_07 Anyway back to INTC, personally, I would not trade INTC at all; it's price is too low. My broker charges me about $0.01 to trade 1 share (if I buy/sell 100 shares). So INTC is about 30 times more expensive to trade than GOOG. --Dan On 8/13/11, Bob wrote: > Hi Dan, > > How are you? It's been a couple of months since we last talked about your > investment app tracking INTC, how's it tracking so far? This is an > interesting period to gauge the accuracy especially given the recent wild > and huge market volatility. > > Bob 2011-08-14 Debt Downgrade Week confounded the DanBotVia e-mail today, A user asked me this:bot4.us is very cool. How is it working out in real world trading? My response: bot4.us did very poorly during debt-downgrade-week. Many us-stock predictions it issued were a "false-positive": http://bot4.us/us_stk_past/us_stk_past_wk2011_08_08 But... The week before it did an acceptable job of sensing the downturn: http://bot4.us/us_stk_past/us_stk_past_wk2011_08_01 And.. I am proud to say that during debt-downgrade-week, bot4.us did an excellent job of predicting Forex prices: http://bot4.us/fx_past/fx_past_wk2011_08_07 2011-07-13 Why use SVM?Finally, I have something to blog about.A user asked me this about my choice of SVM: The question I had for you, when you've got a few minutes, is whether the decision to use the SVM algorithm stemmed from research into the success of various machine learning techniques, or whether your research concluded that this would be an effective technique for predicition in financial markets. Thanks, and I'll be following this project! --JH Here is my response: JH, About my site http://bot4.us, my decision to use SVM is based on its effectiveness with historical data. I've looked at a number of methods/algorithms for generating predictions. For each one, I collected several thousands of predictions on various sets of historical data going as far back as the 80s. Then, I'd use Pearson's Correlation to see the correlation between predictions and gains. So far, SVM has done best. One idea which is worthy of a look is put to use at John F. Ehlers site: http://stockspotter.com A study of his books at Amazon.com suggests he is using signal processing software to consume a time-series of prices and extrapolating the signal to obtain predictions. I took 2 Electrical Engineering signal-processing-classes during my undergrad days at Caltech. I think I could get reacquainted with the math again and then start writing some software. The place for me to start would be here: http://en.wikipedia.org/wiki/Fourier_analysis http://en.wikipedia.org/wiki/Discrete_Fourier_transform http://en.wikipedia.org/wiki/Kalman_filter A side effect of this study might be that I start having dreams where I wake up and I am a Caltech undergrad again and all the final exams are way over my head. The image below shows that the Kalman Filter works better than smoothing for certain types of noisy signals. So it seems plausible to me that I could treat a time-series of prices as a noisy signal that can be filtered and then extrapolated. If this is a valid approach, the validity should be easy to prove or disprove using historical data and Pearson's Correlation calculation. ![]() 2011-06-28 What's in a name?My name is Dan.I am the Founder of Bot4.us Using Open Source software and significant amount of my time, I created this site from the ground up. Currently most of my time is spent on software development. Prior to this week, most of my software efforts have been focused on setting up and operating mechanisms which pull data in every 5 minutes and then transform that data into predictions. Starting this week I will spend more of my software efforts on the software which you, the end user, will interact with. Until I have a better idea of how this site will be useful to you, I aim to keep the User Interface (UI) as simple as possible. Currently for the UI I rely on Ruby on Rails and jQuery: http://www.google.com/search?q=Ruby+on+Rails http://www.google.com/search?q=jQuery If you have some ideas about how I can enhance, or fix, the site so it is more helpful to you, please send me an e-mail. Also I am curious about what you think of the name: "Bot4.us" Bot4.us uses machine learning technology (the SVM Algorithm) to issue short-term predictions for both the stock market and the Forex market. What should I call the site? Most of my friends dislike the name "Bot4.us" which suggests that I should pick a better name. I would like to read your opinion, so send me e-mail: Dan@bot4.usIf you add "not-spam" to the Subject, I will assume it is HumanMail, not BotMail. |