I developed a data collector for DX Cluster spots which shows both current and historical reports (going back over 4 years) in a variety of user-friendly graphics showing trends and global map locations for stations reporting and heard. A website is available so you can use the system in real time: http://dxdisplay.caps.ua.edu (Note: if you can't see the attachments, they are also included as links) Introductory Picture About the System If you’re like me, you stay pretty busy most of the time, and you have to fit your ham radio activities into your schedule. Naturally, most of us would like to maximize our operating enjoyment, which means making QSOs rather than fruitlessly searching the bands and sending unanswered CQs. Several years ago, I realized that an aspect of my profession may be able to help me – and others – make the most of our on-air time: analytics. This is often defined as analysis of data with the goal of improved decision-making, through understanding, prediction, and optimization. The question was, what data could I collect and use for ham radio analytics? It had to be something that could operate unattended, and be capable of generating a lot of data. The more data you have, the better your analytics, because inaccuracies tend to average out. Some research led to the idea of collecting QSO reports from the world-wide DXCluster network. A “spot” on the DXCluster tells you that one station has heard another one (whether they worked them or not), and by knowing their callsigns, you can tell, or at least approximate, their location. A program can pull the callsigns’ latitudes and longitudes from QRZ.com (or estimate these based on country if not in the QRZ.com database). DXCluster spots also include a frequency, so we know there was a signal path on that frequency between the stations’ locations, and by collecting the data in real time, we know the time of the spot. I started collecting spots by checking one or two DXCluster nodes every ten seconds, eliminating duplicates, and collecting the solar flux from the NOAA website. This process is fully automated, and results are stored in a database. A website (http://dxdisplay.caps.ua.edu) lets us view, by band, recent spotting activity and trends, showing band activity as influenced by time of day, season, and solar activity: Band Activity Monitor Band Trends We can search for recent spots of a specific call (a DXpedition, for example), and see on the map where the signal paths are (assuming short versus long path propagation): Global Map of QSOs The System The present data collection system (below) is a Windows service which communicates to one to two DXCluster nodes (usually two so that data collection can continue if one node is down), and stores its data into a database. Data collected can be viewed via the website, or analyzed via direct connection to the database. System Layout Data collection started in September of 2012, so today there are over four full years of data. Now we can use the 11 million (plus) history of DX spots for some serious analytics. The first three illustrations (above) show some quick analyses based on a couple of hours of data. (The web site defaults to show the last two hours of spots, but you can enter any dates back to the start of the data collection). These simple analytics can help you answer questions, such as, “what band is open now?” and “what bands are opening and closing?” to help with your operating activities. The website tools are designed for ease of use. But what if we want to look deeper? At 11 million records, this database is moving in the direction of what is called Big Data, which refers to data collections too large (often with high volumes and velocities of various types of input, i.e., the “Three Vs”) to be manipulated with traditional data processing techniques, such as databases. If you use the web site to look at trends going back to its start, it clearly takes more time to analyze and display the information. While companies that specialize in handling big data (Google, for example) have enormous “server farms” with thousands of computers among which they distribute their analyses to give you search results in seconds, we make do here with a relatively small computer and simply wait. While this database has not yet reached “big data” status, that is likely to eventually happen as spots are continuously added, and especially with the addition of more data sources. Possible additions include adding in all the stations heard in dstarusers.org (stations connecting into the worldwide D-STAR network), online propagation data (from beacons and experiments), all fundamental sunspot data available from NOAA, etc. Data collections tend to become more powerful for analytics, as more data and types of data are added. While this is the stuff of future projects, I have been able to use the existing data collection to make several discoveries. It is true that this data collection has some limitations, i.e., it is somewhat incomplete because not every ham reports every contact or station they work or hear, it is DX-centric, and limited internet access prevents entering spots from many locations. When determining station locations, the system sometimes gets these wrong because not every station reports its latitude and longitude to QRZ.com, so the system has to figure out the station’s country based on the call prefix and then guess the latitude and longitude based on the country’s location (inaccurate for geographically large countries). With these and other limitations, is it possible extract anything of use from the entire data collection? Absolutely! It is very common for data collections to have limitations like these, so part of the science of analytics is understanding your data well enough to know what it can and cannot tell you. With a large data collection, trends and information tend to emerge with increasing visibility. Analyzing Our Data Before getting into details on applying this to operating decision-making in the next section, here are some introductory FAQs that show the kinds of questions we can investigate in the data. Given that one is patient enough to sift through these data a bit, it is possible to draw a few important conclusions. While an experienced operator probably has learned some of these things over time, it is instructive to see them borne out through the analysis of actual data. Several different data analysis tools have been used to illustrate different results. Some examples: Question: How have operating activities been changing over time during the last 4 years? Below, we see the trends (2012-2016) in Solar Flux, QSO distances, and QSOs made (in each case, the average for the month), and a trend line shows the overall direction of the data. This trend line shows what is called a “linear regression,” which can give a general idea of the trend: Long Term Trends As we know, the Solar Flux is decreasing as Solar Cycle 24 winds down, but interestingly we can observe that while the average distance of QSOs is declining at a sharper rate than the solar flux, the number of QSOs being reported has declined only slightly. So if you are a DXer and you have noticed that you can contact distant stations less often than you could during the peak of Cycle 24, it is certainly not for lack of trying! Also (not shown), an analysis of the bands being used shows that we are shifting to the lower HF bands (longer wavelength), which makes sense because these bands become more useful as we approach solar minimum. Question: Who is being reported as a station heard (or worked)? Here is a list showing the most active 20 countries (DX entities, strictly speaking) reported as having been heard, and which countries have been most active in reporting to the DXCluster network: Active Countries Here we see that the U.S. and Russia are most active, both in stations worked and spots reported; but we also see for “Others” (bottom line) that many more spots are reported as heard countries than countries reporting; this makes sense, because a lot of rare DX entities, DXpeditions, etc., report few or no spots to the DXCluster network, but tend to be reported by others when they are heard. Question: How does solar flux affect HF versus VHF? In the last set of charts, we saw the trends in HF, but what about VHF? An analysis shows that six meters is seasonal, with huge numbers of contacts reported around summer solstice: VHF Trends Summer Sporadic E (Es) is a propagation of distinct origin and dominates six meter activity, versus F2 skywave which is of interest to DXers. Spotting data includes both, but Es is likely dominant due to intense operating contest reporting spurts. Es appears to be largely dependent on seasonal effects on the ionosphere that are independent of solar flux, and Es levels have been observed to be essentially unchanged by solar cycle minimum conditions. Another separate analysis (not shown here) shows no clear trend for average distances of 6 meter contacts (propagation guru Carl Luetzelschwab, K9LA, suggested that we may need to “lower our expectations for both HF and VHF in future solar cycles,” [from August 2016 World Wide Radio Operators Foundation sponsored webinar “Solar Topics – Where We’re Headed,” as reported on arrl.org.] but the four years of data in the DX analytics database apparently don’t cover a long enough time yet to see this trend for six meters). Another separate analysis indicates that traffic on two meters (at least what is reported to the DXCluster) is nearly random, with large spikes whenever there is VHF competition. Question: What have been the most popular DXing bands 2012 – 2016? By displaying a count of contacts by band over the entire period, it is clear that 20 meters is the champion: Band Utilization 2012-2016 Question: Within 20 meters, what frequencies have shown the most traffic? On this most popular band, the most spots were reported at 14.070 MHz, with 14.020 MHz the heaviest for CW and 14.240 MHz heaviest for phone. A chart like this is available for every band, see the dxdisplay.caps.ua.edu website: 20 Meter Utilization Question: Which stations have been heard most often, and which stations have entered the most spots? Hams love to look at tables of contest scores to see who made the most QSOs, points, and so on; here are a couple of tables showing “Top 20” lists of which stations have been reported as heard most often (“Station Heard”) and which stations have entered the most spots (“Station Reporting”) on the DXCluster (Sept. 2012 through Nov. 2016): Station Heard and Number of Times Reported W1AW/4 27,564 W1AW/7 26,716 W1AW/0 23,282 W1AW/5 20,016 W1AW/1 19,816 K1N 14,895 S01WS 14,511 CO8LY 14,004 FT5ZM 13,953 3B9FR 13,721 W1AW/3 13,507 W1AW/9 12,725 TR8CA 11,937 HC2AO 10,296 VP8LP 10,186 OD5ZZ 10,151 J28NC 9,997 W1AW/8 9,961 A45XR 9,593 D3AA 9,480 Station Reporting and Number of Spots Entered W3LPL 153,500 HA6VH 50,353 DL4RCK 35,417 DC3RJ 16,813 VE2FK 14,930 N4VN 14,857 W4EE 14,710 NW0W 14,311 KE8M 14,070 R3RT 13,690 W4IMD 13,619 EA2DT 12,033 I5FLN 11,687 F8DGY 11,570 ZL2IFB 11,357 N7ELL 11,273 N6QQ 11,098 G1TDN 10,366 LY5W 9,924 WJ2D 9,820 The Analytics Part: Decision Making So much for the statistics, which can cause your eyes to glaze over if you are not interested in the historical information; but how can we use this in our operating? Analytics is concerned with actionable information, i.e., it helps us to make decisions. Some decisions we can make include: · What bands are open now, and how are they trending? (http://dxdisplay.caps.ua.edu) – helps me decide what band to get on for DX hunting. · What are the signal paths reported in recent DX spots? (http://dxdisplay.caps.ua.edu/Chart1) – helps me decide what parts of the world might be reachable, and perhaps the direction to point my beam. · Has anybody heard DX station so-and-so, and what band are they on? (http://dxdisplay.caps.ua.edu/Search) – by using the search feature, I can decide where to hunt for my favorite DXpedition or station. Conclusions Some folks will feel that using analytics and related tools for ham radio is somehow “cheating” (and it may indeed be, if you plan to submit your contest log in the “unassisted” category), or if you believe amateur radio should be about getting on the air, not just thinking about it and researching it. These are both good points, but the idea of the project is not to substitute these techniques for good old listening on the air, but rather to apply data science to our hobby and educate our ham colleagues in this growing field of information processing (attention, young hams: data science is a hot field leading to many career opportunities). Hopefully, this tool will add more “fun factor” to your ham radio experience by showing where and when to look for QSOs. Ongoing projects include an application that communicates with the database and can notify you by text or email when a desired station comes on the air (as reported in DXCluster or dstarusers.org). Just remember to use what you learn as an aid to better operating and more successful DXing! About the Author - Bill Engelke, AB4EJ, was first licensed in 1965. He earned a BS in Electrical Engineering from Virginia Tech and the MS in Industrial and Systems Engineering from the University of Alabama in Huntsville. Bill is semi-retired and works part-time at the University of Alabama in the Center for Advanced Public Safety, a center within the College of Engineering. You can reach Bill at firstname.lastname@example.org. Thanks to Dr. Perry Wheless, K4CWW, for his assistance in editing and correcting this article.