Ham Radio Analytics

Discussion in 'Amateur Radio News' started by AB4EJ, Apr 6, 2017.

ad: L-HROutlet
ad: l-rl
ad: Subscribe
ad: L-rfparts
ad: l-gcopper
  1. AB4EJ

    AB4EJ XML Subscriber QRZ Page

    I developed a data collector for DX Cluster spots which shows both current and historical reports (going back over 4 years) in a variety of user-friendly graphics showing trends and global map locations for stations reporting and heard. A website is available so you can use the system in real time: http://dxdisplay.caps.ua.edu
    (Note: if you can't see the attachments, they are also included as links)
    FigureZero.png

    Introductory Picture

    About the System
    If you’re like me, you stay pretty busy most of the time, and you have to fit your ham radio activities into your schedule. Naturally, most of us would like to maximize our operating enjoyment, which means making QSOs rather than fruitlessly searching the bands and sending unanswered CQs. Several years ago, I realized that an aspect of my profession may be able to help me – and others – make the most of our on-air time: analytics. This is often defined as analysis of data with the goal of improved decision-making, through understanding, prediction, and optimization.

    The question was, what data could I collect and use for ham radio analytics? It had to be something that could operate unattended, and be capable of generating a lot of data. The more data you have, the better your analytics, because inaccuracies tend to average out. Some research led to the idea of collecting QSO reports from the world-wide DXCluster network. A “spot” on the DXCluster tells you that one station has heard another one (whether they worked them or not), and by knowing their callsigns, you can tell, or at least approximate, their location. A program can pull the callsigns’ latitudes and longitudes from QRZ.com (or estimate these based on country if not in the QRZ.com database). DXCluster spots also include a frequency, so we know there was a signal path on that frequency between the stations’ locations, and by collecting the data in real time, we know the time of the spot.

    I started collecting spots by checking one or two DXCluster nodes every ten seconds, eliminating duplicates, and collecting the solar flux from the NOAA website. This process is fully automated, and results are stored in a database. A website (http://dxdisplay.caps.ua.edu) lets us view, by band, recent spotting activity and trends, showing band activity as influenced by time of day, season, and solar activity:

    Band Activity Monitor
    Band Trends


    We can search for recent spots of a specific call (a DXpedition, for example), and see on the map where the signal paths are (assuming short versus long path propagation):

    Figure_3.png
    Global Map of QSOs

    The System

    The present data collection system (below) is a Windows service which communicates to one to two DXCluster nodes (usually two so that data collection can continue if one node is down), and stores its data into a database. Data collected can be viewed via the website, or analyzed via direct connection to the database.

    Figure4.png
    System Layout

    Data collection started in September of 2012, so today there are over four full years of data. Now we can use the 11 million (plus) history of DX spots for some serious analytics. The first three illustrations (above) show some quick analyses based on a couple of hours of data. (The web site defaults to show the last two hours of spots, but you can enter any dates back to the start of the data collection). These simple analytics can help you answer questions, such as, “what band is open now?” and “what bands are opening and closing?” to help with your operating activities. The website tools are designed for ease of use. But what if we want to look deeper?

    At 11 million records, this database is moving in the direction of what is called Big Data, which refers to data collections too large (often with high volumes and velocities of various types of input, i.e., the “Three Vs”) to be manipulated with traditional data processing techniques, such as databases. If you use the web site to look at trends going back to its start, it clearly takes more time to analyze and display the information. While companies that specialize in handling big data (Google, for example) have enormous “server farms” with thousands of computers among which they distribute their analyses to give you search results in seconds, we make do here with a relatively small computer and simply wait. While this database has not yet reached “big data” status, that is likely to eventually happen as spots are continuously added, and especially with the addition of more data sources. Possible additions include adding in all the stations heard in dstarusers.org (stations connecting into the worldwide D-STAR network), online propagation data (from beacons and experiments), all fundamental sunspot data available from NOAA, etc. Data collections tend to become more powerful for analytics, as more data and types of data are added. While this is the stuff of future projects, I have been able to use the existing data collection to make several discoveries. It is true that this data collection has some limitations, i.e., it is somewhat incomplete because not every ham reports every contact or station they work or hear, it is DX-centric, and limited internet access prevents entering spots from many locations. When determining station locations, the system sometimes gets these wrong because not every station reports its latitude and longitude to QRZ.com, so the system has to figure out the station’s country based on the call prefix and then guess the latitude and longitude based on the country’s location (inaccurate for geographically large countries). With these and other limitations, is it possible extract anything of use from the entire data collection?

    Absolutely! It is very common for data collections to have limitations like these, so part of the science of analytics is understanding your data well enough to know what it can and cannot tell you. With a large data collection, trends and information tend to emerge with increasing visibility.

    Analyzing Our Data

    Before getting into details on applying this to operating decision-making in the next section, here are some introductory FAQs that show the kinds of questions we can investigate in the data. Given that one is patient enough to sift through these data a bit, it is possible to draw a few important conclusions. While an experienced operator probably has learned some of these things over time, it is instructive to see them borne out through the analysis of actual data. Several different data analysis tools have been used to illustrate different results.


    Some examples:

    Question: How have operating activities been changing over time during the last 4 years? Below, we see the trends (2012-2016) in Solar Flux, QSO distances, and QSOs made (in each case, the average for the month), and a trend line shows the overall direction of the data. This trend line shows what is called a “linear regression,” which can give a general idea of the trend:

    Figure_5.png
    Long Term Trends

    As we know, the Solar Flux is decreasing as Solar Cycle 24 winds down, but interestingly we can observe that while the average distance of QSOs is declining at a sharper rate than the solar flux, the number of QSOs being reported has declined only slightly. So if you are a DXer and you have noticed that you can contact distant stations less often than you could during the peak of Cycle 24, it is certainly not for lack of trying! Also (not shown), an analysis of the bands being used shows that we are shifting to the lower HF bands (longer wavelength), which makes sense because these bands become more useful as we approach solar minimum.

    Question: Who is being reported as a station heard (or worked)? Here is a list showing the most active 20 countries (DX entities, strictly speaking) reported as having been heard, and which countries have been most active in reporting to the DXCluster network:

    Table1.png
    Active Countries

    Here we see that the U.S. and Russia are most active, both in stations worked and spots reported; but we also see for “Others” (bottom line) that many more spots are reported as heard countries than countries reporting; this makes sense, because a lot of rare DX entities, DXpeditions, etc., report few or no spots to the DXCluster network, but tend to be reported by others when they are heard.

    Question: How does solar flux affect HF versus VHF? In the last set of charts, we saw the trends in HF, but what about VHF? An analysis shows that six meters is seasonal, with huge numbers of contacts reported around summer solstice:

    Figure 6_six meters.png
    VHF Trends

    Summer Sporadic E (Es) is a propagation of distinct origin and dominates six meter activity, versus F2 skywave which is of interest to DXers. Spotting data includes both, but Es is likely dominant due to intense operating contest reporting spurts. Es appears to be largely dependent on seasonal effects on the ionosphere that are independent of solar flux, and Es levels have been observed to be essentially unchanged by solar cycle minimum conditions. Another separate analysis (not shown here) shows no clear trend for average distances of 6 meter contacts (propagation guru Carl Luetzelschwab, K9LA, suggested that we may need to “lower our expectations for both HF and VHF in future solar cycles,” [from August 2016 World Wide Radio Operators Foundation sponsored webinar “Solar Topics – Where We’re Headed,” as reported on arrl.org.] but the four years of data in the DX analytics database apparently don’t cover a long enough time yet to see this trend for six meters). Another separate analysis indicates that traffic on two meters (at least what is reported to the DXCluster) is nearly random, with large spikes whenever there is VHF competition.

    Question: What have been the most popular DXing bands 2012 – 2016? By displaying a count of contacts by band over the entire period, it is clear that 20 meters is the champion:

    Figure7_band.png
    Band Utilization 2012-2016

    Question: Within 20 meters, what frequencies have shown the most traffic? On this most popular band, the most spots were reported at 14.070 MHz, with 14.020 MHz the heaviest for CW and 14.240 MHz heaviest for phone. A chart like this is available for every band, see the dxdisplay.caps.ua.edu website:

    Figure_8_20_meter_usage_readable.png
    20 Meter Utilization

    Question: Which stations have been heard most often, and which stations have entered the most spots? Hams love to look at tables of contest scores to see who made the most QSOs, points, and so on; here are a couple of tables showing “Top 20” lists of which stations have been reported as heard most often (“Station Heard”) and which stations have entered the most spots (“Station Reporting”) on the DXCluster (Sept. 2012 through Nov. 2016):

    Station Heard and Number of Times Reported
    W1AW/4 27,564
    W1AW/7 26,716
    W1AW/0 23,282
    W1AW/5 20,016
    W1AW/1 19,816
    K1N 14,895
    S01WS 14,511
    CO8LY 14,004
    FT5ZM 13,953
    3B9FR 13,721
    W1AW/3 13,507
    W1AW/9 12,725
    TR8CA 11,937
    HC2AO 10,296
    VP8LP 10,186
    OD5ZZ 10,151
    J28NC 9,997
    W1AW/8 9,961
    A45XR 9,593
    D3AA 9,480

    Station Reporting and Number of Spots Entered
    W3LPL 153,500
    HA6VH 50,353
    DL4RCK 35,417
    DC3RJ 16,813
    VE2FK 14,930
    N4VN 14,857
    W4EE 14,710
    NW0W 14,311
    KE8M 14,070
    R3RT 13,690
    W4IMD 13,619
    EA2DT 12,033
    I5FLN 11,687
    F8DGY 11,570
    ZL2IFB 11,357
    N7ELL 11,273
    N6QQ 11,098
    G1TDN 10,366
    LY5W 9,924
    WJ2D 9,820


    The Analytics Part: Decision Making

    So much for the statistics, which can cause your eyes to glaze over if you are not interested in the historical information; but how can we use this in our operating? Analytics is concerned with actionable information, i.e., it helps us to make decisions. Some decisions we can make include:

    · What bands are open now, and how are they trending? (http://dxdisplay.caps.ua.edu) – helps me decide what band to get on for DX hunting.

    · What are the signal paths reported in recent DX spots? (http://dxdisplay.caps.ua.edu/Chart1) – helps me decide what parts of the world might be reachable, and perhaps the direction to point my beam.

    · Has anybody heard DX station so-and-so, and what band are they on? (http://dxdisplay.caps.ua.edu/Search) – by using the search feature, I can decide where to hunt for my favorite DXpedition or station.



    Conclusions

    Some folks will feel that using analytics and related tools for ham radio is somehow “cheating” (and it may indeed be, if you plan to submit your contest log in the “unassisted” category), or if you believe amateur radio should be about getting on the air, not just thinking about it and researching it. These are both good points, but the idea of the project is not to substitute these techniques for good old listening on the air, but rather to apply data science to our hobby and educate our ham colleagues in this growing field of information processing (attention, young hams: data science is a hot field leading to many career opportunities). Hopefully, this tool will add more “fun factor” to your ham radio experience by showing where and when to look for QSOs. Ongoing projects include an application that communicates with the database and can notify you by text or email when a desired station comes on the air (as reported in DXCluster or dstarusers.org). Just remember to use what you learn as an aid to better operating and more successful DXing!



    About the Author - Bill Engelke, AB4EJ, was first licensed in 1965. He earned a BS in Electrical Engineering from Virginia Tech and the MS in Industrial and Systems Engineering from the University of Alabama in Huntsville. Bill is semi-retired and works part-time at the University of Alabama in the Center for Advanced Public Safety, a center within the College of Engineering. You can reach Bill at engelke77@bellsouth.net.

    Thanks to Dr. Perry Wheless, K4CWW, for his assistance in editing and correcting this article.
     

    Attached Files:

    KM4LKC, N2NOV, W2NAF and 11 others like this.
  2. AH6BI

    AH6BI Ham Member QRZ Page

    Wow!
    Just... wow!
     
  3. WA7PRC

    WA7PRC Ham Member QRZ Page

    WAAAY TMI. All that matters is right now. Now, if you could predict what/when/where the DX is going to be, you'd REALLY have something. :p
     
  4. K6CLS

    K6CLS Ham Member QRZ Page

    This is great! I'm a numbers guy so had some fun poking around your website.
     
  5. WA6JRW

    WA6JRW Ham Member QRZ Page

    Absolutely fabulous! Thank you for sharing.
     
    KN0DE and KD0TXW like this.
  6. KK5H

    KK5H Ham Member QRZ Page

    Good job, Bill!
    I will treat you to a beer in Dayton!
    tim
     
  7. VE7DXW

    VE7DXW XML Subscriber QRZ Page

    HI Bill and Everybody;

    the data analysis is pretty compelling.

    To make the conclusion that high solar flux creates all the QSOs is a bit risky. There is also a link between people not turning on the radio when the solar flux is low. How is that human factor accounted for?

    All the best and keep up the great stuff you are doing;

    Alex
     
  8. VE7EDT

    VE7EDT Ham Member QRZ Page

    Well the technology is pretty impressive, but here is a radical thought. You could just go "old school", turn your computer off, turn your rig on, tune around and work what you hear.... :)
     
    W4QBQ, WA3VJB, KD8WBQ and 5 others like this.
  9. WB8LBZ

    WB8LBZ Premium Subscriber QRZ Page

    That sums it up...
    I think I need to make that 20 Meter antenna bigger.

    Larry WB8LBZ
    El Paso, TX
     
  10. W1YW

    W1YW Ham Member QRZ Page

    Excellent!

    Say hello to Perry for me too:)

    73
    Chip W1YW
     

Share This Page