View Full Version : Phrases database for digital modes - open, publically available
KE7HQY
05-31-2008, 10:36 PM
Just finished reading the announcement about WSPR (http://forums.qrz.com/showthread.php?t=164389) QSO software and its "short QSO" format. The yahoo group announcement included this about the new mode:
The WSPR message "payload" is 50 information bits per transmission
A digital database of key phrases would make a LOT of sense with so little information space. Each "phrase" would have its own unique identifier code. You can cram 36 possibilities in just one character alone, with a-z and 0-9 being available. That means that you can have over 78 BILLION possible messages with just *7* characters. And the number of possible messages goes up exponentially with every character added (36^N. Where N is the number of characters).
This is basically just a simple compression scheme. However, it is powerful because most people use a LOT of repetition in everyday conversations.
"How you doing"
"whats your name"
"how's the YL doing"
"What kind of work do you do"
"Hows the traffic on highway 35"
You could capitalize on this repetition and compress common phrases down to a 2-3character code.
A more advanced way of doing this would be to adapt to the language of grammar, and have on the fly encoding for things like common verbs, pronouns, adjectives, etc. That way, you wouldn't have to hunt and peck for phrases but simply type them out.
The database would also be free and openly available, so it would comply with part 97.309(a)(4) (http://www.arrl.org/FandES/field/regulations/news/part97/onepage.html#309)
Also, the database would have separate sections for languages other than english. Chinese, French, etc.
In that case, there would be a "language code" that would denote which language database was being used.
So, an example would be:
Como Estas? Estoy bien, gracias.
converts to
2x6
where 2 is the language code (Spanish), and the x6 is the phrase code. The language code would always be the first character, and could hold up to 36 languages.
My question is, has anything similar to this been done?
My question is, has anything similar to this been done?
Yep - Q signals, Prosigns - for formal messages, the ARRL has numbered radiograms.
73 de Joseph Durnal NE3R
KE7HQY
06-01-2008, 12:43 AM
Nothing beats Q-calls at what they're intended for. They're quick and (mostly) universally known.
I'm looking to topics they don't cover, not replace Q-calls/prosigns. With a software database, this could be done for thousands of common phrases. Just look up the phrase and send off the code. Also, you could use spell checking type software to find matches to typos, similar meanings, etc.
EDIT:
An example could be:
"I am running a Kenwood TS-2000 with a 5 element yagi at 65ft with 125ft of RG-8X"
"I am running" is a common phrase --> "1xb"
"Kenwood TS-2000" ---> "56f"
"5 element yagi" --->u8d"
"at 65ft" --->"1hd"
"125ft"--->"6jt"
"of RG-8X" ---->jb4"
So rather than sending
I am running a Kenwood TS-2000 with a 5 element yagi at 65ft with 125ft of RG-8X
it would send
1xb 56f u8d 1hd 6jt jb4
Even if you try and condense it with phonetics:
KNWD TS2000 5el yagi 65ft up 125ft RG8X
that is 32 characters, not counting spaces. The condensed code is 18 characters (also not counting spaces). If you had a software decoding/encoding that, you could speed stuff up considerably
KE7HQY
06-01-2008, 01:36 AM
While skimming the original message (http://groups.yahoo.com/group/wsjtgroup/message/4799), I somehow missed this:
Space has been reserved in the WSPR protocol for many more
"canned" or "partially canned" messages like those in the
final group of templates. I hereby solicit suggestions for
messages that might be included in this group. Note that
the variable information to be inserted in a given message
type should be no more than one, two, or possibly three
numbers or words. Please help me to populate the list
of message types in the most useful way.
I am glad to see this kind of thing is already underway.
KG4RUL
06-01-2008, 10:39 AM
There is nothing new under the sun.
The Circuit Mayflower (Shipboard) System is installed on a variety of U.S. Navy ships to provide a one-way-ship-to-shore HF radio link. It consists of an AN/BRT-2 system which has a KY-766A Keyer, TS-3858 silent tuner, modified AN/URT-23, AN/UGC-136CX teleprinter, and AN/USM-488 oscilloscope.
The teleprinter is used to control the Keyer and to input various strings of coded messages from a classified codebook. There is a limitation on the number of codes allowed in each transmission so MUCH effort is put into finding suitable codes to convey a given message.
This was cumbersome at best with a limited codebook and, without major automation of the process, will quickly become impossible as the WSPR code book expands.
W3MIV
06-01-2008, 10:50 AM
The idea predates radio by a good bit. Flag hoists are one such example. Lord Nelson famously failed to read one such "canned message" by holding his spy glass to his blind eye during the Battle of Denmark.
KE7HQY
06-01-2008, 04:11 PM
This was cumbersome at best with a limited codebook and, without major automation of the process, will quickly become impossible as the WSPR code book expands.
This is where spell checking type software comes in. It can match approximate answers to a canned message, and as long as the operator is given a certain set of conditions (no fancy dancy english, etc)., it should be able to recognize segments of a message and match it to a list of canned messages. Its already done on the fly in several word processors for sentence structure, grammar and spelling, and there's an open source project called "Link Grammar" that does exactly this (currently maintained by Abiword developers).
Another way you could do this is have what WSPR developers are already starting to do, and that is to make certain templates. That way, the person can pick a certain template for a certain subject and just fill in the different answers. Not as compressed as an "indexed canned message" like above, but still highly compressed.
Nothing beats Q-calls at what they're intended for. They're quick and (mostly) universally known.
I'm looking to topics they don't cover, not replace Q-calls/prosigns. With a software database, this could be done for thousands of common phrases. Just look up the phrase and send off the code. Also, you could use spell checking type software to find matches to typos, similar meanings, etc.
EDIT:
An example could be:
"I am running a Kenwood TS-2000 with a 5 element yagi at 65ft with 125ft of RG-8X"
"I am running" is a common phrase --> "1xb"
"Kenwood TS-2000" ---> "56f"
"5 element yagi" --->u8d"
"at 65ft" --->"1hd"
"125ft"--->"6jt"
"of RG-8X" ---->jb4"
So rather than sending
it would send
Even if you try and condense it with phonetics:
that is 32 characters, not counting spaces. The condensed code is 18 characters (also not counting spaces). If you had a software decoding/encoding that, you could speed stuff up considerably
Wow, maybe we could condense it all down into long and short bits you could send with some kind of switch with a handle....Hmmmmm ?????????/:D
:confused:
WA6MHZ
06-01-2008, 05:14 PM
Thats what all the Tweens and Teens are doing in Text Messaging. Need a glossary to figure out all the meanings of the letters, but the one I DO know is WTF?. That is probably the best known and it announces an expression of Disbelief, extreme curiosity and a stunning shocker when encountered!
Here is a sample glossary as found on the NET.
http://www.mantex.co.uk/samples/texting.htmL
kids know them by heart!
WA9SVD
06-01-2008, 06:35 PM
But IS there any real advantage?
Which takes longer? Looking up a "code" that means "MY RADIO IS..." or just typing it?
As the database gets bigger and bigger, it will become more and more cumbersome, and less and less useful.
KE7HQY
06-01-2008, 07:47 PM
But IS there any real advantage?
Which takes longer? Looking up a "code" that means "MY RADIO IS..." or just typing it?
As the database gets bigger and bigger, it will become more and more cumbersome, and less and less useful.
Not with an autocomplete/spellchecking type setup. IF you have a sufficiently large database (millions and millions of phrases, etc.), you could just use a search algorithm that searches for what you're typing, finds the best match and encodes that.
All in all, you'd just end up typing what you want, and the computer program would do the rest. When you're done composing your message, the computer would come back with what it thinks you mean, and it would give you a small dropdown box with a few potential options if its incorrect. These kinds of algorithms are very advanced and are already built into open source projects (see Abiword as an example). The trick would just be customizing the algorithm for phrases rather than words as you have in spellcheckers.
You'd have to limit it to simple English (no antidisestablishmentarianism, etc.), but it should work just fine.
AB1HH
06-01-2008, 08:41 PM
What if there were a few single bit errors?
So rather than sending
Quote:
I am running a Kenwood TS-2000 with a 5 element yagi at 65ft with 125ft of RG-8X
it would try to send
Quote:
1xb 56f u8d 1hd 6jt jb4
but this would be received as
1xc 55f 48d 6ju ib3
Which would translate to
I am running away with your wife, who says you smell worse than horses in heat whos
stalls haven't been cleaned in 65 days.
KE7HQY
06-01-2008, 08:51 PM
If you're getting bit errors, you might get some other guy's callsign rather than the callsign of the guy your talking to. :p
The entire QSO is relying on the error correction built into the digital mode. For the new WSPR mode (which think kind of indexing thing is really for), there is a LOT of error correction going on. If you're just doing raw RTTY with no correction, you might have issues. I just hope its garbage then that's coming back and not some mixup like the one you described.