Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
AbstractPhil 
posted an update Sep 11
Post
292
Training and tuning a top 500k geometric vocabulary is doable, but scaling upward is highly impractical for me.

This one has many logistics issues. Primarily, there's no precedent I know of to literally train hundreds of millions of potential character combinations; with their prefabricated variations of crystals to tune a specific series of trajectories in specific directions, based on the input text targeting other crystals, the weights, and the batch. The dataset needs to be properly prepared though, and I can't find any prefabricated variations of this data format that the symbolic lexical engine needs to be robust.
There's a few possibilities for this one. Batch size being an obvious one, where I take a large influx of information in, then grab any matching words, characters, or information and update those using the formulas for topological tuning.
The main issue is the language web is massive. BILLIONS of variations can crop up from a single document if you're not hard capping depth; so if you traverse the whole tree like say - "the quick brown fox", becomes words, becomes definitions, becomes letters - not counting multi-pass finetuning. This alone is a massive logistics nightmare to implement, but thankfully this is the modern era.

Simply put; if I hard cap to 500k vocab with a depth of no more than 50,000 pentachora crystals each, it should be capable of housing the an approximate word structure within a trajectory space.

I'd rather run it on a fleet of devices and feed it the pile, the book corpus, and everything else so we can get some truly trajectory related subsets of 500k+ crystals per token upward to 100,000,000 or so combinations each. The crystals really aren't that big, and they house a massive amount of context.
Even so, there are many logistics nightmares to this, but it's a viable option for training a legitimate similarity-fed BERT or LLAMA meant to specifically form linguistic responses using those crystals as tuning forks for solidity.

There are some saving graces though. You can probably house the entire purpose of a word in a 256d token; but you won't get all of the robust lexical and analytical behavioral responses required from the orthonormalization 5th so it will likely be less accurate than a 512d.
You can get some more utility from upscaling 256 to 512 and you gain some sparsity which allows more growth, with the negative elemental response of sparsity being filled with no meaning - which tends to confuse and build pockets of misrepresentation on projection.
Multiple overlapping projections are the most robust from what I've been observing; where you take the same token and blow it up multiple times for multiple different projection sizes. This has proven invaluable behavioral response from the geometry 4-5 with freeze/unfreeze has shown that all layers can complementarily improve performance - while the final version can be any of them individually requested - as they are all experts on their own plane and the output does not require all of their outputs.
There are many potential variations of the models from these geometries - including 200+ projections implemented on the same model using the same tokens.
Pairs, triplets, quins, and penta word + letter combinations remain uncrystalized and unexplored, but I plan to use the same system to run them.
I'll likely implement a sentencepiece-esque translator that will turn a sentencepiece vocabulary directly into crystal variants with weighting for convenience, which will allow for much more utilizable and easy-to-represent vocabularies for expanding current models.
Wordnet with hard gated non-fabricated tokens has proven the most valuable, however they are still shallow and require full solidification and robustness curation with additional definitions and datasets.
Research is ongoing and many mechanisms still need to be created.

·

Also my apologies for not updating the lattice vocabulary, I've been very swept up in direct testing and implementing models. It's been really fun setting all this stuff up.

The more it works, the more I get excited that the formulas I'm manifesting are cohesive representations of purpose rather than simple random convergence. I've altered them hundreds of times, but the pipeline goal is still present. Unified geometric vocabulary WILL be a universal language, not simply a tinker-toy, but instead a full lexical representation of potential with all manifested trajectory and solidification of grammatical, lexical, symbolic, and representative substructure.

It's at the point where time will tell HOW this system is useful. Even if it can DO ALL THAT, large scale adoption or even minimal scale adoption is up to how robustly useful and how many eyes end up on the topics with technical knowhow. It's already well beyond the IF this system will be useful, which means I feel obligated to at least continue kicking my legs until I get access to a speedboat.

Simply put, I've built this system for the eyes of the technical - with some very direct and representative understanding to the less technical available as well.

In this post