Possibility to Display Most used Words?

For my blog i want to determine which words are the most used ones over a period of time and from all posts.
I was searching for a plugin but all i got were plugins for overall stats.

1 Answer
1

This answer is just to put down a kind of roadmap. I think the question touches some machine learning concepts, you used the right “statistics” tag. You need to build a dictionary plugin learning from your new posts. Probably doing something like:

  • Manually create a first json filter dataset of most used words in your language (i.g. https://1000mostcommonwords.com/1000-most-common-english-words/). I didn’t find an API for it. It would filter out all the words considered or which you consider irrelevant (like prepositions, pronouns, etc).
  • Write a function which process all the existing posts and export the content of your interest (description, titles, etc) in a second json dataset. You already have the post_metas to exploit, as database source. Remember to assign each content to the post_id because you’ll need to handle updates or removals.
  • Create a function which updates that json on new post updated or published.
  • Define a gate comparing the previous 2 json sources, filtering out words and generating a new final json file to parse. You can use a declarative approach with array_map or array_filter built-in functions.
  • Finally, build a logic to count each word occurrence, store in a new database table and display it in a dashboard page.

I guess the parsing activity will become quite intense after a while, if the blog gets rich of contents and you often update your filter dataset. Let’s have a look also to these library which could help:

  • Machine learning library for php https://php-ml.readthedocs.io/en/latest/
  • Event-driven, non-blocking I/O with PHP https://github.com/reactphp/react

Have fun and please share the outcome.

Leave a Comment