PROJECT

F.R.I.D.A.Y Obsidian Dictation

Overview

In an earlier project, I built my own voice-activated smart assistant powered by AI – F.R.I.D.A.Y. In another project, I built an Obsidian knowledgebase with cross-device syncing using Git and GitHub. It’s a natural progression to integrate my Obsidian knowledgebase with F.R.I.D.A.Y.

F.R.I.D.A.Y already has a dictate function that will use vocal recognition to create text from speech, as well as a reformat feature to turn that text into another format. In this project, I will be expanding the dictate function so that it properly formats the text into a nicely-written format and appends that to a select Obsidian markdown file. This is essentially: speaking into a document.

Recoding F.R.I.D.A.Y

Firstly, the automatic splitting of text when “and” is present into multiple commands has to be bypassed because “and” is a super natural thing to say in a normal sentence.  To do this, I altered the process_request code to the following:

Secondly, within the process_request function itself, I wanted to keep the original dictate feature but allow for this new full dictation feature. To do this, I added code to look for “fully” after dictate.

The new full_dictation() function

The new full_dictation() function is fairly light-weight. It simply takes the remainder of the text after “dictate fully”, uses the language tool to clean it up, then passes that to llama with the request to clean it up and format is in a very specific way:

– Replace written punctuation with symbols

– Splitting paragraphs into sentences

– Capitalise the start of sentences

– Don’t return anything except the formatted text

– Add a full stop at the end if there isn’t one

The result is a robust, multi-sentence reformatted dictation with punctuation that looks like it was typed out.

Main Obsidian

I then built into F.R.I.D.A.Y’s process_request() function a section on specific Obsidian commands.

The commands are:

Obsidian load – load an obsidian page into memory (just the file path)

Obsidian unload – unload the current page in memory if there is one

Obsidian append – triggers the full_dictation() function and appends the text to the current page

Obsidian Read – read the contents of the current page

Obsidian examine – proof-read the page and check if anything looks off or lacking context

Right at the top of the script outside of any function I’ve added a variable to hold the current page’s file path.

All following code blocks are nested under:

Obsidian Load & Unload

This option sets the obsidian_page variable to the file path of the markdown file you drag into the terminal. It could be a text file or anything simple, but as I’m using Obsidian it’s going to be markdown.

Obsidian unload simply resets the obsidian_page variable to None.

Obsidian Append

As the name suggests, this feature simply calls the full_dictation() function and opens the file saved in memory and appends the result of the full dictation function to the end of it.

Obsidian Read

Obsidian read is fairly simple in that it just reads the text from the page and vocalises it. The main issue is that the TTS model breaks down after a while so usage can’t be too long, so to get around this, F.R.I.D.A.Y first splits the text by paragraphs (\n\n) and then splits them by sentence (.) and reads the sentences one-by-one.

It also ignores ` and # starting strings in files to prevent it reading code block language and headers. I could change this to say, remove all # from the start of sentences and remove code blocks with additional logic, but the documents that I’ll be using this function on aren’t technical pages, rather more like scripts. The usage this function will be for doesn’t really call for anything more complex, but it’s something to keep in mind for later on, potentially.

Obsidian examine

Finally, the examine option will do what read does, in that it reads the file and splits it into paragraphs. The examine feature then takes paragraphs and sends the text to ollama with the prompt:

“Examine the following text and split the content to examine into sentences and number each sentence according to its place in the paragraph and feed back if any words seem irregular in context to the sentence, if there are any grammar mistakes, if large sentences could be split into multiple

and give a suggestion for each issue using the format: sentence number, sentence, issue, suggestion. Text to examine: “

It then returns in the terminal both the paragraph number and the response.

Summary

This functionality was born from a need, that I had half a book’s worth of quotes to type up and it was taking me ages, and I realised that it takes a lot less time and effort to simply read it out loud. This feature took me a little time to code, but saved me hours of typing.

This was the first expansion of F.R.I.D.A.Y since I finished the first build. I’m very happy with the result and I hope that I continue to build our F.R.I.D.A.Y this year to be even better.