Intro to Rust for JS developers - Part 3

Repo

Welcome back! In our journey to learn Rust, we've already created a program that can read user inputs, validate URL formats, and fetch resources. Now, as our project grows, we need to focus on clean and manageable code organization.

In this third article, we'll delve into Rust's structures like enums, structs, and modules to improve our codebase. An ideal structure would be:

  • A get_article function that takes and url and returns an Article if the url is valid, or a custom error. The signature could be &str -> Result<Article, ValidationError
  • An Article type with url, title and content properties
  • An audiofy function that takes an Article and returns an Audio type

1. Custom validation errors: enum

Let’s start by creating our custom validation errors. To do so an enum would be appropriate. An enum is a custom data type that allows you to define a type by enumerating its possible variants. In javascript we do not have this data structure but it’s common to use objects like this to achieve similar goal:

const Sizes = {
  Small: 'small',
  Medium: 'medium',
  Large: 'large'
};

For our app we need an enum that handles four errors:

enum ValidationError {
    InvalidUrlFormat, // Not a valid URL
    UnreachableResource, // Error status from get request
    ArticleNotFound, // <article> not found
	TitleNotFound // <h1> not found
}

Option Enum

Rust has a built-in enum used extensively for error handling it’s the Option Enum. It’s an elegant solution for handling the presence or absence of values.

Option  encapsulates the idea of optional values in a type-safe way. It has two variants: 

  • Some(T) is used when a value is present, and it wraps a value of type T.
  • None represents the absence of a value.
pub enum Option<T> {
  Some(T),
  None,
}

We will used it as well to handle the absence of value (content) from in the http response.

2. Article type: struct

To build the Article type we will use a struct . A struct is a data structure to group related data together into a single, cohesive unit. It’s used to create custom data types by grouping together related data. In some ways, a struct in Rust is similar to a class in other object-oriented programming languages but there are significant differences in their features and how they are used.

  • Structs in Rust are primarily for data layout, they don't include behaviors (methods), although you can associate functions with them impl blocks.
struct Article {
  url: String,
  title: Option<String>,
  content: Option<String>,
}

impl Article {
    fn new(url: String ) -> Article {
        Article { 
					url: url, 
					title: None,
					content: None
				}
    }
}

new in this example is a conventional name for a function that creates a new instance of a type but not a keyword.

  • Rust structs do not support inheritance. To share functionalities between structs, Rust uses traits. They are similar to interfaces in other languages.
trait Describable {
    fn describe(&self) -> String;
}

impl Describable for Article {
    fn describe(&self) -> String {
        match  &self.title {
            Some(title) => {
                format!("{}", title)
            },
            None => {
                format!("No title found!")
            }
        }
    }
}
  • Last but not least, structs are immutable by default. You have to explicitly use mut keyword to make an instance mutable.
let mut article = Article {
	url: "https://www.sadry.dev/articles/intro-to-rust-for-js-devs-part-2",
	title: None,
	content: None,
}

article.title = Some("title goes here".to_string());

3. Refactoring: custom types

With this Article and ValidationError types, let’s create get_article function. As starting point, it should take an url passed as argument and returns an Article instance if it’s valid, or InvalidUrlFormat error.

fn get_article(url: String) -> Result<Article,ValidationError> {
    if is_valid_url(&url) {
        let article = Article::new(url);
    
        Ok(article)
    } else {
       Err(ValidationError::InvalidUrlFormat)
    }
}

#[tokio::main]
async fn main() -> Result<(), reqwest::Error> {
    let args: Vec<String> = args().collect();
    println!("Audiofy: Transform Your Favorite Articles into a Podcast 🚀");

    for (index, arg) in args.iter().skip(1).enumerate() {
        match get_article(arg) {
            Ok(article) => {
                println!("Article created");
            },
            Err(e) => {
                print!("Error {:?}", e);
            }
        }
    }

    Ok(())
}

This code create two compilation errors:

  • At get_article(arg) call Rust error: expected String but found &String
  • In the last println call: ValidationError doesn't implement Debug... Rust error: ValidationError doesn't implement Debud

What does it mean?

For the first error, the get_article is expecting a String but received a reference to a string &String. Bear with it’s &String not &str. They are both references to strings, but they differ in their nature and usage:

  • &str (String Slice): It is typically used for borrowing a part of a string or a whole string in an efficient way. A string slice has a fixed size and cannot be mutated. Its size is known at compile time.
  • &String (Reference to a String Object): is a growable, heap-allocated data structure. Unlike &str, a String is mutable and its size is not known at compile time. It can expand to hold new data as needed. When you have a &String, you essentially have a pointer to a String object but without ownership. You cannot directly modify the content of the String through this reference. A &String is often used in function arguments to accept a reference to a String object to avoid ownership transfer and unnecessary data cloning.

The Rust compiler is very handy. It suggest a solution : try using a conversion method: .to_string() :

let url = arg.to_string();
match get_article(url){
    ...
} 

The second error ValidationError doesn't implement **Debug** occurs when you try to use functionality that requires the Debug trait, but the type using it ( ValidationErrorin our case) does not have it implemented.

The Debug trait in Rust is a part of the standard library, specifically designed for formatting a value in a way that is suitable for debugging purposes. It's often used for printing values to the console. The Debug trait is implicitly required in several situations. The most common one is when using the {:?} or {:#?} format specifiers in macros like println!, format!, etc.

If you try to print an instance of ValidationError using {:?} without having implemented the Debug trait for ValidationError, Rust will throw a compilation error. This is because Rust cannot infer how to format ValidationError for debugging.

To resolve this error, we need to implement the Debug trait for our ValidationError type. There are two common ways to do this:

  1. Adding #[derive(Debug)] above your enum or struct definition tells the Rust compiler to automatically generate an implementation of the Debug trait for your type. It compiles but does not give a user friendly custom error message:
#[derive(Debug)]
enum ValidationError {
    InvalidUrlFormat,
    UnreachableResource,
    ArticleNotFound,
}
  1. Manually implementing the Debug trait. This approach gives us the flexibility to define exactly how ValidationError should be formatted when printed for debugging purposes (doc):
use std::fmt;

impl fmt::Debug for ValidationError {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        // Custom formatting logic here
				match self {
            ValidationError::InvalidUrlFormat => write!(f, "Invalid URL Format"),
            ValidationError::UnreachableResource => write!(f, "Unreachable Resource"),
						ValidationError::TitleNotFound => write!(f, "Title Not Found"),            
						ValidationError::ArticleNotFound => write!(f, "Article Not Found"),
						
        }
    }
}

Let’s break this code down:

  • f in the signature of the fmt function, fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result, is a mutable reference to a Formatter object. This Formatter is a core component of Rust's formatting system, provided by the standard library, and it's used to manage the output of formatted text.
  • Type &mut fmt::Formatter<'_>:
    • &mut: This indicates that f is a mutable reference. You need a mutable reference because the Formatter is being modified during the formatting process (e.g., writing data into it).
    • fmt::Formatter: This is a struct provided by Rust's standard library (std::fmt) that implements functionality for formatting strings. It handles various formatting tasks like padding, alignment, number formatting, etc.
    • <'_>: This is a lifetime annotation. It signifies that the Formatter reference has a lifetime, but it's being determined implicitly by the Rust compiler. This is often used in trait implementations where lifetimes are involved.
  • The Formatter object, f, is what your implementation of fmt writes the formatted output to. When you use macros like println! or format!, they internally use this Formatter to construct the final string.
  • The fmt function writes to f using methods like write! or write_str. These methods append text to the Formatter, effectively building the output string piece by piece.

Now the code compiles 🤘🏽.

4. Refactoring: modules

Just like in JavaScript, modules in Rust are used to organize code into logical units. In Rust, a module is declared using the mod keyword. This is somewhat like using export in JavaScript to expose functions, classes, or variables. However, in Rust, mod defines a new module.

mod my_module {
    // Functions, structs, enums, etc., go here
} 

A module can be defined within a single file, or it can be split across multiple files. For instance, declaring mod my_module; tells Rust to look for either a my_module.rs file or a my_module/mod.rs file.

By default, all items (functions, structs, enums, etc.) in a Rust module are private. They can only be accessed by code within the same module. To make an item public in Rust, you use the pub keyword.

To use a function or type from another module, you need to bring it into scope. This is akin to using import in JavaScript. In Rust, you use use to bring items into scope.

mod my_module {
    pub fn my_function() {
        // ...
    }
}

use my_module::my_function;

fn main() {
    my_function(); // Now accessible
}

Both use and mod are keywords used for managing modules, but they serve different purposes:

  • The mod keyword is used to declare a new module. This is akin to defining a new namespace or a container for code (like functions, structs, enums, etc.).
  • The use keyword is used to bring items (functions, structs, enums, etc.) into scope. This simplifies access to items from other modules or crates.

Article module

Let’s create a new file called [article.rs](http://article.rs) and move all code related to getting an article, into it:

// article.rs

use scraper::{Html, Selector}; 
use crate::validation::{ValidationError, is_valid_url}; 

// Overkill but good for practice ^^
pub trait Describable {
    fn describe(&self) -> String;
}

fn parse_html(html: &str) -> (String, String) {
    // Parse the HTML
    let document = Html::parse_document(html);
    // Create a CSS selector
    let h1_selector = Selector::parse("h1").unwrap();
    // Get the h1 node content
    let h1 = document.select(&h1_selector).next().unwrap();
    let h1_content = h1.text().collect::<String>();

    // Get the article node content
    let article_selector = Selector::parse("article").unwrap();
    let article = document.select(&article_selector).next().unwrap();
    let article_content = article.text().collect::<String>();

   (
    h1_content,
    article_content
   )
}

async fn fetch_url(url: &str) -> Result<String, ValidationError> {
    println!("⏳ Fetching URL in progress: {}", url);
    
    match reqwest::get(url).await {
        Ok(response) => {
            match response.text().await {
                Ok(html) => {
                   Ok(html)
                }
                _ => {
                    Err(ValidationError::ArticleNotFound)
                }
            }
        }
        _ => {
            Err(ValidationError::UnreachableResource)
        }
    }
}

pub struct Article {
    url: String,
    title: Option<String>,
    content:  Option<String>,
}


impl Article {
	 pub fn new(url: String) -> Article {
        Article {
            url: url,
            title: None,
            content: None,
        }
    }
  
}

impl Describable for Article {
    fn describe(&self) -> String {
        match &self.title {
            Some(title) => {
                format!("{}", title)
            },
            None => {
                format!("No title found!")
            }
        }
    }
}

pub fn get_article(url: String) -> Result<Article, ValidationError> {
    if is_valid_url(&url) {
        let article = Article::new(url);

        Ok(article)
    } else {
        Err(ValidationError::InvalidUrlFormat)
    }
}

Validation module

Let’s create another file for validation: validation.rs

// validation.rs

use std::fmt;
use url::Url;

// #[derive(Debug)]
pub enum ValidationError {
    InvalidUrlFormat,
    UnreachableResource,
    ArticleNotFound,
}

impl fmt::Debug for ValidationError {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        // Custom formatting logic here
		match self {
            ValidationError::InvalidUrlFormat => write!(f, "Invalid URL Format"),
            ValidationError::UnreachableResource => write!(f, "Unreachable Resource"),
            ValidationError::ArticleNotFound => write!(f, "Article Not Found"),
        }
    }
}

pub fn is_valid_url(url: &str) -> bool {
    let result = Url::parse(url);
    result.is_ok() // Without ; at end, the value is returned automatically. It's a shorthand for return result.is_ok();
}

Attention:

It's important to remember that everything is private by default. This means that when you move code to a different module, any structs, enums or functions, used in different modules other than the one where they are defined need to be explicitly made public. You can achieve this by using the pub keyword.

The second point, declaring modules with mod article; and mod validation; in the main file, is insufficient for bringing them into scope. To effectively utilize these modules, you must explicitly “import” them using use keyword. The main file becomes:

// main.rs

mod article;
mod validation;

use article::{get_article, Describable};
use reqwest;
use std::env::args;
use tokio;

#[tokio::main]
async fn main() -> Result<(), reqwest::Error> {
    let args: Vec<String> = args().collect();
    println!("Audiofy: Transform Your Favorite Articles into a Podcast 🚀");

    for (index, arg) in args.iter().skip(1).enumerate() {
        let url = arg.to_string();
        match get_article(url) {
            Ok(article) => {
                println!("Article created");
            }
            Err(e) => {
                print!("Error {:?}", e);
            }
        }
    }

    Ok(())
}

Great! Let’ run cargo fmt to make sure our code is nicely formatted.

Fetching:

The new method in Article doesn’t do much at the moment. It creates an instance with an url property, that is all. The goal here is to fetch the url and parse the response to get the title and the article.

First we need to update the fetch_url to return a Resutl<String, ValidationError> and remove the parsing from it:

async fn fetch_url(url: &str) -> Result<String, ValidationError>{
    println!("Fetching URL {} in progress...", url);
    let response = reqwest::get(url).await;

    match response {
        Ok(res) => {
            let url_body = res.text().await;
            match url_body {
                Ok(html) => {
                    println!("✅ Article fetched!");
                    Ok(html)
                }
                Err(_) => {
                    Err(ValidationError::UnreachableResource)
                }
            }
        }
        Err(_) => {
            Err(ValidationError::UnreachableResource)
        }
    }
}

Parsing

The parsing function should take html returned from http request and get the title and the article’s content. If something goes wrong it should return and error:

fn parse_html(html: &str) -> (String, String) {
   let document = Html::parse_document(html);

   let h1_selector = Selector::parse("h1").unwrap();
   let h1 = document.select(&h1_selector).next().unwrap();
   let h1_content = h1.text().collect::<String>();

   let article_selector = Selector::parse("article").unwrap();
   let article = document.select(&article_selector).next().unwrap();
   let article_content = article.text().collect::<String>();

  (h1_content, article_content)
}

Instantiating

Let’ update the new method to instantiate an article with fetched data:

async fn new(url: String) -> Result<Article, ValidationError> {
        let mut article = Article {
            url,
            title: None,
            content: None,
        };

        let fetched_article = fetch_url(&article.url).await?;
        let (title, content) = parse_html(&fetched_article);
        article.title = Some(title);
        article.content = Some(content);

        return Ok(article);
 }

5. Parsing errors: combinators

Have you noticed that the parse_html method contains several unwrap calls? This approach is not ideal as it implies that potential errors in those lines of code are not being handled. Instead, they will lead to a panic in the program.

fn parse_html(html: &str) -> (String, String) {
    let document = Html::parse_document(html);

    let h1_selector = Selector::parse("h1").unwrap();
    let h1 = document.select(&h1_selector).next().unwrap();
    let h1_content = h1.text().collect::<String>();

    let article_selector = Selector::parse("article").unwrap();
    let article = document.select(&article_selector).next().unwrap();
    let article_content = article.text().collect::<String>();

   (h1_content, article_content)
}

To effectively handle these errors, we'll utilize two Rust helper functions known as combinators: .map_err and .ok_or. These functions provide a more graceful way to manage potential errors (doc). Both .map_err and .ok_or are methods used for error handling, but they serve different purposes and are used in different contexts:

  • .map_err is used with Result<T, E> types. It's a method that takes a closure and applies it to the error (E) part of a Result. If the Result is Ok, nothing happens. If it's Err, the closure is applied to transform the error into a different type.
  • .ok_or is used with Option<T> types. It's a method that enables you to convert an Option into a Result. You provide a default error value that is used if the Option is None.
fn parse_html(html: &str) -> Result<(String, String), ValidationError> {
    let document = Html::parse_document(html);

    let h1_selector = Selector::parse("h1").map_err(|_| ValidationError::TitleNotFound)?;
    let h1 = document.select(&h1_selector).next().ok_or(ValidationError::TitleNotFound)?;
    let h1_content = h1.text().collect::<String>();

    let article_selector = Selector::parse("article").map_err(|_| ValidationError::ArticleNotFound)?;
    let article = document.select(&article_selector).next().ok_or(ValidationError::ArticleNotFound)?;
    let article_content = article.text().collect::<String>();

    Ok((h1_content, article_content))
}

In the new method of the Article struct, we modify the call to parse_html as follows:

let (title, content) = parse_html(&fetched_article)?;

The question mark (?) at the end of this line is crucial. It indicates that if parse_html returns an Ok value, then title and content will be assigned accordingly. However, if it results in an Err, the error (in this case, a ValidationError) will be automatically propagated to the caller.

The last point to address is the call to get_article in the main function. Since it's now a Future, it must be awaited. This ensures that the asynchronous operation completes before proceeding with the rest of the program.

match get_article(url).await {
            Ok(article) => {
                println!("Article created");
            }
            Err(e) => {
                print!("Error {:?}", e);
            }
        }

Great! Let’s try this out:

// With invalid url
cargo run foo

// Output
Audiofy: Transform your favorites articles to a podcast 🚀
Error Invalid URL Format%

// With valid url
cargo run https://www.sadry.dev/articles/intro-to-rust-for-js-devs-part-2

// Output
Fetching URL https://www.sadry.dev/articles/intro-to-rust-for-js-devs-part-2 in progress...
✅ Article fetched!
Article created

6. Console messages: raw string

Let’s enhance our app by display some nice feedback messages to the user. First let’s make the title stands out by using raw string. A raw string is a type of string literal that allows you to include characters that would normally need to be escaped in a regular string.

let app_title = r#"
     ************************************************************************
     *                                                                      *
     *      Audiofy: Transform Your Favorite Articles into a Podcast 🚀     *
     *                                                                      *
     ************************************************************************
    "#;
 println!("{}",app_title);

And also we should inform the user if no argument was supplied

let args: Vec<String> = env::args().collect();
if args.len() <= 1 { // 1 and not O because the first arg is the path to the file
  println!("🚫 No arguments were supplied!");
}

Let’s remove the logs from fetch_url function and put them in the main call as follows:

mod article;
mod validation;

use article::{get_article, Describable};
use reqwest;
use std::env;
use tokio;

#[tokio::main]
async fn main() -> Result<(), reqwest::Error> {

    let app_title = r#"
    ************************************************************************
    *                                                                      *
    *      Audiofy: Transform Your Favorite Articles into a Podcast 🚀     *
    *                                                                      *
    ************************************************************************
   "#;
    println!("{}",app_title);


    let args: Vec<String> = env::args().collect();
    // 1 and not O because the first arg is the path to the file
    if args.len() <= 1 { 
        println!("🚫 No arguments were supplied!");
    }

    for (index, arg) in args.iter().skip(1).enumerate() {
        let url = arg.to_string();
        match get_article(url).await {
            Ok(article) => {
                println!("✅ Article fetched!");
                println!("⏩ Title: {}", article.describe());
                println!("🎤 Audiofy...");
            }
            Err(e) => {
                print!("❌ Failed to process argument at index {}: {:?}", index, e);
            }
        }
    }

    Ok(())
}

7. Let’s try this out

With a valid url

cargo run https://www.sadry.dev/articles/intro-to-rust-for-js-devs-part-2

audiofy: test with a valid URL

With a valid url that has no content

cargo run https://www.google.com

audiofy: test with a valid URL but without content

With invalid url

cargo run foo

audiofy: test with invalid URL

With no arguments

cargo run

audiofy: test without arguments

8. To be continued

Fantastic results! I'm very pleased with what we've achieved so far. We can now process URLs provided by users and successfully retrieve content. There's just one more step remaining: converting the article content into an audio file using the OpenAI API.