Intro to Rust for JS developers - Part 3
Welcome back! In our journey to learn Rust, we've already created a program that can read user inputs, validate URL formats, and fetch resources. Now, as our project grows, we need to focus on clean and manageable code organization.
In this third article, we'll delve into Rust's structures like enums, structs, and modules to improve our codebase. An ideal structure would be:
- A
get_article
function that takes and url and returns an Article if the url is valid, or a custom error. The signature could be&str -> Result<Article, ValidationError
- An
Article
type with url, title and content properties - An
audiofy
function that takes anArticle
and returns anAudio
type
1. Custom validation errors: enum
Let’s start by creating our custom validation errors. To do so an enum would be appropriate. An enum is a custom data type that allows you to define a type by enumerating its possible variants. In javascript we do not have this data structure but it’s common to use objects like this to achieve similar goal:
const Sizes = {
Small: 'small',
Medium: 'medium',
Large: 'large'
};
For our app we need an enum that handles four errors:
enum ValidationError {
InvalidUrlFormat, // Not a valid URL
UnreachableResource, // Error status from get request
ArticleNotFound, // <article> not found
TitleNotFound // <h1> not found
}
Option Enum
Rust has a built-in enum
used extensively for error handling it’s the Option
Enum. It’s an elegant solution for handling the presence or absence of values.
Option
encapsulates the idea of optional values in a type-safe way. It has two variants:
Some(T)
is used when a value is present, and it wraps a value of typeT
.None
represents the absence of a value.
pub enum Option<T> {
Some(T),
None,
}
We will used it as well to handle the absence of value (content) from in the http response.
2. Article type: struct
To build the Article type we will use a struct . A struct is a data structure to group related data together into a single, cohesive unit. It’s used to create custom data types by grouping together related data. In some ways, a struct in Rust is similar to a class in other object-oriented programming languages but there are significant differences in their features and how they are used.
- Structs in Rust are primarily for data layout, they don't include behaviors (methods), although you can associate functions with them
impl
blocks.
struct Article {
url: String,
title: Option<String>,
content: Option<String>,
}
impl Article {
fn new(url: String ) -> Article {
Article {
url: url,
title: None,
content: None
}
}
}
new
in this example is a conventional name for a function that creates a new instance of a type but not a keyword.
- Rust structs do not support inheritance. To share functionalities between structs, Rust uses traits. They are similar to interfaces in other languages.
trait Describable {
fn describe(&self) -> String;
}
impl Describable for Article {
fn describe(&self) -> String {
match &self.title {
Some(title) => {
format!("{}", title)
},
None => {
format!("No title found!")
}
}
}
}
- Last but not least, structs are immutable by default. You have to explicitly use mut keyword to make an instance mutable.
let mut article = Article {
url: "https://www.sadry.dev/articles/intro-to-rust-for-js-devs-part-2",
title: None,
content: None,
}
article.title = Some("title goes here".to_string());
3. Refactoring: custom types
With this Article and ValidationError types, let’s create get_article function. As starting point, it should take an url passed as argument and returns an Article instance if it’s valid, or InvalidUrlFormat error.
fn get_article(url: String) -> Result<Article,ValidationError> {
if is_valid_url(&url) {
let article = Article::new(url);
Ok(article)
} else {
Err(ValidationError::InvalidUrlFormat)
}
}
#[tokio::main]
async fn main() -> Result<(), reqwest::Error> {
let args: Vec<String> = args().collect();
println!("Audiofy: Transform Your Favorite Articles into a Podcast 🚀");
for (index, arg) in args.iter().skip(1).enumerate() {
match get_article(arg) {
Ok(article) => {
println!("Article created");
},
Err(e) => {
print!("Error {:?}", e);
}
}
}
Ok(())
}
This code create two compilation errors:
- At get_article(arg) call
- In the last println call: ValidationError doesn't implement Debug...
What does it mean?
For the first error, the get_article
is expecting a String
but received a reference to a string &String
. Bear with it’s &String
not &str
. They are both references to strings, but they differ in their nature and usage:
&str
(String Slice): It is typically used for borrowing a part of a string or a whole string in an efficient way. A string slice has a fixed size and cannot be mutated. Its size is known at compile time.&String
(Reference to a String Object): is a growable, heap-allocated data structure. Unlike&str
, aString
is mutable and its size is not known at compile time. It can expand to hold new data as needed. When you have a&String
, you essentially have a pointer to aString
object but without ownership. You cannot directly modify the content of theString
through this reference. A&String
is often used in function arguments to accept a reference to aString
object to avoid ownership transfer and unnecessary data cloning.
The Rust compiler is very handy. It suggest a solution : try using a conversion method: .to_string()
:
let url = arg.to_string();
match get_article(url){
...
}
The second error ValidationError doesn't implement **Debug**
occurs when you try to use functionality that requires the Debug
trait, but the type using it ( ValidationError
in our case) does not have it implemented.
The Debug
trait in Rust is a part of the standard library, specifically designed for formatting a value in a way that is suitable for debugging purposes.
It's often used for printing values to the console. The Debug
trait is implicitly required in several situations. The most common one is when using the {:?}
or {:#?}
format specifiers in macros like println!
, format!
, etc.
If you try to print an instance of ValidationError
using {:?}
without having implemented the Debug
trait for ValidationError
, Rust will throw a compilation error. This is because Rust cannot infer how to format ValidationError
for debugging.
To resolve this error, we need to implement the Debug
trait for our ValidationError
type. There are two common ways to do this:
- Adding
#[derive(Debug)]
above your enum or struct definition tells the Rust compiler to automatically generate an implementation of theDebug
trait for your type. It compiles but does not give a user friendly custom error message:
#[derive(Debug)]
enum ValidationError {
InvalidUrlFormat,
UnreachableResource,
ArticleNotFound,
}
- Manually implementing the Debug trait. This approach gives us the flexibility to define exactly how ValidationError should be formatted when printed for debugging purposes (doc):
use std::fmt;
impl fmt::Debug for ValidationError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
// Custom formatting logic here
match self {
ValidationError::InvalidUrlFormat => write!(f, "Invalid URL Format"),
ValidationError::UnreachableResource => write!(f, "Unreachable Resource"),
ValidationError::TitleNotFound => write!(f, "Title Not Found"),
ValidationError::ArticleNotFound => write!(f, "Article Not Found"),
}
}
}
Let’s break this code down:
f
in the signature of thefmt
function,fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result
, is a mutable reference to aFormatter
object. ThisFormatter
is a core component of Rust's formatting system, provided by the standard library, and it's used to manage the output of formatted text.- Type
&mut fmt::Formatter<'_>
:&mut
: This indicates thatf
is a mutable reference. You need a mutable reference because theFormatter
is being modified during the formatting process (e.g., writing data into it).fmt::Formatter
: This is a struct provided by Rust's standard library (std::fmt
) that implements functionality for formatting strings. It handles various formatting tasks like padding, alignment, number formatting, etc.<'_>
: This is a lifetime annotation. It signifies that theFormatter
reference has a lifetime, but it's being determined implicitly by the Rust compiler. This is often used in trait implementations where lifetimes are involved.
- The
Formatter
object,f
, is what your implementation offmt
writes the formatted output to. When you use macros likeprintln!
orformat!
, they internally use thisFormatter
to construct the final string. - The
fmt
function writes tof
using methods likewrite!
orwrite_str
. These methods append text to theFormatter
, effectively building the output string piece by piece.
Now the code compiles 🤘🏽.
4. Refactoring: modules
Just like in JavaScript, modules in Rust are used to organize code into logical units. In Rust, a module is declared using the mod keyword. This is somewhat like using export in JavaScript to expose functions, classes, or variables. However, in Rust, mod defines a new module.
mod my_module {
// Functions, structs, enums, etc., go here
}
A module can be defined within a single file, or it can be split across multiple files. For instance, declaring mod my_module;
tells Rust to look for either a my_module.rs
file or a my_module/mod.rs
file.
By default, all items (functions, structs, enums, etc.) in a Rust module are private. They can only be accessed by code within the same module. To make an item public in Rust, you use the pub
keyword.
To use a function or type from another module, you need to bring it into scope. This is akin to using import in JavaScript. In Rust, you use use to bring items into scope.
mod my_module {
pub fn my_function() {
// ...
}
}
use my_module::my_function;
fn main() {
my_function(); // Now accessible
}
Both use
and mod
are keywords used for managing modules, but they serve different purposes:
- The
mod
keyword is used to declare a new module. This is akin to defining a new namespace or a container for code (like functions, structs, enums, etc.). - The
use
keyword is used to bring items (functions, structs, enums, etc.) into scope. This simplifies access to items from other modules or crates.
Article module
Let’s create a new file called [article.rs](http://article.rs)
and move all code related to getting an article, into it:
// article.rs
use scraper::{Html, Selector};
use crate::validation::{ValidationError, is_valid_url};
// Overkill but good for practice ^^
pub trait Describable {
fn describe(&self) -> String;
}
fn parse_html(html: &str) -> (String, String) {
// Parse the HTML
let document = Html::parse_document(html);
// Create a CSS selector
let h1_selector = Selector::parse("h1").unwrap();
// Get the h1 node content
let h1 = document.select(&h1_selector).next().unwrap();
let h1_content = h1.text().collect::<String>();
// Get the article node content
let article_selector = Selector::parse("article").unwrap();
let article = document.select(&article_selector).next().unwrap();
let article_content = article.text().collect::<String>();
(
h1_content,
article_content
)
}
async fn fetch_url(url: &str) -> Result<String, ValidationError> {
println!("⏳ Fetching URL in progress: {}", url);
match reqwest::get(url).await {
Ok(response) => {
match response.text().await {
Ok(html) => {
Ok(html)
}
_ => {
Err(ValidationError::ArticleNotFound)
}
}
}
_ => {
Err(ValidationError::UnreachableResource)
}
}
}
pub struct Article {
url: String,
title: Option<String>,
content: Option<String>,
}
impl Article {
pub fn new(url: String) -> Article {
Article {
url: url,
title: None,
content: None,
}
}
}
impl Describable for Article {
fn describe(&self) -> String {
match &self.title {
Some(title) => {
format!("{}", title)
},
None => {
format!("No title found!")
}
}
}
}
pub fn get_article(url: String) -> Result<Article, ValidationError> {
if is_valid_url(&url) {
let article = Article::new(url);
Ok(article)
} else {
Err(ValidationError::InvalidUrlFormat)
}
}
Validation module
Let’s create another file for validation: validation.rs
// validation.rs
use std::fmt;
use url::Url;
// #[derive(Debug)]
pub enum ValidationError {
InvalidUrlFormat,
UnreachableResource,
ArticleNotFound,
}
impl fmt::Debug for ValidationError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
// Custom formatting logic here
match self {
ValidationError::InvalidUrlFormat => write!(f, "Invalid URL Format"),
ValidationError::UnreachableResource => write!(f, "Unreachable Resource"),
ValidationError::ArticleNotFound => write!(f, "Article Not Found"),
}
}
}
pub fn is_valid_url(url: &str) -> bool {
let result = Url::parse(url);
result.is_ok() // Without ; at end, the value is returned automatically. It's a shorthand for return result.is_ok();
}
Attention:
It's important to remember that everything is private by default. This means that when you move code to a different module, any structs, enums or functions, used in different modules other than the one where they are defined need to be explicitly made public.
You can achieve this by using the pub
keyword.
The second point, declaring modules with mod article;
and mod validation;
in the main file, is insufficient for bringing them into scope.
To effectively utilize these modules, you must explicitly “import” them using use
keyword. The main file becomes:
// main.rs
mod article;
mod validation;
use article::{get_article, Describable};
use reqwest;
use std::env::args;
use tokio;
#[tokio::main]
async fn main() -> Result<(), reqwest::Error> {
let args: Vec<String> = args().collect();
println!("Audiofy: Transform Your Favorite Articles into a Podcast 🚀");
for (index, arg) in args.iter().skip(1).enumerate() {
let url = arg.to_string();
match get_article(url) {
Ok(article) => {
println!("Article created");
}
Err(e) => {
print!("Error {:?}", e);
}
}
}
Ok(())
}
Great! Let’ run cargo fmt
to make sure our code is nicely formatted.
Fetching:
The new method in Article doesn’t do much at the moment. It creates an instance with an url property, that is all. The goal here is to fetch the url and parse the response to get the title and the article.
First we need to update the fetch_url to return a Resutl<String, ValidationError>
and remove the parsing from it:
async fn fetch_url(url: &str) -> Result<String, ValidationError>{
println!("Fetching URL {} in progress...", url);
let response = reqwest::get(url).await;
match response {
Ok(res) => {
let url_body = res.text().await;
match url_body {
Ok(html) => {
println!("✅ Article fetched!");
Ok(html)
}
Err(_) => {
Err(ValidationError::UnreachableResource)
}
}
}
Err(_) => {
Err(ValidationError::UnreachableResource)
}
}
}
Parsing
The parsing function should take html returned from http request and get the title and the article’s content. If something goes wrong it should return and error:
fn parse_html(html: &str) -> (String, String) {
let document = Html::parse_document(html);
let h1_selector = Selector::parse("h1").unwrap();
let h1 = document.select(&h1_selector).next().unwrap();
let h1_content = h1.text().collect::<String>();
let article_selector = Selector::parse("article").unwrap();
let article = document.select(&article_selector).next().unwrap();
let article_content = article.text().collect::<String>();
(h1_content, article_content)
}
Instantiating
Let’ update the new method to instantiate an article with fetched data:
async fn new(url: String) -> Result<Article, ValidationError> {
let mut article = Article {
url,
title: None,
content: None,
};
let fetched_article = fetch_url(&article.url).await?;
let (title, content) = parse_html(&fetched_article);
article.title = Some(title);
article.content = Some(content);
return Ok(article);
}
5. Parsing errors: combinators
Have you noticed that the parse_html method contains several unwrap calls? This approach is not ideal as it implies that potential errors in those lines of code are not being handled. Instead, they will lead to a panic in the program.
fn parse_html(html: &str) -> (String, String) {
let document = Html::parse_document(html);
let h1_selector = Selector::parse("h1").unwrap();
let h1 = document.select(&h1_selector).next().unwrap();
let h1_content = h1.text().collect::<String>();
let article_selector = Selector::parse("article").unwrap();
let article = document.select(&article_selector).next().unwrap();
let article_content = article.text().collect::<String>();
(h1_content, article_content)
}
To effectively handle these errors, we'll utilize two Rust helper functions known as combinators: .map_err
and .ok_or
. These functions provide a more graceful way to manage potential errors (doc).
Both .map_err
and .ok_or
are methods used for error handling, but they serve different purposes and are used in different contexts:
.map_err
is used withResult<T, E>
types. It's a method that takes a closure and applies it to the error (E
) part of aResult
. If theResult
isOk
, nothing happens. If it'sErr
, the closure is applied to transform the error into a different type..ok_or
is used withOption<T>
types. It's a method that enables you to convert anOption
into aResult
. You provide a default error value that is used if theOption
isNone
.
fn parse_html(html: &str) -> Result<(String, String), ValidationError> {
let document = Html::parse_document(html);
let h1_selector = Selector::parse("h1").map_err(|_| ValidationError::TitleNotFound)?;
let h1 = document.select(&h1_selector).next().ok_or(ValidationError::TitleNotFound)?;
let h1_content = h1.text().collect::<String>();
let article_selector = Selector::parse("article").map_err(|_| ValidationError::ArticleNotFound)?;
let article = document.select(&article_selector).next().ok_or(ValidationError::ArticleNotFound)?;
let article_content = article.text().collect::<String>();
Ok((h1_content, article_content))
}
In the new
method of the Article struct, we modify the call to parse_html
as follows:
let (title, content) = parse_html(&fetched_article)?;
The question mark (?
) at the end of this line is crucial. It indicates that if parse_html
returns an Ok
value, then title
and content
will be assigned accordingly. However, if it results in an Err
, the error (in this case, a ValidationError
) will be automatically propagated to the caller.
The last point to address is the call to get_article
in the main function. Since it's now a Future
, it must be awaited. This ensures that the asynchronous operation completes before proceeding with the rest of the program.
match get_article(url).await {
Ok(article) => {
println!("Article created");
}
Err(e) => {
print!("Error {:?}", e);
}
}
Great! Let’s try this out:
// With invalid url
cargo run foo
// Output
Audiofy: Transform your favorites articles to a podcast 🚀
Error Invalid URL Format%
// With valid url
cargo run https://www.sadry.dev/articles/intro-to-rust-for-js-devs-part-2
// Output
Fetching URL https://www.sadry.dev/articles/intro-to-rust-for-js-devs-part-2 in progress...
✅ Article fetched!
Article created
6. Console messages: raw string
Let’s enhance our app by display some nice feedback messages to the user. First let’s make the title stands out by using raw string. A raw string is a type of string literal that allows you to include characters that would normally need to be escaped in a regular string.
let app_title = r#"
************************************************************************
* *
* Audiofy: Transform Your Favorite Articles into a Podcast 🚀 *
* *
************************************************************************
"#;
println!("{}",app_title);
And also we should inform the user if no argument was supplied
let args: Vec<String> = env::args().collect();
if args.len() <= 1 { // 1 and not O because the first arg is the path to the file
println!("🚫 No arguments were supplied!");
}
Let’s remove the logs from fetch_url function and put them in the main call as follows:
mod article;
mod validation;
use article::{get_article, Describable};
use reqwest;
use std::env;
use tokio;
#[tokio::main]
async fn main() -> Result<(), reqwest::Error> {
let app_title = r#"
************************************************************************
* *
* Audiofy: Transform Your Favorite Articles into a Podcast 🚀 *
* *
************************************************************************
"#;
println!("{}",app_title);
let args: Vec<String> = env::args().collect();
// 1 and not O because the first arg is the path to the file
if args.len() <= 1 {
println!("🚫 No arguments were supplied!");
}
for (index, arg) in args.iter().skip(1).enumerate() {
let url = arg.to_string();
match get_article(url).await {
Ok(article) => {
println!("✅ Article fetched!");
println!("⏩ Title: {}", article.describe());
println!("🎤 Audiofy...");
}
Err(e) => {
print!("❌ Failed to process argument at index {}: {:?}", index, e);
}
}
}
Ok(())
}
7. Let’s try this out
With a valid url
cargo run https://www.sadry.dev/articles/intro-to-rust-for-js-devs-part-2
With a valid url that has no content
cargo run https://www.google.com
With invalid url
cargo run foo
With no arguments
cargo run
8. To be continued
Fantastic results! I'm very pleased with what we've achieved so far. We can now process URLs provided by users and successfully retrieve content. There's just one more step remaining: converting the article content into an audio file using the OpenAI API.