Intro to Rust for JS developers - Part 1

Repo

Rust has become increasingly popular in the JavaScript ecosystem, known for its focus on performance and memory safety. You can spot its use in tools such: Turbo, Biome JS, Volta, LightningCSS... There is even an initiative to rewrite TypeScript in Rust!

As JavaScript developer I decided it's time to give it a shot, and, the best way is to learn by doing. So in this series of tutorials, we're going to build a fun project together. We will create a CLI named audiofy that turns online articles into podcasts. It will utilize text-to-speech capabilities provided by the OpenAI API to achieve this. For instance, after building the tool, the following command in the terminal Would yield an .mp3 file of the article, narrated beautifully as if by a professional podcaster:

$ audiofy http://some-interesting-blog-post.com

Pretty cool, isn't it? Let’s get started!

1. Init the audiofy project: Cargo and Crate

Installing Rust is straightforward and can be accomplished with just one command, as outlined in the official documentation at rust-lang.org/tools/install. No need to elaborate on this besides adding the adequate extensions to your IDE for a better DX (rust-analyzer for VSCode).

Once it's installed (running rustc --version gives you the installed version), you can create a new project by running cargo new command. It sets up a new directory with the necessary structure for a Rust project and initializes it as a new Git repository.

cargo new audiofy

Cargo is Rust’s build system and package manager. It does a lot: creating an new project, downloading dependencies, compiling project and your packages, running your code, making distributable packages, and uploading them to crates.io, among other things.

Crate is the term used for a package in Rust code. It refers to a module or a group of modules (think of it as a JavaScript library). The package manager, cargo, allows downloading and compiling your crate's dependencies. Each crate contains a Cargo.toml "manifest" file at the root, which describes the package.

Crates.io is the public registry where you download (and share) crates. It functions as a centralized hub for publishing and sharing open-source Rust libraries.

Cargo.toml file at the root of the directory, is known as the "manifest" and it contains metadata like the package name, version as well as dependencies. It is equivalent to the package.json file found in a JavaScript project. It is written in TOML (Tom's Obvious, Minimal Language) format.

main.rs: Inside the src directory, a main.rs file is created with a simple "Hello, world!" program as a starting point. This file is the main entry point for your Rust application.

Run your code to check the output:

$ cargo run

2. Display a welcome message to our CLI: Print, Dbg, and Panic:

Just like console.log in JavaScript, Rust uses print macros for outputting text to the console. The println!() macro adds a new line automatically.

fn main() {
  println!("Hello, world!");
}

A macro allows for writing code that writes other code, known as metaprogramming. Macros are used for a variety of purposes, such as code generation or reducing boilerplate. The exclamation mark ! following the macro's name, as in println!, distinguishes a macro call from a regular function call.

Another handy macro for quickly and easily printing the value of expressions is dbg!. It's particularly useful because it allows you to see the value of a variable or expression without requiring much setup or altering the flow of your code significantly. Let's say you have a variable x and you want to see its value at a certain point in your program. You can use dbg! like this:

fn main() {
  let x = 5;
  let y = dbg!(x * 2) + 1;

  println!("y: {}", y);
}
// Output: 
// [src/main.rs:2] x * 2 = 10
// y: 11

While println! can be used for similar purposes, dbg! provides additional context like the file and line number, and doesn't require formatting the output.

If you need your code to crash with an error message when something goes wrong you can use panic!. Similar idea to Error in Javascript.

fn main() {
    let x = 5;
    let y = dbg!(x * 2 +12) + 1;
    panic!("There's been an error!");
    println!("y: {}", y); // Displays a warning for unreachable statement
}

// Output: 
// [src/main.rs:3] x * 2 + 12 = 22
// thread 'main' panicked at src/main.rs:4:5:
// There's been an error!

Let's add a friendly message to our CLI :

fn main() {
    let msg = "Audiofy: turn your favorite articles into a podcast! πŸš€";
    println!("{}", msg); 
}

Single quotes, double quotes, String and &str (string slice)

Unlike JavaScript, Rust differentiates between single and double quotes. Single quotes are used for character literals, which represent a single Unicode character and have the type char. For example, 'a' or 'βˆ†'.
On the other hand, double quotes are used for string literals, which are collections of characters. They represent a &str (a string slice), such as "Hello, world!". This distinction is crucial in Rust as it enforces correct data types and usage at compile time, preventing errors that could arise from inadvertently confusing characters with strings or vice versa.

Be mindful that the type of "Hello, world!" is not String, but &str (a string slice).

In Rust, both &str and String are used to work with text, but they are two primary types used in different ways due to their distinct characteristics:

  • String is a growable, heap-allocated (which we'll explore more in an upcoming article) data structure. It's the more complex of the two types and It's typically used when you need to modify your string data, such as adding characters, concatenating strings, or changing the contents in any other way.
  • &str is a string slice. It is an immutable view into a string and is essentially a reference to some UTF-8 data. It doesn't own the data it points to. The actual data is stored elsewhere, like in a String or a static string in the program.

You can turn a string slice &str into to a String using to_string() or String::from().

Constants, Variables, and Statics

In Rust, there are let and const keywords but they behave differently from javascript.

let is used to declare a variable, but with a key difference from JavaScript: variables are immutable by default. This means that once a variable is assigned a value using let, it cannot be altered. This immutability is a core aspect of Rust's design, prioritizing safety and predictability in code.

// This will cause a compile-time error
let x = 5;
x = 6; 

// To change the value you must explicitly declare it as mutable 
// using the mut keyword
let mut x = 5;
x = 6; 

const declares a constant value, which is not just immutable but also requires a fixed value at compile time. Constants can never be mutable and it must always include the type annotation

const MAX_POINTS: u32 = 100000;

Static variables are variables that have a static lifetime, meaning they are present for the entire duration of a program's execution. This is similar to global variables in other languages but with some unique characteristics due to Rust's focus on safety and concurrency.

Static variables are initialized when the program start and can be accessed from anywhere in the program. They have a fixed memory location throughout the program's lifetime.

Static variables can be mutable or immutable, but mutable static variables require the use of unsafe code due to the potential for data races.

// Immutable static variable
static LANGUAGE: &str = "Rust";

// Mutable static variable (unsafe):
static mut COUNTER: i32 = 0;
unsafe {
    COUNTER += 1;
}

Static variables are used in the following cases:

  1. Constants with Complex Initialization: If you have a constant that requires runtime computation for initialization, a static variable can be useful. Regular constants in Rust require values to be determined at compile time.
  2. Global State or Configuration: They can be used to maintain global state or configuration settings that need to be accessible across different parts of a program.
  3. Long-Lived Values: For values that are needed throughout the program's lifetime and don't fit the constant model, static variables are appropriate.
  4. Performance Optimization: Sometimes, using static variables can optimize performance by avoiding repeated initialization of data.

3. Read CLI arguments: The type system

To build our CLI, we need to handle command-line arguments to get the URL for the article we want to turn into a podcast. We can do so by collecting the arguments using a function provided by Rust's standard library std::env::args.

use std::env;

fn main() {
    let msg = "Audiofy: turn your favorite articles into a podcast! πŸš€";
    println!("{}", msg); 

    let args: Vec<String> = env::args().collect();

    // args[0] is the path to the program
    // Further elements are the passed command-line arguments
    println!("Command-line arguments: {:?}", args);
}

Let's break down and analyze this code:

The use keyword in Rust is used for importing items from modules or crates into the current scope. the statement use std::env is used to bring the env module from the standard library std into the scope. It’s also possible to avoid typing env at each function call by importing the methods you need like this:

use std::env::args;
// Or use std::env::{ args, var } to import multiple methods (eg: args and var)

fn main() {
    // Accessing command-line arguments
    let cli_args: Vec<String> = args().collect(); // Rust uses snake_case covention to naming
    println!("Command-line arguments: {:?}", cli_args);
}

These lines of code bring us to one of the main features of the language: it’s type system.

Rust features a rich and robust type system that gives the developer fine-grained control over how memory is used and managed. Its types are broadly divided into two categories: scalar and compound.

  • Scalar types represent a single value, crucial for basic data operations and control structures.
  • Compound types, on the other hand, can group multiple values into one type.

Here is a list of available types, though we won't delve into each one. Instead, we'll gradually explore some of these types as we progress through this project:

Rust types: scalar and compound

In javascript we are used to use the array data structure to hold a collection of elements. JS arrays are flexible in size and can hold elements of different types. Arrays in Rust, on the other hand have a fixed size known at compile-time, and hold elements of the same type. For example, an array of 32-bit signed integer let my_array: [i32; 5] = [1, 2, 3, 4, 5];

Rust's vectors (Vec<T>), meanwhile, are dynamic, heap-allocated collections that can grow or shrink at runtime, similar to JavaScript arrays in terms of flexibility, but with a uniform type for all elements and more control over memory layout and performance.

Another concept in Rust that goes hand in hand with collection is: Iterator. Iterator is not a specific type but rather a trait.

A trait in Rust is similar to an interface in other programming languages: it defines a set of methods that types must implement.

Most collection types in Rust, like vectors and arrays, can be turned into iterators using methods like iter() for immutable access or iter_mut() for mutable access.

In our example let args: Vec<String> = args().collect(); The env::args() function returns an iterator of type std::env::Args. This iterator yields values of type String. We use a trait .collect to transform it to a Vector of String.

Understanding {:?}

The {:?} in the println! macro is a formatting specifier used for pretty-printing values using the Debug trait. The Debug trait is a part of Rust's standard library and is generally implemented for most types. It's intended to output a human-readable format of a value, which is particularly useful for debugging purposes.

When you use {:?}, you're asking println! to use the Debug implementation of the value's type to format it. This is different from {}, which uses the Display trait meant for user-friendly output. In this case args is a vector and does not implement Display trait. If you try to use you get an error.

Rust error when using println! with a vector

Looping over an iterator

We may need to loop over the arguments to process multiple links provided by the user. To do so we use the trait .iter(). To get the index for each value from the Iterator by using .enumerate().

use std::env;

fn main() {
   let msg = "Audiofy: turn your favorite articles into a podcast! πŸš€";
    println!("{}", msg); 

    let args: Vec<String> = env::args().collect();

    for (index, arg) in args.iter().enumerate() {
      println!("- Arg at index {}: {}", index, arg);
    }
}

Let's give it a try by running :

$ cargo run arg1 arg2 arg3  

The result is almost as expected. There is an extra argument at index 0:

- Arg at index 0: target/debug/audiofy
- Arg at index 1: arg1
- Arg at index 2: arg2
- Arg at index 3: arg3

The extra argument at index 0 given by env::args().collect() is typically the path to the executable that's running. This behavior is standard in many programming environments, where the first element of the array of command-line arguments (argv[0] in C and C++, for instance) is the program name or the full path to the executable.

We want to ensure we don't process this initial argument. To achieve this, Rust provides the convenient .skip(n) method, which allows us to skip a specified number of items in the iterator.

use std::env;

fn main() {
   let msg = "Audiofy: turn your favorite articles into a podcast! πŸš€";
    println!("{}", msg); 

    let args: Vec<String> = env::args().collect();

    for (index, arg) in args.iter().skip(1).enumerate() {
      println!("- Arg at index {}: {}", index, arg);
    }
}

4. Validate URL format: add dependencies

At this point we can read the arguments, but we need to validate that they are valid URL format. To do so, we use a crate called url.

Like in a JavaScript project, Cargo in Rust offers two main ways to add a dependency:

  1. Manual Addition: Directly edit the Cargo.toml file and include the desired crate under the [dependencies] section.
  2. Using cargo add command: For a more automated approach.

Cargo provides several options for customized dependency management. You can add a crate specifically for the development environment or choose a specific version, among other options detailed in this documentation.

In our case, to add the url crate, simply run cargo add url. This updates the Cargo.toml file by adding url as a dependency. After adding a new dependency, it's important to run cargo build to download and compile the new crate along with any of its dependencies, ensuring everything is correctly set up for your project.

We aim to utilize the Url module and it's parse method. For those exploring its documentation, you might be wondering:

  • What is Struct
  • What is pub fn
  • What &str
  • What is Result<Url, ParseError>

Struct (short for "structure") is a data stratcture to group together related data. It's similar to a record, structure, or class in other programming languages, but without built-in notions of inheritance. For example: struct Person { name: String, age: u32,}.

pub fn means the function is public, so it can be accessed from outside the module it's defined in. By default functions are private.

&str represents a string slice in Rust, which is a view into a string. It's a reference to a part of a String (or another string slice), denoted by its start and end positions. It's a more efficient way to pass strings around, as it avoids copying the string data.

Resultis an enum in Rust used for error handling. It represents either a success Ok or a failure Err. Here's a simplified implementation to illustrate its structure enum Result <T, E> { Ok(T), Err(E),}. In Result<Url, ParseError>, Url is the type returned in the case of success, and ParseError is the type of error returned in case of failure. The Result enum has two helpful methods, is_ok() and is_err(), which are used to check whether the Result is an Ok variant or an Err variant, respectively.

Let's use is_ok to build a helper function is_valid_url.

use std::env::args;
use url::Url;

fn is_valid_url(url: &str) -> bool {
    let result = Url::parse(url);
    result.is_ok() // Without ; at end, the value is returned automatically. It's a shorthand for return result.is_ok();
}

fn main() {
    let args: Vec<String> = args().collect();
    println!("Audiofy: Transform your favorites articles to a podcast πŸš€");
    
		for (index, arg) in args.iter().skip(1).enumerate() {
        if is_valid_url(arg) {
            println!("- Valid URL at index {}: {}", index, arg);
        } else {
            println!("- Invalid argument at index {}: {}", index, arg);
        }
    }
}

Let's try this code with two arguments: a valid url format and an invalid one:

$ cargo run http://some-interesting-blog-post.com some-invalid-url

// output 
Audiofy: Transform your favorites articles to a podcast πŸš€
- Valid URL at index 0: http://some-interesting-blog-post.com
- Invalid argument at index 1: some-invalid-url

5. To be continued

To wrap up, our current implementation functions seamlessly. However, it currently lacks the capability to verify the existence of a given URL. Enhancing our application with this feature will be the focus of the next article...