Skip to content

Make workshop text field searches ignore punctuation (hyphens, ampersands, periods, dashes, quotes)#737

Merged
maebeale merged 9 commits intomainfrom
copilot/adjust-workshop-search-hyphen-logic
Feb 15, 2026
Merged

Make workshop text field searches ignore punctuation (hyphens, ampersands, periods, dashes, quotes)#737
maebeale merged 9 commits intomainfrom
copilot/adjust-workshop-search-hyphen-logic

Conversation

Copy link
Contributor

Copilot AI commented Jan 20, 2026

What is the goal of this PR and why is this important?

Improve punctuation-agnostic searching so more workshops are found

What works

Punctuation stripping (20 characters)

All of these are ignored in both DB values and search input: - & . " ' ' ' " " / : + ! ? , ( )

Hyphen/space/joined variants (all combos match)

Search ↓ / DB → self-care self care selfcare
self-care spaced ✅ spaced ✅ spaceless ✅
self care spaced ✅ spaced ✅ spaceless ✅
selfcare spaceless ✅ spaceless ✅ spaceless ✅

Synonym expansion

| Search | DB | Match |
| arts and crafts | Arts & Crafts | ✅ |
| Arts & Crafts | arts and crafts | ✅ |
| arts + crafts | arts and crafts | ✅ |
| arts and crafts | arts and crafts | ✅ |
| arts crafts | Arts & Crafts | ✅ |
| Painting with Kids | Painting w/ Kids | ✅ |
| Painting w/ Kids | Painting with Kids | ✅ |

More examples

| Search | DB | Match |
| own art | Make your own - art kit | ✅ |
| Dr Workshop | Dr. Workshop | ✅ |
| Art Music | Art/Music | ✅ |
| Create | Create! | ✅ |
| Introduction The Basics | Introduction: The Basics | ✅ |

Search targets

  • Workshop title and full_name (native columns) ✅
  • All Rhino rich text content via action_text_rich_texts.plain_text_body
  • Case insensitive (MySQL utf8mb4_unicode_ci collation) ✅

What doesn't work

  • Word order: "crafts arts" won't find "Arts & Crafts" — LIKE requires sequential substring match
  • Searching for punctuation itself: Can't specifically find all workshops containing & since it gets stripped
  • Spaceless false positives (rare): DB "the art" → spaceless "theart" could match search "hear" since t[hear]t is a substring
  • No index usage: Full table scans due to REPLACE() wrapping + leading % wildcards (fine at 700-3k workshops)


Original ai description

This PR makes workshop searches ignore punctuation in both user input and workshop data, improving search usability. Users can now find workshops regardless of punctuation variations in titles or text content (e.g., "self-care" vs "selfcare", "Arts & Crafts" vs "Arts Crafts", "Dr. Workshop" vs "Dr Workshop", "The 'Best' Workshop" vs "The Best Workshop").

How did you approach the change?

  • PunctuationStrippable Concern: Created a reusable concern with helper methods to centralize punctuation stripping logic:
    • strip_punctuation_sql(field_name) - generates nested SQL REPLACE functions for database queries
    • strip_punctuation(text) - strips punctuation from Ruby strings
  • Workshop Model: Includes the PunctuationStrippable concern and modified the title scope to use the helper methods for stripping punctuation from both search input and database fields, with proper SQL injection protection via sanitize_sql_like
  • WorkshopSearchService: Includes the PunctuationStrippable concern and updated filter_by_query method to use the helper methods for custom SQL that strips punctuation from search queries and applies to all searchable workshop fields (title, full_name, objective, materials, setup, introduction, demonstration, etc., including Spanish variants)
  • Tests: Added comprehensive RSpec tests for punctuation-ignoring behavior in both title searches and full-text query searches with various punctuation types

Anything else to add?

The implementation ensures bidirectional matching for 11 types of punctuation:

  • Hyphens: -, --
  • Ampersands: &
  • Periods: .
  • Em dashes:
  • En dashes:
  • Double quotes: ", ", "
  • Single quotes: ', ', '

Examples:

  • Searching "Arts Crafts" finds workshops titled "Arts & Crafts"
  • Searching "Dr Workshop" finds workshops titled "Dr. Workshop"
  • Searching "HelloGoodbye" finds workshops titled "Hello—Goodbye" or "Hello–Goodbye"
  • Searching "The Best Workshop" finds workshops titled "The 'Best' Workshop"

All security checks passed with no vulnerabilities introduced. The refactoring into a reusable concern improves code maintainability and ensures consistency between the Workshop model and WorkshopSearchService.

Original prompt

This section details on the original issue you should resolve

<issue_title>Make workshop text field searches ignore hyphens</issue_title>
<issue_description>Need to be stripped out of user input, and also adjust the workshop data against which it's being compared.</issue_description>

<agent_instructions>adjust the workshops index workshop search service to ignore matching against hyphens, esp in the workshop title.

so, if a user inputs "hello -- goodbye" into the search field, it'll find workshops w title (or text, depending on search field) "hello goodbye"
and, if a user inputs "hello goodbye" into the search field, it'll find workshop w title (or text, depending on search field) "hello - goodbye"</agent_instructions>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@maebeale
Copy link
Collaborator

@copilot would you start again?

Copy link
Contributor Author

Copilot AI commented Jan 27, 2026

@copilot would you start again?

I've implemented the hyphen-ignoring search functionality. The changes are in commit 5c9261a.

What was changed:

  • Modified the title scope in Workshop model to strip hyphens from both search input and database fields using SQL REPLACE
  • Updated filter_by_query in WorkshopSearchService to use custom SQL that strips hyphens from all searchable workshop fields
  • Added comprehensive tests for both title and query searches

How it works:

  • Searching "hello goodbye" now finds workshops titled "hello - goodbye"
  • Searching "hello -- goodbye" now finds workshops titled "hello goodbye"
  • Works bidirectionally for title searches and full-text query searches across all workshop text fields

All security checks passed with no vulnerabilities found.

Copilot AI changed the title [WIP] Make workshop text field searches ignore hyphens Make workshop text field searches ignore hyphens Jan 27, 2026
@maebeale
Copy link
Collaborator

@copilot make it also correctly match titles with ampersand, periods, em dash, en dash, and quotes

Copy link
Contributor Author

Copilot AI commented Feb 15, 2026

@copilot make it also correctly match titles with ampersand, periods, em dash, en dash, and quotes

I've extended the search to ignore ampersands, periods, em dashes, en dashes, and quotes in addition to hyphens. Changes are in commits 5074556 and f92cb34.

What was added:

  • Ampersands: &
  • Periods: .
  • Em dashes:
  • En dashes:
  • Various quote types: ", ', ', ', ", "

Implementation:

  • Created a reusable PunctuationStrippable concern with helper methods
  • Updated both title and query searches to strip all these punctuation characters
  • Added comprehensive tests for each punctuation type

Examples:

  • "Arts Crafts" now finds "Arts & Crafts"
  • "Dr Workshop" now finds "Dr. Workshop"
  • "HelloGoodbye" now finds "Hello—Goodbye" or "Hello–Goodbye"
  • "The Best Workshop" now finds "The 'Best' Workshop"

All security checks passed with no vulnerabilities.

Copilot AI changed the title Make workshop text field searches ignore hyphens Make workshop text field searches ignore punctuation (hyphens, ampersands, periods, dashes, quotes) Feb 15, 2026
@maebeale maebeale force-pushed the copilot/adjust-workshop-search-hyphen-logic branch from f92cb34 to 64cbab0 Compare February 15, 2026 17:14
) #{sort_order == "asc" ? "ASC" : "DESC"}
SQL
end
scope :title, ->(title) { where("workshops.title like ?", "%#{ title }%") }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove dupe

def filter_by_title
return unless params[:title].present?
@workshops = @workshops.search("title:#{params[:title]}")
@workshops = @workshops.title(params[:title])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop using SearchCop for title search so we can instead apply this stripped chars version of searching.

@maebeale maebeale marked this pull request as ready for review February 15, 2026 17:49
@maebeale maebeale merged commit 4868c31 into main Feb 15, 2026
4 of 5 checks passed
@maebeale maebeale deleted the copilot/adjust-workshop-search-hyphen-logic branch February 15, 2026 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make workshop text field searches ignore hyphens

2 participants